Subscribe to PHP Freaks RSS

How to Analyze Tweet Sentiments with PHP Machine Learning

syndicated from www.sitepoint.com on July 7, 2017

As of late, it seems everyone and their proverbial grandma is talking about Machine Learning. Your social media feeds are inundated with posts about ML, Python, TensorFlow, Spark, Scala, Go and so on; and if you are anything like me, you might be wondering, what about PHP?

Yes, what about Machine Learning and PHP? Fortunately, someone was crazy enough not only to ask that question, but to also develop a generic machine learning library that we can use in our next project. In this post we are going take a look at PHP-ML - a machine learning library for PHP - and we'll write a sentiment analysis class that we can later reuse for our own chat or tweet bot. The main goals of this post are:

  • Explore the general concepts around Machine learning and Sentiment Analysis
  • Review the capabilities and shortcomings of PHP-ML
  • Define the problem we are going to work on
  • Prove that trying to do Machine learning in PHP isn't a completely crazy goal (optional)

A robot elephpant

What is Machine Learning?

Machine learning is a subset of Artificial Intelligence that focuses on giving "computers the ability to learn without being explicitly programmed". This is achieved by using generic algorithms that can "learn" from a particular set of data.

For example, one common usage of machine learning is classification. Classification algorithms are used to put data into different groups or categories. Some examples of classification applications are:

  • Email spam filters
  • Market segmentation
  • Fraud detection

Machine learning is something of an umbrella term that covers many generic algorithms for different tasks, and there are two main algorithm types classified on how they learn – supervised learning and unsupervised learning.

Supervised Learning

In supervised learning, we train our algorithm using labelled data in the form of an input object (vector) and a desired output value; the algorithm analyzes the training data and produces what is referred to as an inferred function which we can apply to a new, unlabelled dataset.

For the remainder of this post we will focus on supervised learning, just because its easier to see and validate the relationship; keep in mind that both algorithms are equally important and interesting; one could argue that unsupervised is more useful because it precludes the labelled data requirements.

Unsupervised Learning

This type of learning on the other hand works with unlabelled data from the get-go. We don't know the desired output values of the dataset and we are letting the algorithm draw inferences from datasets; unsupervised learning is especially handy when doing exploratory data analysis to find hidden patterns in the data.

PHP-ML

Meet PHP-ML, a library that claims to be a fresh approach to Machine Learning in PHP. The library implements algorithms, neural networks, and tools to do data pre-processing, cross validation, and feature extraction.

I'll be the first to admit PHP is an unusual choice for machine learning, as the language's strengths are not that well suited for Machine Learning applications. That said, not every machine learning application needs to process petabytes of data and do massive calculations - for simple applications, we should be able to get away with using PHP and PHP-ML.

The best use case that I can see for this library right now is the implementation of a classifier, be it something like a spam filter or even sentiment analysis. We are going to define a classification problem and build a solution step by step to see how we can use PHP-ML in our projects.

The Problem

To exemplify the process of implementing PHP-ML and adding some machine learning to our applications, I wanted to find a fun problem to tackle and what better way to showcase a classifier than building a tweet sentiment analysis class.

One of the key requirements needed to build successful machine learning projects is a decent starting dataset. Datasets are critical since they will allow us to train our classifier against already classified examples. As there has recently been significant noise in the media around airlines, what better dataset to use than tweets from customers to airlines?

Fortunately, a dataset of tweets is already available to us thanks to Kaggle.io. The Twitter US Airline Sentiment database can be downloaded from their site using this link

The Solution

Let's begin by taking a look at the dataset we will be working on. The raw dataset has the following columns:

Continue reading %How to Analyze Tweet Sentiments with PHP Machine Learning%