In geometry, a two dimensional plane is linearly separable if we can divide the plane using a line. In case of n-dimensions it is a hyperplane. In Machine learning, we use this property to classify the data. Lets first understand what do we mean by Learning here ?.

Lets take following example. For now lets ignore how we will convert out problem to cartesian coordinates.

The problem is if a new point comes, consider the green cross in below picture. In what category we will put it into ?. Is it belongs to blue cluster or red cluster ?.

Most of us will agree that it belongs to red cluster. Because it is close to it. What we do intuitively is we draw a link between these two clusters and we will decide based on what side it is falling into.

For two dimension it is a line and for multi-dimensions it is a hyperplane.

If the data can be separated using a hyperplane (straight line in case of two dimension) then we call the data linearly separable. What we defined now is a simple algorithm to classify the data into two classes. We defined a simple classifier now that can divide the data into two classes. How can we do that if we have more than one class ?.

###
Multi-class classification

Once we have an algorithm to classify the data into two classes, we can generalise it to multiple classes. There are two approaches for doing that.

- One vs Rest (one vs all): Here we will have n classifiers. Meaning we will have n hyperplanes. First we will check whether it belongs to class1. We will check this by drawing the hyperplane between class1 and rest of the classes (we will mix all other classes into one). We will repeat this for all the classes.
- One vs One: This is more complicated but is useful for certain class of problems. Here we will have (n * (n - 1)/2) classifiers. For 4 classes we will have 6 classifiers. We will compare class1 vs class2, class1 vs class3, class1 vs class4, class2 vs class3, class2 vs class4, class4 vs class4. Whichever class comes more times we will classify it as that class.

###
Revisiting Linear separability

Not all problems are linearly separable. Take following example.

We can't draw any line that separates these two classes. Only way is to draw an ellipse. These kind of problems are solved into two ways. As i said before, draw an ellipse instead of line. Meaning, we are using non-linear function to classify the data.

The other way (ex. kernel trick in svm) is to project the data to higher dimension and check whether it is linearly separable. If we project above data into 3rd dimension we will see it as,

Clearly, above data is linearly separable using a hyperplane. In the next blog, we will see how we can use this for sentiment analysis.

###
References

- http://en.wikipedia.org/wiki/Linear_separability
- SVM Example: http://stackoverflow.com/questions/9480605/what-is-the-relation-between-the-number-of-support-vectors-and-training-data-and
- Multi-class classification: http://scikit-learn.org/stable/modules/multiclass.html
- http://www.statsoft.com/textbook/naive-bayes-classifier/