January 17, 2014

Linear Separability

In geometry, a two dimensional plane is linearly separable if we can divide the plane using a line. In case of n-dimensions it is a hyperplane. In Machine learning, we use this property to classify the data. Lets first understand what do we mean by Learning here ?. 

Learning vs Memorising

Oxford dictionary defines learning as "the acquisition of knowledge or skills through study, experience, or being taught". Memorising is the ability to remember/call what ever we have seen before. The key difference is, if we learn correctly we can able to answer the things which we haven't seen before. While learning, we generalise the things we have seen before and apply the same to new things we haven't seen. Achieving 100% memorisation in computers is very very easy. Its matter of how fast we can go through the things we have seen and get the answer. The challenge lies in how we make computer learn. To achieve that we will not define any explicit rules for any particular problem rather we will create a model where we will give the computer ability to generalise from examples we have given. The key for learning is ability to generalise

Linear separability

Lets take following example. For now lets ignore how we will convert out problem to cartesian coordinates. 
The problem is if a new point comes, consider the green cross in below picture. In what category we will put it into ?. Is it belongs to blue cluster or red cluster ?.
Most of us will agree that it belongs to red cluster. Because it is close to it. What we do intuitively is we draw a link between these two clusters and we will decide based on what side it is falling into.


For two dimension it is a line and for multi-dimensions it is a hyperplane.




If the data can be separated using a hyperplane (straight line in case of two dimension) then we call the data linearly separable. What we defined now is a simple algorithm to classify the data into two classes. We defined a simple classifier now that can divide the data into two classes. How can we do that if we have more than one class ?.

Multi-class classification

Once we have an algorithm to classify the data into two classes, we can generalise it to multiple classes. There are two approaches for doing that. 
  • One vs Rest (one vs all): Here we will have n classifiers. Meaning we will have n hyperplanes. First we will check whether it belongs to class1. We will check this by drawing the hyperplane between class1 and rest of the classes (we will mix all other classes into one). We will repeat this for all the classes. 
  • One vs One: This is more complicated but is useful for certain class of problems. Here we will have (n * (n - 1)/2) classifiers. For 4 classes we will have 6 classifiers. We will compare class1 vs class2, class1 vs class3, class1 vs class4, class2 vs class3, class2 vs class4, class4 vs class4. Whichever class comes more times we will classify it as that class. 

Revisiting Linear separability

Not all problems are linearly separable. Take following example. 


We can't draw any line that separates these two classes. Only way is to draw an ellipse. These kind of problems are solved into two ways. As i said before, draw an ellipse instead of line. Meaning, we are using non-linear function to classify the data. 


The other way (ex. kernel trick in svm) is to project the data to higher dimension and check whether it is linearly separable. If we project above data into 3rd dimension we will see it as, 



Clearly, above data is linearly separable using a hyperplane. In the next blog, we will see how we can use this for sentiment analysis. 

References

  • http://en.wikipedia.org/wiki/Linear_separability
  • SVM Example: http://stackoverflow.com/questions/9480605/what-is-the-relation-between-the-number-of-support-vectors-and-training-data-and
  • Multi-class classification: http://scikit-learn.org/stable/modules/multiclass.html
  • http://www.statsoft.com/textbook/naive-bayes-classifier/

4 comments: