Line 8: Line 8:
 
= Outline of the slecture =
 
= Outline of the slecture =
  
* Background in Linear Classification Problem
+
* Background in Linear Classifiers
 
* Support vector machine
 
* Support vector machine
 
* Summary
 
* Summary
Line 15: Line 15:
 
----
 
----
  
== Background in Linear Classification Problem==
+
== Background in Linear Classifiers ==
 
    In this section, we will introduce the framework and basic idea of linear classification problem.  
 
    In this section, we will introduce the framework and basic idea of linear classification problem.  
  
Line 28: Line 28:
 
<center><math> {w_2} </math>  if  <math> \textbf{c}\cdot\textbf{y}<0</math> </center>
 
<center><math> {w_2} </math>  if  <math> \textbf{c}\cdot\textbf{y}<0</math> </center>
  
&nbsp; &nbsp; We can apply a trick here to replace all <math>\textbf{y}</math>'s in class <math>w_2</math> by <math>-\textbf{y}</math>, then the task is looking for <math>\textbf{c}</math> so that  
+
&nbsp; &nbsp; We can apply a trick here to replace all <math>\textbf{y}</math>'s in class <math>w_2</math> by <math>-\textbf{y}</math>, then the above task is equivalent to looking for <math>\textbf{c}</math> so that
  
 
<center><math>\textbf{c}\cdot \textbf{y}>0, \forall \textbf{y} \in </math>new sample space. </center>
 
<center><math>\textbf{c}\cdot \textbf{y}>0, \forall \textbf{y} \in </math>new sample space. </center>
  
  
&nbsp; &nbsp; You might have already observe the ambiguity of c in the above discussion: if c separates data, <math>\lambda \textbf{c}</math> also separates the data. One solution might be set <math>|\textbf{c}|=1</math>. Another solution is to introduce the concept of "margin" which we denote by b, and ask  
+
&nbsp; &nbsp; You might have already observed the ambiguity of c in the above discussion: if c separates data, <math>\lambda \textbf{c}</math> also separates the data. One solution might be set <math>|\textbf{c}|=1</math>. Another solution is to introduce the concept of "margin" which we denote by b, and ask  
  
 
<center><math>\textbf{c}\cdot\textbf{y}\geqslant b > 0, \forall \textbf{y} </math>.</center>
 
<center><math>\textbf{c}\cdot\textbf{y}\geqslant b > 0, \forall \textbf{y} </math>.</center>
Line 62: Line 62:
  
 
== Support Vector Machine==
 
== Support Vector Machine==
Support Vector Machines
+
Support vector machines are an example of a linear two-class classifier

Revision as of 10:15, 2 May 2014


'Support Vector Machine and its Applications in Classification Problems
A slecture by Xing Liu Partially based on the ECE662 Spring 2014 lecture material of Prof. Mireille Boutin.



Outline of the slecture

  • Background in Linear Classifiers
  • Support vector machine
  • Summary
  • References


Background in Linear Classifiers

    In this section, we will introduce the framework and basic idea of linear classification problem.

    In a linear classification problem, the feature space can be divided into different regions by hyperplanes. In this lecture, we will take a two-catagory case to illustrate. Given training samples $ \textbf{y}_1,\textbf{y}_2,...\textbf{y}_n \in \mathbb{R}^p $, each $ \textbf{y}_i $ is a p-dimensional vector and belongs to either class $ w_1 $ or $ w_2 $. The goal is to find the maximum-margin hyperplane that separate the points in the feature space that belong to class $ w_1 $ from those that belong to class $ w_2 $. The discriminate function can be written as

$ g(\textbf{y}) = \textbf{c}\cdot\textbf{y} $

    We want to find $ \textbf{c}\in\mathbb{R}^{n+1} $ so that a testing data point $ \textbf{y}_i $ is labelled

$ {w_1} $ if $ \textbf{c}\cdot\textbf{y}>0 $
$ {w_2} $ if $ \textbf{c}\cdot\textbf{y}<0 $

    We can apply a trick here to replace all $ \textbf{y} $'s in class $ w_2 $ by $ -\textbf{y} $, then the above task is equivalent to looking for $ \textbf{c} $ so that

$ \textbf{c}\cdot \textbf{y}>0, \forall \textbf{y} \in $new sample space.


    You might have already observed the ambiguity of c in the above discussion: if c separates data, $ \lambda \textbf{c} $ also separates the data. One solution might be set $ |\textbf{c}|=1 $. Another solution is to introduce the concept of "margin" which we denote by b, and ask

$ \textbf{c}\cdot\textbf{y}\geqslant b > 0, \forall \textbf{y} $.

    In this scenario, the hyperplane is defined by $ \textbf{c}\cdot \textbf{y} = \textbf{b} $. $ \textbf{c} $ is the normal of the plane lying on the positive side of every hyperplane. $ \frac{b_i}{||c||} $ is the distance from each point $ \textbf{y}_i $ to the hyperplane.


    The above approach is equivalent to finding a solution for

$ \textbf{Y}\textbf{c} = \begin{bmatrix} b_1\\b_2\\...\\b_n\end{bmatrix} $

where $ \textbf{Y} =\begin{bmatrix} y_1^T \\ y_2^T \\ ... \\ y_n^T \end{bmatrix} $


    In most cases when n>p, it is always impossible to find a solution for $ \textbf{c} $. An alternative approach is to find c that minimize a criterion function $ J(\textbf{c}) $. There are variant forms of criterion functions. For example, we can try to minimize the error vector between $ \textbf{c}\cdot\textbf{y} $ and $ b $, hence the criterion function can be defined as

$ J(\textbf{c}) = ||\textbf{Y}\textbf{c}-\textbf{b}||^2 $

The solution to the above problem is

$ \textbf{c} = (\textbf{Y}^T\textbf{Y})^{-1}\textbf{Y}^T\textbf{b} $

if $ det(\textbf{Y}^T\textbf{Y})\neq 0 $

otherwise, the solution is defined generally by

$ \lim_{\epsilon \to 0}(\textbf{Y}^T\textbf{Y}+\epsilon\textbf{I})^{-1}\textbf{Y}^Tb $
.

This solution is a MSE solution to $ \textbf{Y}\textbf{c} = \textbf{b} $and it always exists.

Support Vector Machine

Support vector machines are an example of a linear two-class classifier

Alumni Liaison

Followed her dream after having raised her family.

Ruth Enoch, PhD Mathematics