Line 16: Line 16:
  
 
== Background in Linear Classification Problem==
 
== Background in Linear Classification Problem==
In this section, we will introduce the framework and basic idea of linear classification problem.  
+
    In this section, we will introduce the framework and basic idea of linear classification problem.  
  
In a linear classification problem, the feature space can be divided into different regions by hyperplanes. In this lecture, we will take a two-catagory case to illustrate. Given training samples <math> \vec{y}_1,\vec{y}_2,...\vec{y}_n \in \mathbb{R}^p</math>, each <math> \vec{y}_i </math> is a p-dimensional vector and belongs to either class <math> w_1</math> or <math>w_2</math>. The goal is to find the maximum-margin hyperplane that separate the points in the feature space that belong to class <math>w_1</math> from those belong to class<math>w_2</math>. The discriminate function can be written as  
+
&nbsp; &nbsp; In a linear classification problem, the feature space can be divided into different regions by hyperplanes. In this lecture, we will take a two-catagory case to illustrate. Given training samples <math> \textbf{y}_1,\textbf{y}_2,...\textbf{y}_n \in \mathbb{R}^p</math>, each <math> \textbf{y}_i </math> is a p-dimensional vector and belongs to either class <math> w_1</math> or <math>w_2</math>. The goal is to find the maximum-margin hyperplane that separate the points in the feature space that belong to class <math>w_1</math> from those that belong to class <math>w_2</math>. The discriminate function can be written as  
  
<center><math> g(\vec{y}) = \vec{c}\cdot\vec{y}</math> </center>
+
<center><math> g(\textbf{y}) = \textbf{c}\cdot\textbf{y}</math> </center>
  
We want to find <math>\vec{c}\in\mathbb{R}^{n+1}</math> so that a testing data point <math>\vec{y}_i</math> is labelled  
+
&nbsp; &nbsp; We want to find <math>\textbf{c}\in\mathbb{R}^{n+1}</math> so that a testing data point <math>\textbf{y}_i</math> is labelled  
  
<center><math> {w_1} </math>  if  <math> \vec{c}\cdot\vec{y}>0</math> </center>
+
<center><math> {w_1} </math>  if  <math> \textbf{c}\cdot\textbf{y}>0</math> </center>
  
<center><math> {w_2} </math>  if  <math> \vec{c}\cdot\vec{y}<0</math> </center>
+
<center><math> {w_2} </math>  if  <math> \textbf{c}\cdot\textbf{y}<0</math> </center>
  
We can apply a trick here to replace all <math>\vec{y}</math>'s in class <math>w_2</math> by <math>-\vec{y}</math>, then the task is looking for <math>\vec{c}</math> so that  
+
&nbsp; &nbsp; We can apply a trick here to replace all <math>\textbf{y}</math>'s in class <math>w_2</math> by <math>-\textbf{y}</math>, then the task is looking for <math>\textbf{c}</math> so that  
  
<center><math>\vec{c}\cdot \vec{y}>0, \forall \vec{y} \in </math>new sample space. </center>
+
<center><math>\textbf{c}\cdot \textbf{y}>0, \forall \textbf{y} \in </math>new sample space. </center>
  
Then hyperplane through origin is defined by <math>\vec{c}\cdot \vec{y} = 0</math>, where \vec{c} is the normal of the plane lying on the positive side of every hyperplane.
 
  
You might have already observe the ambiguity of c in the above discussion: if c separates data, <math>\lambda \vec{c}</math> also separates the data. One solution might be set <math>|\vec{c}|=1</math>. Another solution is to introduce the concept of "margin" which we denote by b, and ask  
+
&nbsp; &nbsp; You might have already observe the ambiguity of c in the above discussion: if c separates data, <math>\lambda \textbf{c}</math> also separates the data. One solution might be set <math>|\textbf{c}|=1</math>. Another solution is to introduce the concept of "margin" which we denote by b, and ask  
  
<center><math>\vec{c}\cdot\vec{y}\geqslant b > 0, \forall \vec{y} </math>.</center>
+
<center><math>\textbf{c}\cdot\textbf{y}\geqslant b > 0, \forall \textbf{y} </math>.</center>
  
In this scenario, <math>\frac{b_i}{||c||}</math> is the distance from each point to the hyperplane.  
+
&nbsp; &nbsp; In this scenario, the hyperplane is defined by <math>\textbf{c}\cdot \textbf{y} = \textbf{b}</math>. <math>\textbf{c}</math> is the normal of the plane lying on the positive side of every hyperplane. <math>\frac{b_i}{||c||}</math> is the distance from each point <math>\textbf{y}_i</math> to the hyperplane.  
  
  
However, it is not always possible to find a solution for c. An alternative approach is to find c that minimize a criterion function <math>J(\textbf{a})</math>that satisfy <math>\vec{c}\cdot \vec{y}>0</math>.
+
&nbsp; &nbsp; The above approach is equivalent to finding a solution for
 +
 
 +
<center><math>\textbf{Y}\textbf{c} = \begin{bmatrix} b_1\\b_2\\...\\b_n\end{bmatrix}</math></center>
 +
 
 +
where <math> \textbf{Y} =\begin{bmatrix} y_1^T \\ y_2^T \\ ... \\ y_n^T \end{bmatrix}</math>
 +
 
 +
 
 +
 
 +
&nbsp; &nbsp;  In most cases when n>p, it is always impossible to find a solution for c. An alternative approach is to find c that minimize a criterion function <math>J(\vec{c})</math>. There are variant forms of criterion functions. For example, we can try to minimize the error vector between <math>\vec{c}\cdot\vec{y}</math> and <math>b</math>, hence the criterion function  can be defined as <math>J(\vec{c}) = ||||

Revision as of 18:16, 1 May 2014


'Support Vector Machine and its Applications in Classification Problems
A slecture by Xing Liu Partially based on the ECE662 Spring 2014 lecture material of Prof. Mireille Boutin.



Outline of the slecture

  • Background in Linear Classification Problem
  • Support vector machine
  • Summary
  • References


Background in Linear Classification Problem

    In this section, we will introduce the framework and basic idea of linear classification problem.

    In a linear classification problem, the feature space can be divided into different regions by hyperplanes. In this lecture, we will take a two-catagory case to illustrate. Given training samples $ \textbf{y}_1,\textbf{y}_2,...\textbf{y}_n \in \mathbb{R}^p $, each $ \textbf{y}_i $ is a p-dimensional vector and belongs to either class $ w_1 $ or $ w_2 $. The goal is to find the maximum-margin hyperplane that separate the points in the feature space that belong to class $ w_1 $ from those that belong to class $ w_2 $. The discriminate function can be written as

$ g(\textbf{y}) = \textbf{c}\cdot\textbf{y} $

    We want to find $ \textbf{c}\in\mathbb{R}^{n+1} $ so that a testing data point $ \textbf{y}_i $ is labelled

$ {w_1} $ if $ \textbf{c}\cdot\textbf{y}>0 $
$ {w_2} $ if $ \textbf{c}\cdot\textbf{y}<0 $

    We can apply a trick here to replace all $ \textbf{y} $'s in class $ w_2 $ by $ -\textbf{y} $, then the task is looking for $ \textbf{c} $ so that

$ \textbf{c}\cdot \textbf{y}>0, \forall \textbf{y} \in $new sample space.


    You might have already observe the ambiguity of c in the above discussion: if c separates data, $ \lambda \textbf{c} $ also separates the data. One solution might be set $ |\textbf{c}|=1 $. Another solution is to introduce the concept of "margin" which we denote by b, and ask

$ \textbf{c}\cdot\textbf{y}\geqslant b > 0, \forall \textbf{y} $.

    In this scenario, the hyperplane is defined by $ \textbf{c}\cdot \textbf{y} = \textbf{b} $. $ \textbf{c} $ is the normal of the plane lying on the positive side of every hyperplane. $ \frac{b_i}{||c||} $ is the distance from each point $ \textbf{y}_i $ to the hyperplane.


    The above approach is equivalent to finding a solution for

$ \textbf{Y}\textbf{c} = \begin{bmatrix} b_1\\b_2\\...\\b_n\end{bmatrix} $

where $ \textbf{Y} =\begin{bmatrix} y_1^T \\ y_2^T \\ ... \\ y_n^T \end{bmatrix} $


    In most cases when n>p, it is always impossible to find a solution for c. An alternative approach is to find c that minimize a criterion function $ J(\vec{c}) $. There are variant forms of criterion functions. For example, we can try to minimize the error vector between $ \vec{c}\cdot\vec{y} $ and $ b $, hence the criterion function can be defined as $ J(\vec{c}) = |||| $

Alumni Liaison

Questions/answers with a recent ECE grad

Ryne Rayburn