(One intermediate revision by the same user not shown)
Line 1: Line 1:
Thursday, April 8th 2010<br>(Continuation of the linear discriminant of lecture 19)<br>
+
[[Category:2010 Spring ECE 662 mboutin]]
  
Let's define the vectors <math>\vec{y}_{i}</math>, <math> i \in \left (1,N \right) </math> such that <math>\vec{y}_{i}\in \mathcal{D}</math> and <math>\mathcal{D} = \mathcal{D}_{1} \cup \mathcal{D}_{2}</math>.<br>
+
=Details of Lecture 20, [[ECE662]] Spring 2010=
 +
Thursday, April 8th 2010
  
Let's define <math>\mathcal{D}</math> such that <math>\mathcal{D} = \mathcal{D}_{1} \cup \mathcal{D}_{2}</math>.<br><br>
+
(See [[noteslecture20ECE662S10|Lecture notes]].)
  
There exists a vector <math>\vec{c}</math> for which:<br>
+
In Lecture 20, we continued our discussion of linear discriminants.
  
<math>
+
The lecture notes have been typed by a student and can be found [[noteslecture20ECE662S10|here]].  
\begin{align}
+
    \vec{c} \cdot \left ( 1, y_{1}, y_{2}, ... , y_{n} \right ) & > 0, \forall \left ( y_{1}, ... , y_{n} \right ) \in \mathcal{D}_{1} \\
+
                                                                & < 0, \forall \left ( y_{1}, ... , y_{n} \right ) \in \mathcal{D}_{2} \\
+
\end{align}
+
</math><br>
+
  
We can transform the second inequality into:<br><math>\vec{c} \cdot \left ( -1, -y_{1}, -y_{2}, ... , -y_{n} \right ) > 0, \forall \left ( y_{1}, ... , y_{n} \right ) \in \mathcal{D}_{2} </math> <br>
+
Previous: [[Lecture19ECE662S10|Lecture 19]]
 +
Next: [[Lecture21ECE662S10|Lecture 21]]
  
<br>
+
----
<ins>Example in 1D:</ins><br>
+
[[ 2010 Spring ECE 662 mboutin|Back to 2010 Spring ECE 662 mboutin]]
[[Image:Fig1.png]]<br>
+
<table>
+
<tr>
+
<td>[[Image:Fig2.png]]</td>
+
<td><math>g(y_{1}) = -y_{1} + 3.5</math> <br>
+
<math>g(y_{1}) = 0 </math> is the decision hyperplane. <br><br>
+
<math>\vec{c} \cdot \left ( X_{1}, X_{2} \right ) = 0</math> is equivalent here to <math>3.5 X_{1} - X_{2} = 0 </math> which is the decision hyperplane in the new space. <br>
+
<math>\vec{c} = \left ( 3.5, -1 \right )</math>
+
</td>
+
</tr>
+
</table>
+
<br>
+
 
+
<math>\vec{c}</math> is not unique. It can be any vector with the same direction, it's norm does not count.<br>
+
 
+
<math>
+
\mathcal{D}_{1}
+
\begin{cases}
+
\vec{y} = (3)  \\
+
\vec{y} = (2)  \\
+
\vec{y} = (1)  \\
+
\vec{y} = (-1) \\
+
\vec{y} = (-2) \\
+
\vec{y} = (-3)
+
\end{cases}
+
 
+
\mathcal{D}_{2}
+
\begin{cases}
+
\vec{y} = (4)  \\
+
\vec{y} = (5)  \\
+
\vec{y} = (6)  \\
+
\vec{y} = (7)
+
\end{cases}
+
g(y_{1}) = cst.y_{1} + cst
+
</math>
+
                      <math>\Downarrow</math>
+
<math>
+
new \mathcal{D}_{1}
+
\begin{cases}
+
\vec{y} = (1,3)  \\
+
\vec{y} = (1,2)  \\
+
\vec{y} = (1,1)  \\
+
\vec{y} = (1,-1) \\
+
\vec{y} = (1,-2) \\
+
\vec{y} = (1,-3)
+
\end{cases}
+
 
+
new \mathcal{D}_{2}
+
\begin{cases}
+
\vec{y} = (1,4)  \\
+
\vec{y} = (1,5)  \\
+
\vec{y} = (1,6)  \\
+
\vec{y} = (1,7)
+
\end{cases}
+
</math>
+
 
+
<math>
+
\vec{c} \cdot \vec{y} > 0, \vec{y} \in \mathcal{D}_{1}
+
</math>
+
<br>
+
<math>
+
\vec{c} \cdot \vec{y} < 0, \vec{y} \in \mathcal{D}_{2}
+
</math>
+
<br>
+
We modify <math>\mathcal{D}_{2}</math> such that <math>\vec{c} \cdot \vec{y} > 0, \forall \vec{y} \in \mathcal{D}</math>:<br>
+
<math>
+
new \mathcal{D}_{2}
+
\begin{cases}
+
\vec{y} = (-1,-4)  \\
+
\vec{y} = (-1,-5)  \\
+
\vec{y} = (-1,-6)  \\
+
\vec{y} = (-1,-7)
+
\end{cases}
+
</math>
+
 
+
<math>\vec{c}</math> is not unique.
+
 
+
1) <ins>Norm ambiguity</ins><br>
+
 
+
If <math>\vec{c}</math> separates the data, i.e. <math>\vec{c} \cdot \vec{y} > 0, \forall \vec{y} \in \mathcal{D}</math>, then <math>\lambda\vec{c}</math> also separates it, <math>\forall \lambda \in \mathcal{R}_{>0}</math>.<br>
+
<math>\vec{c} \cdot \vec{y} > 0 \Rightarrow \lambda (\vec{c} \cdot \vec{y}) > 0</math><br>
+
We can set <math>\left | \vec{c} \right | = 1</math>.<br>
+
<br>
+
 
+
2) <ins>Separation location ambiguity</ins><br>
+
[[Image:Fig3.png]]<br>
+
Here we can take <math>\vec{c} = (\beta,-1)</math>, with any <math>\beta \in (3,4)</math>.
+
 
+
[[Image:Fig4.png]]<br>
+
[[Image:Fig7.png]]<br>
+
To uniquely define <math>\vec{c}</math>, introduce margin <math>b>0</math> and ask that <math>\vec{c}</math> be the minimum length vector such that <math>\vec{c} \cdot \vec{y} \ge b, \forall \vec{y} \in \mathcal{D}</math> (after transformation). Then take the separation as <math>\vec{c} \cdot \vec{y} = 0</math>.<br>
+
 
+
<ins>Observe</ins><br>
+
 
+
<math>\exists \vec{y}_{i_{0}} \in \mathcal{D}</math> such that <math>\vec{c} \cdot \vec{y}_{i_{0}} = b</math><br>
+
Otherwise write <math>\vec{c} \cdot \vec{y}_{i} = b + \epsilon_{i}, \epsilon_{i} > 0</math><br>
+
Let <math>\vec{c}^\prime = \dfrac{b}{b+\epsilon_{i_{0}}}\vec{c}</math> where <math>\epsilon_{i_{0}} = min_{i} \epsilon_{i}</math>
+
Observe that <math>\left | \vec{c}^\prime \right | < \left | \vec{c} \right |</math> and <math>\vec{c}^\prime \cdot \vec{y} = \dfrac{b}{b+\epsilon_{i_{0}}}\vec{c} \cdot \vec{y} = \dfrac{b}{b+\epsilon_{i_{0}}} ( b + \epsilon_{i} ) \ge b, \forall \epsilon_{i}</math> (equality when <math>i = i_{0}</math>) <br>
+
 
+
 
+
 
+
<math> \vec{c} \cdot \vec{y}_{i_{0}} = b </math> means <math> \vec{y}_{i_{0}} </math> belongs to the plane with normal vector <math> \vec{c} </math> and at distance <math> \dfrac{b}{\left | \vec{y}_{i_{0}} \right |} </math> from the origin (no other <math>\vec{y}_{i} \in \mathcal{D}</math> is closer to it).
+
 
+
Typically its impossible to satisfy <math>\vec{c} \cdot \vec{y} > 0, \forall \vec{y} \in \mathcal{D}</math>.
+
Try to do "as best as you can".<br>
+
 
+
<ins>Idea:</ins><br>
+
Try to minimize the number of misclassified training samples.<br>
+
 
+
Let <math>J(\vec{c}) = \left | \vec{y}_{1} \in \mathcal{D} \right | \vec{c} \cdot \vec{y}_{1} \le 0</math> be the cost function. Hard to minimize !<br>
+
 
+
Other idea "Perceptron criteria function".
+
 
+
Let <math>J_{p}(\vec{c}) = \sum_{\vec{y}_{i} misclassified <br> \vec{y}_{i} \in \mathcal{D}} -\vec{c} \cdot \vec{y}_{i}</math><br>
+
 
+
Measures how "far away" you are from classifying all training data correctly.<br>
+
 
+
<ins>Geometric interpretation</ins><br>
+
 
+
<table>
+
<tr>
+
<td>[[Image:Fig6.png]]</td>
+
<td>Distance from y<sup>i</sup> to plane <math>\vec{c} \cdot \vec{y} = 0</math> is:<br>
+
<math>\left | projection\, of\, \vec{y}_{i}\, onto\, \dfrac{\vec{c}}{\left | \vec{c} \right |} \right | = \left | \vec{y}_{i} \cdot \dfrac{\vec{c}}{\left | \vec{c} \right |} \right | = \dfrac{}{\left | \vec{c} \right |} . \left | \vec{y}_{i} \cdot \vec{c} \right |
+
</math>
+
</td>
+
<td><math>(\vec{c} \cdot \vec{y}_{i})\, is\, negative\, when\, \vec{y}_{i}\, is\, misclassified</math>
+
</td>
+
</tr>
+
</table>
+
<br>
+
+
+
So <math>-\vec{c} \cdot \vec{y}_{i}</math> is proportional to the distance from y<sup>i</sup> to the right side (being right).<br>
+
Can use gradient descent procedure to minimize <math>J_{p}(\vec{c})</math>
+
<math>
+
\nabla J_{p}(\vec{c}) =
+
\begin{pmatrix}
+
  \dfrac{\partial}{\partial c_{1}} \\
+
  \dfrac{\partial}{\partial c_{2}} \\
+
  \vdots \\
+
  \dfrac{\partial}{\partial c_{n+1}}
+
\end{pmatrix} J_{p}(\vec{c}) =
+
 
+
\begin{pmatrix}
+
  \dfrac{\partial}{\partial c_{1}} \\
+
  \dfrac{\partial}{\partial c_{2}} \\
+
  \vdots \\
+
  \dfrac{\partial}{\partial c_{n+1}}
+
\end{pmatrix}
+
\sum_{\vec{y}_{i}\, misclassified} -\vec{c} \cdot \vec{y}_{i} =
+
 
+
\begin{pmatrix}
+
  \sum_{\vec{y}_{i}\, miscl.} -\vec{y}_{i,1} \\
+
  \sum_{\vec{y}_{i}\, miscl.} -\vec{y}_{i,2} \\
+
  \vdots \\
+
  \sum_{\vec{y}_{i}\, miscl.} -\vec{y}_{i,n+1}
+
\end{pmatrix}
+
 
+
</math>
+
<br>
+
 
+
Follow basic gradient descent procedure:<br>
+
<ul>
+
<li>initialize with guess <math>\vec{c}_{1}</math></li>
+
<li>upgrade to better guess by moving in direction of steepest descent<br>
+
<math>\vec{c}_{2} = \vec{c}_{1} - \underbrace{\eta(1)}_{step\, size} \nabla J_{p}(\vec{c}_{1})</math>
+
</li>
+
<li>iterate <math>\vec{c}_{k+1} = \vec{c}_{k} - \eta(k) \nabla J_{p}(\vec{c}_{k}) </math> until convergence
+
</ul>
+
<br>
+
 
+
Interpretation:<br>
+
<math>\vec{c}_{k+1} = \vec{c}_{k} - \underbrace{\eta(k) \sum_{\vec{y}_{i}\, misclassified} -\vec{y}_{i}}_{
+
\begin{matrix}
+
update\, by\, adding\, a\, constant\, \\
+
multiple\, of\, the\, sum\, of\, all\, \\
+
misclassified\, samples
+
\end{matrix}
+
}</math>
+
<br>
+
<br>
+
 
+
When dealing with multiple classes, 2 possible ways:<br>
+
<ul>
+
  <li>take the classes 2 by 2 -> 2<sup>(n+1)</sup> separations</li>
+
  <li>or take each class against the rest -> (n+1) separations</li>
+
</ul>
+
 
+
 
+
<br><br>
+
(--roy8 17:48, 8 May 2010 (UTC))
+

Latest revision as of 09:28, 11 May 2010


Details of Lecture 20, ECE662 Spring 2010

Thursday, April 8th 2010

(See Lecture notes.)

In Lecture 20, we continued our discussion of linear discriminants.

The lecture notes have been typed by a student and can be found here.


Previous: Lecture 19 Next: Lecture 21


Back to 2010 Spring ECE 662 mboutin

Alumni Liaison

Basic linear algebra uncovers and clarifies very important geometry and algebra.

Dr. Paul Garrett