(New page: [http://balthier.ecn.purdue.edu/index.php/ECE662 ECE662 Main Page] [http://balthier.ecn.purdue.edu/index.php/ECE662#Class_Lecture_Notes Class Lecture Notes])
 
Line 1: Line 1:
 
[http://balthier.ecn.purdue.edu/index.php/ECE662 ECE662 Main Page]
 
[http://balthier.ecn.purdue.edu/index.php/ECE662 ECE662 Main Page]
 +
 +
Derivation of Fischer's Linear Discriminant
 +
===========================
 +
Main article: [Derivation of Fisher's Linear Discriminant]
 +
 +
The derivation was completed in this lecture.
 +
 +
Recall from last lecture
 +
=================
 +
 +
Last time, we considered
 +
 +
<math>J(\vec{w}) = \frac{\vec{w}^t S_B \vec{w}}{\vec{w}^t S_W \vec{w}}
 +
</math>
 +
 +
|jinha_Jw|
 +
 +
.. |jinha_Jw| image:: tex
 +
  :alt: tex: J(\vec{w}) = \frac{\vec{w}^t S_B \vec{w}}{\vec{w}^t S_W \vec{w}}
 +
 +
which is explicit function of <math>\vec{w}</math>
 +
 +
.. |jinha_vecw| image:: tex
 +
  :alt: tex: \vec{w}
 +
 +
One can do this because numerator of <math>J(\vec{w})</math> can be written as
 +
 +
<math>\mid \tilde m_1 - \tilde m_2 \mid^2 = \mid w \cdot (m_1 - m_2) \mid^2 = w^t (m_1 - m_2) (m_1^t - m_2^t) w</math>
 +
 +
<math>\rightarrow S_B = (m_1 - m_2) (m_1^t - m_2^t)</math>
 +
 +
|jinha_top|
 +
 +
.. |jinha_top| image:: tex
 +
  :alt: tex: \mid \tilde m_1 - \tilde m_2 \mid^2 = \mid w \cdot (m_1 - m_2) \mid^2 = w^t (m_1 - m_2) (m_1^t - m_2^t) w
 +
 +
|jinha_sb|
 +
 +
.. |jinha_sb| image:: tex
 +
  :alt: tex: \rightarrow S_B = (m_1 - m_2) (m_1^t - m_2^t)
 +
 +
In a same way, denominator can be written as
 +
 +
<math>\tilde s_1^2 + \tilde s_2^2 = \sum_{y_i \in class \ i} (w \cdot y_i - \tilde m_1)^2 = \sum w^t (y_i - m_i)(y_i^t - m_i^t) w</math>
 +
 +
<math>= w^t \left[ \sum (y_i - m_i)(y_i^t - m_i^t) \right] w</math>
 +
 +
<math>\rightarrow S_W = \sum_{y_i \in class \ i} (y_i - m_i)(y_i^t - m_i^t)</math>
 +
 +
 +
Fisher Linear Discriminant
 +
===================
 +
 +
 +
.. |omega_0| image:: tex
 +
  :alt: tex: \omega_0
 +
 +
.. |generalized_evalue_prob| image:: tex
 +
  :alt: tex: S_B\omega_0=\lambda S_W\omega_0
 +
 +
 +
.. |standard_evalue_prob| image:: tex
 +
  :alt: tex: {S_W}^{-1}S_B\omega_0=\lambda\omega_0
 +
 +
 +
  - Maximizing J as defined in previous class
 +
It is a known result that J is maximum at |omega_0| such that |generalized_evalue_prob|. This is the "Generalized eigen-value problem.
 +
 +
.. |notsingular| image:: tex
 +
  :alt: tex: |S_W|\neq 0
 +
 +
Note that if |notsingular|, then |standard_evalue_prob|. It can be written as the "Standard eigenvalue problem". The only difficulty (which is a big difficulty when the feature space dimension is large) is that matrix inversion is very unstable.
 +
 +
.. |cst_meandifference| image:: tex
 +
  :alt: tex: S_B\omega_0=(\vec{m1}-\vec{m2})(\vec{m1}-\vec{m2})^T\omega_0=cst.(\vec{m1}-\vec{m2})
 +
 +
.. |reduced_prob| image:: tex
 +
  :alt: tex: {S_W}^{-1}cst.(\vec{m1}-\vec{m2})=\lambda\omega_0
 +
 +
.. |value_omega_0| image:: tex
 +
  :alt: tex: \omega_0={S_W}^{-1}(\vec{m1}-\vec{m2})
 +
 +
Observe that |cst_meandifference|. Therefore the standard eigenvalue problem as presented above becomes |reduced_prob|. From this equation, value of |omega_0| can easily be obtained, as |value_omega_0| or any constant multiple of this. Note that magnitude of |omega_0| is not important, the direction it represents is important.
 +
 +
 +
Fischer's Linear Discriminant in Projected Coordinates
 +
=====================================
 +
**Claim**
 +
 +
.. |claim1| image:: tex
 +
  :alt: tex: \vec{c}=\omega_0={S_W}^{-1}(\vec{m1}-\vec{m2})
 +
 +
.. |prob_matrix1| image:: tex
 +
  :alt: tex: \mathbf{Y}\vec{c}=\vec{b}
 +
 +
.. |opt_b_value1| image:: tex
 +
  :alt: tex: \vec{b}=(d/d_1, \cdots, <d_1 times>, d/(d-d_1), \cdots, <(d-d_1) times>)^T
 +
 +
|claim1|
 +
 +
is the solution to |prob_matrix1| with |opt_b_value1|
 +
 +
 +
Here is an animation of the 1D example given in class on projections
 +
 +
.. image:: projection1.gif
 +
 +
.. |1DEx1| image:: tex
 +
  :alt: tex: \vec{\omega} \bullet y_i + \omega_0 > 0
 +
 +
.. |1DEx2| image:: tex
 +
  :alt: tex: \vec{\omega}  \bullet y_i + \omega_0 < 0
 +
 +
.. |1DEx3| image:: tex
 +
  :alt: tex: \vec{\omega} \bullet y_i  < 0
 +
 +
.. |1DEx4| image:: tex
 +
  :alt: tex: \vec{\omega} \bullet y_i  > 0
 +
 +
Explanation *(feel free to change / correct)*
 +
 +
starts with |1DEx1| for class 1 and |1DEx2| for class 2
 +
 +
the data points are then projected onto an axis at 1 which results in
 +
 +
|1DEx4| for class 1 and |1DEx3| for class 2
 +
 +
one class is then projected onto an axis at -1 which results in
 +
 +
|1DEx4| for all yi
 +
 +
 +
**Support Vector Machines (SVM)**
 +
 +
A support vector for a hyperplane (vector c) with margin (bi >= b) is a sample (yio) such that (*yio = b.)  (someone please fix with mathtype)
 +
 +
.. image:: lec11_sv_pic1.bmp
 +
.. image:: lec11_sv_pic2.bmp
 +
 +
Support Vector Machines are a two step process:
 +
1)  Preprocessing - X1,...,Xd features in kth dimensional real space is mapped to features in n dimensional real space where n>>k. 
 +
2)  Linear Classifier - separates classes in n dimensional real space via hyperplane.
 +
  -  Support Vectors - for finding the hyperplane with the biggest margins.
 +
  -  Kernel - to simplify computation (This is key for real world applications)
 +
 +
 +
Previous: [Lecture 10]
 +
Next: [Lecture 12]
 +
  
 
[http://balthier.ecn.purdue.edu/index.php/ECE662#Class_Lecture_Notes Class Lecture Notes]
 
[http://balthier.ecn.purdue.edu/index.php/ECE662#Class_Lecture_Notes Class Lecture Notes]

Revision as of 22:14, 16 March 2008

ECE662 Main Page

Derivation of Fischer's Linear Discriminant

===============

Main article: [Derivation of Fisher's Linear Discriminant]

The derivation was completed in this lecture.

Recall from last lecture

=====

Last time, we considered

$ J(\vec{w}) = \frac{\vec{w}^t S_B \vec{w}}{\vec{w}^t S_W \vec{w}} $

|jinha_Jw|

.. |jinha_Jw| image:: tex

  :alt: tex: J(\vec{w}) = \frac{\vec{w}^t S_B \vec{w}}{\vec{w}^t S_W \vec{w}}

which is explicit function of $ \vec{w} $

.. |jinha_vecw| image:: tex

  :alt: tex: \vec{w}

One can do this because numerator of $ J(\vec{w}) $ can be written as

$ \mid \tilde m_1 - \tilde m_2 \mid^2 = \mid w \cdot (m_1 - m_2) \mid^2 = w^t (m_1 - m_2) (m_1^t - m_2^t) w $

$ \rightarrow S_B = (m_1 - m_2) (m_1^t - m_2^t) $

|jinha_top|

.. |jinha_top| image:: tex

  :alt: tex: \mid \tilde m_1 - \tilde m_2 \mid^2 = \mid w \cdot (m_1 - m_2) \mid^2 = w^t (m_1 - m_2) (m_1^t - m_2^t) w

|jinha_sb|

.. |jinha_sb| image:: tex

  :alt: tex: \rightarrow S_B = (m_1 - m_2) (m_1^t - m_2^t)

In a same way, denominator can be written as

$ \tilde s_1^2 + \tilde s_2^2 = \sum_{y_i \in class \ i} (w \cdot y_i - \tilde m_1)^2 = \sum w^t (y_i - m_i)(y_i^t - m_i^t) w $

$ = w^t \left[ \sum (y_i - m_i)(y_i^t - m_i^t) \right] w $

$ \rightarrow S_W = \sum_{y_i \in class \ i} (y_i - m_i)(y_i^t - m_i^t) $


Fisher Linear Discriminant

=======

.. |omega_0| image:: tex

  :alt: tex: \omega_0

.. |generalized_evalue_prob| image:: tex

  :alt: tex: S_B\omega_0=\lambda S_W\omega_0


.. |standard_evalue_prob| image:: tex

  :alt: tex: {S_W}^{-1}S_B\omega_0=\lambda\omega_0


  - Maximizing J as defined in previous class

It is a known result that J is maximum at |omega_0| such that |generalized_evalue_prob|. This is the "Generalized eigen-value problem.

.. |notsingular| image:: tex

  :alt: tex: |S_W|\neq 0

Note that if |notsingular|, then |standard_evalue_prob|. It can be written as the "Standard eigenvalue problem". The only difficulty (which is a big difficulty when the feature space dimension is large) is that matrix inversion is very unstable.

.. |cst_meandifference| image:: tex

  :alt: tex: S_B\omega_0=(\vec{m1}-\vec{m2})(\vec{m1}-\vec{m2})^T\omega_0=cst.(\vec{m1}-\vec{m2})

.. |reduced_prob| image:: tex

  :alt: tex: {S_W}^{-1}cst.(\vec{m1}-\vec{m2})=\lambda\omega_0

.. |value_omega_0| image:: tex

  :alt: tex: \omega_0={S_W}^{-1}(\vec{m1}-\vec{m2})

Observe that |cst_meandifference|. Therefore the standard eigenvalue problem as presented above becomes |reduced_prob|. From this equation, value of |omega_0| can easily be obtained, as |value_omega_0| or any constant multiple of this. Note that magnitude of |omega_0| is not important, the direction it represents is important.


Fischer's Linear Discriminant in Projected Coordinates

=========================
    • Claim**

.. |claim1| image:: tex

  :alt: tex: \vec{c}=\omega_0={S_W}^{-1}(\vec{m1}-\vec{m2})

.. |prob_matrix1| image:: tex

  :alt: tex: \mathbf{Y}\vec{c}=\vec{b}

.. |opt_b_value1| image:: tex

  :alt: tex: \vec{b}=(d/d_1, \cdots, <d_1 times>, d/(d-d_1), \cdots, <(d-d_1) times>)^T

|claim1|

is the solution to |prob_matrix1| with |opt_b_value1|


Here is an animation of the 1D example given in class on projections

.. image:: projection1.gif

.. |1DEx1| image:: tex

  :alt: tex: \vec{\omega} \bullet y_i + \omega_0 > 0

.. |1DEx2| image:: tex

  :alt: tex: \vec{\omega}  \bullet y_i + \omega_0 < 0

.. |1DEx3| image:: tex

  :alt: tex: \vec{\omega} \bullet y_i  < 0

.. |1DEx4| image:: tex

  :alt: tex: \vec{\omega} \bullet y_i  > 0

Explanation *(feel free to change / correct)*

starts with |1DEx1| for class 1 and |1DEx2| for class 2

the data points are then projected onto an axis at 1 which results in

|1DEx4| for class 1 and |1DEx3| for class 2

one class is then projected onto an axis at -1 which results in

|1DEx4| for all yi


    • Support Vector Machines (SVM)**

A support vector for a hyperplane (vector c) with margin (bi >= b) is a sample (yio) such that (*yio = b.) (someone please fix with mathtype)

.. image:: lec11_sv_pic1.bmp .. image:: lec11_sv_pic2.bmp

Support Vector Machines are a two step process: 1) Preprocessing - X1,...,Xd features in kth dimensional real space is mapped to features in n dimensional real space where n>>k. 2) Linear Classifier - separates classes in n dimensional real space via hyperplane.

 -  Support Vectors - for finding the hyperplane with the biggest margins.
 -  Kernel - to simplify computation (This is key for real world applications)


Previous: [Lecture 10] Next: [Lecture 12]


Class Lecture Notes

Alumni Liaison

Basic linear algebra uncovers and clarifies very important geometry and algebra.

Dr. Paul Garrett