Line 1: | Line 1: | ||
[http://balthier.ecn.purdue.edu/index.php/ECE662 ECE662 Main Page] | [http://balthier.ecn.purdue.edu/index.php/ECE662 ECE662 Main Page] | ||
− | Derivation of Fischer's Linear Discriminant | + | |
− | + | == Derivation of Fischer's Linear Discriminant == | |
+ | |||
Main article: [Derivation of Fisher's Linear Discriminant] | Main article: [Derivation of Fisher's Linear Discriminant] | ||
The derivation was completed in this lecture. | The derivation was completed in this lecture. | ||
− | Recall from last lecture | + | == Recall from last lecture == |
− | + | ||
Last time, we considered | Last time, we considered | ||
Line 15: | Line 15: | ||
</math> | </math> | ||
− | |||
− | |||
− | |||
− | |||
which is explicit function of <math>\vec{w}</math> | which is explicit function of <math>\vec{w}</math> | ||
− | |||
− | |||
− | |||
One can do this because numerator of <math>J(\vec{w})</math> can be written as | One can do this because numerator of <math>J(\vec{w})</math> can be written as | ||
Line 31: | Line 24: | ||
<math>\rightarrow S_B = (m_1 - m_2) (m_1^t - m_2^t)</math> | <math>\rightarrow S_B = (m_1 - m_2) (m_1^t - m_2^t)</math> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
In a same way, denominator can be written as | In a same way, denominator can be written as | ||
Line 50: | Line 34: | ||
− | Fisher Linear Discriminant | + | == Fisher Linear Discriminant == |
− | + | ||
.. |omega_0| image:: tex | .. |omega_0| image:: tex | ||
− | + | :alt: tex: \omega_0 | |
.. |generalized_evalue_prob| image:: tex | .. |generalized_evalue_prob| image:: tex | ||
− | + | :alt: tex: S_B\omega_0=\lambda S_W\omega_0 | |
.. |standard_evalue_prob| image:: tex | .. |standard_evalue_prob| image:: tex | ||
− | + | :alt: tex: {S_W}^{-1}S_B\omega_0=\lambda\omega_0 | |
− | + | - Maximizing J as defined in previous class | |
− | It is a known result that J is maximum at |omega_0| such that |generalized_evalue_prob|. This is the "Generalized eigen-value problem. | + | It is a known result that J is maximum at |omega_0| such that |generalized_evalue_prob|. This is the "Generalized eigen-value problem. |
.. |notsingular| image:: tex | .. |notsingular| image:: tex | ||
− | + | :alt: tex: |S_W|\neq 0 | |
Note that if |notsingular|, then |standard_evalue_prob|. It can be written as the "Standard eigenvalue problem". The only difficulty (which is a big difficulty when the feature space dimension is large) is that matrix inversion is very unstable. | Note that if |notsingular|, then |standard_evalue_prob|. It can be written as the "Standard eigenvalue problem". The only difficulty (which is a big difficulty when the feature space dimension is large) is that matrix inversion is very unstable. | ||
.. |cst_meandifference| image:: tex | .. |cst_meandifference| image:: tex | ||
− | + | :alt: tex: S_B\omega_0=(\vec{m1}-\vec{m2})(\vec{m1}-\vec{m2})^T\omega_0=cst.(\vec{m1}-\vec{m2}) | |
.. |reduced_prob| image:: tex | .. |reduced_prob| image:: tex | ||
− | + | :alt: tex: {S_W}^{-1}cst.(\vec{m1}-\vec{m2})=\lambda\omega_0 | |
.. |value_omega_0| image:: tex | .. |value_omega_0| image:: tex | ||
− | + | :alt: tex: \omega_0={S_W}^{-1}(\vec{m1}-\vec{m2}) | |
Observe that |cst_meandifference|. Therefore the standard eigenvalue problem as presented above becomes |reduced_prob|. From this equation, value of |omega_0| can easily be obtained, as |value_omega_0| or any constant multiple of this. Note that magnitude of |omega_0| is not important, the direction it represents is important. | Observe that |cst_meandifference|. Therefore the standard eigenvalue problem as presented above becomes |reduced_prob|. From this equation, value of |omega_0| can easily be obtained, as |value_omega_0| or any constant multiple of this. Note that magnitude of |omega_0| is not important, the direction it represents is important. | ||
− | Fischer's Linear Discriminant in Projected Coordinates | + | == Fischer's Linear Discriminant in Projected Coordinates == |
− | + | ||
**Claim** | **Claim** | ||
.. |claim1| image:: tex | .. |claim1| image:: tex | ||
− | + | :alt: tex: \vec{c}=\omega_0={S_W}^{-1}(\vec{m1}-\vec{m2}) | |
.. |prob_matrix1| image:: tex | .. |prob_matrix1| image:: tex | ||
− | + | :alt: tex: \mathbf{Y}\vec{c}=\vec{b} | |
.. |opt_b_value1| image:: tex | .. |opt_b_value1| image:: tex | ||
− | + | :alt: tex: \vec{b}=(d/d_1, \cdots, <d_1 times>, d/(d-d_1), \cdots, <(d-d_1) times>)^T | |
− | |claim1| | + | |claim1| |
is the solution to |prob_matrix1| with |opt_b_value1| | is the solution to |prob_matrix1| with |opt_b_value1| | ||
Line 108: | Line 91: | ||
.. |1DEx1| image:: tex | .. |1DEx1| image:: tex | ||
− | + | :alt: tex: \vec{\omega} \bullet y_i + \omega_0 > 0 | |
.. |1DEx2| image:: tex | .. |1DEx2| image:: tex | ||
− | + | :alt: tex: \vec{\omega} \bullet y_i + \omega_0 < 0 | |
.. |1DEx3| image:: tex | .. |1DEx3| image:: tex | ||
− | + | :alt: tex: \vec{\omega} \bullet y_i < 0 | |
.. |1DEx4| image:: tex | .. |1DEx4| image:: tex | ||
− | + | :alt: tex: \vec{\omega} \bullet y_i > 0 | |
Explanation *(feel free to change / correct)* | Explanation *(feel free to change / correct)* | ||
Line 132: | Line 115: | ||
− | + | == Support Vector Machines (SVM) == | |
A support vector for a hyperplane (vector c) with margin (bi >= b) is a sample (yio) such that (*yio = b.) (someone please fix with mathtype) | A support vector for a hyperplane (vector c) with margin (bi >= b) is a sample (yio) such that (*yio = b.) (someone please fix with mathtype) | ||
Line 140: | Line 123: | ||
Support Vector Machines are a two step process: | Support Vector Machines are a two step process: | ||
− | 1) Preprocessing - X1,...,Xd features in kth dimensional real space is mapped to features in n dimensional real space where n>>k. | + | 1) Preprocessing - X1,...,Xd features in kth dimensional real space is mapped to features in n dimensional real space where n>>k. |
2) Linear Classifier - separates classes in n dimensional real space via hyperplane. | 2) Linear Classifier - separates classes in n dimensional real space via hyperplane. | ||
− | + | - Support Vectors - for finding the hyperplane with the biggest margins. | |
− | + | - Kernel - to simplify computation (This is key for real world applications) | |
Revision as of 22:20, 16 March 2008
Contents
Derivation of Fischer's Linear Discriminant
Main article: [Derivation of Fisher's Linear Discriminant]
The derivation was completed in this lecture.
Recall from last lecture
Last time, we considered
$ J(\vec{w}) = \frac{\vec{w}^t S_B \vec{w}}{\vec{w}^t S_W \vec{w}} $
which is explicit function of $ \vec{w} $
One can do this because numerator of $ J(\vec{w}) $ can be written as
$ \mid \tilde m_1 - \tilde m_2 \mid^2 = \mid w \cdot (m_1 - m_2) \mid^2 = w^t (m_1 - m_2) (m_1^t - m_2^t) w $
$ \rightarrow S_B = (m_1 - m_2) (m_1^t - m_2^t) $
In a same way, denominator can be written as
$ \tilde s_1^2 + \tilde s_2^2 = \sum_{y_i \in class \ i} (w \cdot y_i - \tilde m_1)^2 = \sum w^t (y_i - m_i)(y_i^t - m_i^t) w $
$ = w^t \left[ \sum (y_i - m_i)(y_i^t - m_i^t) \right] w $
$ \rightarrow S_W = \sum_{y_i \in class \ i} (y_i - m_i)(y_i^t - m_i^t) $
Fisher Linear Discriminant
.. |omega_0| image:: tex
- alt: tex: \omega_0
.. |generalized_evalue_prob| image:: tex
- alt: tex: S_B\omega_0=\lambda S_W\omega_0
.. |standard_evalue_prob| image:: tex
- alt: tex: {S_W}^{-1}S_B\omega_0=\lambda\omega_0
- Maximizing J as defined in previous class
It is a known result that J is maximum at |omega_0| such that |generalized_evalue_prob|. This is the "Generalized eigen-value problem.
.. |notsingular| image:: tex
- alt: tex: |S_W|\neq 0
Note that if |notsingular|, then |standard_evalue_prob|. It can be written as the "Standard eigenvalue problem". The only difficulty (which is a big difficulty when the feature space dimension is large) is that matrix inversion is very unstable.
.. |cst_meandifference| image:: tex
- alt: tex: S_B\omega_0=(\vec{m1}-\vec{m2})(\vec{m1}-\vec{m2})^T\omega_0=cst.(\vec{m1}-\vec{m2})
.. |reduced_prob| image:: tex
- alt: tex: {S_W}^{-1}cst.(\vec{m1}-\vec{m2})=\lambda\omega_0
.. |value_omega_0| image:: tex
- alt: tex: \omega_0={S_W}^{-1}(\vec{m1}-\vec{m2})
Observe that |cst_meandifference|. Therefore the standard eigenvalue problem as presented above becomes |reduced_prob|. From this equation, value of |omega_0| can easily be obtained, as |value_omega_0| or any constant multiple of this. Note that magnitude of |omega_0| is not important, the direction it represents is important.
Fischer's Linear Discriminant in Projected Coordinates
- Claim**
.. |claim1| image:: tex
- alt: tex: \vec{c}=\omega_0={S_W}^{-1}(\vec{m1}-\vec{m2})
.. |prob_matrix1| image:: tex
- alt: tex: \mathbf{Y}\vec{c}=\vec{b}
.. |opt_b_value1| image:: tex
- alt: tex: \vec{b}=(d/d_1, \cdots, <d_1 times>, d/(d-d_1), \cdots, <(d-d_1) times>)^T
|claim1|
is the solution to |prob_matrix1| with |opt_b_value1|
Here is an animation of the 1D example given in class on projections
.. image:: projection1.gif
.. |1DEx1| image:: tex
- alt: tex: \vec{\omega} \bullet y_i + \omega_0 > 0
.. |1DEx2| image:: tex
- alt: tex: \vec{\omega} \bullet y_i + \omega_0 < 0
.. |1DEx3| image:: tex
- alt: tex: \vec{\omega} \bullet y_i < 0
.. |1DEx4| image:: tex
- alt: tex: \vec{\omega} \bullet y_i > 0
Explanation *(feel free to change / correct)*
starts with |1DEx1| for class 1 and |1DEx2| for class 2
the data points are then projected onto an axis at 1 which results in
|1DEx4| for class 1 and |1DEx3| for class 2
one class is then projected onto an axis at -1 which results in
|1DEx4| for all yi
Support Vector Machines (SVM)
A support vector for a hyperplane (vector c) with margin (bi >= b) is a sample (yio) such that (*yio = b.) (someone please fix with mathtype)
.. image:: lec11_sv_pic1.bmp .. image:: lec11_sv_pic2.bmp
Support Vector Machines are a two step process: 1) Preprocessing - X1,...,Xd features in kth dimensional real space is mapped to features in n dimensional real space where n>>k. 2) Linear Classifier - separates classes in n dimensional real space via hyperplane. - Support Vectors - for finding the hyperplane with the biggest margins. - Kernel - to simplify computation (This is key for real world applications)
Previous: [Lecture 10]
Next: [Lecture 12]