Difference between revisions of "Lecture 6 - Discriminant Functions Old Kiwi" - Rhea

Revision as of 14:12, 16 March 2008

LECTURE THEME : - Discriminant Functions

To separate several classes, we can draw the "skeleton" (Blum) of shape defined by mean vectors:

.. image:: lec6_skel.bmp

skeleton= set of points whose distance to the set ${\mu_1, ..., \mu_k}$ is achieved by at least two different |muis|, i.e., we have |distset|

.. |distset| image:: tex

alt: tex: dist(x,set)=min \{dist(x,\mu_i)\}

.. |muset| image:: tex

alt: tex:

.. |muis| image:: tex

alt: tex: \mu_{i}'s

.. |x| image:: tex

alt: tex:\vec{x}

.. |w1| image:: tex

alt: tex: w_1

.. |w2| image:: tex

alt: tex: w_2

.. |mu1| image:: tex

alt: tex: \mu_1

.. |mu2| image:: tex

alt: tex: \mu_2

and want |existi12| such that |distset2|

.. |existi12| image:: tex

alt: tex: \exists i_1 \neq i_2

.. |distset2| image:: tex

alt: tex: dist(x,set)=dist(x,\mu_i_1) =dist(x,\mu_i_2)

The skeleton is a decision boundary defining regions (chambers) |Ri| where we should decide |wi|.

.. |Ri| image:: tex

alt: tex: R_i

.. |wi| image:: tex

alt: tex:w_i

What is the equation of these hyperplanes?

Recall the hyperplane equation: |hyperplaneeqn|

.. |hyperplaneeqn| image:: tex

alt: tex: \{ \vec{x} | \vec{n} \cdot \vec{x} = const \}

|n| is a normal vector to the plane. Because if |x1| and |x2| are in this plane, |orthogonal1|

|orthogonal2|

|orthogonal3|

.. |orthogonal1| image:: tex

alt: tex: \Longrightarrow \vec{n} \cdot \vec{x_1} = const, \vec{n} \cdot \vec{x_2} = const

.. |orthogonal2| image:: tex

alt: tex: \Longrightarrow \vec{n} \cdot (\vec{x_1} - \vec{x_2}) = const - const = 0

.. |orthogonal3| image:: tex

alt: tex: \therefore \vec{n} \bot ( \vec{x_1} - \vec{x_2})

.. |n| image:: tex

alt: tex: \vec{n}

.. |x1| image:: tex

alt: tex:\vec{x_1}

.. |x2| image:: tex

alt: tex: \mathbf{x_2}

Any linear structure can be written as |linearstructure|

.. |linearstructure| image:: tex

alt: tex: \sum_{i=1}^{n} c_i x_i + const = 0

Ex. of planes in |R2|

.. |R2| image:: tex

alt: tex: \Re^{2}

Example: for two classes |w1|, |w2| hyperplane is defined by

|classhyperplane|

.. |classhyperplane| image:: tex

alt: tex: \{ \vec{x} | g_1(\vec{x}) - g_2(\vec{x}) = 0 \}

where, |gdef1|

|gdef2|

|gdef3|

but |x1dotmui| is scalar |x1dotmuit|

.. |x1dotmui| image:: tex

alt: tex: \vec{x}^{\top}\vec{\mu_i}

.. |x1dotmuit| image:: tex

alt: tex: \Longrightarrow \left( \vec{x}^{\top} \vec{\mu_i}\right)^{\top} = \vec{\mu_i}^{\top}\vec{x} = \vec{x}^{\top}\vec{\mu_i}

|gdef4|

First term is independent of |i|, therefore we can remove first term from |gi|

.. |i| image:: tex

alt: tex: i

.. |gi| image:: tex

alt: tex: g_i\left( \vec{x}\right)

|gdef5|

.. |gdef1| image:: tex

alt: tex: g_i(\vec{x})=-\frac{1}{2\sigma^2} \|\vec{x}-\mu_i\|_{L_2}^2+ \ln P(w_i)

.. |gdef2| image:: tex

alt: tex: = -\frac{1}{2\sigma^2} ((\vec{x}-\mu_i)^{\top}(\vec{x}-\mu_i)) + \ln P(w_i)

.. |gdef3| image:: tex

alt: tex: = -\frac{1}{2\sigma^2} (\vec{x}^{\top}\vec{x} - \vec{x}^{\top}\mu_i -\mu_i^{\top}\vec{x} + \mu_i^{\top}\mu_i ) + \ln P(w_i)

.. |gdef4| image:: tex

alt: tex: \Longrightarrow g_i(\vec{x}) = -\frac{1}{2\sigma^2} \|\vec{x}\|^2 + \frac{1}{\sigma^2} \vec{x} \cdot \vec{\mu_i} - \frac{\mu_i^{\top}\mu_i}{2\sigma^2} + \ln P(w_i)

.. |gdef5| image:: tex

alt: tex: \Longrightarrow g_i(\vec{x}) = \frac{1}{\sigma^2} \vec{x} \cdot \vec{\mu_i} - \frac{\vec{\mu_i} \cdot \vec{\mu_i}}{2 \sigma^2} + \ln P(w_i)

which is a degree one polynomial in |x|.

A classifier that uses a linear discriminant function is called "linear machine".

The hyperplane between two classes is defined by

|class2hyperplane1|

|class2hyperplane2|

|class2hyperplane3|

|class2hyperplane4|

.. |class2hyperplane1| image:: tex

alt: tex: g_1(\vec{x}) - g_2(\vec{x}) = 0

.. |class2hyperplane2| image:: tex

alt: tex:\Leftrightarrow \frac{1}{\sigma^2} \vec{x}^{\top}\mu_1 - \frac{\mu_1^{\top}\mu_1}{2\sigma^2} + \ln P(w_1)

.. |class2hyperplane3| image:: tex

alt: tex: - \frac{1}{\sigma^2} \vec{x}^{\top}\mu_2 + \frac{\mu_2^{\top}\mu_2}{2\sigma^2} - \ln P(w_2) = 0

.. |class2hyperplane4| image:: tex

alt: tex: \Leftrightarrow \frac{1}{\sigma^2} \vec{x} \cdot (\vec{\mu_1} - \vec{\mu_2}) =\frac{ \|\vec{\mu_1} \|^2}{2\sigma^2} - \frac{ \|\vec{\mu_2} \|^2}{2\sigma^2} + \ln P(w_2) -\ln P(w_1)

Case 1: When |Pw1Pw2|

.. |Pw1Pw2| image:: tex

alt: tex: P(w_1)=P(w_2)

.. image:: lec6_case1.png

The hyperplane (black line) in this case goes through the middle cline of the vector (gray line).

Case 2: |Sigmai1| for all i's:

.. |Sigmai1| image:: tex

alt: tex: \Sigma_i = \Sigma

Recalll: we can take |gicase21|

.. |gicase21| image:: tex

alt: tex: g_i(\vec{x}) = -\frac{1}{2}\left( \vec{x}-\vec{\mu_i} \right)^{\top} \Sigma^{-1}\left(\vec{x}-\vec{\mu_i}\right) - \frac{n}{2}\ln 2\pi - \frac{1}{2} \ln |\Sigma| + \ln P(w_i)

but |sigmaandln| are independent of |i|

.. |sigmaandln| image:: tex

alt: tex: \Sigma_i = \Sigma, - \frac{1}{2} \ln{2\pi}

Therefore, we remove these terms from |gi|, then new |gi| will look like

|gicase2|

.. |gicase2| image:: tex

alt: tex: \Longrightarrow g_i\left( \vec{x} \right) = - \frac{1}{2} \left( \vec{x} - \vec{\mu_i} \right)^{\top} \Sigma^{-1} \left( \vec{x} - \vec{\mu_i} \right) + \ln{P(w_i)}

So, if all |Pwi|'s are the same, assign |x| to the class with the "nearest" mean.

.. |Pwi| image:: tex

alt: tex: P\left( w_i \right)

Rewriting |gi|,

|gicase2rewrite|

.. |gicase2rewrite| image:: tex

alt: tex: g_i(\vec{x}) = - \frac{1}{2} ( \vec{x}^{\top} \Sigma^{-1}\vec{x} - 2 \vec{\mu_i}^{\top} \Sigma^{-1}\vec{x} + \vec{\mu_i}^{\top}\Sigma^{-1}\vec{\mu_i}) + \ln{P(w_i)}

Here we know that |xsx| is independent of |i|, therefore we can remove this term from |gi|

.. |xsx| image:: tex

alt: tex: \vec{x}^{\top} \Sigma^{-1}\vec{x}

|gicase2rewrite2|

.. |gicase2rewrite2| image:: tex

alt: tex: \Longrightarrow g_i(\vec{x}) = \vec{\mu_i}^{\top} \Sigma^{-1}\vec{x} - \frac{1}{2} \vec{\mu_i}^{\top}\Sigma^{-1}\vec{\mu_i} + \ln{P(w_i)}

Again this is a linear function of |x|

The equation of the hyperplane: |hyperplanecase2|

.. |hyperplanecase2| image:: tex

alt: tex: (\vec{\mu_1}-\vec{\mu_2})^{\top}\Sigma^{-1}\vec{x} = \frac{1}{2} \vec{\mu_2}^{\top}\Sigma^{-1}\vec{\mu_2} - \frac{1}{2}\vec{\mu_1}^{\top}\Sigma^{-1}\vec{\mu_1} + \ln P(w_1) - \ln P(w_2)

In sum, whatever the covariance structures are, as long as they are the same for all classes, the final discriminant functions would be linear (square terms dropped).

Below, you see an illustration of this case. If you have ellipses that have the same length and direction of the principal axis, you can modify them simultaneusly to use Case 1.

.. |Pw1| image:: tex

alt: tex: P(w_1)

.. |Pw2| image:: tex

alt: tex: P(w_2)

.. image:: lecture_6_1.jpg

The hyperplane (green line) is perpendicular to the red line conecting the two means. It moves along the red line depending on the value of |Pw1| and |Pw2|. If |Pw1Pw2| the hyperplane is located on the middle of the distance between the means.

Here's an animated version of the above figure:

.. image:: Hyperplane_animated.GIF

Another visualization for Case 2 is as follows: Consider class 1 which provides a multivariate Gaussian density on a 2D feature vector, when conditioned on that class.

.. image:: 662Lecture6_GaussClass1.jpg

Now consider class 2 with a similar Gaussian conditional density, but with different mean

.. image:: 662Lecture6_GaussClass2.jpg

If the priors for each class are the same (i.e. 0.5), we have that the decision hypersurface cuts directly between the two means, with a direction parallel to the eliptical shape of the modes of the Gaussian densities shaped by their (identical) covariance matrices.

.. image:: 662Lecture6_GaussbothClasses.jpg

Now if the priors for each class are unequal, we have that the decision hypersurface cuts between the two means with a direction as before, but now will be located further from the more likely class. This biases the estimator in favor of the more likely class.

.. image:: 662Lecture6_GaussbothClasses_UneqPrior2.jpg

!`.m file for creating Gaussian surfaces like these`__

A video to visualize the decision hypersurface with changes to the Gaussian parameters is shown on the [Bayes Decision Rule] page.

__ ECE662Lecture6_MakeGaussFigs.m

Case 3: When |Sigma_i| is arbitrary

.. |Sigma_i| image:: tex

alt: tex: \Sigma_i^{-1}

.. image:: Lecture6_sigma_arbitrary.JPG

We can take

|gicase31|

.. |gicase31| image:: tex

alt: tex: g_i(\vec{x}) = - \frac{1}{2} ( \vec{x} - \vec{\mu_i})^{\top}\Sigma_i^{-1}(\vec{x}-\vec{\mu_i})-\frac{1}{2} \ln \|\Sigma_i\|+ \ln P(w_i)

the decision surface between |w1| and |w2|:

is a degree 2 polynom in |x|

Note: decision boundaries must not be connected if |Pw1Pw2| decision boundary has two disconnected points.

.. image:: fig_case3_lec6.jpg

Class =w1, when -5<x<15

Class=w2, when x<-5 or x>15

For difference cases and their figures, refer to page 42 and 43 of DHS.

Previous: [Lecture 5] Next: [Lecture 7]

@@ Line 2: / Line 2: @@
 [http://balthier.ecn.purdue.edu/index.php/ECE662#Class_Lecture_Notes Class Lecture Notes]
+LECTURE THEME :
+- Discriminant Functions
+To separate several classes, we can draw the "skeleton" (Blum) of shape defined by mean vectors:
+.. image:: lec6_skel.bmp
+skeleton= set of points whose distance to the set <math>{\mu_1, ..., \mu_k}</math> is achieved by at least two different |muis|, i.e., we have |distset|
+.. |distset| image:: tex
+:alt: tex: dist(x,set)=min \{dist(x,\mu_i)\}
+.. |muset| image:: tex
+:alt: tex:
+.. |muis| image:: tex
+:alt: tex: \mu_{i}'s
+.. |x| image:: tex
+:alt: tex:\vec{x}
+.. |w1| image:: tex
+:alt: tex: w_1
+.. |w2| image:: tex
+:alt: tex: w_2
+.. |mu1| image:: tex
+:alt: tex: \mu_1
+.. |mu2| image:: tex
+:alt: tex: \mu_2
+and want |existi12| such that |distset2|
+.. |existi12| image:: tex
+:alt: tex: \exists i_1 \neq i_2
+.. |distset2| image:: tex
+:alt: tex: dist(x,set)=dist(x,\mu_i_1) =dist(x,\mu_i_2)
+The skeleton is a decision boundary defining regions (chambers) |Ri| where we should decide |wi|.
+.. |Ri| image:: tex
+:alt: tex: R_i
+.. |wi| image:: tex
+:alt: tex:w_i
+What is the equation of these hyperplanes?
+Recall the hyperplane equation: |hyperplaneeqn|
+.. |hyperplaneeqn| image:: tex
+:alt: tex: \{ \vec{x} | \vec{n} \cdot \vec{x} = const \}
+|n| is a normal vector to the plane. Because if |x1| and |x2| are in this plane,
+|orthogonal1|
+|orthogonal2|
+|orthogonal3|
+.. |orthogonal1| image:: tex
+:alt: tex: \Longrightarrow \vec{n} \cdot \vec{x_1} = const, \vec{n} \cdot \vec{x_2} = const
+.. |orthogonal2| image:: tex
+:alt: tex: \Longrightarrow \vec{n} \cdot (\vec{x_1}  - \vec{x_2}) = const - const = 0
+.. |orthogonal3| image:: tex
+:alt: tex:  \therefore  \vec{n} \bot ( \vec{x_1} - \vec{x_2})
+.. |n| image:: tex
+:alt: tex: \vec{n}
+.. |x1| image:: tex
+:alt: tex:\vec{x_1}
+.. |x2| image:: tex
+:alt: tex: \mathbf{x_2}
+Any linear structure can be written as
+|linearstructure|
+.. |linearstructure| image:: tex
+:alt: tex: \sum_{i=1}^{n} c_i  x_i + const  = 0
+Ex. of planes in |R2|
+.. |R2| image:: tex
+:alt: tex: \Re^{2}
+Example: for two classes |w1|, |w2| hyperplane is defined by
+|classhyperplane|
+.. |classhyperplane| image:: tex
+:alt: tex: \{ \vec{x} | g_1(\vec{x}) - g_2(\vec{x}) = 0 \}
+where,
+|gdef1|
+|gdef2|
+|gdef3|
+but |x1dotmui| is scalar |x1dotmuit|
+.. |x1dotmui| image:: tex
+:alt: tex: \vec{x}^{\top}\vec{\mu_i}
+.. |x1dotmuit| image:: tex
+:alt: tex: \Longrightarrow \left( \vec{x}^{\top} \vec{\mu_i}\right)^{\top} = \vec{\mu_i}^{\top}\vec{x} = \vec{x}^{\top}\vec{\mu_i}
+|gdef4|
+First term is independent of |i|, therefore we can remove first term from |gi|
+.. |i| image:: tex
+:alt: tex: i
+.. |gi| image:: tex
+:alt: tex: g_i\left( \vec{x}\right)
+|gdef5|
+.. |gdef1| image:: tex
+:alt: tex: g_i(\vec{x})=-\frac{1}{2\sigma^2} \|\vec{x}-\mu_i\|_{L_2}^2+ \ln P(w_i)
+.. |gdef2| image:: tex
+:alt: tex: = -\frac{1}{2\sigma^2} ((\vec{x}-\mu_i)^{\top}(\vec{x}-\mu_i)) + \ln P(w_i)
+.. |gdef3| image:: tex
+:alt: tex: = -\frac{1}{2\sigma^2} (\vec{x}^{\top}\vec{x} - \vec{x}^{\top}\mu_i -\mu_i^{\top}\vec{x} + \mu_i^{\top}\mu_i ) + \ln P(w_i)
+.. |gdef4| image:: tex
+:alt: tex: \Longrightarrow g_i(\vec{x}) =  -\frac{1}{2\sigma^2}  \|\vec{x}\|^2 + \frac{1}{\sigma^2} \vec{x} \cdot \vec{\mu_i} - \frac{\mu_i^{\top}\mu_i}{2\sigma^2} + \ln P(w_i)
+.. |gdef5| image:: tex
+:alt: tex: \Longrightarrow g_i(\vec{x}) = \frac{1}{\sigma^2} \vec{x} \cdot \vec{\mu_i} - \frac{\vec{\mu_i} \cdot \vec{\mu_i}}{2 \sigma^2} + \ln P(w_i)
+which is a degree one polynomial in |x|.
+A classifier that uses a linear discriminant function is called "linear machine".
+The hyperplane between two classes is defined by
+|class2hyperplane1|
+|class2hyperplane2|
+|class2hyperplane3|
+|class2hyperplane4|
+.. |class2hyperplane1| image:: tex
+:alt: tex:  g_1(\vec{x}) - g_2(\vec{x}) = 0
+.. |class2hyperplane2| image:: tex
+:alt: tex:\Leftrightarrow \frac{1}{\sigma^2} \vec{x}^{\top}\mu_1 - \frac{\mu_1^{\top}\mu_1}{2\sigma^2} + \ln P(w_1)
+.. |class2hyperplane3| image:: tex
+:alt: tex: - \frac{1}{\sigma^2} \vec{x}^{\top}\mu_2 + \frac{\mu_2^{\top}\mu_2}{2\sigma^2} - \ln P(w_2) = 0
+.. |class2hyperplane4| image:: tex
+:alt: tex: \Leftrightarrow \frac{1}{\sigma^2} \vec{x} \cdot (\vec{\mu_1} - \vec{\mu_2}) =\frac{ \|\vec{\mu_1} \|^2}{2\sigma^2} - \frac{ \|\vec{\mu_2} \|^2}{2\sigma^2} + \ln P(w_2) -\ln P(w_1)
+Case 1: When |Pw1Pw2|
+.. |Pw1Pw2| image:: tex
+:alt: tex: P(w_1)=P(w_2)
+.. image:: lec6_case1.png
+The hyperplane (black line) in this case goes through the middle cline of the vector (gray line).
+Case 2: |Sigmai1| for all i's:
+.. |Sigmai1| image:: tex
+:alt: tex:  \Sigma_i = \Sigma
+Recalll: we can take |gicase21|
+.. |gicase21| image:: tex
+:alt: tex:  g_i(\vec{x}) = -\frac{1}{2}\left( \vec{x}-\vec{\mu_i} \right)^{\top} \Sigma^{-1}\left(\vec{x}-\vec{\mu_i}\right) - \frac{n}{2}\ln 2\pi - \frac{1}{2} \ln |\Sigma| + \ln P(w_i)
+but |sigmaandln| are independent of |i|
+.. |sigmaandln| image:: tex
+:alt: tex: \Sigma_i = \Sigma, - \frac{1}{2} \ln{2\pi}
+Therefore, we remove these terms from |gi|, then new |gi| will look like
+|gicase2|
+.. |gicase2| image:: tex
+:alt: tex: \Longrightarrow g_i\left( \vec{x} \right) = - \frac{1}{2} \left( \vec{x} - \vec{\mu_i} \right)^{\top} \Sigma^{-1} \left( \vec{x} - \vec{\mu_i} \right) + \ln{P(w_i)}
+So, if all |Pwi|'s are the same, assign |x| to the class with the "nearest" mean.
+.. |Pwi| image:: tex
+:alt: tex: P\left( w_i \right)
+Rewriting |gi|,
+|gicase2rewrite|
+.. |gicase2rewrite| image:: tex
+:alt: tex: g_i(\vec{x}) = - \frac{1}{2} ( \vec{x}^{\top} \Sigma^{-1}\vec{x} - 2 \vec{\mu_i}^{\top} \Sigma^{-1}\vec{x} + \vec{\mu_i}^{\top}\Sigma^{-1}\vec{\mu_i}) + \ln{P(w_i)}
+Here we know that |xsx| is independent of |i|, therefore we can remove this term from |gi|
+.. |xsx| image:: tex
+:alt: tex: \vec{x}^{\top} \Sigma^{-1}\vec{x}
+|gicase2rewrite2|
+.. |gicase2rewrite2| image:: tex
+:alt: tex: \Longrightarrow g_i(\vec{x}) = \vec{\mu_i}^{\top} \Sigma^{-1}\vec{x} - \frac{1}{2} \vec{\mu_i}^{\top}\Sigma^{-1}\vec{\mu_i} + \ln{P(w_i)}
+Again this is a linear function of |x|
+The equation of the hyperplane:
+|hyperplanecase2|
+.. |hyperplanecase2| image:: tex
+:alt: tex: (\vec{\mu_1}-\vec{\mu_2})^{\top}\Sigma^{-1}\vec{x} = \frac{1}{2} \vec{\mu_2}^{\top}\Sigma^{-1}\vec{\mu_2} - \frac{1}{2}\vec{\mu_1}^{\top}\Sigma^{-1}\vec{\mu_1} + \ln P(w_1) - \ln P(w_2)
+In sum, whatever the covariance structures are, as long as they are the same for all classes, the final discriminant functions would be linear (square terms dropped).
+Below, you see an illustration of this case.   If you have ellipses that have the same length and direction of the principal axis, you can modify them simultaneusly to use Case 1.
+.. |Pw1| image:: tex
+:alt: tex: P(w_1)
+.. |Pw2| image:: tex
+:alt: tex: P(w_2)
+.. image:: lecture_6_1.jpg
+The hyperplane (green line) is perpendicular to the red line conecting the two means.  It moves along the red line depending on the value of |Pw1| and |Pw2|.  If |Pw1Pw2| the hyperplane is located on the middle of the distance between the means.
+Here's an animated version of the above figure:
+.. image:: Hyperplane_animated.GIF
+Another visualization for Case 2 is as follows:  Consider class 1 which provides a multivariate Gaussian density on a 2D feature vector, when conditioned on that class.
+.. image:: 662Lecture6_GaussClass1.jpg
+Now consider class 2 with a similar Gaussian conditional density, but with different mean
+.. image:: 662Lecture6_GaussClass2.jpg
+If the priors for each class are the same (i.e. 0.5), we have that the decision hypersurface cuts directly between the two means, with a direction parallel to the eliptical shape of the modes of the Gaussian densities shaped by their (identical) covariance matrices.
+.. image:: 662Lecture6_GaussbothClasses.jpg
+Now if the priors for each class are unequal, we have that the decision hypersurface cuts between the two means with a direction as before, but now will be located further from the more likely class.  This biases the estimator in favor of the more likely class.
+.. image:: 662Lecture6_GaussbothClasses_UneqPrior2.jpg
+!`.m file for creating Gaussian surfaces like these`__
+A video to visualize the decision hypersurface with changes to the Gaussian parameters is shown on the [Bayes Decision Rule] page.
+__ ECE662Lecture6_MakeGaussFigs.m
+Case 3: When |Sigma_i| is arbitrary
+.. |Sigma_i| image:: tex
+:alt: tex: \Sigma_i^{-1}
+.. image:: Lecture6_sigma_arbitrary.JPG
+We can take
+|gicase31|
+.. |gicase31| image:: tex
+:alt: tex: g_i(\vec{x}) = - \frac{1}{2} ( \vec{x} - \vec{\mu_i})^{\top}\Sigma_i^{-1}(\vec{x}-\vec{\mu_i})-\frac{1}{2} \ln \|\Sigma_i\|+ \ln P(w_i)
+the decision surface between |w1| and |w2|:
+is a degree 2 polynom in |x|
+Note: decision boundaries must not be connected if |Pw1Pw2| decision boundary has two disconnected points.
+.. image:: fig_case3_lec6.jpg
+Class =w1, when -5<x<15
+Class=w2, when x<-5 or x>15
+For difference cases and their figures, refer to page 42 and 43 of DHS.
+Previous: [Lecture 5]
+Next: [Lecture 7]

Difference between revisions of "Lecture 6 - Discriminant Functions Old Kiwi" - Rhea

Revision as of 14:12, 16 March 2008

Alumni Liaison