(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Classic central limit Thm (Second Fundamental probabilistic):
+
[[Category:2010 Spring ECE 662 mboutin]]
  
"The distribution of the average of a large number of samples from a distribution tends to be normal"
+
=Details of Lecture 8, [[ECE662]] Spring 2010=
 +
In Lecture 8, we justified the commonly made assumption that the features are normally distributed with the Central Limit Theorem. We then discussed the probability of error when using Bayes decision rule. More precisely, we obtained the Chernoff Bound and the Bhattacharrya bound for the probability of error.
  
let X1,X2,...,Xn be n independent and identically distributed variables (i.i.d) with finite mean <math>\mu</math> and finite variance <math>\sigma^2>0</math>.Then as n increases the distribution of <math>\Sigma_{i=1}^n \frac{X_i} {n}</math> approaches <math>N(\mu,\frac {\sigma^2}{n})</math>.
+
Note for this lecture can be found [[noteslecture8ECE662S10|here]].  
 +
  
More precisely the random variable <math>Z_n = \frac{\Sigma_{i=1}^n X_i - n \mu}{\sigma \sqrt{n}}</math> has <math>P(Z_n)\longrightarrow N(0,1)</math> when <math>n \longrightarrow \infty</math>
+
Previous: [[Lecture7ECE662S10|Lecture 7]]
 +
Next: [[Lecture9ECE662S10|Lecture 9]]
  
More generalization of central limit Thm.
+
----
 
+
[[ 2010 Spring ECE 662 mboutin|Back to 2010 Spring ECE 662 mboutin]]
let X1,X2,...,Xn be n independent variables
+
 
+
Xi has mean <math>\mu_i</math> and finite variance <math>\sigma^2 > 0</math> ,i=1,2,...,n
+
 
+
Then <math>Z_n = \frac{\Sigma_{i=1}^n X_i - \Sigma_{i=1}^n \mu_i} {\sqrt{\Sigma_{i=1}^n \sigma^2}}</math> has <math>P(Z_n)\longrightarrow N(\mu ,\Sigma)</math> when <math>n \longrightarrow \infty</math>
+
 
+
-------------------------------------
+
Error bounds for Bayes decision rule:
+
 
+
As we know Bayes decision rule guarantees the lowest average error rate; It Does not tell what the probability of error actually is. Therefore people try to find upper bounds for that: Chernoff and Bhattacharyya bounds
+
 
+
<math>P(error)=\int_{R^n} p(error,x)dx = \int_{R^n} p(error|x)p(x)dx</math>
+
 
+
in two class case we have:
+
 
+
<math> p(error|x) = min \{ p(\omega_1|x) , p(\omega_2|x) \} </math>
+
<math> \Rightarrow P(error)= \int_{R^n}min\{p(\omega_1|x),p(\omega_2|x)\} p(x)dx </math>
+
 
+
as we know from Lemma: <math>min \{a,b\}\leq a^\beta b^ {1-\beta}</math>, <math>\forall a,b \geq 0 , \forall \beta s.t 0 \leq\beta\leq 1 </math>
+
 
+
<math>P(error)\leq \int_{R^n} p(\omega_1 |x)^\beta p(\omega_2 |x)^{1-\beta} p(x) dx </math> , <math>\forall 0 \leq \beta \leq 1</math>
+
 
+
<math>P(error)\leq \int_{R^n} ({\frac {p(x|\omega)P(\omega_1)}{p(x)}})^\beta ({\frac {p(x|\omega_2)P(\omega_2)}{p(x)}})^{1-\beta} p(x) dx </math> , <math>\forall 0 \leq \beta \leq 1</math>
+
 
+
<math>P(error)\leq P(\omega_1) ^\beta P(\omega_2)^ https://kiwi.ecn.purdue.edu/rhea/skins/common/images/button_math.png{1-\beta} \int_{R^n} p(x|\omega_1)^\beta p(x|\omega_2)^{1-\beta} dx =: \varepsilon_\beta </math> , <math>\forall 0 \leq \beta \leq 1</math>
+
 
+
The lower bound <math>S:= min\varepsilon_\beta</math> , <math>\beta \in [0,1]</math>is an upper bound for P(error).
+
 
+
S is called "Chernoff Bound" and <math> \varepsilon_{1/2}</math> is called "Bhattacharyya Bound"
+
 
+
<math> \varepsilon_{1/2}= \sqrt{P(\omega_1)P(\omega_2)} \int_{R^n} \sqrt{p(x|\omega_1)p(x|\omega_2)} dx = e^{-f(1/2)} \sqrt {P(\omega_1)P(\omega_2)}</math>
+
 
+
<math>\varepsilon_\beta = P(\omega_1) ^\beta P(\omega_2)^ {1-\beta} \int_{R^n} p(x|\omega_1)^\beta p(x|\omega_2)^{1-\beta} dx = e^{-f(\beta)} P(\omega_1)^\beta P(\omega_2)^ {1-\beta}</math>
+
 
+
Here <math>f(1/2)</math> is "Bhattacharyya distance" and <math>f(\beta)</math> is "Chernof distance"
+
 
+
Especial case: Bounds for Gaussian density:
+
 
+
<math>f(\beta)= {\frac{ \beta (1- \beta)}{2}} {(\mu_2 - \mu_1)^T} [ \beta \Sigma_1 + (1- \beta) \Sigma_2]^{-1} (\mu_2-\mu_1) + \frac {1}{2} ln{\frac {|\beta \Sigma_1 + (1-\beta) \Sigma_2|}{|\Sigma_1|^\beta |\Sigma_2|^ {1-\beta}}}</math>
+
 
+
<math>f(1/2)=\frac{1}{8} (\mu_2 - \mu_1)^T [\frac {\Sigma_1 + \Sigma_2} {2}]^{-1} (\mu_2 - \mu_1) + \frac {1} {2} ln \frac {[\frac {\Sigma_1 + \Sigma_2} {2}]} {\sqrt {|\Sigma_1| |\Sigma_2|}}</math>
+
 
+
The Chernoff bound is found by analytically or numerically minimizing <math>e^{-f(\beta)}</math>, therefore optimization is now in the one dimensional β space.
+
[[Image:Error.jpg]]
+
 
+
* Now consider a special case where : <math>\Sigma_1 = \Sigma_2 =: \Sigma</math>
+
 
+
<math>f(\beta) = \beta (1-\beta){(\mu_2 - \mu_1)^T} \Sigma^{-1}(\mu_2 - \mu_1)</math>
+
 
+
<math>\frac {df(\beta)}{d\beta} = \frac {1-2\beta}{2}{(\mu_2 - \mu_1)^T} \Sigma^{-1}(\mu_2 - \mu_1)= 0</math>
+
 
+
<math> \Leftrightarrow \beta = \frac {1}{2}</math>
+
therefore Bhattacharyya Bound is optimum in this case.
+
 
+
* A closer look at f(1/2):
+
 
+
<math>f(1/2)=\frac{1}{8} (\mu_2 - \mu_1)^T [\frac {\Sigma_1 + \Sigma_2} {2}]^{-1} (\mu_2 - \mu_1) + \frac {1} {2} ln \frac {[\frac {\Sigma_1 + \Sigma_2} {2}]} {\sqrt {|\Sigma_1| |\Sigma_2|}}</math>
+
 
+
As we see first term of this equation would be zero when <math>\mu_1 = \mu_2</math>; which shows class separability due to distance between means.(error would be higher when means of our classes are close to each other)
+
 
+
Second term of this equation would be zero when <math>\Sigma_1 = \Sigma_2</math>; which shows class separability due to difference of covariance matrices.
+

Latest revision as of 09:09, 11 May 2010


Details of Lecture 8, ECE662 Spring 2010

In Lecture 8, we justified the commonly made assumption that the features are normally distributed with the Central Limit Theorem. We then discussed the probability of error when using Bayes decision rule. More precisely, we obtained the Chernoff Bound and the Bhattacharrya bound for the probability of error.

Note for this lecture can be found here.


Previous: Lecture 7 Next: Lecture 9


Back to 2010 Spring ECE 662 mboutin

Alumni Liaison

ECE462 Survivor

Seraj Dosenbach