(27 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<br>
+
[[Category:slecture]]
<center><font size="4"></font>
+
[[Category:ECE662Spring2014Boutin]]
<font size="4">'''Maximum Likelihood Estimation (MLE) Analysis for various Probability Distributions''' <br> </font> <font size="2">A [https://www.projectrhea.org/learning/slectures.php slecture] by [http://web.ics.purdue.edu/~hseshadr/ Hariharan Seshadri]</font>
+
[[Category:ECE]]
 +
[[Category:ECE662]]
 +
[[Category:pattern recognition]]
 +
[[Category:MATLAB]]
  
<font size="2">(partially based on Prof. [https://engineering.purdue.edu/~mboutin/ Mireille Boutin's] ECE [[ECE662|662]] lecture) </font></center>
+
 +
<center><font size="4"></font>
 +
<font size="4">'''Maximum Likelihood Estimation (MLE) Analysis for various Probability Distributions''' <br> </font> <font size="2">A [http://www.projectrhea.org/learning/slectures.php slecture] by [http://web.ics.purdue.edu/~hseshadr/ Hariharan Seshadri]</font>  
  
 +
Partly based on the [[2014_Spring_ECE_662_Boutin_Statistical_Pattern_recognition_slectures|ECE662 Spring 2014 lecture]] material of [[user:mboutin|Prof. Mireille Boutin]]
 +
</center>
 
----
 
----
 
----
 
----
Line 30: Line 37:
 
----
 
----
 
----
 
----
== Introduction ==
 
  
The maximum likelihood estimate (MLE) is the value <math>\hat{\theta}</math> which maximizes the function L(θ) given by L(θ) = f (X1,X2,...,Xn | θ) where 'f' is the probability density function in case of continuous random variables and probability mass function in case of discrete random variables and 'θ' is the parameter being estimated.
+
= Introduction =
 +
 
 +
The maximum likelihood estimate (MLE) is the value <math>\hat{\theta}</math> which maximizes the function L(θ) given by L(θ) = f (X<sub>1</sub>,X<sub>2</sub>,...,X<sub>n</sub> | θ) where 'f' is the probability density function in case of continuous random variables and probability mass function in case of discrete random variables and 'θ' is the parameter being estimated.
  
 
In other words,<math>\hat{\theta}</math> = arg max<sub>θ</sub> L(θ), where <math>\hat{\theta}</math> is the best estimate of the parameter 'θ' . Thus, '''we are trying to maximize the probability density (in case of continuous random variables) or the probability of the probability mass (in case of discrete random variables)'''  
 
In other words,<math>\hat{\theta}</math> = arg max<sub>θ</sub> L(θ), where <math>\hat{\theta}</math> is the best estimate of the parameter 'θ' . Thus, '''we are trying to maximize the probability density (in case of continuous random variables) or the probability of the probability mass (in case of discrete random variables)'''  
  
If the random variables X1,X2,...,Xn <math>\epsilon</math> R are Independent Identically Distributed (I.I.D.) then,
+
If the random variables X<sub>1</sub>,X<sub>2</sub>,...,X<sub>n</sub> <math>\epsilon</math> R are Independent Identically Distributed (I.I.D.) then,
 
L(θ) = <sub>i=1</sub> ∏ <sup>n</sup> f(x<sub>i</sub> | θ) . We need to find the value <math> \hat{\theta} </math> <math>\epsilon </math> θ that maximizes this function.
 
L(θ) = <sub>i=1</sub> ∏ <sup>n</sup> f(x<sub>i</sub> | θ) . We need to find the value <math> \hat{\theta} </math> <math>\epsilon </math> θ that maximizes this function.
  
In the course, Purdue ECE 662, Pattern Recognition and Decision Taking Processes, we have already looked at the MLE of the Normal Distribution and found that to be:
+
In the course, Purdue ECE 662, Pattern Recognition and Decision Taking Processes, we have already looked at the Maximum Likelihood Estimates for for Normally distributed random variables and found that to be:
  
 +
<blockquote style="color: lightgrey; border: solid thin gray;">
 
<center><math>  
 
<center><math>  
 
\widehat{\mu} = \frac{\sum_{i=1}^{n}x_{i}}{n}
 
\widehat{\mu} = \frac{\sum_{i=1}^{n}x_{i}}{n}
Line 48: Line 57:
 
\hat{\sigma{}^{2}} = \frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\widehat{\mu})^{2}  
 
\hat{\sigma{}^{2}} = \frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\widehat{\mu})^{2}  
 
</math></center>  
 
</math></center>  
 +
</blockquote>
 
   
 
   
  
 
where,  
 
where,  
Xi's are the Normal (Gaussian) Random Variables <math>\epsilon</math> R ,
+
X<sub>i</sub>'s are the Normal (Gaussian) Random Variables <math>\epsilon</math> R ,
 
'n' is the number of samples, and  
 
'n' is the number of samples, and  
 
<math>\widehat{\mu}</math> and <math>\hat{\sigma{}^{2}}</math> are the estimated Mean and estimated Variance.
 
<math>\widehat{\mu}</math> and <math>\hat{\sigma{}^{2}}</math> are the estimated Mean and estimated Variance.
  
 
----
 
----
=== Maximum Likelihood Estimate (MLE)for :- ===  
+
= Maximum Likelihood Estimate (MLE) for :- =  
===1. Exponential Distribution===
+
==1. Exponential Distribution==
  
 
Let X<sub>1</sub>,X<sub>2</sub>,...,X<sub>n</sub> <math>\epsilon</math> R be a random sample from the exponential distribution with p.d.f.
 
Let X<sub>1</sub>,X<sub>2</sub>,...,X<sub>n</sub> <math>\epsilon</math> R be a random sample from the exponential distribution with p.d.f.
  
f(x)=(1|θ)*exp(−x|θ)
+
'''f(x)=(1|θ) * exp(−x|θ)'''
 
   
 
   
The likelihood function L(θ) is a function of x<sub>1</sub>, x<sub>2</sub>, x<sub>3</sub>,...,x<sub>n</sub>
+
The likelihood function L(θ) is a function of x<sub>1</sub>, x<sub>2</sub>, x<sub>3</sub>,...,x<sub>n</sub>, given by:
  
L(θ)=(1|θ)*exp(−x<sub>1</sub>|θ)*(1|θ)*exp(−x<sub>2</sub>|θ)*...*(1|θ)*exp(−x<sub>n</sub>|θ)
 
  
L(θ)= (1|θ<sup>n</sup>) * exp( <sub>i=1</sub>∑<sup>n</sup> -x<sub>i</sub>|θ)
+
'''L(θ)=(1|θ) * exp(−x<sub>1</sub>|θ) * (1|θ) * exp(−x<sub>2</sub>|θ) * ... * (1|θ) * exp(−x<sub>n</sub>|θ)'''
 +
 
 +
'''L(θ)= (1|θ<sup>n</sup>) * exp( <sub>i=1</sub>∑<sup>n</sup> -x<sub>i</sub>|θ)'''
 
   
 
   
 
We need to maximize L(θ) . The logarithm of this function will be easier to maximize.
 
We need to maximize L(θ) . The logarithm of this function will be easier to maximize.
  
ln [L(θ)] = -n . ln(θ) - (1|θ) <sub>i=1</sub>∑<sup>n</sup> x<sub>i</sub>  
+
'''ln [L(θ)] = -n . ln(θ) - (1|θ) <sub>i=1</sub>∑<sup>n</sup> x<sub>i</sub>'''
 
   
 
   
 
Setting its derivative with respect to the parameter (θ)  to zero, we have:
 
Setting its derivative with respect to the parameter (θ)  to zero, we have:
  
(d|dθ) ln[L(θ)] = -n|θ + <sub>i=1</sub>∑<sup>n</sup> (-x<sub>i</sub>| θ<sup>2</sup>) = 0  
+
'''(d|dθ) ln[L(θ)] = (-n|θ) + <sub>i=1</sub>∑<sup>n</sup> (-x<sub>i</sub>| θ<sup>2</sup>) = 0'''
 
   
 
   
 
which implies that
 
which implies that
  
<center><math>\hat{\theta} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X}</math></center>  
+
<blockquote style="color: black; border: solid thin gray;">
<center>= Mean of x<sub>1</sub>, x<sub>2</sub>, x<sub>3</sub>,...,x<sub>n</sub> </center>
+
'''<center><math>\hat{\theta} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X}</math></center>  
 +
<center>= Mean of x<sub>1</sub>, x<sub>2</sub>, x<sub>3</sub>,...,x<sub>n</sub> </center>'''
 +
</blockquote>
  
 
This is the maximum likelihood estimate
 
This is the maximum likelihood estimate
  
 
-----
 
-----
===2. Geometric Distribution ===
+
==2. Geometric Distribution ==
  
 
Let X<sub>1</sub>,X<sub>2</sub>,...,X<sub>n</sub> <math>\epsilon</math> R be random samples from the geometric distribution with p.d.f.  
 
Let X<sub>1</sub>,X<sub>2</sub>,...,X<sub>n</sub> <math>\epsilon</math> R be random samples from the geometric distribution with p.d.f.  
  
f(x,p) = (1−p)<sup>x-1</sup>.p ; where x=1,2,3,.... and <math>0 \leq p \leq 1</math>  
+
'''f(x) = (1−p)<sup>x-1</sup>.p''' ; where x=1,2,3,.... and <math>0 \leq p \leq 1</math>  
  
 
The likelihood function is given by:
 
The likelihood function is given by:
  
L(p) = (1−p)<sup>x<sub>1</sub>-1</sup>.p.(1−p)<sup>x<sub>2</sub>-1</sup>.p.(1−)<sup>x<sub>3</sub>-1</sup>.p...(1−p)<sup>x<sub>n</sub>-1</sup>.p
+
'''L(p) = (1−p)<sup>x<sub>1</sub>-1</sup>.p.(1−p)<sup>x<sub>2</sub>-1</sup>.p.(1−)<sup>x<sub>3</sub>-1</sup>.p...(1−p)<sup>x<sub>n</sub>-1</sup>.p'''
 
   
 
   
L(p) = p<sup>n</sup>.(1-p)<sup><sub>i=1</sub>∑<sup>n</sup> x<sub>i</sub>-n</sup>
+
'''L(p) = p<sup>n</sup>.(1-p)<sup><sub>i=1</sub>∑<sup>n</sup> x<sub>i</sub>-n</sup>'''
  
 
The likelihood function is a function of x<sub>1</sub>,x<sub>2</sub>,...,x<sub>n</sub>
 
The likelihood function is a function of x<sub>1</sub>,x<sub>2</sub>,...,x<sub>n</sub>
Line 101: Line 114:
 
The log-likelihood is:
 
The log-likelihood is:
  
ln L(p)=n . ln(p) +  <sub>i=1</sub>∑<sup>n</sup> x<sub>i</sub>-n . ln(1-p)
+
'''ln L(p)=n . ln(p) +  <sub>i=1</sub>∑<sup>n</sup> x<sub>i</sub>-n . ln(1-p)'''
 
   
 
   
 
Setting its derivative with respect to the parameter (p) to zero, we have:
 
Setting its derivative with respect to the parameter (p) to zero, we have:
  
(d|dp) ln. L(p) = (n|p) - (<sub>i=1</sub>∑<sup>n</sup> x<sub>i</sub>-n|(1-p)) = 0
+
'''(d|dp) ln. L(p) = (n|p) - (<sub>i=1</sub>∑<sup>n</sup> x<sub>i</sub>-n|(1-p)) = 0'''
 
   
 
   
 
which implies that
 
which implies that
  
 +
<blockquote style="color: lightgrey; border: solid thin gray;">
 
<center><math>\hat{p} = \frac{n}{\sum_{i=1}^{n}x_{i}} = \frac{1}{\overline{X}}</math></center>
 
<center><math>\hat{p} = \frac{n}{\sum_{i=1}^{n}x_{i}} = \frac{1}{\overline{X}}</math></center>
 +
</blockquote>
  
 
This is the maximum likelihood estimate.
 
This is the maximum likelihood estimate.
Line 118: Line 133:
  
 
----
 
----
===3. Binomial Distribution===
+
==3. Binomial Distribution==
  
 
Let X<sub>1</sub>,X<sub>2</sub>,...,X<sub>N</sub> <math>\epsilon</math> R be samples obtained from a Binomially Distribution.
 
Let X<sub>1</sub>,X<sub>2</sub>,...,X<sub>N</sub> <math>\epsilon</math> R be samples obtained from a Binomially Distribution.
Line 132: Line 147:
 
The log-likelihood is:
 
The log-likelihood is:
  
<math>lnL(p) = \sum_{i=1}^{N} ln (n!) - \sum_{i=1}^{N} ln (x_{i}!) - \sum_{i=1}^{N} (n-x_{i}!) + \sum_{i=1}^{N} xi.ln(p) + (n- \sum_{i=1}^{N} xi) . ln(1-p)</math>
+
<math>lnL(p) = \sum_{i=1}^{N} ln (n!) - \sum_{i=1}^{N} ln (x_{i}!) - \sum_{i=1}^{N} ln(n-x_{i}!) + \sum_{i=1}^{N} xi.ln(p) + (n- \sum_{i=1}^{N} xi) . ln(1-p)</math>
 
    
 
    
 
Setting its derivative with respect to p to zero,
 
Setting its derivative with respect to p to zero,
Line 144: Line 159:
 
giving,
 
giving,
  
 +
 +
<blockquote style="color: lightgrey; border: solid thin gray;">
 
<center><math>\hat{p} = \frac{1}{N} (\frac{\sum_{i=1}^{N}x_{i}}{n}) = \frac{1}{N} (\frac{X1}{n} + \frac{X2}{n} + ... + \frac{XN}{n})</math></center>  
 
<center><math>\hat{p} = \frac{1}{N} (\frac{\sum_{i=1}^{N}x_{i}}{n}) = \frac{1}{N} (\frac{X1}{n} + \frac{X2}{n} + ... + \frac{XN}{n})</math></center>  
 +
</blockquote>
  
 
which is the maximum likelihood estimate.
 
which is the maximum likelihood estimate.
  
This is intuitively correct too, as this is the average of the ratio \frac{X_{i}}{n} for each X_{i}, which is intuitively average of 'p' for each X_{i}
+
This is intuitively correct too, as this is '''the average of the ratio <math>\frac{X_{i}}{n}</math> for each X<sub>i</sub>''', which is '''intuitively average of 'p' for each X<sub>i</sub>'''
  
 
-----
 
-----
  
===4.Poisson Distribution===
+
==4.Poisson Distribution==
  
Let X1,X2,...,Xn <math>\epsilon</math>
+
Let X<sub>1</sub>,X<sub>2</sub>,...,X<sub>n</sub> <math>\epsilon</math> R be a random sample from a Poisson distribution
  R be a random sample from a Poisson distribution
+
  
 
The p.d.f. of a Poisson Distribution is :
 
The p.d.f. of a Poisson Distribution is :
  
<math>f(x) = \frac{\lambda^{x}e^{-\lambda}}{x!}</math>
+
<math>f(x) = \frac{\lambda^{x}e^{-\lambda}}{x!}</math> ; where x =0,1,2,...
  where x =0,1,2,...
+
  
 
The likelihood function is:
 
The likelihood function is:
 +
 
<math>
 
<math>
 
L(\lambda
 
L(\lambda
 
  ) = \prod_{i=1}^{n}
 
  ) = \prod_{i=1}^{n}
 
   \frac{\lambda^{x_{i}}e^{-\lambda}}{x_{i}!}
 
   \frac{\lambda^{x_{i}}e^{-\lambda}}{x_{i}!}
   = e^{-\lambda n}
+
   = e^{-\lambda n}   
  \frac{\lambda\sum_{i=1}^{n}x_{i}}{\prod_{i=1}^{n}x_{i}}</math>
+
  \frac{\lambda^{\sum_{i=1}^{n}x_{i}}}{\prod_{i=1}^{n}x_{i}}
+
</math>
 
+
 
 
The log-likelihood is:
 
The log-likelihood is:
  
<math>ln L(\lambda
+
<math>ln L(\lambda) = - \lambda n + \sum_{i=1}^{n} xi . ln(\lambda) - ln( \prod_{i=1}^{n}xi)</math>
) = - \lambda
+
n + \sum_{i=1}^{n}
+
xi . ln(\lambda
+
) - ln( \prod_{i=1}^{n}
+
xi)</math>
+
  
Setting its derivative to zero, we have:
+
Setting its derivative with respect to <math>\lambda</math> to zero, we have:
  
<math>\frac{d}{d\lambda}
+
<math>\frac{d}{d\lambda}ln L(\lambda) = -n + \sum_{i=1}^{n}xi .\frac{1}{\lambda} = 0</math>
ln L(\lambda
+
) = -n + \sum_{i=1}^{n}
+
xi .\frac{1}{\lambda}
+
  = 0</math>
+
  
<math>\widehat{\lambda}
+
giving,
  = \frac{\sum_{i=1}^{n}x_{i}}{n}
+
 
  = \overline{X}</math>
+
<blockquote style="color: lightgrey; border: solid thin gray;">
   which is the maximum likelihood estimate
+
<center><math>\widehat{\lambda} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X}</math></center>
 +
</blockquote>
 +
    
 +
which is the maximum likelihood estimate
  
 
-----
 
-----
  
===5.Uniform Distribution===
+
==5.Uniform Distribution==
  
For Uniformly Distributed random variables X1,X2,...,Xn <math>\epsilon</math>
+
For Uniformly Distributed random variables X<sub>1</sub>,X<sub>2</sub>,...,X<sub>n</sub> <math>\epsilon</math> R, the p.d.f is given by:
  R, the density function is given by:
+
  
<math>f(xi) = \frac{1}{\theta}
+
f(x<sub>i</sub>) = <math>\frac{1}{\theta}</math> ; if <math> 0 \leq xi \leq \theta </math>
  if 0 \leq
+
xi \leq
+
\theta</math>
+
 
   
 
   
 
+
f(x) = 0 ; otherwise
<math>f(x) = 0</math> otherwise
+
  
 
If the uniformly distributed random variables are arranged in the following order  
 
If the uniformly distributed random variables are arranged in the following order  
  
<math>X1\leq
+
<math> 0 \leq X1\leq X2 \leq X3 ... \leq Xn \leq \theta</math>,
  X2 \leq
+
  X3 ... \leq
+
  Xn and
+
  
0 \leq
+
The likelihood function is given by:
X1\leq
+
  X2 \leq
+
  X3 ... \leq
+
  Xn \leq
+
\theta</math>
+
,
+
  
Now, the likelihood function is given as:
+
<math>L(\theta) = \prod_{i=1}^{n} f(xi) = \prod_{i=1}^{n} \frac{1}{\theta} = \theta{}^{-n}</math>
 
+
<math>L(\theta
+
) = \prod_{i=1}^{n}
+
f(xi) = \prod_{i=1}^{n}
+
\frac{1}{\theta}
+
  = \theta{}^{-n}</math>
+
 
   
 
   
 
 
The log-likelihood is:
 
The log-likelihood is:
  
<math>ln L(\theta
+
<math> ln L(\theta) = -n ln (\theta)</math>
) = -n ln (\theta
+
)</math>
+
  
Setting its derivative to zero, we get:
+
Setting its derivative with respect to parameter <math>\theta</math> to zero, we get:
  
<math>\frac{d}{d\theta}
+
<math>\frac{d}{d\theta} ln L(\theta) = \frac{-n}{\theta}</math>  
ln L(\theta
+
) = \frac{-n}{\theta}
+
  which is < 0 for \theta
+
>0</math>
+
  
Hence, L(<math>\theta</math>
+
which is < 0 for <math>\theta</math> > 0
) is a decreasing function and it is maximized at <math>\theta</math>
+
 
  = xn. The maximum likelihood estimate is thus,
+
Hence, L(<math>\theta</math>) is a decreasing function and it is maximized at <math>\theta</math> = x<sub>n</sub>
<math>
+
 
\hat{\theta}
+
The maximum likelihood estimate is thus,
  = xn</math>
+
 
 +
<blockquote style="color: black; border: solid thin gray;">
 +
<center>'''<math> \hat{\theta} </math> = X<sub>n</sub> </center>
 +
</blockquote>
  
 +
----
 
----
 
----
  
===Summary===
+
=Summary=
  
 
Using the usual notations and symbols,
 
Using the usual notations and symbols,
  
1) Normal Distribution:
+
'''1) Normal Distribution:'''
  
<math>f(x,\mu
+
<math>f(x,\mu,\sigma) = \frac{1}{\sigma \sqrt(2\pi)} exp(-\frac{1}{2}(\frac{x-\mu}{\sigma})^{2})</math>
,\sigma
+
) = \frac{1}{\sigma \sqrt(2\pi)}
+
exp(-\frac{1}{2}(\frac{x-\mu}{\sigma})^{2}
+
)</math>
+
  
<math>\widehat{\mu}
+
X<sub>1</sub>,X<sub>2</sub>,...,X<sub>n</sub> <math>\epsilon</math> R
  = \frac{\sum_{i=1}^{n}x_{i}}{n}</math>
+
+
  
<math>\hat{\sigma{}^{2}}
+
<blockquote style="color: lightgrey; border: solid thin gray;">
  = \frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\widehat{\mu})^{2}</math>
+
<center><math>  
 +
\widehat{\mu} = \frac{\sum_{i=1}^{n}x_{i}}{n}
 +
</math></center>
 +
 
 +
<center><math>
 +
\hat{\sigma{}^{2}} = \frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\widehat{\mu})^{2}  
 +
</math></center>
 +
</blockquote>
 +
 
 +
----
 
   
 
   
 +
'''2) Exponential Distribution:'''
  
 +
'''f(x,<math>\lambda</math>)=(1|<math>\lambda</math>)*exp(−x|<math>\lambda</math>)''' ; X<sub>1</sub>,X<sub>2</sub>,...,X<sub>n</sub> <math>\epsilon</math> R
  
2) Exponential Distribution:
+
<blockquote style="color: black; border: solid thin gray;">
 +
<math>\widehat{\lambda} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X}</math>
 +
</blockquote>
  
f(x,<math>\lambda</math>)=(1|<math>\lambda</math>)*exp(−x|<math>\lambda</math>)<math>\epsilon</math>  R
+
----
 +
 
 +
'''3) Geometric Distribution:'''
  
<math>\widehat{\lambda}
+
'''f(x,p) = (1−p)<sup>x-1</sup>.p''' ; X<sub>1</sub>,X<sub>2</sub>,...,X<sub>n</sub> <math>\epsilon</math> R
  = \frac{\sum_{i=1}^{n}x_{i}}{n}
+
 
  = \overline{X}</math>
+
<blockquote style="color: black; border: solid thin gray;">
 +
<math>\hat{p} = \frac{n}{\sum_{i=1}^{n}x_{i}} = \frac{1}{\overline{X}}</math>
 +
</blockquote>
 
    
 
    
 +
----
  
3) Geometric Distribution:
+
'''4) Binomial Distribution:'''
  
f(x, p) = (1−p)x-1. p ; X1,X2,...,Xn <math>\epsilon</math>
+
<math>f(x,p) = \frac{n!}{x!(n-x)!} p^{^{x}} (1-p)^{n-x}</math>   
  R
+
  
<math>\hat{p}
+
X<sub>1</sub>,X<sub>2</sub>,...,X<sub>n</sub> <math>\epsilon</math> R
  = \frac{n}{\sum_{i=1}^{n}x_{i}}
+
  = \frac{1}{\overline{X}}</math>
+
 
+
  
4) Binomial Distribution:
+
<blockquote style="color: black; border: solid thin gray;">
  
<math>f(x) = \frac{n!}{x!(n-x)!}
+
<math>\hat{p} = \frac{1}{N} (\frac{\sum_{i=1}^{N}x_{i}}{n}) = \frac{1}{N} (\frac{X1}{n} + \frac{X2}{n} + ... + \frac{XN}{n})</math>
p^{^{x}}
+
(1-p)^{n-x}</math>
+
; X1,X2,...,XN <math>\epsilon</math>
+
R
+
  
<math>\hat{p}
+
</blockquote>
= \frac{1}{N}
+
 
(\frac{\sum_{i=1}^{N}x_{i}}{n}
+
----
) = \frac{1}{N}
+
(\frac{X1}{n}
+
  + \frac{X2}{n}
+
  + ... + \frac{XN}{n}</math>
+
 
   
 
   
 +
'''5) Poisson Distribution:'''
  
5) Poisson Distribution:
+
<math>f(x,\lambda) = \frac{\lambda^{x}e^{-\lambda}}{x!}</math>
  
<math>f(x) = \frac{\lambda^{x}e^{-\lambda}}{x!}
+
X<sub>1</sub>,X<sub>2</sub>,...,X<sub>n</sub> <math>\epsilon</math> R
  ; X1,X2,...,Xn\epsilon
+
R
+
  
\widehat{\lambda}
+
<blockquote style="color: black; border: solid thin gray;">
  = \frac{\sum_{i=1}^{n}x_{i}}{n}
+
  = \overline{X}</math>
+
 
+
  
6) Uniform Distribution:
+
<math>\widehat{\lambda} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X}</math>
 +
</blockquote>
  
<math>For, X1,X2,...,Xn\epsilon
+
----
  R
+
 
 +
'''6) Uniform Distribution:'''
 +
 
 +
For X<sub>1</sub>,X<sub>2</sub>,...,X<sub>n</sub> <math>\epsilon</math> R
  
f(xi) = \frac{1}{\theta}
+
'''f(x<sub>i</sub>) = <math>\frac{1}{\theta}</math> ; if <math> 0 \leq xi \leq \theta </math>
  if 0 \leq
+
xi\leq
+
\theta
+
 
   
 
   
 +
f(x) = 0 ; otherwise'''
  
f(xi) = 0, otherwise,
 
  
and 0 \leq
+
<blockquote style="color: black; border: solid thin gray;">
X1\leq
+
  X2 \leq
+
  X3 ... \leq
+
  Xn \leq
+
\theta
+
 
   
 
   
 +
<math> \hat{\theta} </math> = X<sub>n</sub>
 +
</blockquote>
  
\hat{\theta}
+
----
  = Xn</math>
+
 
+
 
----
 
----
  
=== References ===
+
= References =
  
 
1) A module on Maximum Likelihood Estimation - Examples by Ewa Paszek
 
1) A module on Maximum Likelihood Estimation - Examples by Ewa Paszek
Line 356: Line 336:
  
 
3) Partially based on Dr. Mireille Boutin lecture notes for Purdue ECE 662 - Pattern Recognition and Decision Making Processes
 
3) Partially based on Dr. Mireille Boutin lecture notes for Purdue ECE 662 - Pattern Recognition and Decision Making Processes
 +
 +
----
 +
 +
= [[Maximum Likelihood Estimation (MLE) for various probability distributions|Questions and comments]] =
 +
 +
If you have any questions, comments, etc. please post them on [[Maximum Likelihood Estimation (MLE) for various probability distributions|this page]].
 +
 +
----

Latest revision as of 10:49, 22 January 2015


Maximum Likelihood Estimation (MLE) Analysis for various Probability Distributions
A slecture by Hariharan Seshadri

Partly based on the ECE662 Spring 2014 lecture material of Prof. Mireille Boutin



What would be the learning outcome from this slecture?

  • Basic Theory behind Maximum Likelihood Estimation (MLE)
  • Derivations for Maximum Likelihood Estimates for parameters of Exponential Distribution, Geometric Distribution, Binomial Distribution, Poisson Distribution, and Uniform Distribution


Outline of the slecture

  • Introduction
  • Derivation for Maximum Likelihood Estimates (MLE) for parameters of:
    • Exponential Distribution
    • Geometric Distribution
    • Binomial Distribution
    • Poisson Distribution
    • Uniform Distribution
  • Summary
  • References


Introduction

The maximum likelihood estimate (MLE) is the value $ \hat{\theta} $ which maximizes the function L(θ) given by L(θ) = f (X1,X2,...,Xn | θ) where 'f' is the probability density function in case of continuous random variables and probability mass function in case of discrete random variables and 'θ' is the parameter being estimated.

In other words,$ \hat{\theta} $ = arg maxθ L(θ), where $ \hat{\theta} $ is the best estimate of the parameter 'θ' . Thus, we are trying to maximize the probability density (in case of continuous random variables) or the probability of the probability mass (in case of discrete random variables)

If the random variables X1,X2,...,Xn $ \epsilon $ R are Independent Identically Distributed (I.I.D.) then, L(θ) = i=1n f(xi | θ) . We need to find the value $ \hat{\theta} $ $ \epsilon $ θ that maximizes this function.

In the course, Purdue ECE 662, Pattern Recognition and Decision Taking Processes, we have already looked at the Maximum Likelihood Estimates for for Normally distributed random variables and found that to be:

$ \widehat{\mu} = \frac{\sum_{i=1}^{n}x_{i}}{n} $
$ \hat{\sigma{}^{2}} = \frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\widehat{\mu})^{2} $


where, Xi's are the Normal (Gaussian) Random Variables $ \epsilon $ R , 'n' is the number of samples, and $ \widehat{\mu} $ and $ \hat{\sigma{}^{2}} $ are the estimated Mean and estimated Variance.


Maximum Likelihood Estimate (MLE) for :-

1. Exponential Distribution

Let X1,X2,...,Xn $ \epsilon $ R be a random sample from the exponential distribution with p.d.f.

f(x)=(1|θ) * exp(−x|θ)

The likelihood function L(θ) is a function of x1, x2, x3,...,xn, given by:


L(θ)=(1|θ) * exp(−x1|θ) * (1|θ) * exp(−x2|θ) * ... * (1|θ) * exp(−xn|θ)

L(θ)= (1|θn) * exp( i=1n -xi|θ)

We need to maximize L(θ) . The logarithm of this function will be easier to maximize.

ln [L(θ)] = -n . ln(θ) - (1|θ) i=1n xi

Setting its derivative with respect to the parameter (θ) to zero, we have:

(d|dθ) ln[L(θ)] = (-n|θ) + i=1n (-xi| θ2) = 0

which implies that

$ \hat{\theta} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X} $
= Mean of x1, x2, x3,...,xn

This is the maximum likelihood estimate


2. Geometric Distribution

Let X1,X2,...,Xn $ \epsilon $ R be random samples from the geometric distribution with p.d.f.

f(x) = (1−p)x-1.p ; where x=1,2,3,.... and $ 0 \leq p \leq 1 $

The likelihood function is given by:

L(p) = (1−p)x1-1.p.(1−p)x2-1.p.(1−)x3-1.p...(1−p)xn-1.p

L(p) = pn.(1-p)i=1n xi-n

The likelihood function is a function of x1,x2,...,xn

The log-likelihood is:

ln L(p)=n . ln(p) + i=1n xi-n . ln(1-p)

Setting its derivative with respect to the parameter (p) to zero, we have:

(d|dp) ln. L(p) = (n|p) - (i=1n xi-n|(1-p)) = 0

which implies that

$ \hat{p} = \frac{n}{\sum_{i=1}^{n}x_{i}} = \frac{1}{\overline{X}} $

This is the maximum likelihood estimate.

This is intuitively correct as well. Geometric Distribution is used to model a random variable X which is the number of trials before the first success is obtained. So, for random variables X1,X2,...,Xn, these contain n successes in X1+ X2 +...+ Xn trials.

Intuitively, the estimate of 'p' is the number of successes divided by the total number of trials. This matches with the maximum likelihood estimate of the parameter 'p' got for Geometric Distribution.


3. Binomial Distribution

Let X1,X2,...,XN $ \epsilon $ R be samples obtained from a Binomially Distribution.

Binomial Distribution is used to model 'x' successes in 'n' Bernoulli trials. Its p.d.f. is given by:

$ f(x) = \frac{n!}{x!(n-x)!} p^{^{x}} (1-p)^{n-x} $

The likelihood function L(p) is given by:

$ L(p) = \prod_{i=1}^{n} f(x_{i}) = \prod_{i=1}^{N} \frac{n!}{x_{i}!(n-xi)!} p^{x_{i}} (1-p)^{n-x_{i}} $

The log-likelihood is:

$ lnL(p) = \sum_{i=1}^{N} ln (n!) - \sum_{i=1}^{N} ln (x_{i}!) - \sum_{i=1}^{N} ln(n-x_{i}!) + \sum_{i=1}^{N} xi.ln(p) + (n- \sum_{i=1}^{N} xi) . ln(1-p) $

Setting its derivative with respect to p to zero,

$ \frac{d}{dp} lnL(p) = \frac{1}{p}. \sum_{i=1}^{N} xi - \frac{1}{1-p} \sum_{i=1}^{N} (n - xi) = 0 $

which implies,

$ \frac{1}{p}. \sum_{i=1}^{N} xi = (\frac{1}{1-p})( N.n - \sum_{i=1}^{N} xi) $

giving,


$ \hat{p} = \frac{1}{N} (\frac{\sum_{i=1}^{N}x_{i}}{n}) = \frac{1}{N} (\frac{X1}{n} + \frac{X2}{n} + ... + \frac{XN}{n}) $

which is the maximum likelihood estimate.

This is intuitively correct too, as this is the average of the ratio $ \frac{X_{i}}{n} $ for each Xi, which is intuitively average of 'p' for each Xi


4.Poisson Distribution

Let X1,X2,...,Xn $ \epsilon $ R be a random sample from a Poisson distribution

The p.d.f. of a Poisson Distribution is :

$ f(x) = \frac{\lambda^{x}e^{-\lambda}}{x!} $ ; where x =0,1,2,...

The likelihood function is:

$ L(\lambda ) = \prod_{i=1}^{n} \frac{\lambda^{x_{i}}e^{-\lambda}}{x_{i}!} = e^{-\lambda n} \frac{\lambda^{\sum_{i=1}^{n}x_{i}}}{\prod_{i=1}^{n}x_{i}} $

The log-likelihood is:

$ ln L(\lambda) = - \lambda n + \sum_{i=1}^{n} xi . ln(\lambda) - ln( \prod_{i=1}^{n}xi) $

Setting its derivative with respect to $ \lambda $ to zero, we have:

$ \frac{d}{d\lambda}ln L(\lambda) = -n + \sum_{i=1}^{n}xi .\frac{1}{\lambda} = 0 $

giving,

$ \widehat{\lambda} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X} $

which is the maximum likelihood estimate


5.Uniform Distribution

For Uniformly Distributed random variables X1,X2,...,Xn $ \epsilon $ R, the p.d.f is given by:

f(xi) = $ \frac{1}{\theta} $ ; if $ 0 \leq xi \leq \theta $

f(x) = 0 ; otherwise

If the uniformly distributed random variables are arranged in the following order

$ 0 \leq X1\leq X2 \leq X3 ... \leq Xn \leq \theta $,

The likelihood function is given by:

$ L(\theta) = \prod_{i=1}^{n} f(xi) = \prod_{i=1}^{n} \frac{1}{\theta} = \theta{}^{-n} $

The log-likelihood is:

$ ln L(\theta) = -n ln (\theta) $

Setting its derivative with respect to parameter $ \theta $ to zero, we get:

$ \frac{d}{d\theta} ln L(\theta) = \frac{-n}{\theta} $

which is < 0 for $ \theta $ > 0

Hence, L($ \theta $) is a decreasing function and it is maximized at $ \theta $ = xn

The maximum likelihood estimate is thus,

$ \hat{\theta} $ = Xn


Summary

Using the usual notations and symbols,

1) Normal Distribution:

$ f(x,\mu,\sigma) = \frac{1}{\sigma \sqrt(2\pi)} exp(-\frac{1}{2}(\frac{x-\mu}{\sigma})^{2}) $

X1,X2,...,Xn $ \epsilon $ R

$ \widehat{\mu} = \frac{\sum_{i=1}^{n}x_{i}}{n} $
$ \hat{\sigma{}^{2}} = \frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\widehat{\mu})^{2} $

2) Exponential Distribution:

f(x,$ \lambda $)=(1|$ \lambda $)*exp(−x|$ \lambda $) ; X1,X2,...,Xn $ \epsilon $ R

$ \widehat{\lambda} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X} $


3) Geometric Distribution:

f(x,p) = (1−p)x-1.p ; X1,X2,...,Xn $ \epsilon $ R

$ \hat{p} = \frac{n}{\sum_{i=1}^{n}x_{i}} = \frac{1}{\overline{X}} $


4) Binomial Distribution:

$ f(x,p) = \frac{n!}{x!(n-x)!} p^{^{x}} (1-p)^{n-x} $

X1,X2,...,Xn $ \epsilon $ R

$ \hat{p} = \frac{1}{N} (\frac{\sum_{i=1}^{N}x_{i}}{n}) = \frac{1}{N} (\frac{X1}{n} + \frac{X2}{n} + ... + \frac{XN}{n}) $


5) Poisson Distribution:

$ f(x,\lambda) = \frac{\lambda^{x}e^{-\lambda}}{x!} $

X1,X2,...,Xn $ \epsilon $ R

$ \widehat{\lambda} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X} $


6) Uniform Distribution:

For X1,X2,...,Xn $ \epsilon $ R

f(xi) = $ \frac{1}{\theta} $ ; if $ 0 \leq xi \leq \theta $

f(x) = 0 ; otherwise


$ \hat{\theta} $ = Xn



References

1) A module on Maximum Likelihood Estimation - Examples by Ewa Paszek

2) Lecture on Maximum Likelihood Estimation by Dr. David Levin, Assistant Professor, Univeristy of Utah

3) Partially based on Dr. Mireille Boutin lecture notes for Purdue ECE 662 - Pattern Recognition and Decision Making Processes


Questions and comments

If you have any questions, comments, etc. please post them on this page.


Alumni Liaison

Sees the importance of signal filtering in medical imaging

Dhruv Lamba, BSEE2010