Revision as of 08:12, 27 April 2014

Maximum Likelihood Estimation (MLE) Analysis for various Probability Distributions
A slecture by Hariharan Seshadri

(partially based on Prof. Mireille Boutin's ECE 662 lecture)

What would be the learning outcome from this slecture?

Basic Theory behind Maximum Likelihood Estimation (MLE)
Derivations for Maximum Likelihood Estimates for parameters of Exponential Distribution, Geometric Distribution, Binomial Distribution, Poisson Distribution, and Uniform Distribution

Outline of the slecture

Introduction
Derivation for Maximum Likelihood Estimates (MLE) for parameters of:
- Exponential Distribution
- Geometric Distribution
- Binomial Distribution
- Poisson Distribution
- Uniform Distribution
Summary
References

Introduction

The maximum likelihood estimate (MLE) is the value $\hat{\theta}$ which maximizes the function L(θ) given by L(θ) = f (X1,X2,...,Xn | θ) where 'f' is the probability density function in case of continuous random variables and probability mass function in case of discrete random variables and 'θ' is the parameter being estimated.

In other words, $\hat{\theta}$ = arg max_θ L(θ), where $\hat{\theta}$ is the best estimate of the parameter 'θ' . Thus, we are trying to maximize the probability density (in case of continuous random variables) or the probability of the probability mass (in case of discrete random variables)

If the random variables X1,X2,...,Xn $\epsilon$ R are Independent Identically Distributed (I.I.D.) then, L(θ) = _i=1 ∏ ⁿ f(x_i | θ) . We need to find the value $\hat{\theta}$ $\epsilon$ θ that maximizes this function.

In the course, Purdue ECE 662, Pattern Recognition and Decision Taking Processes, we have already looked at the MLE of the Normal Distribution and found that to be:

\widehat{\mu} = \frac{\sum_{i=1}^{n}x_{i}}{n}

\hat{\sigma{}^{2}} = \frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\widehat{\mu})^{2}

where, Xi's are the Normal (Gaussian) Random Variables $\epsilon$ R , 'n' is the number of samples, and $\widehat{\mu}$ and $\hat{\sigma{}^{2}}$ are the estimated Mean and estimated Variance.

Maximum Likelihood Estimate (MLE)for :-

1. Exponential Distribution

Let X₁,X₂,...,X_n $\epsilon$ R be a random sample from the exponential distribution with p.d.f.

f(x,θ)=(1|θ)*exp(−x|θ)

The likelihood function L(θ) is a function of x₁, x₂, x₃,...,x_n

L(θ)=(1|θ)*exp(−x₁|θ)*(1|θ)*exp(−x₂|θ)*...*(1|θ)*exp(−x_n|θ)

L(θ)= (1|θⁿ) * exp( _i=1∑ⁿ -x_i|θ)

We need to maximize L(θ) . The logarithm of this function will be easier to maximize.

ln [L(θ)] = -n . ln(θ) - (1|θ) _i=1∑ⁿ x_i

Setting its derivative with respect to the parameter (θ) to zero, we have:

(d|dθ) ln[L(θ)] = -n|θ + _i=1∑ⁿ (-x_i| θ²) = 0

which implies that

\hat{\theta} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X}

= Mean of x₁, x₂, x₃,...,x_n

This is the maximum likelihood estimate

2. Geometric Distribution

Let X₁,X₂,...,X_n $\epsilon$ R be random samples from the geometric distribution with p.d.f.

f(x,p) = (1−p)^x-1.p ; where x=1,2,3,.... and $0 \leq p \leq 1$

The likelihood function is given by:

L(p) = (1−p)^x₁-1.p.(1−p)^x₂-1.p.(1−)^x₃-1.p...(1−p)^x_n-1.p

L(p) = pⁿ.(1-p)^{_i=1∑ⁿ x_i-n}

The likelihood function is a function of x₁,x₂,...,x_n

The log-likelihood is:

ln L(p)=n . ln(p) + _i=1∑ⁿ x_i-n . ln(1-p)

Setting its derivative with respect to the parameter (p) to zero, we have:

(d|dp) ln. L(p) = (n|p) - (_i=1∑ⁿ x_i-n|(1-p)) = 0

which implies that

\hat{p} = \frac{n}{\sum_{i=1}^{n}x_{i}} = \frac{1}{\overline{X}}

This is the maximum likelihood estimate.

This is intuitively correct as well. Geometric Distribution is used to model a random variable X which is the number of trials before the first success is obtained. So, for random variables X₁,X₂,...,X_n, these contain n successes in X₁+ X₂ +...+ X_n trials.

Intuitively, the estimate of 'p' is the number of successes divided by the total number of trials. This matches with the maximum likelihood estimate of the parameter 'p' got for Geometric Distribution.

3. Binomial Distribution

Let X₁,X₂,...,X_N $\epsilon$ R be samples obtained from a Binomially Distribution.

Binomial Distribution is used to model 'x' successes in 'n' Bernoulli trials. Its p.d.f. is given by:

$f(x) = \frac{n!}{x!(n-x)!} p^{^{x}} (1-p)^{n-x}$

The likelihood function L(p) is given by:

$L(p) = \prod_{i=1}^{n} f(x_{i}) = \prod_{i=1}^{N} \frac{n!}{x_{i}!(n-xi)!} p^{x_{i}} (1-p)^{n-x_{i}}$

The log-likelihood is:

$lnL(p) = \sum_{i=1}^{N} ln (n!) - \sum_{i=1}^{N} ln (x_{i}!) - \sum_{i=1}^{N} (n-x_{i}!) + \sum_{i=1}^{N} xi.ln(p) + (n- \sum_{i=1}^{N} xi) . ln(1-p)$

Setting its derivative with respect to p to zero,

$\frac{d}{dp} lnL(p) = \frac{1}{p}. \sum_{i=1}^{N} xi - \frac{1}{1-p} \sum_{i=1}^{N} (n - xi) = 0$

which implies,

$\frac{1}{p}. \sum_{i=1}^{N} xi = (\frac{1}{1-p})( N.n - \sum_{i=1}^{N} xi)$

giving,

\hat{p} = \frac{1}{N} (\frac{\sum_{i=1}^{N}x_{i}}{n}) = \frac{1}{N} (\frac{X1}{n} + \frac{X2}{n} + ... + \frac{XN}{n})

which is the maximum likelihood estimate.

This is intuitively correct too, as this is the average of the ratio \frac{X_{i}}{n} for each X_{i}, which is intuitively average of 'p' for each X_{i}

4.Poisson Distribution

Let X1,X2,...,Xn $\epsilon$

 R be a random sample from a Poisson distribution

The p.d.f. of a Poisson Distribution is :

$f(x) = \frac{\lambda^{x}e^{-\lambda}}{x!}$

 where x =0,1,2,...

The likelihood function is: $L(\lambda ) = \prod_{i=1}^{n} \frac{\lambda^{x_{i}}e^{-\lambda}}{x_{i}!} = e^{-\lambda n} \frac{\lambda\sum_{i=1}^{n}x_{i}}{\prod_{i=1}^{n}x_{i}}$

The log-likelihood is:

$ln L(\lambda ) = - \lambda n + \sum_{i=1}^{n} xi . ln(\lambda ) - ln( \prod_{i=1}^{n} xi)$

Setting its derivative to zero, we have:

$\frac{d}{d\lambda} ln L(\lambda ) = -n + \sum_{i=1}^{n} xi .\frac{1}{\lambda} = 0$

$\widehat{\lambda} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X}$

 which is the maximum likelihood estimate

5.Uniform Distribution

For Uniformly Distributed random variables X1,X2,...,Xn $\epsilon$

 R, the density function is given by:

$f(xi) = \frac{1}{\theta} if 0 \leq xi \leq \theta$

$$ f(x) = 0 $$ otherwise

If the uniformly distributed random variables are arranged in the following order

$X1\leq X2 \leq X3 ... \leq Xn and 0 \leq X1\leq X2 \leq X3 ... \leq Xn \leq \theta$

Now, the likelihood function is given as:

$L(\theta ) = \prod_{i=1}^{n} f(xi) = \prod_{i=1}^{n} \frac{1}{\theta} = \theta{}^{-n}$

The log-likelihood is:

$ln L(\theta ) = -n ln (\theta )$

Setting its derivative to zero, we get:

$\frac{d}{d\theta} ln L(\theta ) = \frac{-n}{\theta} which is < 0 for \theta >0$

Hence, L( $\theta$

) is a decreasing function and it is maximized at  $\theta$ 
 = xn. The maximum likelihood estimate is thus,

$\hat{\theta} = xn$

Summary

Using the usual notations and symbols,

1) Normal Distribution:

$f(x,\mu ,\sigma ) = \frac{1}{\sigma \sqrt(2\pi)} exp(-\frac{1}{2}(\frac{x-\mu}{\sigma})^{2} )$

$\widehat{\mu} = \frac{\sum_{i=1}^{n}x_{i}}{n}$

$\hat{\sigma{}^{2}} = \frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\widehat{\mu})^{2}$

2) Exponential Distribution:

f(x, $\lambda$ )=(1| $\lambda$ )*exp(−x| $\lambda$ ) $\epsilon$ R

$\widehat{\lambda} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X}$

3) Geometric Distribution:

f(x, p) = (1−p)x-1. p ; X1,X2,...,Xn $\epsilon$

$\hat{p} = \frac{n}{\sum_{i=1}^{n}x_{i}} = \frac{1}{\overline{X}}$

4) Binomial Distribution:

$f(x) = \frac{n!}{x!(n-x)!} p^{^{x}} (1-p)^{n-x}$

; X1,X2,...,XN  $\epsilon$ 
R

$\hat{p} = \frac{1}{N} (\frac{\sum_{i=1}^{N}x_{i}}{n} ) = \frac{1}{N} (\frac{X1}{n} + \frac{X2}{n} + ... + \frac{XN}{n}$

5) Poisson Distribution:

$f(x) = \frac{\lambda^{x}e^{-\lambda}}{x!} ; X1,X2,...,Xn\epsilon R \widehat{\lambda} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X}$

6) Uniform Distribution:

$For, X1,X2,...,Xn\epsilon R f(xi) = \frac{1}{\theta} if 0 \leq xi\leq \theta f(xi) = 0, otherwise, and 0 \leq X1\leq X2 \leq X3 ... \leq Xn \leq \theta \hat{\theta} = Xn$

References

1) A module on Maximum Likelihood Estimation - Examples by Ewa Paszek

2) Lecture on Maximum Likelihood Estimation by Dr. David Levin, Assistant Professor, Univeristy of Utah

3) Partially based on Dr. Mireille Boutin lecture notes for Purdue ECE 662 - Pattern Recognition and Decision Making Processes

Difference between revisions of "Maximum Likelihood Estimation Analysis for various Probability Distributions" - Rhea

Revision as of 08:12, 27 April 2014

Contents

What would be the learning outcome from this slecture?

Outline of the slecture

Introduction

Maximum Likelihood Estimate (MLE)for :-

1. Exponential Distribution

2. Geometric Distribution

3. Binomial Distribution

4.Poisson Distribution

5.Uniform Distribution

Summary

References

Alumni Liaison

@@ Line 79: / Line 79: @@
 which implies that
-<center><math>\hat{\theta} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X}</math></center> = Mean of x<sub>1</sub>, x<sub>2</sub>, x<sub>3</sub>,...,x<sub>n</sub>
+<center><math>\hat{\theta} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X}</math></center>
+<center>= Mean of x<sub>1</sub>, x<sub>2</sub>, x<sub>3</sub>,...,x<sub>n</sub> </center>
 This is the maximum likelihood estimate
@@ Line 110: / Line 111: @@
 <center><math>\hat{p} = \frac{n}{\sum_{i=1}^{n}x_{i}} = \frac{1}{\overline{X}}</math></center>
-which is the maximum likelihood estimate.
+This is the maximum likelihood estimate.
 This is intuitively correct as well. Geometric Distribution is used to model a random variable X which is the number of trials before the first success is obtained. So, for random variables X<sub>1</sub>,X<sub>2</sub>,...,X<sub>n</sub>, these contain n successes in X<sub>1</sub>+ X<sub>2</sub> +...+ X<sub>n</sub> trials.
@@ Line 119: / Line 120: @@
 ===3. Binomial Distribution===
-Let X1,X2,...,XN <math>\epsilon</math>
+Let X<sub>1</sub>,X<sub>2</sub>,...,X<sub>N</sub> <math>\epsilon</math> R be samples obtained from a Binomially Distribution.
-  R be the Binomially Distributed Random Variables.
-Binomial Distribution is used to model x
+Binomial Distribution is used to model 'x' successes in 'n' Bernoulli trials. Its p.d.f. is given by:
-  successes in n
-  Bernoulli trials. Its p.d.f. is given by:
-<math>f(x)
+<math>f(x) = \frac{n!}{x!(n-x)!} p^{^{x}} (1-p)^{n-x}</math>
- = \frac{n!}{x!(n-x)!}
- p^{^{x}}
- (1-p)^{n-x}
-</math>
 The likelihood function L(p) is given by:
-<math>L(p)
+<math>L(p) = \prod_{i=1}^{n} f(x_{i}) = \prod_{i=1}^{N} \frac{n!}{x_{i}!(n-xi)!} p^{x_{i}} (1-p)^{n-x_{i}}</math>
- = \prod_{i=1}^{n}
- f(x_{i})
-  = \prod_{i=1}^{N}
- \frac{n!}{x_{i}!(n-xi)!}
- p^{x_{i}}
- (1-p)^{n-x_{i}}</math>
 The log-likelihood is:
-<math>lnL(p)
+<math>lnL(p) = \sum_{i=1}^{N} ln (n!) - \sum_{i=1}^{N} ln (x_{i}!) - \sum_{i=1}^{N} (n-x_{i}!) + \sum_{i=1}^{N} xi.ln(p) + (n- \sum_{i=1}^{N} xi) . ln(1-p)</math>
- = \sum_{i=1}^{N}
- ln (n!) - \sum_{i=1}^{N}
- ln (x_{i}!
- ) - \sum_{i=1}^{N}
- (n-x_{i}!
- ) + \sum_{i=1}^{N}
- xi.ln(p)
-  + (n
-  - \sum_{i=1}^{N}
- xi) . ln(1-p)</math>
+Setting its derivative with respect to p to zero,
-Setting its derivative with respect to p
+<math>\frac{d}{dp} lnL(p) = \frac{1}{p}. \sum_{i=1}^{N} xi - \frac{1}{1-p} \sum_{i=1}^{N} (n - xi) = 0</math>
-  to zero,
-<math>\frac{d}{dp}
-  lnL(p)
-  = \frac{1}{p}
- . \sum_{i=1}^{N}
- xi - \frac{1}{1-p}
- \sum_{i=1}^{N}
- (n - xi) = 0</math>
 which implies,
-<math>\frac{1}{p}
+<math>\frac{1}{p}. \sum_{i=1}^{N} xi = (\frac{1}{1-p})( N.n - \sum_{i=1}^{N} xi)</math>
- . \sum_{i=1}^{N}
- xi = (\frac{1}{1-p})(
- N
- .n
-  - \sum_{i=1}^{N}
-  xi)</math>
-which gives,
+giving,
-<math>\hat{p}
+<center><math>\hat{p} = \frac{1}{N} (\frac{\sum_{i=1}^{N}x_{i}}{n}) = \frac{1}{N} (\frac{X1}{n} + \frac{X2}{n} + ... + \frac{XN}{n})</math></center>
- = \frac{1}{N}
- (\frac{\sum_{i=1}^{N}x_{i}}{n}
+which is the maximum likelihood estimate.
- ) = \frac{1}{N}
- (\frac{X1}{n}
-  + \frac{X2}{n}
-  + ... + \frac{XN}{n}</math>
- ); which is the maximum likelihood estimate.
-This is intuitively correct too, as this is the average of the ratio \frac{X_{i}}{n}
+This is intuitively correct too, as this is the average of the ratio \frac{X_{i}}{n} for each X_{i}, which is intuitively average of 'p' for each X_{i}
-  for each X_{i}
- , which is intuitively average of 'p'
-  for each X_{i}
 -----