Maximum Likelihood Estimation Old Kiwi - Rhea

Advantages of MLE

Complement to Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation, ECE662, Spring 2010, Prof. Boutin

MLE

Always have good convergence properties as number of training samples increases.
MLE is often simpler than other methods of parameter estimation.

Parameter Estimation by MLE

Example 1: The Gaussian Case: Unknown $\mu$

Suppose the samples are drawn from a multivariate normal population with mean $\mu$ and covariance matrix $\sigma$ . For this example only mean is unknown. Let $$ x_k $$ be sample point.

$\ln p(x_k|\mu) = -\frac{1}{2} \ln (2\pi)^d|\Sigma| - \frac{1}{2} (x_k - \mu)^t \Sigma^{-1} (x_k - \mu))$

$\nabla_{\mu} \ln p(x_k|\mu) = \Sigma^{-1}(x_k-\mu)$

Thus differentiating above equation and equating to 0, we get

$\sum_{k=1}^n \Sigma^{-1} (x_k-\hat{\mu}) = 0$

Multiplying by $\Sigma$ and rearranging, we obtain

$\hat{\mu} = \frac{1}{n} \sum_{k=1}^n x_k$

Thus the MLE for the unknown population mean is the arithmetic average of the training samples called *the sample mean*

Example 2: The Gaussian Case: Unknown $\mu$ and $\sigma$

In this example both mean $\mu$ and covariance matrix $\sigma$ are unknown. These unknown parameters constitute the components of the parameter vector $\theta$ . Consider univariate case with $\theta_1 = \mu$ and $\theta_2 = \sigma^2$ .

$\ln p(x_k|\theta) = -\frac{1}{2} \ln 2\pi\theta_2 - \frac{1}{2\theta_2}(x_k - \theta_1)^2$

Taking derivative of above equation

$\nabla_{\theta}l = \nabla_{\theta} \ln p(x_k|\theta) = [ \frac{1}{\theta_2}(x_k - \theta_1) ; -\frac{1}{2\theta_2} +\frac{(x_k-\theta_1)^2}{2\theta_2^2}].$

Equating the above equation to 0, we get

$\sum_{k=1}^n \frac{1}{\hat{\theta_2}}(x_k-\hat{\theta_1}) = 0$

and

$-\sum_{k=-1}^{n} \frac{1}{\hat{\theta_2}} + \sum_{k=1}^n \frac{(x_k-\hat{\theta_1})^2}{\hat{\theta_2}^2} = 0$

where $\hat{\theta_1}$ and $\hat{\theta_2}$ are maximum likelihood estimates for $\theta_1$ and $\theta_2$ respectively. Substituting $\hat{\mu} = \hat{\theta_1}$ and $\hat{\sigma} = \hat{\theta_2}$ , we obtain

$\hat{\mu} = \frac{1}{n} \sum_{k=1}^n x_k$

and

$\hat{\sigma}^2 = \frac{1}{n} \sum_{k=1}^n(x_k - \hat{\mu})^2.$

Maximum Likelihood Estimation Old Kiwi - Rhea

Advantages of MLE

See Also

Alumni Liaison