(Moved from old Wiki)
 
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
(See Also: [[Lecture 7_Old Kiwi]] and [[BPE_Old Kiwi]])
+
==Advantages of MLE ==
 
+
for [[ECE662:BoutinSpring08_Old_Kiwi|ECE662: Decision Theory]]
Advantages of MLE :
+
  
 +
Complement to [[Lecture_7_-_MLE_and_BPE_OldKiwi|Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation]], [[ECE662]], Spring 2010, Prof. Boutin
 +
----
 +
MLE
 
# Always have good convergence properties as number of training samples increases.
 
# Always have good convergence properties as number of training samples increases.
 
# MLE is often simpler than other methods of parameter estimation.
 
# MLE is often simpler than other methods of parameter estimation.
Line 51: Line 53:
  
 
<math>\hat{\sigma}^2 = \frac{1}{n} \sum_{k=1}^n(x_k - \hat{\mu})^2.</math>
 
<math>\hat{\sigma}^2 = \frac{1}{n} \sum_{k=1}^n(x_k - \hat{\mu})^2.</math>
 
+
----
[[MLE Examples: Exponential and Geometric Distributions_Old Kiwi]]
+
==See Also==
 
+
*[[MLE Examples: Exponential and Geometric Distributions_Old Kiwi|MLE Examples: Exponential and Geometric Distributions]]
[[MLE Examples: Binomial and Poisson Distributions_Old Kiwi]]
+
*[[MLE Examples: Binomial and Poisson Distributions_Old Kiwi|MLE Examples: Binomial and Poisson Distributions]]
 +
----
 +
Back to [[Lecture_7_-_MLE_and_BPE_OldKiwi|Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation]], [[ECE662]], Spring 2010, Prof. Boutin

Latest revision as of 10:37, 20 May 2013

Advantages of MLE

for ECE662: Decision Theory

Complement to Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation, ECE662, Spring 2010, Prof. Boutin


MLE

  1. Always have good convergence properties as number of training samples increases.
  2. MLE is often simpler than other methods of parameter estimation.

Parameter Estimation by MLE

Example 1: The Gaussian Case: Unknown $ \mu $

Suppose the samples are drawn from a multivariate normal population with mean $ \mu $ and covariance matrix $ \sigma $. For this example only mean is unknown. Let $ x_k $ be sample point.

$ \ln p(x_k|\mu) = -\frac{1}{2} \ln (2\pi)^d|\Sigma| - \frac{1}{2} (x_k - \mu)^t \Sigma^{-1} (x_k - \mu)) $

$ \nabla_{\mu} \ln p(x_k|\mu) = \Sigma^{-1}(x_k-\mu) $

Thus differentiating above equation and equating to 0, we get

$ \sum_{k=1}^n \Sigma^{-1} (x_k-\hat{\mu}) = 0 $

Multiplying by $ \Sigma $ and rearranging, we obtain

$ \hat{\mu} = \frac{1}{n} \sum_{k=1}^n x_k $

Thus the MLE for the unknown population mean is the arithmetic average of the training samples called *the sample mean*

Example 2: The Gaussian Case: Unknown $ \mu $ and $ \sigma $

In this example both mean $ \mu $ and covariance matrix $ \sigma $ are unknown. These unknown parameters constitute the components of the parameter vector $ \theta $. Consider univariate case with $ \theta_1 = \mu $ and $ \theta_2 = \sigma^2 $.

$ \ln p(x_k|\theta) = -\frac{1}{2} \ln 2\pi\theta_2 - \frac{1}{2\theta_2}(x_k - \theta_1)^2 $

Taking derivative of above equation

$ \nabla_{\theta}l = \nabla_{\theta} \ln p(x_k|\theta) = [ \frac{1}{\theta_2}(x_k - \theta_1) ; -\frac{1}{2\theta_2} +\frac{(x_k-\theta_1)^2}{2\theta_2^2}]. $

Equating the above equation to 0, we get

$ \sum_{k=1}^n \frac{1}{\hat{\theta_2}}(x_k-\hat{\theta_1}) = 0 $

and

$ -\sum_{k=-1}^{n} \frac{1}{\hat{\theta_2}} + \sum_{k=1}^n \frac{(x_k-\hat{\theta_1})^2}{\hat{\theta_2}^2} = 0 $

where $ \hat{\theta_1} $ and $ \hat{\theta_2} $ are maximum likelihood estimates for $ \theta_1 $ and $ \theta_2 $ respectively. Substituting $ \hat{\mu} = \hat{\theta_1} $ and $ \hat{\sigma} = \hat{\theta_2} $, we obtain

$ \hat{\mu} = \frac{1}{n} \sum_{k=1}^n x_k $

and

$ \hat{\sigma}^2 = \frac{1}{n} \sum_{k=1}^n(x_k - \hat{\mu})^2. $


See Also


Back to Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation, ECE662, Spring 2010, Prof. Boutin

Alumni Liaison

EISL lab graduate

Mu Qiao