Line 60: | Line 60: | ||

There are many excellent estimators that have been proposed through the years for many different types of problems. However, one very widely used Frequentist estimator is known as the maximum likelihood(ML) estimator given by | There are many excellent estimators that have been proposed through the years for many different types of problems. However, one very widely used Frequentist estimator is known as the maximum likelihood(ML) estimator given by | ||

− | <center><math>\hat{\theta} = </math></center> | + | <center><math>\hat{\theta} = \arg\max_{\theta \in \Omega} p_{\theta}(Y)</math></center> |

+ | |||

+ | <center><math> = \arg\max_{\theta \in \Omega} \log p_{\theta}(Y)</math></center> | ||

+ | |||

+ | where the notation "argmax" denotes the value of the argument that achieves the global maximum of the function. Notice that these formulas for the ML estimate actually use the random variable <math>Y</math> as an argument to the density function <math>p_{\theta}(y)</math>. This implies that <math>\hat{\theta}</math> is function of <math>Y</math>, which in turn means that <math>\hat{\theta}</math> is a random variable. | ||

+ | |||

+ | |||

+ | When the density function, <math>p_{\theta}(y)</math>, is a continuously differentiable function of <math>\theta</math>, then a necessary condition when computing the ML estimate is that the gradient of the likelihood is zero. | ||

+ | |||

+ | <center><math>\bigtriangledown_{\theta} p_{\theta}(Y)|_{\theta=\hat{\theta}} = 0</math></center> | ||

+ | |||

+ | While the ML estimate is generally not unbiased, it does have a number of desirable properties. |

## Revision as of 12:53, 29 April 2014

**Maximum Likelihood Estimators and Examples**

A slecture by Lu Zhang

Partially based on the ECE662 Spring 2014 lecture material of Prof. Mireille Boutin.

# Outline of the slecture

- Introduction
- Derivation for Maximum Likelihood Estimates (MLE)
- Examples
- Summary
- References

## Introduction

Once we have decided on a model(Probability Distribution), our next step is often to estimate some information from the observed data. There are generally two parametric frameworks for estimating unknown information from data. We will refer to these two general frameworks as the Frequentist and Baysian approaches. One very widely used Frequentist estimator is known as the Maximum Likelihood estimator.

In the frequentist approach, one treats the unknown quantity as a deterministic, but unknown parameter vector, $ \theta \in \Omega $. So for example, after we observe the random vector $ Y \in \mathbb{R}^{n} $, then our objective is to use $ Y $ to estimate the unknown scalar or vector $ \theta $. In order to formulate this problem, we will assume that the vector $ Y $ has a probability density function given by $ p_{\theta}(y) $ where $ \theta $ parameterizes a family of density functions for $ Y $. We may then use this family of distributions to determine a function, $ T : \mathbb{R}^{n} \rightarrow \Omega $, that can be used to compute an estimate of the unknown parameter as

Notice, that since $ T(Y) $ is a function of random vector $ Y $, the estimate, $ \hat{\theta} $, is a random vector. The mean of the estimator, $ \bar{\theta} $, can be computed as

The difference between the mean of the estimator and the value of the parameter is known as the bias and is given by

Similarly, the variance of the estimator is given by

and it is easily shown that the mean squared error (MSE) of the estimate is then given by

Since the bias, variance, and the MSE of the estimator will depend on the specific value of $ \theta $, it is often unclear precisely how to compare the accuracy of different estimators. Even estimators that seem quite poor may produce small or zero error for certain values of $ \theta $. For example, consider the estimator which is fixed to the value $ \hat{\theta}=1 $, independent of the data. This would seem to be a very poor estimator, but it has an MSE of 0 when $ \theta=1 $.

An estimator is said to be consistent if for all $ \theta \in \Omega $, the MSE of the estimator goes to zero as the number of independent data samples, n, goes to infinity. If an estimator is not consistent, this means that even with arbitrarily large quantities of data, the estimate will not approach the true value of the parameter. Consistency would seem to be the least we would expect of an estimator, but we will later see that even some very intuitive estimators are not always consistent.

Ideally, it would be best if one could select an estimator which has uniformly low bias and variance for all values of $ \theta $. This is not always possible, but when it is we have names for such estimators. For example, $ \hat{\theta} $ is said to be an unbiased estimator if for all values of $ \theta $ the bias is zero, i.e. $ \theta = \bar{\theta} $. If in addition, for all values of $ \theta $, the variance of an estimator is less than or equal to that of all other unbiased estimators, then we say that the estimator is uniformly minimum variance unbiased(UMVU) estimator.

There are many excellent estimators that have been proposed through the years for many different types of problems. However, one very widely used Frequentist estimator is known as the maximum likelihood(ML) estimator given by

where the notation "argmax" denotes the value of the argument that achieves the global maximum of the function. Notice that these formulas for the ML estimate actually use the random variable $ Y $ as an argument to the density function $ p_{\theta}(y) $. This implies that $ \hat{\theta} $ is function of $ Y $, which in turn means that $ \hat{\theta} $ is a random variable.

When the density function, $ p_{\theta}(y) $, is a continuously differentiable function of $ \theta $, then a necessary condition when computing the ML estimate is that the gradient of the likelihood is zero.

While the ML estimate is generally not unbiased, it does have a number of desirable properties.