Maximum Likelihood Estimators and Examples
A slecture by Lu Zhang

Partially based on the ECE662 Spring 2014 lecture material of Prof. Mireille Boutin.

# Outline of the slecture

• Introduction
• Derivation for Maximum Likelihood Estimates (MLE)
• Examples
• Summary
• References

## Introduction

Once we have decided on a model(Probability Distribution), our next step is often to estimate some information from the observed data. There are generally two parametric frameworks for estimating unknown information from data. We will refer to these two general frameworks as the Frequentist and Baysian approaches. One very widely used Frequentist estimator is known as the Maximum Likelihood estimator.

In the frequentist approach, one treats the unknown quantity as a deterministic, but unknown parameter vector, $\theta \in \Omega$. So for example, after we observe the random vector $Y \in \mathbb{R}^{n}$, then our objective is to use $Y$ to estimate the unknown scalar or vector $\theta$. In order to formulate this problem, we will assume that the vector $Y$ has a probability density function given by $p_{\theta}(y)$ where $\theta$ parameterizes a family of density functions for $Y$. We may then use this family of distributions to determine a function, $T : \mathbb{R}^{n} \rightarrow \Omega$, that can be used to compute an estimate of the unknown parameter as

$\hat{\theta} = T(Y)$

Notice, that since $T(Y)$ is a function of random vector $Y$, the estimate, $\hat{\theta}$, is a random vector. The mean of the estimator, $\bar{\theta}$, can be computed as

$\bar{\theta} = E_{\theta}[\hat{\theta}] = \int_{\mathbb{R}^{n}} T(y)p_{\theta}(y)dy$

The difference between the mean of the estimator and the value of the parameter is known as the bias and is given by

$bias_{\theta} = \bar{\theta} -\theta$

Similarly, the variance of the estimator is given by

$var_{\theta} = E_{\theta}[(\hat{\theta} -\bar{\theta})^2]$

and it is easily shown that the mean squared error (MSE) of the estimate is then given by

$MSE_{\theta} = E_{\theta}[(\hat{\theta}-\theta)^2] = var_{\theta} + (bias_{\theta})^2$

Since the bias, variance, and the MSE of the estimator will depend on the specific value of $\theta$, it is often unclear precisely how to compare the accuracy of different estimators. Even estimators that seem quite poor may produce small or zero error for certain values of $\theta$. For example, consider the estimator which is fixed to the value $\hat{\theta}=1$, independent of the data. This would seem to be a very poor estimator, but it has an MSE of 0 when $\theta=1$.

An estimator is said to be consistent if for all $\theta \in \Omega$, the MSE of the estimator goes to zero as the number of independent data samples, n, goes to infinity. If an estimator is not consistent, this means that even with arbitrarily large quantities of data, the estimate will not approach the true value of the parameter. Consistency would seem to be the least we would expect of an estimator, but we will later see that even some very intuitive estimators are not always consistent.

Ideally, it would be best if one could select an estimator which has uniformly low bias and variance for all values of $\theta$. This is not always possible, but when it is we have names for such estimators. For example, $\hat{\theta}$ is said to be an unbiased estimator if for all values of $\theta$ the bias is zero, i.e. $\theta = \bar{\theta}$. If in addition, for all values of $\theta$, the variance of an estimator is less than or equal to that of all other unbiased estimators, then we say that the estimator is uniformly minimum variance unbiased(UMVU) estimator.

There are many excellent estimators that have been proposed through the years for many different types of problems. However, one very widely used Frequentist estimator is known as the maximum likelihood(ML) estimator given by

$\hat{\theta} =$

## Alumni Liaison

Basic linear algebra uncovers and clarifies very important geometry and algebra.

Dr. Paul Garrett