Parametric Estimators OldKiwi - Rhea

This page and its subtopics discusses about Parametric Estimators

Lectures discussing Parametric Estimators: Lecture 7 and Lecture 8.

More on the MLE

Issues related to the properties and computational efficiency of the Maximum Likelihood Estimator

The MLE estimator is probably the most important parameter estimator in classical statistics. The reason is that the MLE estimator is asymptotically efficient. That is to say that given a large enough data sample, the estimator will be efficient. Furthermore if $\hat \theta$ is the MLE estimator of the parameter $\theta$ , then $\sqrt{n}({\hat \theta}-\theta)$ will asymptotically converges to the distribution $\mathcal{N}(0,v(\theta))$ where $v(\theta)$ is the Cramer Rao Bound.

But what is an efficient estimator? An estimator ${\hat \theta}$ is efficient if:

$\hat \theta$ here is an unbiased estimator.
$\hat \theta$ achieves the Cramer-Rao Lower Bound(CRLB).

The CRLB is the minimum variance achievable by any unbiased estimator for a parameter. The estimator that is unbiased and achieves the CRLB is referred to as the Minimum Variance Unbiased Estimator(MVUE).

So the MLE is an important estimator because:

If an MVUE exists, the MLE procedure will find it.
If an MVUE does not exist, the MLE will asymptotically converge to it.

Therefore if the pdf of the model is known the MLE is often a good candidate estimator since it can be computed (although this might not be an easy task) and it is "optimal" for a large enough data set ( although how large is large enough is not always easily answered). The MLE does have some disadvantages in practice:

It is not the best method for small data and can give highly erroneous results in some cases.
The computation can be extremely difficult and sometimes leads to a plethora of numerical methods such as:
Brute Force Method (i.e compute the pdf on a very fine grid and try to get the maximum). Although it can be done, this is very computationally inefficiently.
Iterative Methods (i.e. Newton-Raphson which does not guarantee convergence. In fact good initial guess is needed here)
Scoring Method
Expectation-Maximization: Guarantees convergence to at least a local maximum. A good method for the complicated vector-parameter cases.

There are two methods for estimating parameters that are the center of a heated debate within the pattern recognition community. These methods are Maximum Likelihood Estimation (MLE) and Bayesian parameter estimation. Despite the difference in theory between these two methods, they are quite similar when they are applied in practice.

Comparison of MLE and Bayesian Parameter Estimation

Maximum Likelihood (ML) and Bayesian parameter estimation make very different assumptions. Here the assumptions are contrasted briefly:

MLE

Deterministic (single, non-random) estimate of parameters, theta_ML

Determining probability of a new point requires one calculation: P(x|theta)
No "prior knowledge"
Estimate of variance and other parameters is often biased
Overfitting solved with regularization parameters

Bayes

Probabilistic (probability density) estimate of parameters, p(theta | Data)
Determining probability of a new point requires integration over the parameter space
Prior knowledge is necessary
With certain specially-designed priors, leads naturally to unbiased estimate of variance
Overfitting is solved by the selection of the prior

In the end, both methods require a parameter to avoid overfitting. The parameter used in Maximum Likelihood Estimation is not as intellectually satisfying, because it does not arise as naturally from the derivation. However, even with Bayesian likelihood it is difficult to justify a given prior. For example, what is a "typical" standard deviation for a Gaussian distribution?

Parameter Estimation

In the problem of parameter estimation, one is concerned with the task of finding the "best" guess of the underlying parameters that give rise to particular observations. The parameters to be estimated can be any type and do not necessarily have to be defined in the same functional space as the observations. The only requirement is that there exists a well-defined mathematical relationship between the parameters and the observations. In the following, we discuss some general formulations for solving the estimation problem.

Bayesian Parameter Estimation (BPE)

See details in [Bayesian Parameter Estimation] In the Bayesian formulation, the parameters to be estimated are treated as random variables. The Bayes estimate is the one that minimizes the Bayes risk by minimizing the posterior cost. The use of a different cost function in the Bayesian estimation yields different estimates. Two popular cost functions are considered below:

Minimum Mean-Square Error (MMSE) Estimation

In MMSE estimation, the goal is compute the minimum mean-square error estimate. The MMSE estimate is the conditional mean of the parameter given the observation. Wiener filtering can be implemente to generate the LMMSE estimate of the inverse solution in certain medical imaging applications.

Maximum A Posteriori Probability (MAP) Estimation: See details in [Maximum A Posteriori]

MAP estimators prove useful in various applications, for example in emission tomography such as CT, PET, or SPECT. The estimate is obtained by using a uniform cost function that is 0 if the error of the estimate is small (ie. some small value delta), and 1 elsewhere. The MAP estimate depends critically on the a priori probability, and prove useful in situations where observations alone is inadequate. The MAP equation is generally non-linear, and thus requires the use of iterative algorithms to compute a numerical solution as analytical expressions are not usually available.

Maximum Likelihood Estimation (MLE)

See details in [Maximum Likelihood Estimation] In many situations, we may have nonrandom but unknow parameters or we may not have sufficient information to assign an priori probability to the unknown parameter. In these cases, maximum likelihood estimate (MLE) has been found useful. As with MAP solutions, the MLE solution is typically obtained by using iterative methods.

Back to ECE662 Spring 2008