Maximum likelihood

Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution of a given data set.

The method was pioneered by geneticist and statistician Sir Ronald A. Fisher between 1912 and 1922 (see external resources below for more information on the history of MLE).

Prerequisites

The following discussion assumes that the reader is familiar with basic notions in probability theory such as probability distributions, probability density functions, random variables and expectation. It also assumes s/he is familiar with standard basic techniques of maximising continuous real-valued functions, such as using differentiation to find a function's maxima.

The philosophy of MLE

Given a probability distribution <math>D</math>, associated with either a known probability density function (continuous distribution) or a known probability mass function (discrete distribution), denoted as <math>f_D</math>, and distributional parameter <math>\theta</math>, we may draw a sample <math>X_1, X_2, ..., X_n</math> of <math>n</math> values from this distribution and then using <math>f_D</math> we may compute the probability associated with our observed data:

<math>\mathbb{P}(x_1,x_2,\dots,x_n) = f_D(x_1,\dots,x_n \mid \theta)</math>

However, it may be that we don't know the value of the parameter <math>\theta</math> despite knowing (or believing) that our data comes from the distribution <math>D</math>. How should we estimate <math>\theta</math>? It is a sensible idea to draw a sample of <math>n</math> values <math>X_1, X_2, ... X_n</math> and use this data to help us make an estimate.

Once we have our sample <math>X_1, X_2, ..., X_n</math>, we may seek an estimate of the value of <math>\theta</math> from that sample. MLE seeks the most likely value of the parameter <math>\theta</math> (i.e., we maximise the likelihood of the observed data set over all possible values of <math>\theta</math>). This is in contrast to seeking other estimators, such as an unbiased estimator of <math>\theta</math>, which may not necessarily yield the most likely value of <math>\theta</math> but which will yield a value that (on average) will neither tend to over-estimate nor under-estimate the true value of <math>\theta</math>.

To implement the MLE method mathematically, we define the likelihood:

<math>\mbox{lik}(\theta) = f_D(x_1,\dots,x_n \mid \theta)</math>

and maximise this function over all possible values of the parameter <math>\theta</math>. The value <math>\hat{\theta}</math> which maximises the likelihood is known as the maximum likelihood estimator (MLE) for <math>\theta</math>.

Notes

The likelihood is a function of <math>\theta</math> for fixed values of <math>x_1,x_2,\ldots,x_n</math>.
The maximum likelihood estimator may not be unique, or indeed may not even exist.

Properties

Functional invariance

If <math>\widehat{\theta}</math> is the maximum likelihood estimator (MLE) for <math>\theta</math>, then the MLE for <math>\alpha = g(\theta)</math> is <math>\widehat{\alpha} = g(\widehat{\theta})</math>. The function g need not be one-to-one. For detail, please refer to the proof of Theorem 7.2.10 of Statistical Inference by George Casella and Roger L. Berger.

Asymptotic behaviour

Maximum likelihood estimators achieve minimum variance (as given by the Cramer-Rao lower bound) in the limit as the sample size tends to infinity. When the MLE is unbiased, we may equivalently say that it has minimum mean squared error in the limit.

For independent observations, the maximum likelihood estimator often follows an asymptotic normal distribution.

Bias

The bias of maximum-likelihood estimators can be substantial. Consider a case where n tickets numbered from 1 to n are placed in a box and one is selected at random (see uniform distribution). If n is unknown, then the maximum-likelihood estimator of n is the value on the drawn ticket, even though the expectation is only <math>(n+1)/2</math>. In estimating the highest number n, we can only be certain that it is greater than or equal to the drawn ticket number.

External resources

A paper detailing the history of maximum likelihood, written by John Aldrich

External links

Wikipedia article on Maximum likelihood

Topics in phylogenetics
Relevant fields: phylogenetics \| computational phylogenetics \| molecular phylogeny \| cladistics
Basic concepts: synapomorphy \| phylogenetic tree \| phylogenetic network \| long branch attraction
Phylogeny inference methods: maximum parsimony \| maximum likelihood \| neighbour joining \| UPGMA

Maximum likelihood

Contents

Prerequisites

The philosophy of MLE

Notes

Properties

Functional invariance

Asymptotic behaviour

Bias

See also

External resources

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools