Christoph at 04:49, 13 September 2006

2006-09-13T04:49:47Z

Christoph: Started article

2005-12-29T23:01:42Z

Started article

New page

'''Maximum likelihood estimation (MLE)''' is a popular [[statistics|statistical]] method used to make inferences about parameters of the underlying [[probability distribution]] of a given [[data set]].

The method was pioneered by [[geneticist]] and [[statistician]] [[Ronald Fisher|Sir Ronald A. Fisher]] between 1912 and 1922 (see external resources below for more information on the history of MLE).

== Prerequisites ==

The following discussion assumes that the reader is familiar with basic notions in [[probability theory]] such as [[probability distribution]]s, [[probability density function]]s, [[random variable]]s and [[expected value|expectation]]. It also assumes s/he is familiar with standard basic techniques of maximising [[continuous function|continuous]] [[real number|real-valued]] [[function (mathematics)|function]]s, such as using [[differentiation]] to find a function's [[maxima and minima|maxima]].

==The philosophy of MLE==

Given a probability distribution <math>D</math>, associated with either a known [[probability density function]] (continuous distribution) or a known [[probability mass function]] (discrete distribution), denoted as <math>f_D</math>, and distributional parameter <math>\theta</math>, we may draw a sample <math>X_1, X_2, ..., X_n</math> of <math>n</math> values from this distribution and then using <math>f_D</math> we may compute the probability associated with our observed data:

:<math>\mathbb{P}(x_1,x_2,\dots,x_n) = f_D(x_1,\dots,x_n \mid \theta)</math>

However, it may be that we don't know the value of the parameter <math>\theta</math> despite knowing (or believing) that our data comes from the distribution <math>D</math>. How should we estimate <math>\theta</math>? It is a sensible idea to draw a sample of <math>n</math> values <math>X_1, X_2, ... X_n</math> and use this data to help us make an estimate.

Once we have our sample <math>X_1, X_2, ..., X_n</math>, we may seek an estimate of the value of <math>\theta</math> from that sample. MLE seeks the most likely value of the parameter <math>\theta</math> (i.e., we maximise the ''likelihood'' of the observed data set over all possible values of <math>\theta</math>). This is in contrast to seeking other estimators, such as an [[unbiased estimator]] of <math>\theta</math>, which may not necessarily yield the most likely value of <math>\theta</math> but which will yield a value that (on average) will neither tend to over-estimate nor under-estimate the true value of <math>\theta</math>.

To implement the MLE method mathematically, we define the <i>likelihood</i>:

:<math>\mbox{lik}(\theta) = f_D(x_1,\dots,x_n \mid \theta)</math>

and maximise this [[function (mathematics)|function]] over all possible values of the parameter <math>\theta</math>. The value <math>\hat{\theta}</math> which maximises the likelihood is known as the '''maximum likelihood estimator''' (MLE) for <math>\theta</math>.

=== Notes ===
*The likelihood is a function of <math>\theta</math> for fixed values of <math>x_1,x_2,\ldots,x_n</math>.
*The maximum likelihood estimator may not be unique, or indeed may not even exist.

== Properties ==

=== Functional invariance ===
If <math>\widehat{\theta}</math> is the maximum likelihood estimator (MLE) for <math>\theta</math>, then the MLE for <math>\alpha = g(\theta)</math> is <math>\widehat{\alpha} = g(\widehat{\theta})</math>. The function ''g'' need not be one-to-one. For detail, please refer to the proof of Theorem 7.2.10 of ''Statistical Inference'' by George Casella and Roger L. Berger.

=== Asymptotic behaviour ===
Maximum likelihood estimators achieve minimum variance (as given by the [[Cramer-Rao lower bound]]) in the limit as the sample size tends to infinity. When the MLE is unbiased, we may equivalently say that it has minimum [[mean squared error]] in the limit.

For independent observations, the maximum likelihood estimator often follows an asymptotic [[normal distribution]].

=== Bias ===
The [[unbiased estimator|bias]] of maximum-likelihood estimators can be substantial. Consider a case where ''n'' tickets numbered from 1 to ''n'' are placed in a box and one is selected at random (''see [[uniform distribution]]''). If ''n'' is unknown, then the maximum-likelihood estimator of ''n'' is the value on the drawn ticket, even though the expectation is only <math>(n+1)/2</math>. In estimating the highest number ''n'', we can only be certain that it is greater than or equal to the drawn ticket number.

== See also ==
* The [[mean squared error]] is a measure of how 'good' an estimator of a distributional parameter is (be it the maximum likelihood estimator or some other estimator).

* The article on the [[Rao-Blackwell theorem]] for a discussion on finding the best possible unbiased estimator (in the sense of having minimal [[mean squared error]]) by a process called Rao-Blackwellisation. The MLE is often a good starting place for the process.

* The reader may be intrigued to learn that the MLE (if it exists) will always be a function of a [[sufficient statistic]] for the parameter in question.

== External resources ==
* [http://projecteuclid.org/Dienst/UI/1.0/Summarize/euclid.ss/1030037906 A paper detailing the history of maximum likelihood, written by John Aldrich]

== External links ==
* [http://en.wikipedia.org/wiki/Maximum_likelihood Wikipedia article on '''Maximum likelihood''']

[[Category:Academic Research]]
[[Category:Statistics]]

@@ Line 53: / Line 53: @@
 * [http://en.wikipedia.org/wiki/Maximum_likelihood Wikipedia article on '''Maximum likelihood''']
-[[Category:Academic Research]]
+{{Phylogenetics}}
 [[Category:Statistics]]

Maximum likelihood - Revision history

Christoph at 04:49, 13 September 2006

Christoph: Started article