# Maximum likelihood

Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution of a given data set.

The method was pioneered by geneticist and statistician Sir Ronald A. Fisher between 1912 and 1922 (see external resources below for more information on the history of MLE).

## Prerequisites

The following discussion assumes that the reader is familiar with basic notions in probability theory such as probability distributions, probability density functions, random variables and expectation. It also assumes s/he is familiar with standard basic techniques of maximising continuous real-valued functions, such as using differentiation to find a function's maxima.

## The philosophy of MLE

Given a probability distribution $D$, associated with either a known probability density function (continuous distribution) or a known probability mass function (discrete distribution), denoted as $f_D$, and distributional parameter $\theta$, we may draw a sample $X_1, X_2, ..., X_n$ of $n$ values from this distribution and then using $f_D$ we may compute the probability associated with our observed data:

$\mathbb{P}(x_1,x_2,\dots,x_n) = f_D(x_1,\dots,x_n \mid \theta)$

However, it may be that we don't know the value of the parameter $\theta$ despite knowing (or believing) that our data comes from the distribution $D$. How should we estimate $\theta$? It is a sensible idea to draw a sample of $n$ values $X_1, X_2, ... X_n$ and use this data to help us make an estimate.

Once we have our sample $X_1, X_2, ..., X_n$, we may seek an estimate of the value of $\theta$ from that sample. MLE seeks the most likely value of the parameter $\theta$ (i.e., we maximise the likelihood of the observed data set over all possible values of $\theta$). This is in contrast to seeking other estimators, such as an unbiased estimator of $\theta$, which may not necessarily yield the most likely value of $\theta$ but which will yield a value that (on average) will neither tend to over-estimate nor under-estimate the true value of $\theta$.

To implement the MLE method mathematically, we define the likelihood:

$\mbox{lik}(\theta) = f_D(x_1,\dots,x_n \mid \theta)$

and maximise this function over all possible values of the parameter $\theta$. The value $\hat{\theta}$ which maximises the likelihood is known as the maximum likelihood estimator (MLE) for $\theta$.