Maximum Likelihood Estimation (MLE)
Probability is about predicting data given a model. Statistics is about predicting the model (parameters) given the data. MLE is the primary tool for this reversal.
The Core Concept: The Likelihood Function
Imagine you have a coin. You don't know if it's fair. You flip it 10 times and get 8 Heads. What is the most "likely" probability of heads () for this coin?
In probability, we define the Probability Function , which tells us the chance of seeing data given parameter . In statistics, we flip this into the Likelihood Function :
The math is the same, but the perspective changes: the data is now fixed (the 8 heads you saw), and the parameter is the variable we are trying to find.
The Recipe for MLE
Finding the maximum of a function usually involves calculus. However, because most probability functions involve products (due to Independence), the derivatives get messy. We use a trick: the Log-Likelihood.
- Define the Likelihood: Write the joint probability of your data points.
- Take the Natural Log: . (Since is a monotonically increasing function, the maximum of the log is the same as the maximum of the original).
- Differentiate: Find .
- Solve for Zero: Set the derivative to zero and solve for .
Why "Maximum" Likelihood?
If we assume our coin has , the chance of getting 8/10 heads is quite low (~4%). If we assume , the chance is much higher (~30%). MLE says: "The best estimate for the world is the one that makes our observed reality most probable."
Real-World Connection: The Normal Distribution
If you take a set of measurements and assume they come from a Normal Distribution, the MLE for the mean () is simply the Sample Average:
This is why we use the average so often—it is mathematically the most likely center of a Gaussian world.
Test Your Knowledge
Example: MLE for a Bernoulli Trial
You flip a coin times and observe successes. Find the MLE for the probability of success .
View Step-by-Step Solution
- Likelihood:
- Log-Likelihood:
- Derivative:
- Solve: Set to 0:
The MLE for is simply the proportion of successes observed.