Bias-Variance Decomposition
Every machine learning model's error can be decomposed into three fundamentally different components. Understanding this decomposition is crucial for diagnosing whether a model is underfitting, overfitting, or simply limited by the quality of the data.
1. The Decomposition Formula
For any model , the expected test error at a point can be written as:
Where:
- : Error from erroneous assumptions in the learning algorithm. High bias leads to Underfitting.
- : Error from sensitivity to small fluctuations in the training set. High variance leads to Overfitting.
- : Irreducible Error (Noise). This is the minimum possible error that no model can overcome (e.g., due to measurement errors).
2. The Tradeoff
There is a natural tension between bias and variance:
- Simple Models (e.g., Linear Regression) usually have High Bias and Low Variance. They are consistent but might miss the true complexity.
- Complex Models (e.g., Deep Neural Networks) usually have Low Bias and High Variance. They can fit any shape but are very sensitive to the specific training data used.
The goal is to find the "Sweet Spot" where the total error is minimized.
The Bias-Variance Tradeoff
Total expected test error = (Bias)^2 + Variance + Irreducible Noise. As complexity increases, bias decreases but variance increases. The goal is to find the 'Sweet Spot' at the minimum of the Total Error curve.
3. Diagnosing the Error
| Scenario | Bias | Variance | Likely Issue | Solution |
|---|---|---|---|---|
| Train Error: 15% Test Error: 16% | High | Low | Underfitting | Add features, increase complexity |
| Train Error: 1% Test Error: 12% | Low | High | Overfitting | Regularization, more data |
| Train Error: 14% Test Error: 30% | High | High | Very Poor Model | Change algorithm/approach |
The "Modern" Exception: Recent research in deep learning has observed a phenomenon called "Double Descent," where increasing model complexity beyond the interpolation threshold can actually lead to a further decrease in test error, challenging the traditional U-shaped bias-variance curve.