Model Selection
Model selection is the process of choosing the best model from a set of candidate models. This involves balancing model complexity to achieve the best generalization on new data.
1. The Validation Set
We cannot use the Test Set to choose our model, as that would "leak" information and lead to over-optimistic results. Instead, we split the training data further:
- Training Set: Used to fit the parameters of the model.
- Validation Set: Used to tune hyperparameters and choose the best model architecture.
- Test Set: Used ONLY once at the very end to estimate real-world performance.
2. Cross-Validation
When data is scarce, we use k-Fold Cross-Validation:
- Split training data into "folds".
- Train times, each time using folds for training and 1 fold for validation.
- Average the validation performance across all runs.
3. Information Criteria
Sometimes we prefer models that are simpler, even if they have slightly higher training error.
- AIC (Akaike Information Criterion): Rewards goodness of fit but penalizes the number of parameters.
- BIC (Bayesian Information Criterion): Similar to AIC but with a stronger penalty for the number of parameters.
4. Visualizing the Selection Curve
The optimal model is located at the point where the validation error is at its minimum.
The Model Selection Curve
As model complexity increases, training error decreases. However, test error (generalization error) initially decreases then starts to increase once overfitting begins.
Early Stopping: In iterative algorithms like Neural Networks, we can monitor the validation error during training and stop as soon as it begins to rise, effectively "selecting" the best version of the model automatically.