Log-Linear Models (Softmax Regression)
Log-linear models are a generalization of logistic regression to handle multiclass classification. Instead of a single sigmoid, we use the Softmax function.
1. The Softmax Function
Given possible classes, the probability that an input belongs to class is:
Where is the linear predictor for class .
Softmax Probabilities and Temperature
The Softmax function converts raw scores (logits) into a probability distribution. Temperature (T) controls how 'sharp' or 'smooth' the distribution is.
Temperature (T)
1.0
High T = Smooth, Low T = Sharp
2. Training: Cross-Entropy Loss
The multiclass cross-entropy loss function is:
Where:
- is the indicator variable (1 if the -th sample belongs to class , 0 otherwise).
- is the predicted probability.
This is the standard loss function for most modern neural network classifiers.
3. Key Properties
- Sum to One: .
- Probabilities: .
- Decision Rule: We assign the sample to the class with the highest probability.
Softmax and Temperature: Sometimes we divide the linear predictors by a temperature parameter : .
- High makes the distribution more uniform (higher uncertainty).
- Low makes the distribution more peaked (higher confidence).