Machine Learning
Linear Classification
Probabilistic Generative Models

Probabilistic Generative Models

Generative models take a different approach to classification: they try to model how the data was generated for each class. Instead of just learning a boundary, they learn the distribution of the features for each class.


1. The MLE Landscape

In Maximum Likelihood Estimation (MLE), we can categorize algorithms based on how they model the likelihood L(θ;D)=P(D;θ)L(\theta; D) = P(D; \theta):

  • Discriminative P(yx)P(y \mid x): Directly models the mapping from inputs to outputs.
    • Linear: Gaussian noise \to Linear Regression.
    • Logistic: Bernoulli noise \to Logistic Regression.
  • Generative P(x,y)P(x, y): Models the joint probability. Both xx and yy are treated as random variables.
    • Example: Naive Bayes.

2. The Generative Approach

A generative model learns:

  1. Class-Conditional Densities: P(xy)P(x \mid y) (How does the data for class yy look?)
  2. Class Priors: P(y)P(y) (How common is class yy?)

To classify a new point xx, we use Bayes' Theorem to find the posterior probability P(yx)P(y \mid x):

P(yx)=P(xy)P(y)P(x)P(y \mid x) = \frac{P(x \mid y) P(y)}{P(x)}

Dependent Events and Decomposition

For dependent events xx and yy, the joint probability is: P(x,y)=P(xy)=P(xy)P(y)=P(yx)P(x)P(x, y) = P(x \cap y) = P(x \mid y)P(y) = P(y \mid x)P(x)

Generative ML assumes P(D;θ)=P(x,y;θ)P(D; \theta) = P(x, y; \theta), while Discriminative ML assumes P(D;θ)=P(yx;θ)P(D; \theta) = P(y \mid x; \theta).


3. Gaussian Class Densities

In models like Gaussian Discriminant Analysis (GDA), we assume classes follow a Gaussian distribution.

Example: Drawing Samples If we have a class with distribution P(x)N(2,1.5)P(x) \sim \mathcal{N}(2, 1.5), most values will fall within the range [21.5,2+1.5][2 - 1.5, 2 + 1.5] (the high-density region).

Linear (LDA)

Assumes both classes share the same covariance. The resulting boundary is a straight line.

Quadratic (QDA)

Allows each class to have its own covariance. The resulting boundary is a curve.

  • If all classes share the same covariance matrix Σ\Sigma, the decision boundary is linear (LDA).
  • If classes have different covariance matrices Σk\Sigma_k, the decision boundary is quadratic (QDA).

Class-Conditional Densities

Generative models learn the distribution of each class independently, p(x | Ck). Bayes' rule is then used to compute the posterior p(Ck | x) for classification.


Comparison: Generative vs. Discriminative

FeatureGenerative Models (e.g., GDA, Naive Bayes)Discriminative Models (e.g., Logistic Regression)
AssumptionP(D;θ)=P(x,y;θ)P(D; \theta) = P(x, y; \theta)P(D;θ)=P(yx;θ)P(D; \theta) = P(y \mid x; \theta)
Random VariablesBoth xx and yy are random variablesyy is a random variable, xx is not
Alternative Notation-P(y;x,θ)P(y; x, \theta)
GoalModel P(x,y)P(x, y)Model P(yx)P(y \mid x) directly
New SamplesCan generate new data points from P(xy)P(x \mid y)Cannot generate new data

Bayes Rule for Naive Bayes: For generative models, we often use P(xy)=P(yx)P(x)P(y)P(x \mid y) = \frac{P(y \mid x)P(x)}{P(y)} to transform the classification problem.