Machine Learning
Linear Regression
Polynomial Regression

Polynomial Regression

Linear models assume that the target variable is a weighted sum of input features. However, real-world data often exhibits non-linear relationships that a straight line cannot capture.


1. Motivation: Non-Linear Trends

Consider predicting the price of a house based on its area. While price generally increases with area, the rate of increase might accelerate or decelerate. A simple linear hypothesis h(x)=θ0+θ1xh(x) = \theta_0 + \theta_1 x would result in high error.

By adding polynomial terms, we can create a model that captures these curves.


2. Feature Expansion

The key insight of Polynomial Regression is that we can transform our input feature xx into a set of higher-degree features. For a polynomial of degree dd, the hypothesis becomes:

hθ(x)=θ0+θ1x+θ2x2++θdxdh_{\boldsymbol{\theta}}(x) = \theta_0 + \theta_1 x + \theta_2 x^2 + \dots + \theta_d x^d

Linear in Parameters

Even though the hypothesis is non-linear with respect to the input feature xx, it remains linear with respect to the parameters θ\boldsymbol{\theta}. This means we can still use all the optimization tools from linear regression (MLE, Gradient Descent, Normal Equation).

We simply define a new feature vector: x=[1,x,x2,,xd]T\mathbf{x}' = [1, x, x^2, \dots, x^d]^T


3. Expressive Power and Model Capacity

As we increase the degree dd, the model's Expressive Power increases. It becomes more flexible and can fit more complex shapes. However, there is a fundamental trade-off:

  • Low Degree (d=1d=1): High bias, leads to Underfitting.
  • Optimal Degree: Captures the true underlying pattern.
  • High Degree (d1d \gg 1): High variance, leads to Overfitting (the model follows the noise).

Polynomial Regression & Expressive Power

Vary the degree of the polynomial to see how the model's 'capacity' changes.

Underfitting (High Bias)

The straight line is too rigid. It misses the inherent curvature of the data.


4. Interaction Terms

In addition to powers of a single feature, we can also include interaction terms between different features x1x_1 and x2x_2:

hθ(x)=θ0+θ1x1+θ2x2+θ3x1x2h_{\boldsymbol{\theta}}(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_1 x_2

Interaction terms allow the model to capture dependencies where the effect of x1x_1 on the target depends on the value of x2x_2.

🧠

Domain Knowledge: The choice of polynomial features and interaction terms should ideally be guided by your understanding of the problem domain. Adding features indiscriminately can lead to the "Curse of Dimensionality."