Machine Learning
Linear Classification
Perceptron

The Perceptron

🧠

The Perceptron is one of the oldest forms of a neural network (1958). It is a linear classifier that attempts to find a hyperplane that separates two classes of data points.


Core Concepts

If a random variable XX represents our input vector and YY our target label, the Perceptron learns a mapping:

h:RD{1,+1}h: \mathbb{R}^D \rightarrow \{-1, +1\}

Where:

  • θRD\theta \in \mathbb{R}^D is the Parameter Vector (determines the orientation of the hyperplane).
  • bRb \in \mathbb{R} is the Bias (determines the offset from the origin).

The Hypothesis Function

The final prediction is determined by passing a linear combination of inputs through a Heaviside Step Function:

h(x)=sign(θTx+b)h(x) = \text{sign}(\theta^T x + b)

The decision boundary is the set of points where θTx+b=0\theta^T x + b = 0, forming a linear hyperplane.


2. The Perceptron Learning Rule

The Perceptron uses an Online Learning approach, updating its weights for each misclassified sample (x(i),y(i))(x^{(i)}, y^{(i)}).

The Update Formula

Whenever h(x(i))y(i)h(x^{(i)}) \neq y^{(i)}:

θ(new)=θ(old)+ηy(i)x(i)\theta^{(new)} = \theta^{(old)} + \eta y^{(i)} x^{(i)} b(new)=b(old)+ηy(i)b^{(new)} = b^{(old)} + \eta y^{(i)}

Where η(0,1]\eta \in (0, 1] is the Learning Rate.


Interactive Visualization

👀

Visualizing the Update: Imagine a 2D plane where points are red (+1) and blue (-1). Every time the Perceptron misclassifies a point, it "tilts" the decision boundary towards that point to correct the error.

Perceptron Learning (Step-by-Step)

Watch the hyperplane shift to correct its mistakes. Each 'Train' click identifies a misclassified point and rotates the boundary.

Current weights: w:[0.20, 0.80]bias: -0.50
Misclassification detected! The model sees the Blue point at (2, 1) and will update the boundary.

Test Your Knowledge

Example: Manual Perceptron Update

Initial weights θ=[0,0]\theta = [0, 0], bias b=0b = 0, and learning rate η=1\eta = 1. You receive a training point x=[1,2]x = [1, 2] with label y=+1y = +1. What are the updated weights and bias?

View Step-by-Step Solution
  1. Calculate current prediction: h(x)=sign(θTx+b)=sign([0,0][1,2]+0)=sign(0)=+1h(x) = \text{sign}(\theta^T x + b) = \text{sign}([0, 0] \cdot [1, 2] + 0) = \text{sign}(0) = +1 (Assuming 0 is misclassified).

  2. Assume misclassification: If x=[1,2],y=1x = [1, 2], y = -1, but our model predicted +1+1:

    θnew=θold+ηyx=[0,0]+1(1)[1,2]=[1,2]\theta_{new} = \theta_{old} + \eta y x = [0, 0] + 1(-1)[1, 2] = [-1, -2] bnew=bold+ηy=0+1(1)=1b_{new} = b_{old} + \eta y = 0 + 1(-1) = -1

    The new decision boundary is x12x21=0-x_1 - 2x_2 - 1 = 0.


3. Convergence Theorem

A key mathematical property of the Perceptron is its Convergence Theorem: If the data is Linearly Separable (i.e., a hyperplane exists that can perfectly separate the two classes), the Perceptron algorithm is guaranteed to converge to a solution in a finite number of steps.


4. Limitations: The XOR Problem

The Perceptron is a linear model. It can only learn patterns that can be separated by a straight line or hyperplane.

A famous failure of the Perceptron is the XOR (Exclusive OR) Problem, which is not linearly separable. This realization in the 1960s led to the "AI Winter" until multi-layer neural networks (which could solve non-linear problems) were developed.

x1x_1x2x_2yy (XOR)
000
011
101
110
🚫

Non-Separable Data: If the data is NOT linearly separable, the Perceptron will never stop; it will keep oscillating back and forth as it tries to fix misclassifications.