Probability
5. Inequalities & Limit Theorems
Central Limit Theorem

The Central Limit Theorem (CLT)

🌟

The Central Limit Theorem is arguably the most magical, counter-intuitive, and fundamentally important theorem in all of statistics. It is the reason why the Normal (Gaussian) distribution appears everywhere in nature and science.

Core Concept

The theorem states:

If you take sufficiently large random samples from any population with a finite variance, the distribution of the sample means will approximate a Normal Distribution.

🤯

Read that carefully: The original population can be shaped like a square (Uniform), it can be heavily skewed to the left, it can be bimodal (two peaks), or it can look like complete random noise.

It doesn't matter! If you take enough samples, calculate the average of each sample, and plot those averages... a perfect Bell Curve will emerge from the chaos.

The Mathematical Definition

Let X1,X2,,XnX_1, X_2, \dots, X_n be a random sample of size nn from a population with a mean μ\mu and variance σ2\sigma^2.

Let Xˉ\bar{X} be the sample mean. As nn approaches infinity, the distribution of Xˉ\bar{X} approaches a Normal distribution:

XˉN(μ,σ2n)\bar{X} \sim \mathcal{N}\left(\mu, \frac{\sigma^2}{n}\right)

This means:

  1. The mean of your sample means will exactly equal the true population mean (μ\mu).
  2. The variance of your sample means shrinks as your sample size (nn) gets bigger (σ2n\frac{\sigma^2}{n}). This is why larger sample sizes give you more accurate estimates!

Why Does This Matter?

  1. A/B Testing: It allows us to compare the conversion rates of two website designs using Normal distribution math, even though clicks are discrete binary events (not continuous curves).
  2. Polling & Elections: We can ask just 1,000 people who they are voting for and use the CLT to calculate a highly accurate margin of error for a population of 300 million.

Interactive Visualization: Seeing the Magic

In the simulation below, we are drawing random numbers from a Uniform Distribution (a perfectly flat, rectangular distribution).

  1. Set Sample Size (nn) to 1. You will see the raw, flat distribution.
  2. Now, increase the Sample Size to 2. You are now drawing two numbers, adding them, and plotting their average. The flat rectangle instantly turns into a triangle!
  3. Now, drag the Sample Size to 10 or higher. From a flat rectangle, a perfect Bell Curve emerges. This is the Central Limit Theorem in action.

Central Limit Theorem

Sampling from Uniform(0,1): as n increases, the distribution of means becomes normal-like.

Sample Size (n)
2
Samples Drawn
100

Test Your Knowledge

Example: The Power of the CLT

The weight of apples in an orchard is heavily right-skewed with a mean of 150g and a standard deviation of 40g. If you randomly sample 64 apples and take the average weight, what is the probability the sample average is greater than 160g?

View Step-by-Step Solution

Because n=6430n = 64 \ge 30, the Central Limit Theorem applies. The distribution of the sample mean xˉ\bar{x} will be approximately Normal.

  • Mean of xˉ=μ=150\bar{x} = \mu = 150
  • Standard Error (SE) =σn=4064=408=5= \frac{\sigma}{\sqrt{n}} = \frac{40}{\sqrt{64}} = \frac{40}{8} = 5

Now, calculate the Z-score for xˉ=160\bar{x} = 160: Z=1601505=105=2.0Z = \frac{160 - 150}{5} = \frac{10}{5} = 2.0

Using the Empirical Rule (or Z-table), the probability of Z>2.0Z > 2.0 is about 2.5%. Even though individual apples are highly skewed, the average of 64 apples follows a strict Normal curve!