Normal (Gaussian) Distribution
The Normal distribution, often called the "bell curve," is the single most important probability distribution in statistics. It appears everywhere in nature and forms the backbone of machine learning, hypothesis testing, and inferential statistics.
Core Concepts
If a random variable follows a Normal distribution, we write:
Where:
- (mu) is the mean (which dictates the center of the bell).
- (sigma squared) is the variance (which dictates the spread). The square root, , is the standard deviation.
Probability Density Function (PDF)
Don't panic! The equation below looks terrifying, but it perfectly defines the elegant, smooth shape of the bell curve.
The Anatomy of the PDF (The "Why")
Why does the bell curve have and ? Every part of this formula serves a specific mathematical purpose:
- The Exponent (): This is the "engine." It creates the symmetric drop-off. As gets further from the mean, the square grows rapidly, and approaches zero. This is what creates the "bell."
- The Center (): By subtracting from , we shift the entire curve left or right so that its peak is exactly at the mean.
- The Spread (): By dividing by , we stretch or compress the curve horizontally. This is a basic Linear Transformation.
- The Normalizing Constant (): This is just a number. Its only job is to ensure that the Total Area under the curve is exactly 1.0. Without this, the height of the curve would be wrong, and it wouldn't be a valid probability distribution. The comes from the Gaussian Integral in multivariate calculus.
The Empirical Rule (68-95-99.7)
One of the most useful heuristics in statistics applies to all Normal distributions:
- of data falls within 1 standard deviation () of the mean.
- of data falls within 2 standard deviations () of the mean.
- of data falls within 3 standard deviations () of the mean.
Real-World Examples
The Normal distribution is everywhere because of the Central Limit Theorem (covered later).
- Human Heights: The heights of adult males in a country cluster around a mean and taper off symmetrically.
- Measurement Error: The tiny errors a machine makes when precisely measuring weights or lengths.
- Test Scores: Standardized tests like the SAT or GRE are explicitly designed to force scores into a normal distribution.
Interactive Visualization
Below is an interactive graph.
- Drag the Mean () to shift the entire curve left or right along the x-axis. The shape does not change.
- Drag the Standard Deviation () to see how variance impacts the curve. A smaller makes the curve tall and spiked (data is highly predictable). A larger makes the curve flat and wide (data is highly spread out).
Normal Distribution
Drag μ and σ to shift the curve and change its spread.
Test Your Knowledge
Example: Normal Curve Empirical Rule
A population of adult male heights is normally distributed with a mean inches and a standard deviation inches. What percentage of men are between 64 inches and 76 inches tall?
View Step-by-Step Solution
Notice that 64 inches is exactly (). Notice that 76 inches is exactly ().
According to the Empirical Rule (68-95-99.7), approximately 95% of normally distributed data falls within 2 standard deviations of the mean.