Probability
4. Advanced Distributions
Correlation

Correlation and Association

⚠️

Correlation does not imply causation. Two variables can move together perfectly without one causing the other.

Correlation Coefficient

The Pearson correlation coefficient (rr) measures the linear relationship between two variables, ranging from -1 to 1.

Rank Correlation

Spearman's Rank Correlation evaluates how well the relationship between two variables can be described using a monotonic function, without requiring it to be perfectly linear.

Homoscedasticity vs. Heteroscedasticity

  • Homoscedasticity: The variance of errors is constant across all levels of the independent variable.
  • Heteroscedasticity: The variance of errors changes (often forming a cone shape in a scatter plot), which violates many linear regression assumptions.

Interactive Correlation Explorer

Correlation scatter plot

Adjust r and see how the cloud tightens/loosens. Toggle heteroscedasticity for cone-shaped variance.

Correlation (r)
0.80
-1 is perfect negative, +1 is perfect positive.
Heteroscedasticity
Off
Variance increases with |x| (cone shape).

Test Your Knowledge

Example: Covariance and Pearson r

Two variables XX and YY have a Covariance of Cov(X,Y)=15Cov(X,Y) = 15. The standard deviation of XX is σx=4\sigma_x = 4 and the standard deviation of YY is σy=5\sigma_y = 5. Calculate the Pearson correlation coefficient rr.

View Step-by-Step Solution

The formula for the Pearson correlation coefficient is: r=Cov(X,Y)σxσyr = \frac{Cov(X,Y)}{\sigma_x \sigma_y}

r=154×5=1520=0.75r = \frac{15}{4 \times 5} = \frac{15}{20} = 0.75

The correlation is 0.75, indicating a strong positive linear relationship between XX and YY.