Covariance Formula
Reference for covariance Cov(X,Y) = E[(X-μx)(Y-μy)].
Explains positive, negative, and zero covariance with examples for portfolio analysis and data science.
The Formula
Covariance measures the direction of the relationship between two variables. A positive value means they tend to increase together. A negative value means one increases as the other decreases.
Variables
| Symbol | Meaning |
|---|---|
| Cov(X, Y) | Covariance between variables X and Y |
| xᵢ, yᵢ | Individual data points |
| x̄, ȳ | Mean (average) of X and Y |
| n | Number of data points |
Example 1
Find covariance for X = {1, 2, 3} and Y = {2, 4, 5}
x̄ = (1+2+3)/3 = 2, ȳ = (2+4+5)/3 = 3.67
Σ(xᵢ - x̄)(yᵢ - ȳ) = (1-2)(2-3.67) + (2-2)(4-3.67) + (3-2)(5-3.67)
= (-1)(-1.67) + (0)(0.33) + (1)(1.33) = 1.67 + 0 + 1.33 = 3.0
Cov(X,Y) = 3.0 / (3-1) = 1.5 (positive — they increase together)
Interpreting Covariance
- Cov > 0: Variables tend to move in the same direction
- Cov < 0: Variables tend to move in opposite directions
- Cov ≈ 0: No clear linear relationship
When to Use It
Use covariance when:
- Building an investment portfolio (diversification analysis)
- Performing linear regression analysis
- Exploring relationships between variables in data science
- Calculating the correlation coefficient (which normalizes covariance)
Key Notes
- Covariance magnitude depends on measurement units and is hard to interpret alone — a covariance of 50 between height (cm) and weight (kg) is not comparable to one between height (m) and weight (kg); divide by both standard deviations to get Pearson's correlation r ∈ [−1, 1], which is scale-free
- Population covariance divides by n; sample covariance (this formula) divides by n−1 (Bessel's correction to correct for bias) — always check which form your software uses: NumPy's cov() defaults to ddof=1 (sample), but some packages default to n
- Cov(X, X) = Var(X) — the covariance of a variable with itself equals its own variance; the covariance matrix is symmetric with variances on the diagonal, which is why PCA (principal component analysis) operates on the covariance matrix
- Zero covariance does not mean independence — two variables can have Cov = 0 with a strong non-linear relationship (e.g., Y = X², which has zero covariance around X = 0); covariance only measures linear association