Ad Space — Top Banner

Regression Formula (Least Squares)

Learn the least squares regression formula for finding the line of best fit.
Includes slope and intercept derivation with examples.

The Formula

ŷ = a + bx

b = [n × Σ(xy) - Σx × Σy] / [n × Σ(x²) - (Σx)²]

a = (Σy - b × Σx) / n

The least squares method minimizes the sum of squared differences between observed values and predicted values. It produces the unique line that best fits the data.

Variables

SymbolMeaning
ŷPredicted value of the dependent variable
aY-intercept of the regression line
bSlope of the regression line
nNumber of data points
Σ(xy)Sum of the products of each x and y pair
Σx, ΣySum of all x values, sum of all y values
Σ(x²)Sum of each x value squared

Example 1

Data: (1, 2), (2, 3), (3, 5), (4, 4), (5, 6). Find the regression line.

n = 5, Σx = 15, Σy = 20, Σ(xy) = 1×2 + 2×3 + 3×5 + 4×4 + 5×6 = 69

Σ(x²) = 1 + 4 + 9 + 16 + 25 = 55

b = (5×69 - 15×20) / (5×55 - 15²) = (345 - 300) / (275 - 225) = 45 / 50 = 0.9

a = (20 - 0.9×15) / 5 = (20 - 13.5) / 5 = 6.5 / 5 = 1.3

ŷ = 1.3 + 0.9x (for each unit increase in x, y increases by 0.9)

Example 2

Using the regression line ŷ = 1.3 + 0.9x, predict y when x = 7.

ŷ = 1.3 + 0.9 × 7

ŷ = 1.3 + 6.3

ŷ = 7.6

When to Use It

Use the least squares regression formula when:

  • Finding a linear trend in a data set
  • Predicting future values based on historical data
  • Quantifying the relationship between two variables
  • Analyzing experimental data in science and business research

Key Notes

  • Simple linear regression: ŷ = b₀ + b₁x: b₁ = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)² (slope); b₀ = ȳ − b₁x̄ (intercept). The line minimizes the sum of squared residuals (ordinary least squares). The regression line always passes through (x̄, ȳ).
  • R² — coefficient of determination: R² = 1 − SS_res/SS_tot; the proportion of variance in y explained by x. R² = 0: the model explains nothing (predicts the mean for all x). R² = 1: perfect fit. R² is the square of the Pearson correlation coefficient r for simple linear regression.
  • Correlation vs causation: A high R² shows association, not causation. Confounding variables can create spurious correlations. Regression quantifies predictive relationships; causality requires controlled experiments or causal inference methods (instrumental variables, difference-in-differences).
  • Multiple regression: ŷ = b₀ + b₁x₁ + b₂x₂ + …: Each coefficient b_j represents the change in ŷ for a one-unit change in xⱼ holding all other variables constant. Multicollinearity (high correlation among predictors) makes individual coefficients unstable without changing overall fit.
  • Applications: Regression is used in economics (wage and price equations), medicine (dose-response relationships, risk factor analysis), engineering (calibration curves, process optimization), finance (factor models for asset returns), and machine learning (linear regression as a baseline model and in regularized forms like LASSO and Ridge).

Ad Space — Bottom Banner

Embed This Calculator

Copy the code below and paste it into your website or blog.
The calculator will work directly on your page.