Understanding Linear Regression: The Mathematical Foundation

2 min read

Understanding Linear Regression: The Mathematical Foundation

Linear regression is one of the most fundamental algorithms in machine learning and statistics. In this post, we'll explore the mathematical foundations and derive the solution from first principles.

The Mathematical Model

Linear regression assumes a linear relationship between the input features and the target variable. For a single feature, the model can be expressed as:

y=β0+β1x+ϵy = \beta_0 + \beta_1 x + \epsilon

Where:

  • yy is the dependent variable (target)
  • xx is the independent variable (feature)
  • β0\beta_0 is the y-intercept
  • β1\beta_1 is the slope
  • ϵ\epsilon is the error term

Cost Function

We use the Mean Squared Error (MSE) as our cost function:

J(β0,β1)=12mi=1m(hβ(x(i))y(i))2J(\beta_0, \beta_1) = \frac{1}{2m} \sum_{i=1}^{m} (h_\beta(x^{(i)}) - y^{(i)})^2

Where hβ(x)=β0+β1xh_\beta(x) = \beta_0 + \beta_1 x is our hypothesis function.

Normal Equation

For the multivariate case, we can derive the closed-form solution using the normal equation:

β=(XTX)1XTy\boldsymbol{\beta} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}

This gives us the optimal parameters directly without requiring iterative optimization.

Python Implementation

Here's a simple implementation from scratch:

import numpy as np import matplotlib.pyplot as plt class LinearRegression: def __init__(self): self.beta = None def fit(self, X, y): # Add bias term X_with_bias = np.column_stack([np.ones(X.shape[0]), X]) # Normal equation self.beta = np.linalg.pinv(X_with_bias.T @ X_with_bias) @ X_with_bias.T @ y def predict(self, X): X_with_bias = np.column_stack([np.ones(X.shape[0]), X]) return X_with_bias @ self.beta # Example usage X = np.random.randn(100, 1) y = 2 * X.flatten() + 1 + 0.1 * np.random.randn(100) model = LinearRegression() model.fit(X, y) predictions = model.predict(X)

Gradient Descent Alternative

We can also solve this using gradient descent. The gradients are:

Jβ0=1mi=1m(hβ(x(i))y(i))\frac{\partial J}{\partial \beta_0} = \frac{1}{m} \sum_{i=1}^{m} (h_\beta(x^{(i)}) - y^{(i)})

Jβ1=1mi=1m(hβ(x(i))y(i))x(i)\frac{\partial J}{\partial \beta_1} = \frac{1}{m} \sum_{i=1}^{m} (h_\beta(x^{(i)}) - y^{(i)}) x^{(i)}

Conclusion

Linear regression provides an excellent foundation for understanding more complex machine learning algorithms. The mathematical beauty lies in its simplicity and the fact that we can derive exact solutions.

In future posts, we'll explore regularization techniques like Ridge and Lasso regression, which add penalty terms to prevent overfitting.