Understanding Linear Regression: The Mathematical Foundation

January 15, 2024•2 min read

Machine LearningStatisticsMathematicsPython

Understanding Linear Regression: The Mathematical Foundation

Linear regression is one of the most fundamental algorithms in machine learning and statistics. In this post, we'll explore the mathematical foundations and derive the solution from first principles.

The Mathematical Model

Linear regression assumes a linear relationship between the input features and the target variable. For a single feature, the model can be expressed as:

$y = \beta_0 + \beta_1 x + \epsilon$

Where:

$y$ is the dependent variable (target)
$x$ is the independent variable (feature)
$\beta_0$ is the y-intercept
$\beta_1$ is the slope
$\epsilon$ is the error term

Cost Function

We use the Mean Squared Error (MSE) as our cost function:

$J(\beta_0, \beta_1) = \frac{1}{2m} \sum_{i=1}^{m} (h_\beta(x^{(i)}) - y^{(i)})^2$

Where $h_\beta(x) = \beta_0 + \beta_1 x$ is our hypothesis function.

Normal Equation

For the multivariate case, we can derive the closed-form solution using the normal equation:

$\boldsymbol{\beta} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}$

This gives us the optimal parameters directly without requiring iterative optimization.

Python Implementation

Here's a simple implementation from scratch:

import numpy as np
import matplotlib.pyplot as plt

class LinearRegression:
    def __init__(self):
        self.beta = None
    
    def fit(self, X, y):
        # Add bias term
        X_with_bias = np.column_stack([np.ones(X.shape[0]), X])
        
        # Normal equation
        self.beta = np.linalg.pinv(X_with_bias.T @ X_with_bias) @ X_with_bias.T @ y
    
    def predict(self, X):
        X_with_bias = np.column_stack([np.ones(X.shape[0]), X])
        return X_with_bias @ self.beta

# Example usage
X = np.random.randn(100, 1)
y = 2 * X.flatten() + 1 + 0.1 * np.random.randn(100)

model = LinearRegression()
model.fit(X, y)
predictions = model.predict(X)

Gradient Descent Alternative

We can also solve this using gradient descent. The gradients are:

$\frac{\partial J}{\partial \beta_0} = \frac{1}{m} \sum_{i=1}^{m} (h_\beta(x^{(i)}) - y^{(i)})$

$\frac{\partial J}{\partial \beta_1} = \frac{1}{m} \sum_{i=1}^{m} (h_\beta(x^{(i)}) - y^{(i)}) x^{(i)}$

Conclusion

Linear regression provides an excellent foundation for understanding more complex machine learning algorithms. The mathematical beauty lies in its simplicity and the fact that we can derive exact solutions.

In future posts, we'll explore regularization techniques like Ridge and Lasso regression, which add penalty terms to prevent overfitting.