Linear algebra, a branch of mathematics dealing with vectors and the rules for their operations, has many applications in the real world. One such application is in the field of machine learning, particularly in linear regression, a statistical method used to model the relationship between a dependent variable and one or more independent variables.
In this blog post, we’ll dive into the basics of linear regression, highlight the linear algebra concepts involved, and demonstrate a Python implementation with sample data.
1. Concepts in Linear Regression
a. Vector: A list or column of numbers.
b. Matrix: A 2-dimensional array of numbers.
c. Dot Product: The sum of products of the corresponding entries of two sequences of numbers.
d. Transpose of a Matrix: Reflecting a matrix over its main diagonal.
e. Matrix Multiplication: Combining matrices to produce another matrix.
Linear regression essentially revolves around the equation:
Where:
is the dependent variable (what you’re trying to predict), is the matrix of independent variables, is the matrix of coefficients, and is the error term.
The goal is to find
2. Python Implementation using NumPy
Let’s consider a simple example where we’ll try to predict the price of houses based on their size (in square feet).
Sample Data
import numpy as np
# Sample Data
# Independent Variable (Size in sq.ft)
X = np.array([1000, 1500, 2000, 2500, 3000]).reshape(-1, 1)
# Dependent Variable (Price in $)
Y = np.array([150000, 225000, 290000, 370000, 440000])
# Add a column of ones for the intercept term
X = np.hstack([np.ones(X.shape), X])
# Step 1: Transposition
X_transposed = X.T
# Step 2: Matrix Multiplication
XtX = X_transposed.dot(X)
XtY = X_transposed.dot(Y)
# Step 3: Matrix Inversion
XtX_inverse = np.linalg.inv(XtX)
# Calculating beta_hat using the formula
beta_hat = XtX_inverse.dot(XtY)
print("Intercept:", beta_hat[0])
print("Coefficient for size:", beta_hat[1])
Intercept: 5000.0
Coefficient for size: 145.0
3. Linear Algebra Behind the Scenes
The linear regression formula we used,
- Transposition (
): Flips the matrix over its diagonal. This is required to match the dimensions for matrix multiplication. -
Matrix Multiplication (
and ): Helps compute the sum of the squares of the independent variable and the product of the independent and dependent variables, respectively. -
Matrix Inversion (
): Finds the matrix, when multiplied by , will yield the identity matrix. This inversion is central to solving for the coefficients.
Conclusion
Linear algebra, although a fundamental mathematical discipline, finds expansive applications in the modern world, particularly in the domain of machine learning. Linear regression, a seemingly simple algorithm, leverages these concepts to make predictions based on patterns in data.
As demonstrated, Python, along with the NumPy library, provides an intuitive platform to understand and implement these principles.