Multivariable Linear Regression Using Gradient Descent is similar to Neural Network Model (Single Layer)

Multivariable linear regression models and neural network models share several similarities, particularly when it comes to their structure and underlying principles. Here are some key similarities:

Multivariable Linear Regression

A multivariable linear regression model is given by:

where is the dependent variable, are the independent variables, and are the coefficients (weights) of the model.

Neural Network Model (Single Layer)

A simple neural network with one hidden layer (single-layer perceptron) can be represented as:

where is the output, are the inputs, are the weights, and $σ\sigma$ is an activation function.

=-=-=-=-=-=-=-=-

In multivariable linear regression, the model is represented as:

The goal is to find the weights $w0,w1,…,wnw_0, w_1, \ldots, w_n$ that minimize the cost function, typically the mean squared error (MSE):

where:

$m$ is the number of training examples.
$h_w(x^{(i)})$ is the hypothesis function, $hw(x)=w0+w1x1+⋯+wnxnh_w(x) = w_0 + w_1x_1 + \cdots + w_nx_n$ .
$y^{(i)}$ is the actual output for the $i$ -th training example.

Gradient Descent Algorithm

Gradient descent iteratively updates the weights to minimize the cost function. The update rule for each weight $w_j$ is:

$wj:=wj−α∂J(w)∂wjw_j := w_j – \alpha \frac{\partial J(w)}{\partial w_j}$

where:

$α\alpha$ is the learning rate, a small positive number that controls the step size of each update.
$∂J(w)∂wj\frac{\partial J(w)}{\partial w_j}$ is the partial derivative of the cost function with respect to $w_j$ .

The partial derivative of the cost function with respect to $w_j$ is:

$∂J(w)∂wj=1m∑i=1m(hw(x(i))−y(i))xj(i)\frac{\partial J(w)}{\partial w_j} = \frac{1}{m} \sum_{i=1}^{m} (h_w(x^{(i)}) – y^{(i)}) x_j^{(i)}$

Gradient Descent Steps

Initialize Weights: Start with initial guesses for the weights, typically zero or small random values.
Compute Predictions: Compute the predicted values $h_w(x)$ for all training examples.
Compute Cost: Calculate the cost function $J (w)$ .
Compute Gradients: Calculate the partial derivatives of the cost function with respect to each weight.
Update Weights: Update the weights using the gradient descent update rule.
Repeat: Repeat steps 2-5 until the cost function converges (i.e., changes very little between iterations) or a specified number of iterations is reached.

=-=-=-=-=-

In this example:

X is the design matrix with a column of ones for the intercept term and other columns for the features.
y is the vector of target values.
w is the vector of weights.
alpha is the learning rate.
num_iterations is the number of iterations for gradient descent.

The gradient_descent function iteratively updates the weights to minimize the cost function and returns the optimized weights and the history of the cost function values.

import numpy as np

# Function to compute the cost

#X: The matrix of input features (including a column of ones for the intercept).
#y: The vector of target values.
#w: The vector of weights (parameters).
#The cost function in linear regression, specifically the mean squared error (MSE), is often presented with a factor of 12\frac{1}{2}21 in the formula for
#convenience in mathematical derivations, particularly when applying gradient descent. A factor of 2 appears in the derivative, which is generally
#unnecessary and can be avoided by incorporating the 1221 factor into the cost function. Using 1/2m in the cost function instead of 1/m does not
#change the optimization problem but simplifies the mathematical expressions involved in gradient descent.

def compute_cost(X, y, w):

m = len(y)

h = X.dot(w) #This line computes the predicted values h by performing the dot product of the input matrix X and the weight vector w.

cost = (1/(2*m)) * np.sum((h – y)**2) #This line calculates the cost function, which is the mean squared error of the predictions.

return cost

# Function to perform gradient descent

def gradient_descent(X, y, w, alpha, num_iterations):

m = len(y)

cost_history = np.zeros(num_iterations)

for i in range(num_iterations):

h = X.dot(w)

gradients = (1/m) * X.T.dot(h – y)

w = w – alpha * gradients #This line updates weights by subtracting product of learning rate alpha and gradients from current weights.

cost_history[i] = compute_cost(X, y, w)

return w, cost_history

# Example data

X = np.array([[1, 1], [1, 2], [1, 3], [1, 4], [1, 5]])

y = np.array([1, 2, 3, 4, 5])

# Initialize weights

w = np.zeros(X.shape[1])

# Set hyperparameters

alpha = 0.01

num_iterations = 1000

# Perform gradient descent

w, cost_history = gradient_descent(X, y, w, alpha, num_iterations)

print(“Weights:”, w)

print(“Cost history:”, cost_history)