Multivariable linear regression models and neural network models share several similarities, particularly when it comes to their structure and underlying principles. Here are some key similarities:

Multivariable Linear Regression

A multivariable linear regression model is given by:

where yis the dependent variable, x1,x2,…,xn are the independent variables, and w0,w1,…,wn are the coefficients (weights) of the model.

Neural Network Model (Single Layer)

A simple neural network with one hidden layer (single-layer perceptron) can be represented as:

where y is the output, x1,x2,…,xn are the inputs, w0,w1,…,wn are the weights, and σ\sigma is an activation function.

=-=-=-=-=-=-=-=-

In multivariable linear regression, the model is represented as:

The goal is to find the weights w0,w1,…,wnw_0, w_1, \ldots, w_n that minimize the cost function, typically the mean squared error (MSE):

where:

  • mm is the number of training examples.
  • hw(x(i))h_w(x^{(i)}) is the hypothesis function, hw(x)=w0+w1x1+⋯+wnxnh_w(x) = w_0 + w_1x_1 + \cdots + w_nx_n.
  • y(i)y^{(i)} is the actual output for the ii-th training example.

Gradient Descent Algorithm

Gradient descent iteratively updates the weights to minimize the cost function. The update rule for each weight wjw_j is:

wj:=wj−α∂J(w)∂wjw_j := w_j – \alpha \frac{\partial J(w)}{\partial w_j}

where:

  • α\alpha is the learning rate, a small positive number that controls the step size of each update.
  • ∂J(w)∂wj\frac{\partial J(w)}{\partial w_j} is the partial derivative of the cost function with respect to wjw_j.

The partial derivative of the cost function with respect to wjw_j is:

∂J(w)∂wj=1m∑i=1m(hw(x(i))−y(i))xj(i)\frac{\partial J(w)}{\partial w_j} = \frac{1}{m} \sum_{i=1}^{m} (h_w(x^{(i)}) – y^{(i)}) x_j^{(i)}

Gradient Descent Steps

  1. Initialize Weights: Start with initial guesses for the weights, typically zero or small random values.
  2. Compute Predictions: Compute the predicted values hw(x)h_w(x) for all training examples.
  3. Compute Cost: Calculate the cost function J(w)J(w).
  4. Compute Gradients: Calculate the partial derivatives of the cost function with respect to each weight.
  5. Update Weights: Update the weights using the gradient descent update rule.
  6. Repeat: Repeat steps 2-5 until the cost function converges (i.e., changes very little between iterations) or a specified number of iterations is reached.

=-=-=-=-=-

In this example:

  • X is the design matrix with a column of ones for the intercept term and other columns for the features.
  • y is the vector of target values.
  • w is the vector of weights.
  • alpha is the learning rate.
  • num_iterations is the number of iterations for gradient descent.

The gradient_descent function iteratively updates the weights to minimize the cost function and returns the optimized weights and the history of the cost function values.

 

 

 

import numpy as np

 

# Function to compute the cost

#X: The matrix of input features (including a column of ones for the intercept).
#y: The vector of target values.
#w: The vector of weights (parameters).
#The cost function in linear regression, specifically the mean squared error (MSE), is often presented with a factor of 12\frac{1}{2}21​ in the formula for
#convenience  in mathematical derivations, particularly when applying gradient descent. A factor of 2 appears in the derivative, which is generally
#unnecessary and can be avoided by incorporating the 1221​ factor into the cost function. Using 1/2m in the cost function instead of 1/m​ does not
#change the optimization problem but simplifies the mathematical expressions involved in gradient descent.

def compute_cost(X, y, w):

    m = len(y)

    h = X.dot(w)   #This line computes the predicted values h by performing the dot product of the input matrix X and the weight vector w.

    cost = (1/(2*m)) * np.sum((h – y)**2)  #This line calculates the cost function, which is the mean squared error of the predictions. 

    return cost

 

# Function to perform gradient descent

def gradient_descent(X, y, w, alpha, num_iterations):

    m = len(y)

    cost_history = np.zeros(num_iterations)

    for i in range(num_iterations):

        h = X.dot(w)

        gradients = (1/m) * X.T.dot(h – y)

        w = w – alpha * gradients    #This line updates weights by subtracting product of learning rate alpha and gradients from current weights.

        cost_history[i] = compute_cost(X, y, w)

    return w, cost_history

 

# Example data

X = np.array([[1, 1], [1, 2], [1, 3], [1, 4], [1, 5]])

y = np.array([1, 2, 3, 4, 5])

 

# Initialize weights

w = np.zeros(X.shape[1])

 

# Set hyperparameters

alpha = 0.01

num_iterations = 1000

 

# Perform gradient descent

w, cost_history = gradient_descent(X, y, w, alpha, num_iterations)

 

print(“Weights:”, w)

print(“Cost history:”, cost_history)

 

Loading