Neural Network Calibration Demo

please provide an example of a neural network with 2 input nodes 2 hidden layer nodes and one output node. Calibrate it with inputs 2,3 and output 4 twice. using ReLU (Rectified Linear Unit) show calibration calculations and all forward and backward propagation formulas and calculations and show how output gets closer to 4 and how loss and Gradient Calculations are performed during each calibration.

To demonstrate a simple neural network with 2 input nodes, 2 hidden layer nodes, and 1 output node, we will go through the process of forward and backward propagation using ReLU activation, and calibration using the inputs (2, 3) and the target output 4. We will illustrate how the network learns by adjusting weights through gradient descent over two iterations.

Network Structure

Input Layer: 2 nodes
Hidden Layer: 2 nodes with ReLU activation
Output Layer: 1 node (linear activation)

Notations

Inputs: $x_1, x_2$
Weights between Input and Hidden Layer: $w_{11}, w_{12}, w_{21}, w_{22}$
Weights between Hidden and Output Layer: $w_{1}, w_{2}$
Biases for Hidden Layer: $b_1, b_2$
Bias for Output Layer: $b_o$
Learning Rate: $\eta$

Initial Parameters

Let's initialize the weights and biases arbitrarily (small random values for simplicity):

$w_{11} = 0.1, w_{12} = 0.2, w_{21} = 0.3, w_{22} = 0.4$
$w_1 = 0.5, w_2 = 0.6$
$b_1 = 0.1, b_2 = 0.1, b_o = 0.1$
$\eta = 0.01$ (learning rate)

Forward Propagation

Hidden Layer Calculation:
$z_1 = w_{11}x_1 + w_{21}x_2 + b_1$ $z_2 = w_{12}x_1 + w_{22}x_2 + b_2$
Apply ReLU activation:
$a_1 = \max(0, z_1)$ $a_2 = \max(0, z_2)$
Output Layer Calculation:
$\hat{y} = w_1a_1 + w_2a_2 + b_o$

Loss Calculation

Using Mean Squared Error (MSE) as the loss function:

L = \frac{1}{2}(\hat{y} - y)^2

Where $y$ is the target output.

Backward Propagation

Output Layer Gradients:
$\frac{\partial L}{\partial \hat{y}} = \hat{y} - y$
For weights and bias:
$\frac{\partial L}{\partial w_1} = \frac{\partial L}{\partial \hat{y}} \cdot a_1$ $\frac{\partial L}{\partial w_2} = \frac{\partial L}{\partial \hat{y}} \cdot a_2$ $\frac{\partial L}{\partial b_o} = \frac{\partial L}{\partial \hat{y}}$
Hidden Layer Gradients:
$\frac{\partial L}{\partial a_1} = \frac{\partial L}{\partial \hat{y}} \cdot w_1$ $\frac{\partial L}{\partial a_2} = \frac{\partial L}{\partial \hat{y}} \cdot w_2$
ReLU derivative:
$\frac{\partial L}{\partial z_1} = \frac{\partial L}{\partial a_1} \cdot (1_{z_1 > 0})$ $\frac{\partial L}{\partial z_2} = \frac{\partial L}{\partial a_2} \cdot (1_{z_2 > 0})$
For weights and biases:
$\frac{\partial L}{\partial w_{11}} = \frac{\partial L}{\partial z_1} \cdot x_1$ $\frac{\partial L}{\partial w_{21}} = \frac{\partial L}{\partial z_1} \cdot x_2$ $\frac{\partial L}{\partial w_{12}} = \frac{\partial L}{\partial z_2} \cdot x_1$ $\frac{\partial L}{\partial w_{22}} = \frac{\partial L}{\partial z_2} \cdot x_2$ $\frac{\partial L}{\partial b_1} = \frac{\partial L}{\partial z_1}$ $\frac{\partial L}{\partial b_2} = \frac{\partial L}{\partial z_2}$

Weight and Bias Update

w_{ij} = w_{ij} - \eta \frac{\partial L}{\partial w_{ij}}

b_i = b_i - \eta \frac{\partial L}{\partial b_i}

w_{i} = w_{i} - \eta \frac{\partial L}{\partial w_{i}}

b_o = b_o - \eta \frac{\partial L}{\partial b_o}

Iteration 1 Calculations

Let's perform the calculations for the first iteration.

Forward Propagation

Hidden Layer Calculation:
$z_1 = (0.1 \cdot 2) + (0.3 \cdot 3) + 0.1 = 1.2$ $z_2 = (0.2 \cdot 2) + (0.4 \cdot 3) + 0.1 = 2.1$ $a_1 = \max(0, 1.2) = 1.2$ $a_2 = \max(0, 2.1) = 2.1$
Output Layer Calculation:
$\hat{y} = (0.5 \cdot 1.2) + (0.6 \cdot 2.1) + 0.1 = 1.92$

Loss Calculation

L = \frac{1}{2}(1.92 - 4)^2 = 4.3264

Backward Propagation

Output Layer Gradients:
$\frac{\partial L}{\partial \hat{y}} = 1.92 - 4 = -2.08$ $\frac{\partial L}{\partial w_1} = -2.08 \cdot 1.2 = -2.496$ $\frac{\partial L}{\partial w_2} = -2.08 \cdot 2.1 = -4.368$ $\frac{\partial L}{\partial b_o} = -2.08$
Hidden Layer Gradients:
$\frac{\partial L}{\partial a_1} = -2.08 \cdot 0.5 = -1.04$ $\frac{\partial L}{\partial a_2} = -2.08 \cdot 0.6 = -1.248$ $\frac{\partial L}{\partial z_1} = -1.04 \cdot 1 = -1.04$ $\frac{\partial L}{\partial z_2} = -1.248 \cdot 1 = -1.248$ $\frac{\partial L}{\partial w_{11}} = -1.04 \cdot 2 = -2.08$ $\frac{\partial L}{\partial w_{21}} = -1.04 \cdot 3 = -3.12$ $\frac{\partial L}{\partial w_{12}} = -1.248 \cdot 2 = -2.496$ $\frac{\partial L}{\partial w_{22}} = -1.248 \cdot 3 = -3.744$ $\frac{\partial L}{\partial b_1} = -1.04$ $\frac{\partial L}{\partial b_2} = -1.248$

Weight and Bias Update

w_{11} = 0.1 - 0.01 \times (-2.08) = 0.1208

w_{21} = 0.3 - 0.01 \times (-3.12) = 0.3312

w_{12} = 0.2 - 0.01 \times (-2.496) = 0.22496

w_{22} = 0.4 - 0.01 \times (-3.744) = 0.43744

w_1 = 0.5 - 0.01 \times (-2.496) = 0.52496

w_2 = 0.6 - 0.01 \times (-4.368) = 0.64368

b_1 = 0.1 - 0.01 \times (-1.04) = 0.1104

b_2 = 0.1 - 0.01 \times (-1.248) = 0.11248

b_o = 0.1 - 0.01 \times (-2.08) = 0.1208

Iteration 2 Calculations

Let's perform the calculations for the second iteration with the updated weights and biases.

Forward Propagation

Hidden Layer Calculation:
$z_1 = (0.1208 \cdot 2) + (0.3312 \cdot 3) + 0.1104 = 1.4456$ $z_2 = (0.22496 \cdot 2) + (0.43744 \cdot 3) + 0.11248 = 2.3192$ $a_1 = \max(0, 1.4456) = 1.4456$ $a_2 = \max(0, 2.3192) = 2.3192$
Output Layer Calculation:
$\hat{y} = (0.52496 \cdot 1.4456) + (0.64368 \cdot 2.3192) + 0.1208 = 2.492704$

Loss Calculation

L = \frac{1}{2}(2.492704 - 4)^2 = 1.136195

Backward Propagation

Output Layer Gradients:
$\frac{\partial L}{\partial \hat{y}} = 2.492704 - 4 = -1.507296$ $\frac{\partial L}{\partial w_1} = -1.507296 \cdot 1.4456 = -2.178768$ $\frac{\partial L}{\partial w_2} = -1.507296 \cdot 2.3192 = -3.496984$ $\frac{\partial L}{\partial b_o} = -1.507296$
Hidden Layer Gradients:
$\frac{\partial L}{\partial a_1} = -1.507296 \cdot 0.52496 = -0.791429$ $\frac{\partial L}{\partial a_2} = -1.507296 \cdot 0.64368 = -0.970056$ $\frac{\partial L}{\partial z_1} = -0.791429 \cdot 1 = -0.791429$ $\frac{\partial L}{\partial z_2} = -0.970056 \cdot 1 = -0.970056$ $\frac{\partial L}{\partial w_{11}} = -0.791429 \cdot 2 = -1.582858$ $\frac{\partial L}{\partial w_{21}} = -0.791429 \cdot 3 = -2.374287$ $\frac{\partial L}{\partial w_{12}} = -0.970056 \cdot 2 = -1.940112$ $\frac{\partial L}{\partial w_{22}} = -0.970056 \cdot 3 = -2.910168$ $\frac{\partial L}{\partial b_1} = -0.791429$ $\frac{\partial L}{\partial b_2} = -0.970056$

Weight and Bias Update

w_{11} = 0.1208 - 0.01 \times (-1.582858) = 0.136628

w_{21} = 0.3312 - 0.01 \times (-2.374287) = 0.354943

w_{12} = 0.22496 - 0.01 \times (-1.940112) = 0.24436112

w_{22} = 0.43744 - 0.01 \times (-2.910168) = 0.46654168

w_1 = 0.52496 - 0.01 \times (-2.178768) = 0.54674768

w_2 = 0.64368 - 0.01 \times (-3.496984) = 0.67864984

b_1 = 0.1104 - 0.01 \times (-0.791429) = 0.11831429

b_2 = 0.11248 - 0.01 \times (-0.970056) = 0.12218056

b_o = 0.1208 - 0.01 \times (-1.507296) = 0.13587296

Conclusion

After two iterations, the output $\hat{y}$ has moved closer to the target output (4) from 1.92 to 2.492704. The loss has decreased from 4.3264 to 1.136195, indicating that the network is learning and minimizing the error. The weight and bias updates demonstrate the network's adjustment process based on the gradients computed during backward propagation. Further iterations would continue this process, reducing the loss and improving the accuracy of the output.