Machine Learning “Advent Calendar” Day 17: Neural Network Regressor in Excel

they are often presented as black boxes.
Layers, activations, gradients, backpropagation… it can feel overwhelming, especially when everything is hidden behind model.fit().
We will build a neural network regressor from scratch using Excel. All calculations will be transparent. All average values will be visible. Nothing will be hidden.
By the end of this article, you will understand how a neural network performs regression, how forward propagation works, and how the model can estimate non-linear functions using just a few parameters.
Before starting, if you haven't read my previous articles, you should first look at the implementation of linear regression and logistic regression.
You will see that neural network is nothing new. It is a natural extension of these examples.
As always, we will follow these steps:
- First, we will look at how the Neural Network Regressor model works. In the case of neural networks, this step is called forward propagation.
- Then we will train this function using gradient descent. This process is called backpropagation.
1. Forward distribution
In this part, we will define our model, and then use it in Excel to see how the forecast works.
1.1 A Simple Data Set
We will use a very simple dataset that I have generated. It contains only 12 observations and one factor.
As you can see, the target variable has an inverse relationship with x.
And for this dataset, we will use two neurons in the hidden layer.
1.2 Structure of a Sensor Network
Our neural network example has:
- One input layer with element x as input
- One hidden layer with two neurons in the hidden layer, and these two neurons will allow us to create indirect relationships
- The output layer is simply linear regression
Here is a diagram representing this neural network, along with all the parameters to be measured. There are a total of 7 parameters.
Hidden layer:
- a11: weight from x to hidden neuron 1
- b11: bias of hidden neuron 1
- a12: weight from x to hidden neuron 2
- b12: hidden neuron bias 2
Output layer:
- a21: weight from hidden neuron 1 to output
- a22: weight from hidden neuron 2 to output
- b2: output bias
At its core, a neural network is just a function. Combined work.
If you write it clearly, there is nothing ambiguous about it.

We usually represent this activity with a diagram made of “neurons”.
In my opinion, the best way to describe this painting is as a visual representation of an integrated mathematical functionnot as a claim that it literally reproduces the way biological neurons work.

Why does this work?
Each sigmoid behaves like a smooth step.
With two sigmoids, the model can increase, decrease, bend, and flatten the output curve.
By connecting them in sequence, the network can achieve smooth non-linear curves.
That is why in this dataset, two neurons are already enough. But can you find a dataset that this property is not suitable for?
1.3 Processing in Excel
In this section, we will assume that 7 coefficients have been found. Then we can use the formula we saw earlier.
To visualize the neural network, we can use new continuous values of x from -2 to 2 with a step of 0.02.
Here's a screenshot, and we can see that the final function fits perfectly with the input data state.

2. Backpropagation (Gradient Descent)
At this point, the model is fully defined.
Since it is a regression problem, we will use MSE (squared error), like linear regression.
Now, we have to find 7 parameters that reduce MSE.
2.1 Details of the backpropagation algorithm
The principle is simple. BUT, since there are so many built-in functions and so many parameters, we have to plan by derivation.
I won't get all the 7-part derivatives clearly. I'll just give the results.

As we can see, there is an error word. So, to run the whole process, we have to follow this loop:
- start weights,
- combine the output (forward propagation),
- count the error,
- combine gradients using derivatives,
- review weights,
- repeat until combined.
2.2 Implementation
Let's start by putting the input dataset in a columnar format, which will make it easier to use formulas in Excel.

In theory, we can start with random values to initialize the parameter values. But in practice, the number of iterations can be large to achieve full convergence. And since the cost function is not convex, we can get stuck in a local minimum.
So we have to choose “wisely” the starting values. I have prepared something else for you. You can make small changes to see what happens.

2.3 Forward distribution
For the columns from AG to BP, we perform a forward distribution phase. We calculate A1 and A2 first, followed by the output. These are the same formulas used in the previous part of the forward distribution.
To make the calculation easier and more manageable, we calculate each observation separately. This means we have 12 columns for each hidden layer (A1 and A2) and the output layer. Instead of using an aggregation formula, we calculate the values for each view individually.
To simplify the loop process during the gradient descent phase, we organize the training dataset into columns, and then we can expand the formula in Excel by row.

2.4 Errors and Costs work
For columns BQ through CN, we can now calculate the cost function values.

2.5 Derivative component
We will be computing 7 derivatives corresponding to the weights of our neural network. For this partial derivative, we will need to calculate the values of all 12 views, resulting in a total of 84 columns. However, we have made efforts to simplify this process by organizing a sheet with coding and formulas for ease of use.

So we will start with the output layer, in parameters: a21, a22 and b2. We can find them in columns from CO to DX.

Then for parameters a11 and a12, we can find them from columns DY to EV:

And finally, for the bias parameters b11 and b12, we use columns EW to FT.

And to wrap it up, we put together all the parts based on all 12 observations. These combined gradients are neatly arranged in columns Z to AF. Parameter updates are then performed on the columns R to Xusing these values.

2.6 Visualizing the encounter
To better understand the training process, we visualize how the parameters change during gradient descent using a graph. At the same time, the decrease in the cost function is followed column Ywhich makes the convergence of the model clearly visible.

The conclusion
A neural network regressor is not magic.
It is simply a set of basic functions, controlled by a certain number of parameters and trained by minimizing a well-defined mathematical objective.
By creating the model clearly in Excel, every step is visible. Forward distributions, error calculations, partial derivatives, and parameter updates are no longer abstract concepts, but concrete equations that you can test and adjust.
The complete implementation of our neural network, from forward propagation to back propagation, is now complete. You are encouraged to experiment by changing the dataset, initial parameter values, or learning rate, and see how the model behaves during training.
Through this hands-on work, we saw how gradients drive learning, how parameters are iteratively updated, and how a neural network gradually adapts to fit the data. This is exactly what happens inside a modern machine learning library, hidden only behind a few lines of code.
Once you understand it this way, neural networks stop being black boxes.



