Machine Learning

Machine Learning “Calendar Lovent” Day 14: SoftMax Regression in Excel

Through systematic repetition, we learned to distinguish between two categories.

Now, what happens if there are more than two categories.

n is simply a multiclass extension of this concept. And we will discuss this model for the 14th day of my learning devices “Advent Calendar” (follow this link for all the information about the method and files I use).

Instead of one point, we now collect one point for each class. Instead of probability, we use the softmax function to generate probabilities that are at least 1.

Understanding the Softmax model

Before training the model, let's first understand ie the model.

SOFTMax Regression is not in Optimization yet.
It starts about How forecasts are compiled.

Small data with 3 classes

Let's use a small data set with one feature x and three classes.

As we mentioned earlier, Target Valuve y should -I treated like numbers.
It represents categories, not masses.

A common way to represent this One codingwhere each section is presented with its index.

From this point of view, the softmax transformation can be seen as The three logistic regressions are identicalone for each class.

Small datasets are ideal for learning.
You can see every formula, every value, and how each part of the model contributes to the final result.

SOFTMAX Regression on Excel – All Images By Author

Model description

So what is the model, exactly?

Points for each class

In logistic regression, the model is a simple expression: points = A * x + b.

SOFTMax Regression does the same, but with a different score for each class:

Score_0 = A0 * * x + b0
Points_1 = A1 * x + B1
Score_2 = A2 * x + b2

At this stage, these scores are just real numbers.
They don't do it yet.

Converting scores into opportunities: The Softmax initiative

SOFTMax turns three scores into three chances. The odds are each optimistic, and the whole number is three to 1.

The integration is straightforward:

  1. Exponentiate each score
  2. Identify the sum of all material
  3. Divide each exponential by this remainder

This gives us P0, P1, and P2 for each row.

These values ​​represent exemplary confidence in each class.

Currently, the model is fully defined.
Training the model will simply adjust the coefficients ak and BK so that these are probabilities that match the observed classes as much as possible.

SOFTMAX Regression on Excel – All Images By Author

Visualization of the Softmax model

Currently, the model is fully defined.

We have:

  • One Linear Score each
  • The Softmax step converts these scores into probabilities

Training the model simply involves changing the coefficients of aka_kak and bkb_kbk so that these skills match the bound classes as much as possible.

When the coefficients are found, we can Visualize the character of the model.

To do this, we take various values ​​of the input values, for example X from 0 to 7, and add: Score1, Score2 and the corresponding p0 P0, P1, P2.

Plotting these probabilities gives three smooth curves, one for each.

SOFTMAX Regression on Excel – All Images By Author

The result is very accurate.

The smaller the values ​​of X, the higher the probability of class 0.
As X increases, this probability decreases, while the probability of class 1 increases.
Larger values ​​of X, the more likely class 2 dominates.

For every X benefits, three chances to help in 1.
The model does not make arbitrary decisions; Instead, it reveals How convincing in each class.

This property makes the behavior of softmax regression easy to understand.

  • You can see how the model transitions smoothly from one class to another
  • Decision Boundaries correspond to intersections between probability curves
  • Model Logic is tangible, not abstract

This is one of the main advantages of creating a model in Excel:
you can't just predict, you can See how the model thinks.

Now that the model is defined, we need a path to Analyze how good it isand the way to Improve their coefficients.

Both of these steps reuse ideas we've already seen about logical returns.

Model testing: Cross-entropy loss

SOFTMAX Regression uses A common task of getting lost such as logistic regression.

For each data point, we look at the probability that it is given The right sectionand we take the negative logarithm:

Loss = – Log (P True Class)

If the model gives high power in the right class, the loss is small.
The lower the chance, the greater the loss.

In Excel, this is very easy to use.

We select the appropriate probability based on the value of y, and use the logarithm:

Loss = -Ln (Select (Y + 1, P0, P1, P2))

Finally, we include i average loss over all lines.
This standard deviation is the amount we want to minimize.

SOFTMAX Regression on Excel – All Images By Author

Computer scraps

To update the coefficients, we start by computing Remainsone for each class.

For each line:

  • desiual_0 = p0 minus 1 If y equals 0, otherwise 0
  • remainder_1 = p1 minus 1 if y equals 1, otherwise 0
  • deciual_2 = p2 minus 1 If y equals 2, otherwise 0

In other words, in the right section, we remove 1.
For other classes, we subtract 0.

These residuals measure how far the predicted probabilities deviate from our expectations.

Computing The Gradients

The gradients are obtained by summing the residuals with the characteristic values.

For each class:

  • the gradient of ak is the ratio of residual_k * x
  • The BK gradient is the ratio of residual_k

In Excel, this is implemented with simple formulas like SUMPRODUCT and AVERAGE.

At this point, everything is clear:
You can see residuals, gradients, and how each data is performing.

Photo-editing –

Updating the coefficients

When the gradients are known, we update the coefficients using the gradient feost.

This step is the same as we see before, the front rollback or line rollback.
The only difference is that now we are renewed six coefficients instead of two.

To visualize the learning, we create a second sheet with one row per iteration:

  • Current iteration number
  • Six coefficients (A0, B0, A1, B1, A2, B2)
  • getting lost
  • Gradients

Line 2 corresponds to Iteration 0with the first coefficients.

Row 3 includes the updated coefficients using the gradients from Row 2.

By dragging formulas down large lines, we simulate the appearance of a gradient over multiple objects.

After that you can clearly see:

  • The coefficients remain constant
  • Loss reduces balance after collapse

This makes the learning process visible.
Instead of thinking about the optimizer, you can View model Read.

Logistic regression as a special case of softmax regression

Logical regression and softmax regression are often presented as separate models.

In fact, they are the same idea on different scales.

SOFTMax Regression sums each One Linear Score and converts these scores into probabilities by comparing them.
When there are only two classes, this comparison depends on the difference between the two scores.

This difference is a direct input function, and using softmax in this case directly produces a logistic (Sigmoid) function.

In other words, logical regression is simply obtained by softmax regression applied to the two classes, with the non-parameters removed.

If this is understood, moving from binary to multiclass classification becomes a natural extension, not a conceptual leap.

Softmax regression does not introduce a new way of thinking.

It shows that Logical recovery already contains everything we need.

By multiplying the exact points and each class and adapting it with softmax, we move from binary decisions to multiclass probabilities without changing the fundamentals.

Getting lost is the same idea.
Gradients are the same structure.
The most useful is the same construction of the gradient that we already know.

What are the only changes Number of corresponding scores.

Another way to handle multiclass classification?

Softmax is not the only way to deal with multiclass problems in state-based models.

There is another way, which is more beautiful in fact, but more common in practice:
vs-vs-rest or one-vs-one separation.

Instead of building a single multiclass model, we train several binary models and combine their results.
This trick is used mostly with Vest Support devices.

Tomorrow, we will look at SVM.
And you will see that it can be defined in an unusual way … and, as always, directly in Excel.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button