Machine Learning

Machine Learning “Calendar Lovent” Day 12: Logistic Returns in Excel

Today's model is Logistic Regression.

If you already know this model, here is your question:

Reasonable return a I'm sorry or a the victim of the age of the house?

Yes, this question is exactly the same: tomato a fruit or a a commanding thing?

From the botanist's point of view, tomato is a fruit, because they look at the structure: seeds, flowers, plant biology.

From a culinary point of view, tomatoes are vegetables, because they look at the taste, how it is used in a recipe, whether it goes in a salad or a dessert.

Same thing, two valid answers, because the idea it's different.

Logical restoration is exactly like that.

  • In sstatistical / glm idea, is to postpone. And there is no concept of “segregation” in this framework anyway. There is gamma regression, logarithmic regression, poisson regression…
  • In Machine Learning As you can see, it is used for classification. So it's basic.

We will return to this later.

For now, one thing is certain:

Logical order is effectively reversed there The target variable is binaryand often y coded as 0 or 1.

But…

What is a weight based model classifier?

Therefore, y can be either 0 or 1.

0 or 1, they're numbers, right?

So we can just consider y as continuous!

Yes, y = ax + b, with y = 0 or 1.

Why not?

Now, you may ask: Why this question, now? Why were they not asked before.

Of course, with tree-based models and tree-based models, it's truly class-class.

Where y is intermediate, as -red, – blue, – it's a treator simply 0 and 1:

  • between K-nnyou distinguish by looking Neighbors of each class.
  • between centroid modelshe compares with the centroid of each class.
  • In a Decision Treeyou include Class measurements in each area.

For all these models:

Class labels are not numbers.
They are categories.
Algorithms never treat them as values.

So the separation is natural and immediate.

But in weight-based models, things work differently.

In a weight-based model, we always include something like:

y = ax + b

or, later, a more complex function with coefficients.

This means:

The model works with numbers everywhere.

So here is the main idea:

If the model is regressive, then this same model can be used for binary classification.

Yes, we can use a direct reprogramming of binary classification!

As binary labels 0 and 1they already have the numbers.

And in this special case: us can make general applications at least (ols) directly to y = 0 and y = 1.

The model will fit the line, and we can use the same closed formula, as we see below.

Logical Sorting in Excel – All Images By Author

We can do the same gradient origin, and it will work perfectly:

And then, to get the prediction of the last class, we simply choose a the raven.
It is usually 0.5 (or 50 percent), but depending on how strict you want to be, you can choose another value.

  • If the predicted y0.5, predicting stage 1
  • Otherwise, class 0

This is the beginning.

And because the model produces numerical output, we can even see the point where: y = 0,5.

This value of X defines i Decision Down.

In the previous example, this happens at X = 9.
Of this limitation, we have already seen one specification.

But the problem arises as soon as we introduce the point with – a lot the value of x.

For example, suppose we add a point with: x = 50 and y = 1.

Because Direct Returns Try to Equal A a straight line By all accounts, this absolute value of X draws the line high.
Pre-decision from X = approx x = 12.

And now, with this new frontier, we finally are Two misclassification services.

This indicates a major problem:

The direct resampling used as a classifier is very sensitive to high values ​​of X. The decision boundary moves more, and the division becomes unstable.

This is one of the reasons we need a model that doesn't misbehave directly and forever. A model that stays between 0 and 1, even if X gets very large.

And this is exactly what will give us the job of food.

Reasonable rework

We start with: Ax + B, as direct inversion.

Then we use a single function called Sigmoid, or logistic function.

As we can see in the screenshot below, the value of P is between 0 and 1, so this is perfect.

  • p(x) is probability prediction that's it y = 1
  • 1 − p(x) it is predicted that y = 0

For classification, we simply say:

  • When p(x) ≥ 0.5write a class 1
  • Alternatively, write a class 0

From Lightioud to Log-Log

Now, Ols Linear Regression tries to minimize MSE (mean error of measurement).

Logical reordering of the binary target is used Bernoulliity. For each look i:

  • When yᵢ = 1probability of a data point pᵢ
  • When yᵢ = 0probability of a data point 1 − pᵢ

In all data, the probability is the product above all i. Actually, we take the logarithm, which turns the product into a sum.

In A GLM viewwe try be proud this is possible.

In machine learning theorywe define the loss like – negative log opportunities with us reduce It. This gives a general log decrease.

And it's equal. We're not going to make a show here

A decent gradient for rational regression

Principle

Just like we did with direct recovery, we can use it again Gradient Descent here. The concept is always the same:

  1. Start from some initial values a and b.
  2. Conceive its own loss the gradient (Acquisition) in relation to a and b.
  3. Move on a and b a bit in that area reduce getting lost.
  4. Repeat.

There is nothing mysterious.
Same mechanical process as before.

Step 1. Calculation of positions

In logical order, the gradients a Log loss ratio Follow a very simple structure.

This is just Net worth.

We'll just give the result below, with a formula we can use in Excel. As you can see, it is very simple in the end, even if the log loss formula can be complicated at first.

Excel can combine these two indexes directly SUMPRODUCT formulas.

Step 2. Parameter update

As soon as the gradients are known, we update the parameters.

This update step is repeated for each iteration.
Along with the iTration after the breakdown, the loss decreases, and the parameters meet the appropriate values.

Now we have the whole picture.
You have seen the model, the loss, the gradients, and the parameters are updated.
And with a detailed view of each iteration in Excel, you actually can Play with the model: Change the value, watch the movement of the curve, and see the loss decrease step by step.

It's surprisingly satisfying to see how clearly everything fits together.

What about multiclass classification?

For support-based and tree-based models:

No problem at all.
They naturally handle multiple classes because they never interpret labels as numbers.

But for speed-based models?

Here we hit a problem.

If we write the class numbers: 1, 2, 3, etc.

Then the model will interpret these numbers as real number values.
Which leads to problems:

  • The model assumes that class 3 is “larger” than class 1
  • midway between class 1 and class 3 is class 2
  • distances between classes become objective

But none of this is true in isolation.

So:

For weight-based models, we can't just use y = 1, 2, 3 for multiclass classification.

This encining installation is wrong.

We will see later how to fix this.

Lasting

Starting from a simple binary dataset, we have seen how the weight-based model can work as a crissifier, why the direct regression quickly reaches the limits, and that the logical feed function solves these problems by keeping these parameters between 0 and 1.

Then, by expressing the model with litiodiod and log loss, we found a construction that is mathematically sound and easy to use.
And once everything is placed in Excel, the entire learning process is visible: probabilities, losses, gradients, updates, and finally the modification of parameters.

With a detailed iteration table, you actually can understand How the model develops step by step.
You can change the value, adjust the reading rate, or add a point, and immediately notice that the curve and loss decrease.
This is the true value of doing machine learning in a spreadsheet: Nothing is hidden, and the calculations are all transparent.

By building a logical regression this way, you don't just understand the model, you understand it by whom It is being trained.
And this method will stay with you as we move to advanced models later in the Advent calendar.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button