Machine Learning “The Lovent Calendar” Day 13: Lasso and Ridge Regression on Excel

nimda December 13, 2025

0 11 5 minutes read

Machine Learning “The Lovent Calendar” Day 13: Lasso and Ridge Regression on Excel

One day, a data scientist told that Ridge Regression was a complex model. Because he realized that the training formula is more complicated.

Yes, it is the exact purpose of my reading machine “Advent calendar”, to clarify this kind of difficulty.

So, Ile, we are going to talk about the penalized types of direct regression.

First, we will see why habituation or punishment is needed, and we will see how the model has been changed
After that we will examine the different types of approval and their effects.
We will also train the model in general and test different hyperparameters.
We will also ask another question about how to survive the heavy lifting during the punishment period. (confused? You'll see)

A direct reorganization and its conditions”

When we talk about a direct return, people often say that certain conditions must be satisfied.

You may have heard statements like:

Residuals must be Gaussian (sometimes confused with target Gaussian, which is false)
The explanatory variable should not be collinear

In classical mathematics, these conditions are required for slope. In machine learning, the focus is on prediction, so this assumption is less central, but the underlying issues still exist.

Here, we will see an example of two elements that are collinear, and let's make them completely equal.

And we have the relation: y = x1 + x2, and x1 = x2

I know that if they are perfectly equal, we can just do: y = 2 * x1. But the idea is that they can be very similar, and we can always build a model using it, right?

So what's the problem?

When the elements go well, the solution is no different. Here is an example on the screen below.

y = 10000 * x1 – 9998 * x2

Ridge and lasso in Excel – All images by the author

And we can see that the normality of the coefficients is large.

Therefore, this idea is to reduce the general limit of coefficients.

And after using the standard, the mental model is the same!

That's right. The parameters of the return of the changed queue. But the model is the same.

Different types of normality

So the idea is to combine the MSE and the normalization of the coefficients.

Instead of minimizing the MSE, we try to minimize the sum of the two terms.

Which is normal? We can do with Norm L1, l2, or combine them.

There are three classic ways to do this, and consistent names for the models.

Ridge Regression (L2 Penalty)

Ridge's redesign adds finesse to the Prices are inclusive coefficients.

Hopefully:

Large coefficients are heavily penalized (because of the square)
The coefficients are subtracted from zero
But they are never exactly zero

Result:

All features remain in the model
The coefficients are smooth and very strong
It's very effective in combating congestion

You are waiting to shrinkbut it is not selective.

Ridge regression on Excel – All Images By Author

Lasso Regression (L1 Penelty)

Lasso uses a different penalty: the total amount coefficients.

This small change has a big effect.

With the lasso:

Other coefficients can be absolutely zero
The model automatically ignores certain factors

That's why the lasso is called that, because it represents At least complete shrinkage with the selection operator.

Active: Refers to a standard operator added to the loss function
– It's a little bit: Derived from the regression model of Squares Regression
– It's rational: Uses the total number of coefficients (normal L1)
Shrinkage: Regresses the coefficients towards zero
Choice: It can set some coefficients to zero, making the feature selection

Important nuance:

We can say that the model still has the same number of coefficients
But some of them are forced to zero during training

The form of the model is not changed, but the lasso effectively removes the features by driving the coefficients to zero.

Lasso in Excel – all images created by the author

3. Elastic Net (L1 + L2)

Elastic Net is combination of Ridge and Lasso.

It uses:

L1 penalty (like lasso)
and l2 penalty (like ridge)

Why combine them?

Because:

Lasso becomes unstable when features are highly connected
Ridge Move Integration works well but doesn't select features

Elastic Net offers a balance between:

tighten up
shrinkage
to rush

It is usually the most efficient choice for real datasets.

What Really Changes: Model, Training, Organization

Let's look at this from a machine learning perspective.

The model doesn't really change

Of course templatefor all general types, we write:

y = ax + b.

The same number of coefficients
Same prediction formula
But, the coefficients will be different.

From some perspective, Ridge, Lasso, and Elastic Net not different models.

This page preparing for the game The principle is also the same

We remain:

Define the loss function
Reduce it
Compute gradients
Update the coefficients

The only difference is:

The loss function now includes a penalty period

That's it.

Added hyperparameters (this is the real difference)

In linear regression, we have no control over the “difficulty” of the model.

Standard direct return: There is no hyperparameter
Ridge: one hyperparameter (Lambda)
Lasso: one hyperparameter (Lambda)
Elastic Net: two hyperpassmeters
- one of general power
- one that should measure l1 vs l2

So:

Normal line recovery does not need to be programmed
Fined returns do

This is the reason why standard deviations are often seen as “not really machine learning”, when they are clearly normal deviations.

Implementation of standard gradients

We keep the Gradient Descent of Ols Regression as a reference, and for the restoration of the Ridge, we must add the normalization time of the coefficient.

We'll use a simple dataset I generated (the same one we already use for direct regression).

We can see 3 “” models that are different in terms of coefficients. And the goal in this chapter is to apply the gradient to all the models and compare them.

Ridge Lasso Regression on Excel – All Images By Author

Gigi with a penalized gradient

First, we can make a ride, and we only have to change the gradient of a.

Now, it does not mean that the value of B is not changed, because the gradient of B is each step dependent on a.

Lasso with a penalized gradient

Then we can do the same with the lasso.

And the only difference is the cheater.

For each model, we can also calculate the MSE and standard MSE. It's satisfying to see how they shrink over iterations.

Comparing coefficients

Now, we can visualize the coeffled aa in all three models. To see the difference, we include the largest lambdas.

The lambda effect

With a large value of lambda, we will see that the adequacy coefficient becomes small.

And if the Lambda Lasso becomes too large, then we get a value of 0. Numerically, we have to develop the origin of the gradient.

Logistic regression?

We saw a healthy shift yesterday, and one question we can ask is that it can also be interpreted. If so, what are they called?

The answer is yes, Logistic Returns can be closed

Of course the same idea applies.

A reasonable return would also be:

L1 is punished
L2 is punished
Elastic Net Puraded

They are There are no special names as “Ridge Regression” in common usage.

Why?

Because the idea is no longer new.

In practice, libraries like skiit-learn simply let you specify:

The work of getting lost
type of penalty
normal power

Innovation is key when the idea is new.
Now, practice is a common choice.

Some questions we can ask:

Is Normality Always Helpful?
How does feature scaling affect the performance of direct real-time recovery?

Lasting

Ridge and lasso do not change the exact model itself, they change the way the coefficients are read. By adding finesse, the creation of strong solutions and strong and purposeful solutions, especially when the elements are connected. Seeing this process step by step in Excel makes it clear that these methods are not more complicated, they are more controlled.

Source link

nimda December 13, 2025

0 11 5 minutes read