Machine learning “Calendar Lovent” Day 7: Last Tree Tree

we checked that a Resolution of Muscle Restaor chooses its optimal partition by minimizing i Mean mass error (MSE).
Today on the 7th day of the “value calendar” learning machine, we continue the same way but with The Decision to Save the Treethe classmate of yesterday's model.
A quick intuition test with two simple datasets
Let's start with the smallest toy tatdaset I made, with one value element and one target variable with two classes: 0 and 1.
The idea is to cut the data into two parts, based on a single rule. But the question is: What should this law be? What policy tells us which division is better?
Now, even if we don't know the math yet, we can just look at the details and guess the split points.
And in hindsight, it would happen 8 or What you recordedright?
But the question is what one is qualified for numerically.
If we think about the mind:
- With split at 8:
- left side: no misalignment
- right side: single specification
- With split at What you recorded:
- right side: no misalignment
- left side: negative specification
Obviously, dividing by 8 feels better.

Now, let's look at an example with Three classes. I added some random details, and made 3 classes.

Here are the labels 0, 1, 3and I edit them directly.
But we must be careful: These numbers are there just class namesnot numerical values. They should not be interpreted as “ordered”.
So the feeling is always: How does each region meet after the division?
But it's hard to see by sight the finer distinctions.
Now, we need a mathematical way to express this idea.
This is such a topic for the next chapter.
Pollution measurement as a classification method
In the Resteror Resteror decision, we already know:
- The forecast for the region is this – common of the stone.
- The split level is measured by A disease.
In the case of a classifier tree:
- The forecast for the region is this Most people of the district.
- Split quality is measured by the measure of infinity: Gini pollution or A sermon.
Both are common in books, and both are available on skikit-funda. Gini is used automatically.
But, what is the nature of pollution, really?
If you look at the curves of Kind of and A sermonboth behave in the same way:
- That's right 0 Where a node exists -nothing (All samples have the same class).
- They reached theirs which is in the end Where classes exist mixed evenly (50 percent / 50 percent).
- The curve is there -bushelezsymmetric, and increases with disturbance.
This is an important property of any the measure of infinity:
Impurity is low when groups are pure, and high when groups are mixed.

So we will use these methods to determine which division to create.
Divide by one continuous factor
As a tree resolver, we will follow the same structure.
List of all possible cracks
Similar to the Reslarsor Version, with a single numeric feature, the only locations we need to check are the midpoints between the plotted X values.
With each division, the dirty dirt on each side
Let's take a discrete value, for example, x = 5.5.
We divide the dataset into two regions:
- Region l: x < 5.5
- Region r: x ≥ 5.5
In each region:
- We calculate the total number of observations
- We handle Gini waste
- Finally, we include weighted impurities for classification

Choose the separation with the lowest pollution
As a reverse case:
- List all possible cracks
- The filth of every deception
- The correct classification is him less pollution

Table for making all cracks
To automate everything in Excel,
We arrange all calculations internally one tablewhere:
- Each row corresponds to one division,
- For each line, we include:
- The Gini of the – left District,
- The Gini of the on the right side District,
- once Full weighted Gini of separation.
This table provides a clean, compact overview of all possible partitions,
And the best split is simply the lowest value in the last column.

Multiple class classification
Until now, we were working with two classes. But Gini pollution naturally reaches the Three classesand the concept of the division of the same variable.
Nothing changes the structure of the algorithm:
- We count all possible cracks,
- We put dirt on each side,
- We take a weighted average,
- We choose separation with the lowest pollution.
The only way Gini's pollution becomes long-term.
Gini Pollution in three classes
If the circuit consists of measurements P1, P2, P3
In these three studies, then the Gini coefficient is:

Same idea as before:
A 'pure' state where one class rules,
And the pollution is greater when the classes are mixed.
Left and right regions
For each classification:
- Region l contains some observations of classes 1, 2, and 3
- Region r contains the remaining observations
In each region:
- Calculate how many points each class has
- identify the proportions p1, p2, p3
- determine the Gini coefficient using the above formula
Everything is exactly the same as in binary form, in one term.
Summary Temple of 3 Splits
As before, we collect all the combinations in one table:
- each line is single-spaced
- We list section 1, section 2, section 3 on the left
- We list section 1, section 2, section 3 on the right
- We include Gini (left), Gini (right), and weighted Gini
Separation from less weighty dirt is the one chosen by the decision tree.

We can easily create algorithms for K classes, using the following parameters to calculate Gini or entropy

How different are the methods of pollution, really?
Now, we always say Gini or entropy as terms, but Are they really different? When looking at mathematical formulas, some may say
The answer is no.
In theory, in almost all applicable cases:
- Gini and Entropy Choose the same division
- Tree structure almost the same
- Predictions the same
Why?
Because their curves look very similar.
Both collect 50 percent mixing and throw at zero purity.
The only difference type curve:
- Kind of a quadratic work. It penalizes misalignment more properly.
- A sermon a logarithmic The function, therefore, penalizes the uncertainty slightly around 0.5.
But the difference is small, it works, and you can do it in Excel!
Other ways to get dirty?
Another natural question: Is it possible to invent / use other methods?
Yes, you can set up your own business, as long as:
- Corner 0 When the node is clean
- Corner size When classes are mixed
- Corner -bushelez and grow strongly in “disruption”
Example: Pollution = 4 * P0 * P1
This is another acceptable method of pollution. And it actually fits Kind of repeated every time there are only two sections.
So again, it gives similar cracks. If you're not sure, you can
Here are some steps that can be used as well.

Exercises in Excel
Tests and other parameters and features
Once you've created the first partition, you can expand your file:
- Try it A sermon Instead of Gini
- Try to add Employment Features
- Try to build the next division
- Try to change The depth of the depth and be aware of under- and over-fitting
- Try creating a confusion matrix for predictions
These simple tests already give you a good sense of how real real trees behave.
Implementation of the rules for the Titanic Survival Dataset
The next natural exercise is to recreate known decision rules The Titanic Survival Dataset (CC0 / Public Domain).
First, we can start with just two factors: gender and years.
Implementing rules in Excel is old and boring, but this is exactly what it is: It makes you aware of what the default rules look like.
It is nothing but a sequence of If / else Statements, repeated over and over.
This is the true nature of a decision tree: simple rules, stored on top of each other.

Lasting
Getting started with the tree function in Excel is surprisingly easy to find.
With a few formulas, he reveals the heart of the algorithm:
- List the Splits
- the impurity of compulsion
- Choose the cleanest classification

This simple method is the basis of many advanced models that are similar Grown treeswhich we will discuss later in this series.
And stay tuned Day 8 Tomorrow!



