Machine Learning

Reading machine “Calendar Lovent” Day 4: k-lusho in Excel

4 of the learning calendar for the learning machine.

For the first three days, we tested models based on friends For supervised learning:

In all these models, the idea was the same: We measure the distances, and determine the result based on the nearest points or nearest centers.

Today, we live in this same family of ideas. But we use ranges in an uncontrolled way: k-way.

Now, one question for those who still know this algorithm: does k-means look like a model, a K-NN Classifier, or a nearest centroid classifier?

And if you remember, with all the models we've seen so far, there wasn't really a “training” or tuning section.

  • In K-Nn, there is no training at all.
  • In LDA, QDA, or GNB, training is simply an index of integration and differentiation. And there are no real hyperparameters either.

Now, in k-way, we will use a training algorithm that finally looks like a “real” learning machine.

We start with a small 1D example. After that we move to 2D.

The principle of k-means

For training data, there is No original labels.

The goal of K-Thos to create Logical labels are groups of groups that are close to each other.

Let's look at the figure below. You can clearly see two groups of points. Each penny (red square and green square) is in the middle of its collection, and all points are awarded to the closest ones.

This gives a very accurate picture of how k – means discovers detectors using only distances.

And here, k means the number of centers we are trying to find.

k-ndlelas in Excel – photo by the Author

Now, let's answer the question: Which algorithm is closer to K-Nn Classifier or closer centroid classifier?

Don't be fooled by it K in k-nn and k-top.
They don't mean the same thing:

  • between k-nn, K the number of neighbors, not the number of classes;
  • between k-way, K amount of cents.

K-ndlelas very close to Nearby Credioid Classifier.

Both models are represented by centimetersand with the new recognition we simply add up the distance to each cent to determine who it belongs to.

The difference, of course, is that Nearby Credioid ClassifierWe are already here see inches because they come from written classes.

between k-waywe don't know cents. The whole goal of the algorithm is to find out correct directly from the data.

The business problem is completely different: instead of predicting labels, we try create them.

And in k-ways, the value of K (Amount of cents) unknown. Therefore it becomes a Hyperparameter that we can synchronize.

k-means only one factor

We start with a small 1D example so that everything can be seen on one axis. And we will choose the values ​​in such an insignificant way that we can immediately see two centimeters.

1, 2, 3, 11, 12, 13

Yes, 2, and 12.

But how could the computer not know? The machine will 'learn' by guessing step by step.

Here the algorithm is called Lloyd's algorithm.

We will use it by highlighting with the following loop:

  1. Choose the first centimeter
  2. identify the distance from each point to each centroid
  3. Assign each point to the nearest centroid
  4. Repeat the centroids as average points in each cluster
  5. Repeat steps 2 to 4 until the centroids no longer move

1. Select the first centimeter

Choose the first two centers, for example:

They must be within the data range (between 1 and 13).

k-ndlelas in Excel – photo by the Author

2. Advanced grades

For each data point x:

  • point distance to c_1,
  • point distance to c_2.

Normally, we use absolute distance in 1D.

Now we have two grade values ​​for each point.

k-ndlelas in Excel – photo by the Author

3. Give collections

For each point:

  • Compare the two distances,
  • Assign to the smallest group (1 or 2).

In Excel, this is easy IF or MIN Logic based.

k-ndlelas in Excel – photo by the Author

4. It includes new cents

For each batch:

  • Take the points assigned to that cart,
  • write their ratio,
  • This measurement becomes the New Centroid.
k-ndlelas in Excel – photo by the Author

5. Iterate until convergence is reached

Now in Excel, thanks to formulas, we can just be Paste the new centroid values in cells the first cents.

The update is fast, and after doing this a few times, you will see that the values ​​stop changing. This is where the algorithm is integrated.

k-ndlelas in Excel – photo by the Author

We can also record each step in Excel, so we can see how the cents and collections evolve over time.

k-ndlelas in Excel – photo by the Author

k-refers to two factors

Now let's use two features. The process is exactly the same, just use the Euclidean distance in 2D.

You can do it Copy-Print of new puntroids as values (with just a few cells to refresh),

k-ndlelas in Excel – photo by the Author

or you can show all steps in between to see the full evolution of the algorithm.

k-ndlelas in Excel – photo by the Author

Visualizing Excel cents

To make the process more accurate, it is useful to create plots that show how the centroids move.

Unfortunately, Excel or Google sheets are not ideal for this type of visualization, and data tables can quickly become a bit complicated to edit.

If you want to see a full example with detailed plots, you can read this article I wrote about three years ago, where each step of the Centroid Movement is shown clearly.

k-ndlelas in Excel – photo by the Author

As you can see in this picture, the worksheet was very disorganized, especially compared to the previous table, which was very straight.

k-ndlelas in Excel – photo by the Author

Choosing K: The Elbow Method

So now, you might try it k = 2 and k = 3 For us, they also combine each inertia. After that we simply compare the prices.

We can even start with k = 1.

For each value of K:

  • We run the K – method until the transformation,
  • deceive iteriawhich is the fraction of elementary distances between each point and its assigned centroid.

In Excel:

  • For each point, take the distance to its centroid and square it.
  • Achieve all these specific grades.
  • This gives an inertia with this k.

For example:

  • For K = 1, the Centroid is simply X1 and X2,
  • For K = 2 and K = 3, we take the combined measurements from the sheets where you run the algorithm.

Then we can plot the inertia as a function of K, for example (k = 1, 2, 3).

In this data

  • From 1 to 2, the inertia decreases significantly,
  • From 2 to 3, the improvement is very small.

“Elbow” is the value of K after which the decrease in inertia becomes marginal. In the example, it suggests that k = 2 is sufficient.

k-ndlelas in Excel – photo by the Author

Lasting

THENSO is the most accurate algorithm when you see IT step by step in Excel.

We start with simple centimeters, form distances, assign points, renew centimeters, and repeat. Now, we can see how “machines learn”, right?

Yes, this is just the beginning, we will see that different models “learn” in very different ways.

And here's a variation of tomorrow's article: A random version of Nearby Credioid Classifier actually k-way.

So what would be the random form of Alone or Rda? We will answer this in the next article.

K-is That – photo by the Author

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button