Bonus 1 Machine Learning “Advent Calendar”: AUC in Excel

we will use AUC in Excel.
AUC is often used for classification tasks as a performance metric.
But we start with the confusion matrix, because that's where everyone starts in practice. Then we will see why one confusion matrix is not enough.
And we will answer these questions:
- AUC means Area Under the Curve, but which curve is it under?
- Where does that curve come from?
- Why does this place have a purpose?
- Is AUC possible? (Yes, it has a possible explanation)
1. Why the confusion matrix is not enough
1.1 Scores from models
A classifier will usually give it to us pointsnot final decisions. The decision comes later, when we choose the limit.
If you've read previous “Advent Calendar” articles, you've seen that “score” can mean different things depending on the model family:
- Range-based models (similar to k-NN) usually calculates the proportion of neighbors of a certain class (or distance-based confidence), which becomes a score.
- Density-based models sum the probabilities under each class, then normalize to get the final (posterior) probabilities.
- Separation Tree-based models it tends to extract a fraction of a given class among the training samples within a leaf (that's why many points share the same points).
- Weight-based models (linear models, kernels, neural networks) calculate a scaled sum or an indirect score, and sometimes use a scaling step (sigmoid, softmax, Platt scaling, etc.) to make the map possible.
So either way, we end up with the same situation: points for each sight.
Then, in practice, we take the limit, usually 0.5and convert the scores into predicted classes.
And this is exactly where the confusion matrix comes into play.
1.2 Confusion matrix at one threshold
Once the threshold is selected, all observations become a secondary decision:
- predicted good (1) or predicted bad (0)
From there, we can count four numbers:
- TP (True Positive): predicted 1 and actually 1
- TN (True Negative): predicted 0 and actually 0
- FP (False Positive): predicted 1 but actually 0
- FN (False Negative): predicted 0 but actually 1
This 2 × 2 count table is a confusion matrix.
Then we tend to calculate ratios like this:
- Accuracy = TP / (TP + FP)
- Remember (TPR) = TP / (TP + FN)
- Clarification = TN / (TN + FP)
- FPR = FP / (FP + TN)
- Accuracy = (TP + TN) / Sum
So far, everything is clean and accurate.
But there is a hidden limit: all these values depend on the threshold. Thus the confusion matrix evaluates the model at one functional point, not the model itself.
1.3 If one limit breaks everything
This is a strange example, but it still makes the point very clear.
Assume that your limit is set to 0.50, and all points are less than 0.50.
Then the classifier predicts:
- Good Prediction: none
- Bad Prediction: everyone
So you get:
- TP = 0, FP = 0
- FN = 10, TN = 10

This is a perfectly valid confusion matrix. It also creates a very strange feeling:
- Accuracy becomes
#DIV/0!because no good things are predicted. - Recall is 0% because you didn't capture anything good.
- The accuracy is 50%, which sounds “not too bad”, even though the model found nothing.
There is nothing wrong with a confusion matrix. The issue is the question we asked you to answer.
The confusion matrix answers: “How good is the model at this particular limit?”
If the threshold is not chosen well, the confusion matrix can make the model look useless, even if the points contain real differences.
And in your table, the division is visible: positives tend to have points around 0.49, negatives around 0.20 or 0.10. The model is random. Your limit is very tight.
That is why one limit is not enough.
What we need instead is a way to test the model across bordersnot once.
2. The ROC
First we have to create a curve, since AUC stands for Area Under a Curve, so we have to understand this curve.
2.1 What does ROC mean (and what it is)
Because the first question everyone should ask is: What is the AUC under the curve?
The answer is:
AUC is the area under the ROC curve.
But this raises another question.
What is the ROC curve, and where does it come from?
ROC represents Receiver for feature functionality. The name is historical (early signal detection), but the idea is modern and simple: it describes what happens when you change the decision threshold.
The ROC curve is a four-dimensional structure:
- ix-axis: FPR (Good False Rate)
FPR = FP / (FP + TN) - y-axis: TPR (Real Good Level), also called Recall or Sensitivity
TPR = TP / (TP + FN)
Each limit gives one point (FPR, TPR). When you connect all the points, you get the ROC curve.
In this section, one detail is important: the ROC curve is not directly observed; is constructed by sweeping the threshold over the order of points.
2.2 Constructing an ROC curve with scores
For each score, we can use it as a threshold (and of course, we can define custom thresholds).
For each limit:
- we calculate TP, FP, FN, TN from the confusion matrix
- then we calculate FPR and TPR
So the ROC curve is simply the summation of all these pairs (FPR, TPR), ordered from strict thresholds to permissive thresholds.
This is exactly what we will use in Excel.

At this point, it is important to note something that sounds very simple. When we construct an ROC curve, the actual numerical values of the scores do not matter. The bottom line is this order.
If one model outputs a score between 0 and 1, another outputs a score between -12 and +5, and the third outputs only two different values, the ROC works the same way. As long as the high scores tend to correspond to the good class, the threshold sweep will generate a sequence of similar decisions.
That's why the first step in Excel is always the same: sort by score from highest to lowest. If the lines line up correctly, the rest just counts.
2.3 Learning the ROC curve
In an Excel sheet, the construction becomes more concrete.
Sort the views by Score, from highest to lowest. Then go down the list. In each line, you pretend that the limit is set at that point, which means: everything above is predicted to be good.
That allows Excel to perform a compound calculation:
- how many good things have you received so far
- how many negatives have you received so far
From these aggregated statistics and the total dataset, we calculate the TPR and FPR.
Now every line is a single ROC point.
Why does the ROC curve look like stairs
- If the next received line is positive, TP increases, so TPR increases while FPR remains low.
- If the next line received is negative, FP increases, so FPR increases while TPR remains low.
That's why, for finite real data, the ROC curve is a ladder. Excel makes this visible.
2.4 References you should see
A few references help you learn the curve quickly:
- Complete isolation: curve goes straight up (TPR reaches 1 while FPR stays at 0), then goes straight up.

- A random separator: the curve is always close to the diagonal line from (0,0) to (1,1).

- Inverted standard: the curve falls “below” the diagonal, and the AUC becomes smaller than 0.5. But this time we have to change the score by 1-score. In theory, we can consider this hypothetical case. In practice, this often happens when scores are misinterpreted or class labels are changed.

These are not just theories. They are visual anchors. Once you have them, you can interpret any real ROC curve quickly.
3. ROC AUC
Now, in turn, what can we do?
3.1 Calculating area by computer
If the ROC curve exists as a range of points (FPR, TPR), the AUC is pure geometry.
Between two consecutive points, the additional area is the area of the trapezoid:
- range = change in FPR
- height = TPR average of two points
In Excel, this becomes the “delta column” method:
- combine dFPR between consecutive lines
- repeat with moderate TPR
- put it all together

Different situations:
- perfect classification: AUC = 1
- random rate: AUC ≈ 0.5
- inverse level: AUC <0.5
So AUC is the summary of all ROC curves.
3.2. AUC as probabilities
AUC is not a threshold choice.
It answers a very simple question:
If I randomly select one positive example and one negative example, what is the probability that the model will give a high positive score?
That is all.
- AUC = 1.0 means perfect quality (good always gets high score)

- AUC = 0.5 means random scaling (it's basically a flipped coin)

- AUC < 0.5 means the level is skewed (negative tends to get higher scores)
This definition is very useful, because it reiterates this important point:
The AUC depends only on the arrangement of the points, not on the absolute values.
This is why ROC AUC works even when the “scores” are not perfectly balanced probabilities. It can be green points, margins, leaf ratios, or any monotonic confidence value. As long as high means “likely to be good”, AUC can evaluate the level of quality.
The conclusion
The confusion matrix tests the model at one extreme, but the classifiers produce scores, not decisions.
ROC and AUC test the model across all limits by focusing on it levelnot to measure.
Ultimately, AUC answers a simple question: how often does a positive example score higher than a negative one?
Seen this way, ROC AUC is an accurate metric, and a spreadsheet is sufficient to make all steps transparent.


