Your Classifier Is Broken, But It's Still Working | by David Lindelöf | January, 2025

When you run a binary classifier on a population you get an estimate of the proportion of true positives in that population. This is known as spreading.
But that's the equation biasedbecause there is no perfect divider. For example, if your classifier tells you that it has 20% positive cases, but its accuracy is known to be only 50%, you would expect the true frequency to be 0.2 × 0.5 = 0.1, i.e. 10%. But that assumes total recall (all true positives are flagged with a comma). If the recall is less than 1, then you know that the classifier missed true positives, so you again it is necessary to normalize the measure of frequency by recall.
This leads to the general formula for finding the true frequency Pr(y=1) from the positive predictive value Pr(ŷ=1):
But let's say you want to use the separator more than once. For example, you may want to do this periodically to find increasing trends. You can no longer use this formula, because accuracy depends on the spread. To use the above formula you would have to re-estimate the precision every time (say, with human eval), but you could also re-estimate the frequency yourself.
How do we get out of this circular thinking? It turns out that binary classifiers have other performance metrics (besides accuracy) that don't depend on scaling. This does not only involve remembering R but with the specification S, and these metrics can be used to adjust Pr(ŷ=1) to obtain an unbiased estimate of the true frequency using this formula (sometimes called frequency correction):
where:
- Pr(y=1) is the true spread
- S clarification
- R sensitivity or recall
- Pr(ŷ=1) is part of good
The evidence is straightforward:
Solving for Pr(y = 1) gives the above formula.
Note that this formula divides in the denominator R — (1— S) becomes 0, or if recall reaches the false value 1-S. But remember what a typical ROC curve looks like:
An ROC curve like this remembers plots R (true positive rate) versus false positive rate 1-Sand therefore its divisor R = (1-S) is the classifier that falls on the diagonal of the ROC diagram. This is a classifier, that is, a random guess. True cases and false cases are equally likely to be classified well by this classifier, so the classifier has no knowledge at all, and you will learn nothing from it—and it is certainly not a true spread.
Enough theory, let's see how this works in practice:
# randomly draw some covariate
x <- runif(10000, -1, 1)# take the logit and draw the outcome
logit <- plogis(x)
y <- runif(10000) < logit
# fit a logistic regression model
m <- glm(y ~ x, family = binomial)
# make some predictions, using an absurdly low threshold
y_hat <- predict(m, type = "response") < 0.3
# get the recall (aka sensitivity) and specificity
c <- caret::confusionMatrix(factor(y_hat), factor(y), positive = "TRUE")
recall <- unname(c$byClass['Sensitivity'])
specificity <- unname(c$byClass['Specificity'])
# get the adjusted prevalence
(mean(y_hat) - (1 - specificity)) / (recall - (1 - specificity))
# compare with actual prevalence
mean(y)
In this simulation I get recall = 0.049
again specificity = 0.875
. The predicted spread is ridiculously biased 0.087
but the adjusted prevalence is actually equal to the true prevalence (0.498
).
Summary: this shows how, using the recall and specificity of a classifier, you can adjust the predicted frequency to track over time, assuming that the recall and specificity are stable over time. You cannot do this using precision and recall because precision depends on extension, while recall and specificity do not.