Machine Learning

Keeping Probabilities Honest: Jacobian Correction

Introduction

customer frustration at waiting times. The calls arrive randomly, so the wait time X follows an exponential distribution—most waits are short, few are painfully long.

Now I would argue that the irritation is not linear: a 10-minute wait feels twice as bad as a 5-minute wait. So you decide to model the “irritation units” as (Y = X²).

Easy, right? Just take the pdf of X, replace x with (sqrt{y}), and you're done.

He will make you rich. It looks reasonable—high near zero, long tail.

But what if you actually compiled a CDF? You would expect 1 right?

The result? 2.

A short snippet to confirm this
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import expon

# CDF of Exponential(1): F(x) = 1 - exp(-x) for x >= 0
def cdf_exp(x):
    return 1 - np.exp(-x)

# Wrong (naive) pdf for Y = X²: just substitute x = sqrt(y)
def wrong_pdf(y):
    return np.exp(-np.sqrt(y))  # This integrates to 2!

# Quick numerical check of integral
from scipy.integrate import quad
integral, err = quad(wrong_pdf, 0, np.inf)
print(f"Numerical integral ≈ {integral:.3f} (should be 1, but it's 2)")

# prints 2

Your new distribution says that all possible outcomes are doubled as they should be.

That's impossible… but it happened because you skipped one small fix.

This “correction” is a Jacobian—a scaling factor that compensates for how the deformation stretches or compresses the axis at different points. Skip it, and your chances are false. Installed it, and everything is fine again.

In this post, we'll build intuition, get step-by-step statistics, see it naturally from histogram measurements, visualize dynamic stretching/shrinking, and prove it through simulation.


Intuition

To understand why the Jacobian correction is necessary, let's use a concrete analogy: think of a probability distribution as a fixed amount of sand—1 pound to be exact—spread out along a number line, where the height of the sand pile at each point represents the probability density. A perfect sand always adds up to 1, representing 100% probability.

Now, when you change a random variable (say, from X to Y = X²), it's like grabbing that number line—a flexible rubber sheet—and twisting it according to the change. You do not add or remove sand; stretching or compressing different parts of the sheet.

In regions where the transition compresses the sheet (the long line of the original line slides into the short part on the new Y-axis), the same amount of sand now occupies less horizontal space. To keep the sand perfect, the pile must be tall—the density increases. For example, near Y=0 in the squaring transformation, many small X values ​​(from 0 to 1) are crammed into a small Y space (0 to 1), so the density increases significantly.

On the contrary, in the regions where the deformation lengthens the sheet (the short part of the original line is drawn longer on the Y-axis), the sand spreads over an additional area, making the pile shorter and flatter—the density decreases. For a large X (say, from 10 to 11), Y goes from 100 to 121—a very wide range—so that the density decreases there.

Bottom line: the perfect sand is always exactly 1 lb, no matter how you roll the sheet. Without accounting for this expansion and contraction of space, your new density will not be consistent, such as claiming you have 2 lb of sand after the warp. The Jacobian is a mathematical feature that automatically adjusts the height everywhere to maintain the perfect value.


Mathematics

Let's formalize the sensitivity method with the example of ( Y = g(X) = X^2 ), where ( X ) has a pdf ( f_X(x) = e^{-x} ) of ( x geq 0 ) (Exponential with rank 1).

Consider a small interval around ( x ) with width ( Delta x ).
The probability in that interval is approximately ( f_X(x) Delta x ).

After transformation, this maps to the space around ( y = x^2 ) in width
( Delta y approx left| g'(x) right| Delta x = |2x| Delta x ).

Savings opportunities:
$$ f_Y(y) Delta y approx f_X(x) Delta x, $$
so
$$ f_Y(y) approx frac{f_X(x)}{left| g'(x) right|} $$

In the limit as ( Delta x to 0 ), this becomes exactly:
$$ f_Y(y) = f_X(x) left| frac{dx}{dy} right|, $$
where ( x = sqrt{y} ) (inverse) and
( frac{dx}{dy} = frac{1}{2sqrt{y}} ).

Connecting:
$$ f_Y(y) = e^{-sqrt{y}} cdot frac{1}{2sqrt{y}} quad text{for } y > 0. $$

Without the Jacobian term ( frac{1}{2sqrt{y}} ), the naive
( f_Y(y) = e^{-sqrt{y}} )
it includes 2:

Let ( u = sqrt{y} ), ( y = u^2 ), ( dy = 2u , du ):
$$ int_0^infty e^{-sqrt{y}} , dy $$

$$ = int_0^infty e^{-u}cdot 2u , du $$

$$= 2 int_0^infty ue^{-u} , du $$

$$ = 2 Gamma(2) = 2 cdot 1 = 2. $$

The Jacobian correction ensures $$ int_0^infty f_Y(y) , dy = 1. $$

Note on (Gamma)

(Gamma) represents the factorial of real numbers. $$Gamma(n) = (n-1)! quad text{of positive numbers } n$$

This scaling factor ( left| frac{dx}{dy} right| ) is exactly what compensates for local elongation and axial shrinkage.

Standard Form

Let Y = g(X), where ug is a monotonic (increasing or decreasing) differentiable function, and X has pdf ( f_X(x) ).

We want the pdf ( f_Y(y) ) of Y .

Consider a small interval around x with width ( Delta x ).
The probability in that interval is approximately ( f_X(x) Delta x ).

After the transformation y = g(x), this interval map plots the interval around y in width
( Delta y approx left| g'(x) right| Delta x ).

Going back to the equation we developed earlier:
$$ f_Y(y) = f_X(x) left| frac{dx}{dy} right|, $$
where we use the inverse relation (x = h(y) = g^{-1}(y) ), and
( frac{dx}{dy} = h'(y) = frac{1}{g'(x)} ).

So the general formula is
$$ f_Y(y) = f_X(h(y)) left| h'(y) right|. $$


Epirical evidence

Simulating stretching and shrinking

The best way to “feel” the stretching and contraction is to zoom in two regions separately: near zero (where compression occurs) and outward (where stretching dominates).

We will do four episodes:

1. The original X histogram, rounded to small values ​​(X < 1), with small equal intervals of width 0.1 — to show the source of the compression.

2. Corresponding Y = X² histogram, zoomed around zero — showing how those small X intervals become smaller in Y (decreasing).

3. The original X histogram of large values ​​(X > 1), with equal intervals of width 1 — to show the source of the extension.

4. Corresponding Y histogram of large values ​​— showing how those X intervals explode into large Y intervals (expansion).

The code
import numpy as np
import matplotlib.pyplot as plt

# Generate large sample for clear visuals
n = 50000
x = np.random.exponential(scale=1, size=n)
y = x**2

fig = plt.figure(figsize=(16, 10))

def plot_histogram(ax, data, bins, density, color, alpha, title, xlabel, ylabel):
    ax.hist(data, bins=bins, density=density, color=color, alpha=alpha)
    ax.set_title(title)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)

# Plot 1: X small values (compression source)
ax1 = fig.add_subplot(2, 2, 1)
plot_histogram(ax1, x[x < 1], bins=50, density=True, color='skyblue', alpha=0.7,
               title='Original X (zoomed X < 1)', xlabel='X', ylabel='Density')

# Equal-width intervals of 0.1 on small X
small_x_lines = np.arange(0, 1.01, 0.1)
for line in small_x_lines:
    ax1.axvline(line, color='red', linestyle='--', alpha=0.8)

# Plot 2: Y near zero (showing shrink/compression)
ax2 = fig.add_subplot(2, 2, 2)
plot_histogram(ax2, y[y < 1], bins=100, density=True, color='lightcoral', alpha=0.7,
               title='Y = X² near zero (compression visible)', xlabel='Y', ylabel='Density')

# Mapped small intervals on Y (very narrow!)
small_y_lines = small_x_lines**2
for line in small_y_lines:
    ax2.axvline(line, color='red', linestyle='--', alpha=0.8)

# Plot 3: X larger values (stretching source)
ax3 = fig.add_subplot(2, 2, 3)
plot_histogram(ax3, x[(x > 1) & (x < 12)], bins=50, density=True, color='skyblue', alpha=0.7,
               title='Original X (X > 1)', xlabel='X', ylabel='Density')

# Equal-width intervals of 1 on larger X
large_x_starts = [1, 3, 5, 7, 9, 11]
large_x_lines = large_x_starts + [s + 1 for s in large_x_starts]
for line in large_x_lines:
    if line < 12:
        ax3.axvline(line, color='red', linestyle='--', alpha=0.8)

# Plot 4: Y large values (showing stretch)
ax4 = fig.add_subplot(2, 2, 4)
plot_histogram(ax4, y[(y > 1) & (y < 150)], bins=80, density=True, color='lightgreen', alpha=0.7,
               title='Y = X² large values (stretching visible)', xlabel='Y', ylabel='Density')

# Mapped large intervals on Y (huge gaps!)
large_y_lines = np.array(large_x_lines)**2
for line in large_y_lines:
    if line < 150:
        ax4.axvline(line, color='red', linestyle='--', alpha=0.8)

# Update annotation styles with modified font style
fig.text(0.5, 0.75, "X-axis shrinking → Density increasesnY-axis higher near zero", 
         ha='center', va='center', fontsize=12, color='black', 
         fontstyle='italic', bbox=dict(facecolor='#f7f7f7', edgecolor='none', alpha=0.9))

fig.text(0.5, 0.25, "X-axis stretching → Density decreasesnY-axis lower for large values", 
         ha='center', va='center', fontsize=12, color='black', 
         fontstyle='italic', bbox=dict(facecolor='#f7f7f7', edgecolor='none', alpha=0.9))

fig.set_dpi(250)
plt.tight_layout()
plt.show()
Depending on the distribution, different regions are scaled and reduced in conversion.

It simulates the Jacobian correction

To see the Jacobian correction in action, let's simulate data from the Exponential(1) distribution of X, calculate Y = X², and plot the empirical histogram of Y against the theoretical pdf for increasing samples of size n. As n increases, the histogram should converge to the corrected pdf, not the naive one.

The code
import numpy as np
import matplotlib.pyplot as plt

def correct_pdf(y):
    return np.exp(-np.sqrt(y)) / (2 * np.sqrt(y))

def naive_pdf(y):
    return np.exp(-np.sqrt(y))

# Sample sizes to test
sample_sizes = [100, 1_000, 10_000]

fig, axs = plt.subplots(1, len(sample_sizes), figsize=(15, 5))

y_vals = np.linspace(0.01, 50, 1000)  # Range for plotting theoretical pdfs

for i, n in enumerate(sample_sizes):
    # Sample X ~ Exp(1)
    x = np.random.exponential(scale=1, size=n)
    y = x**2
    
    # Plot histogram (normalized to density)
    axs[i].hist(y, bins=50, range=(0, 50), density=True, alpha=0.6, color='skyblue', label='Empirical Histogram')
    
    # Plot theoretical pdfs
    axs[i].plot(y_vals, correct_pdf(y_vals), 'g-', label='Correct PDF (with Jacobian)')
    axs[i].plot(y_vals, naive_pdf(y_vals), 'r--', label='Naive PDF (no Jacobian)')
    
    axs[i].set_title(f'n = {n}')
    axs[i].set_xlabel('Y = X²')
    axs[i].set_ylabel('Density')
    axs[i].legend()
    axs[i].set_ylim(0, 0.5)  # For consistent viewing
    axs[i].grid(True)  # Add grid to each subplot

# Set the figure DPI to 250 for higher resolution
fig.set_dpi(250)

plt.grid()
plt.tight_layout()
plt.show()

And the result is what we expect.

let's simulate data from the Exponential(1) distribution on X, calculate Y = X², and plot the empirical histogram of Y against the theoretical pdf to maximize samples of size n. As n increases, the histogram should converge to the corrected pdf, not the naive one.
Proof of Jacobian correction: the green curve is a good fit to the sampled data for y

Histogram Equalization: a real-world application

Summary plot of the histogram measurement from the post I wrote about it.

A classic example where Jacobian correction appears naturally is histogram estimation in image processing.

We treat the pixel intensity X (usually in ([0, 255])) as samples from other distributions with intensity pdf based on image history.

The goal is to convert to a new power Y so that Y is approximately uniform in ([0, 255]) — this spreads values ​​and improves contrast.

The transformation used is exactly the same as the cumulative distribution function (CDF) of X:

$$ Y = 255 cdot F_X(X) $$

where ( F_X(x) = int_{-infty}^x f_X

Why does this work? It is a straightforward application of the Probability Integral Transform (PIT):

If ( Y = F_X(X) ) and X is continuous, then Y ~ Uniform([0,1]).

Scaling by 255 gives Uniform([0,255]).

Now see the Jacobian at work:

Let ( g(x) = L cdot F_X(x) ) (( L = 255 )).

The derivative ( g'(x) = L cdot f_X(x) ) (since the derivative of the CDF is a pdf).

Use the formula for changing variables:

$$ f_Y(y) = f_X(x) / |g'(x)| = f_X(x) / (L f_X(x)) = 1/L $$

( f_X(x) ) cancels perfectly, leaving a constant (uniform) density.

The Jacobian factor ( 1 / |g'(x)| ) automates the distribution by compensating for regions where the original density was high or low.

In different images, rotation makes it proportional, but the principle is the same.

For a deeper dive into histogram scaling with examples, see my previous post: here.


In conclusion

The Jacobian adjustment is one of those quiet pieces of math that feels unnecessary—until you jump in and suddenly your odds don't add up to 1 anymore. Whether you're measuring wait times, modeling power from speed, or flat image histograms, the transformation doesn't just change the values ​​but how the probabilities are distributed across them. The factor ( left| frac{dx}{dy} right| ) (or its variant cousin, the determinant) is an exact compensation that preserves the number of conserved probabilities when calculating local stretching and compression.

The next time you transform a random variable, remember the sand on the rubber sheet: adjust the axis all you want, but the absolute sand must not change. The Jacobian Adjustment is the law that makes it happen.


The code

Connect to Colab Notebook


References and further reading

A few things I found helpful.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button