Python 3.14 and its New JIT Compiler

0 1 9 minutes read

marks a turning point in the development of the world's most popular programming language. Although Python has long been known for its readability and large ecosystem, its killer speed is often the “elephant in the room.”

With 3.14, the CPython development team has brought not one, but two of the most anticipated features of recent times.

The end of GIL

I have written about this before. True concurrency is now available in Python if you want it. If you want more details about GIL-free Python, I will leave a link to my article about it at the end.

Just-In-Time (JIT) producer.

This testing feature is now wrapped directly into official installers, and that's what we'll focus on here. It's the result of years of architectural tweaks by the Python core team and others, aimed at making Python “automatically faster” without breaking the C-extension ecosystem that powers everything from data science to web backends.

In this article, we'll lift the hood of the new JIT, examine how it differs from previous optimization efforts, and walk through some benchmarks to help you decide if it's time to try JIT in your operations.

What is Python's New Just-In-Time (JIT) compiler?

To understand the 3.14 JIT, we need to understand how Python runs traditionally. Standard Python (CPython) is a translated language. When you use a script, your code is compiled into bytecode, which is a set of instructions executed by the CPython virtual machine.

JIT changes this flow. Instead of simply interpreting the bytecode line by line, the JIT monitors which parts of your code are used the most (“hot” paths). When a function or loop is considered “hot,” the JIT translates the bytecode into native machine code (instructions that the CPU understands). Then, the next time the code is run, no explanation is needed. Instead, it runs as is. This can be a great time saver, as we will see later.

How JIT fits into CPython

The Python 3.14 JIT is not a total rewrite. It is designed as an entry-level component that works alongside an existing translator. It uses a so-called “copy-and-patch” approach, which allows the JIT to be lightweight and portable across all types of CPU architectures without requiring a large, complex integration backend like LLVM.

What Changed in Python 3.14?

Python 3.13 had a basic, experimental JIT, but it was disabled by default. If you wanted to test it, you had to compile the CPython source tree and include it with some test flags like - - enable-experimental-jit.

With Python 3.14, everything changed. Provided JIT to the official .msi (Windows) and .pkg (macOS) installers. It also means that you no longer need a C compiler on your machine to get the benefits of JIT. Although still “experimental,” the inclusion in the official binaries indicates that the core team believes that JIT is stable enough for public testing.

Getting Python 3.14

Go to the top and you will see a download option for 3.14. Click that, and follow the instructions.

Otherwise, if you have UV installed tool, you can type the following.

PS C: > uv python install 3.14

Enables JIT

By default, JIT i disabled. This is a safety measure; because it's experimental, the Python Steering Council wants to ensure that users don't face unexpected regressions in stability or memory usage without an obvious choice.

To enable the JIT, you use an environment variable. This tells the CPython runtime to start the JIT engine when it starts.

On Windows (PowerShell):

$env:PYTHON_JIT=1
python my_script.py

On macOS/Linux (Bash/Zsh):

PYTHON_JIT=1
python my_script.py

Once enabled, CPython does not compile everything immediately with the JIT. It uses a tiering system. Basically, it tries to run the code as low as possible first, and only spends compilation/upgrade effort on the parts that prove to be hot.

Section 0: Standard translation.
Section 1: Special bytecode (introduced in 3.11).
Phase 2 (JIT): Machine code generation for commonly used methods.

Measuring the Impact of JIT

When testing the JIT, you cannot use the time.time() close to work. JITs need ia time to warm up. The first few iterations of the loop may be slower than usual as the JIT profiles the code, but subsequent iterations can be much faster.

The Benchmark Suite

Below is a comprehensive test program designed to implement the unique features of JIT, from heavy calculations to complex object management.

File 1: workloads.py

This file contains three separate CPU-bound tasks.

1/ The Mandelbrot function iterates the Mandelbrot formula over a pixel grid and returns a checksum for each pixel's iteration count.

2/ The Djikstra function constructs a random weighted deterministic graph and using Dijkstra from point 0, it returns how many nodes have been completed/visited.

3/ The Levenshtein function generates N pairs of random strings and returns the sum of their Levenshtein distances

from __future__ import annotations

import random
import heapq

# Workload 1: Mandelbrot (CPU + math loops)
def mandelbrot(width: int = 1000, height: int = 1000, iters: int = 500) -> int:
    checksum = 0
    for y in range(height):
        cy = (y / height) * 2.4 - 1.2
        for x in range(width):
            cx = (x / width) * 3.2 - 2.2
            zx, zy, count = 0.0, 0.0, 0
            while zx * zx + zy * zy <= 4.0 and count < iters:
                zx, zy = zx * zx - zy * zy + cx, 2.0 * zx * zy + cy
                count += 1
            checksum += count
    return checksum

# Workload 2: Dijkstra (heap + list + logic)
def dijkstra(n: int = 10000, edges_per_node: int = 50, seed: int = 123) -> int:
    rng = random.Random(seed)
    graph = [[] for _ in range(n)]
    for u in range(n):
        for _ in range(edges_per_node):
            v = rng.randrange(n)
            if v != u:
                graph[u].append((v, rng.randrange(1, 30)))

    dist = [10**12] * n
    dist[0] = 0
    pq = [(0, 0)]
    visited = 0

    while pq:
        d, u = heapq.heappop(pq)
        if d != dist[u]:
            continue
        visited += 1
        for v, w in graph[u]:
            nd = d + w
            if nd < dist[v]:
                dist[v] = nd
                heapq.heappush(pq, (nd, v))

    return visited

# Workload 3: Levenshtein distance (dynamic programming)
def levenshtein(a: str, b: str) -> int:
    prev = list(range(len(b) + 1))
    for i, ca in enumerate(a, 1):
        cur = [i]
        for j, cb in enumerate(b, 1):
            cur.append(min(cur[j - 1] + 1, prev[j] + 1, prev[j - 1] + (ca != cb)))
        prev = cur
    return prev[-1]

def levenshtein_batch(n: int = 10000, seed: int = 7, k: int = 50) -> int:
    """
    Deterministic batch: fixed RNG seed, fixed alphabet, fixed string length.
    Returns the sum of distances.
    """
    rng = random.Random(seed)
    alphabet = "abc"
    total = 0
    for _ in range(n):
        a = "".join(rng.choices(alphabet, k=k))
        b = "".join(rng.choices(alphabet, k=k))
        total += levenshtein(a, b)
    return total

File 2: benchmark.py

This script automatically compares different tasks with JIT enabled and disabled.

import os
import time
import json
import subprocess
from pathlib import Path

PYTHON_EXE = r"C:UsersthomaAppDataLocalProgramsPythonPython314python.exe"
PROJECT_DIR = Path(__file__).resolve().parent

# Original workloads (statement prints a result for sanity)
WORKLOADS = [
    ("mandelbrot", 'from workloads import mandelbrot; print(mandelbrot())'),
    ("dijkstra", 'from workloads import dijkstra; print(dijkstra())'),
    ("levenshtein_batch", 'from workloads import levenshtein_batch; print(levenshtein_batch())'),
]

N_RUNS = 10  # average of ALL runs (set to 6/10/20 as you like)
OUTFILE = PROJECT_DIR / "results_avg.json"

def run_once(stmt: str, jit_val: int) -> tuple[float, str]:
    env = os.environ.copy()
    env["PYTHON_JIT"] = str(jit_val)

    # Ensure local workloads.py is importable in subprocess
    env["PYTHONPATH"] = str(PROJECT_DIR) + (os.pathsep + env.get("PYTHONPATH", ""))

    t0 = time.perf_counter()
    p = subprocess.run(
        [PYTHON_EXE, "-c", stmt],
        env=env,
        cwd=str(PROJECT_DIR),
        capture_output=True,
        text=True,
    )
    t1 = time.perf_counter()

    if p.returncode != 0:
        raise RuntimeError(
            f"Run failed (PYTHON_JIT={jit_val})nn"
            f"Statement:n{stmt}nn"
            f"STDOUT:n{p.stdout}nnSTDERR:n{p.stderr}"
        )

    return (t1 - t0, p.stdout.strip())

def summarize(times: list[float]) -> dict:
    return {
        "avg": sum(times) / len(times),
        "min": min(times),
        "max": max(times),
        "runs": times,
    }

def bench_workload(name: str, stmt: str) -> dict:
    results = {}
    outputs = {}

    for jit_val in (0, 1):
        times = []
        outs = []
        print(f"  PYTHON_JIT={jit_val}: running {N_RUNS} times...")
        for i in range(1, N_RUNS + 1):
            dt, out = run_once(stmt, jit_val)
            times.append(dt)
            outs.append(out)
            print(f"    run {i}/{N_RUNS}: {dt:.6f}s")

        results[jit_val] = summarize(times)
        outputs[jit_val] = outs

    avg0 = results[0]["avg"]
    avg1 = results[1]["avg"]
    speedup = avg0 / avg1 if avg1 else float("inf")
    delta_pct = (avg1 - avg0) / avg0 * 100.0 if avg0 else 0.0

    return {
        "workload": name,
        "jit0": results[0],
        "jit1": results[1],
        "speedup_jit0_over_jit1": speedup,
        "delta_pct_jit1_vs_jit0": delta_pct,
        "outputs": outputs,  # sanity: should be stable
    }

def main() -> int:
    all_results = []
    print(f"Using Python: {PYTHON_EXE}")
    print(f"Project dir: {PROJECT_DIR}")
    print(f"Runs per setting (avg of all runs): {N_RUNS}n")

    for name, stmt in WORKLOADS:
        print(f"=== {name} ===")
        r = bench_workload(name, stmt)
        all_results.append(r)

        print(f"n  Averages:")
        print(f"    JIT=0 avg: {r['jit0']['avg']:.6f}s (min {r['jit0']['min']:.6f}, max {r['jit0']['max']:.6f})")
        print(f"    JIT=1 avg: {r['jit1']['avg']:.6f}s (min {r['jit1']['min']:.6f}, max {r['jit1']['max']:.6f})")
        print(f"    Speedup (JIT=0 / JIT=1): {r['speedup_jit0_over_jit1']:.3f}×  (Δ={r['delta_pct_jit1_vs_jit0']:+.2f}%)n")

        # Optional: warn if outputs vary across runs (nondeterminism)
        if len(set(r["outputs"][0])) != 1:
            print("  !! WARNING: JIT=0 output differs across runs (nondeterministic workload?)")
        if len(set(r["outputs"][1])) != 1:
            print("  !! WARNING: JIT=1 output differs across runs (nondeterministic workload?)")

    OUTFILE.write_text(json.dumps(all_results, indent=2), encoding="utf-8")
    print(f"Wrote: {OUTFILE}")
    return 0

if __name__ == "__main__":
    raise SystemExit(main())

Here are my results.

C:Usersthomaprojectspython_jit>C:UsersthomaAppDataLocalProgramsPythonPython314python.exe benchmark.py
Using Python: C:UsersthomaAppDataLocalProgramsPythonPython314python.exe
Project dir: C:Usersthomaprojectspython_jit
Runs per setting (avg of all runs): 10

=== mandelbrot ===
  PYTHON_JIT=0: running 10 times...
    run 1/10: 6.890924s
    run 2/10: 6.950737s
    run 3/10: 7.265357s
    run 4/10: 6.947150s
    run 5/10: 6.932333s
    run 6/10: 6.939378s
    run 7/10: 7.194705s
    run 8/10: 6.995550s
    run 9/10: 6.902696s
    run 10/10: 7.256164s
  PYTHON_JIT=1: running 10 times...
    run 1/10: 5.216740s
    run 2/10: 5.241888s
    run 3/10: 5.350822s
    run 4/10: 5.246767s
    run 5/10: 5.294771s
    run 6/10: 5.273295s
    run 7/10: 5.272135s
    run 8/10: 5.617062s
    run 9/10: 5.251656s
    run 10/10: 5.239060s

  Averages:
    JIT=0 avg: 7.027499s (min 6.890924, max 7.265357)
    JIT=1 avg: 5.300420s (min 5.216740, max 5.617062)
    Speedup (JIT=0 / JIT=1): 1.326×  (Δ=-24.58%)

=== dijkstra ===
  PYTHON_JIT=0: running 10 times...
    run 1/10: 0.235401s
    run 2/10: 0.227603s
    run 3/10: 0.244492s
    run 4/10: 0.232971s
    run 5/10: 0.249589s
    run 6/10: 0.232229s
    run 7/10: 0.229422s
    run 8/10: 0.238399s
    run 9/10: 0.230657s
    run 10/10: 0.235772s
  PYTHON_JIT=1: running 10 times...
    run 1/10: 0.238862s
    run 2/10: 0.239266s
    run 3/10: 0.240312s
    run 4/10: 0.231413s
    run 5/10: 0.232692s
    run 6/10: 0.233783s
    run 7/10: 0.230016s
    run 8/10: 0.237760s
    run 9/10: 0.240895s
    run 10/10: 0.246033s

  Averages:
    JIT=0 avg: 0.235653s (min 0.227603, max 0.249589)
    JIT=1 avg: 0.237103s (min 0.230016, max 0.246033)
    Speedup (JIT=0 / JIT=1): 0.994×  (Δ=+0.62%)

=== levenshtein_batch ===
  PYTHON_JIT=0: running 10 times...
    run 1/10: 2.176256s
    run 2/10: 2.171253s
    run 3/10: 2.171834s
    run 4/10: 2.170444s
    run 5/10: 2.149874s
    run 6/10: 2.162820s
    run 7/10: 2.171975s
    run 8/10: 2.199151s
    run 9/10: 2.168398s
    run 10/10: 2.167821s
  PYTHON_JIT=1: running 10 times...
    run 1/10: 1.575666s
    run 2/10: 1.612615s
    run 3/10: 1.571106s
    run 4/10: 1.584650s
    run 5/10: 1.579948s
    run 6/10: 1.582633s
    run 7/10: 1.593924s
    run 8/10: 1.573608s
    run 9/10: 1.581427s
    run 10/10: 1.578553s

  Averages:
    JIT=0 avg: 2.170983s (min 2.149874, max 2.199151)
    JIT=1 avg: 1.583413s (min 1.571106, max 1.612615)
    Speedup (JIT=0 / JIT=1): 1.371×  (Δ=-27.06%)

Interpreting the Results

As you can see, the results are a mixed bag. This is common in JIT testing.

10–30% Speed Up: Typical of “pure Python” loops (such as Mandelbrot or Levenshtein tests) where the JIT can avoid the bytecode dispatch loop.
0% Upgrade: Common for I/O-bound functions or code that heavily uses C extensions. Dijkstra's code didn't speed up because its runtime is dominated by heap/tuple operations and heavy, assignment-driven work that the current CPython JIT doesn't optimize for, so any savings in the interpreter are lost in noise.

When to use the Python 3.14 JIT

JIT is a powerful tool, but it is not a “magic button.” In my experience, you should try JIT if you have…

CPU-Bound Logic: Your application performs complex calculations, data processing, or complex logic in pure Python.
Long-Term Procedures: Web servers (Gunicorn/Uvicorn) or background workers (Celery) run long hours, allowing the JIT more time to warm up and prepare hot methods.
Pilot Test: You want to prepare your codebase for future versions of Python (3.15+), where the JIT will likely be more aggressive.

And avoid it if you have…

I/O-Bound Applications: If your application is just waiting for database queries or API responses, JIT will not help.
Areas of Memory Impairment: Small Lambda functions or small containers may suffer from increased JIT cache memory.
Short-lived CLI tools: A script that runs in less than a second doesn't need a JIT.

Future directions: Beyond 3.14

The CPython core team views 3.14 as a “baseline year.” Future iterations (Python 3.15 and 3.16) are expected to include:

Advanced Upgrade Passes: Type information gathered at runtime is used to generate more aggressive machine code.
Better Heuristics: Smart decisions are open when to combine, reduce the “warm-up” penalty.
High Low: Refine the copy and paste method to reduce memory usage.

Summary

The JIT for Python 3.14 is more than just a performance patch. It is a statement of purpose. It shows that Python is committed to closing the performance gap with languages like Java or Go while maintaining the “battery-equipped” simplicity that made it popular.

For many developers, JIT is just another tool worth exploring. If performance is important to your projects, it's worth testing Python 3.14 against your existing workload. A few benchmarks on your most important code paths may reveal performance gains you didn't expect.

Here is a link to my previous article on GIL Fee Python, which I mentioned earlier.

Source link

nimda 3 hours ago

0 1 9 minutes read