Machine Learning

Python 3.14 and the end of gil

The most anticipated release in recent times, is finally here. The reason for this is that several exciting improvements have been implemented in this release, including:

Sub-vendors. These have been available in Python for 20 years, but to use them, you had to downgrade to code in C. Now they can be used directly from Python itself.

T-string. Template threads are a new way of custom thread processing. They use the standard syntax of F strings, but, unlike F strings, they return an object that represents both the combined parts of the string, instead of a simple string.

A multi-time compiler. This is still an experimental feature and should not be used in production applications; However, it promises to work powerfully for certain use cases.

There are many more improvements in Python 3.14, but this article is not about those or those mentioned above.

Instead, we will be discussing what is probably the most anticipated feature of this release: Free Python, also known as Gil-Free Python. Note that standard Python 3.14 will still work with gil enabled, but you can download (or build) a different, empty version. I will show you how to download and install it, and with several code examples, show a comparison of running times between the free python 3.14.

What is gil?

Many of you will be able to lock the global interpreter (gil) in Python. Gil is a mutex-a meyingism method – used to synchronize access to resources, and in Python, it ensures that only one thread executes the Bytecode at a time.

On the other hand, this has several advantages, including simplifying threading and memory management, avoiding parallel situations, and integrating Python with C / C ++ libraries.

On the other hand, gil can block the match. With gil in place, true parallelization of CPU-bound tasks across CPU Cores within a single Python process is not possible.

Why is this important?

In a word, “Working”.

Because pooled free execution can use all available cores in your system at the same time, code tends to run faster. As data scientists and ML or DATA engineers, this applies not only to your code but also to the code that makes up the systems, frameworks and libraries you rely on.

Many machine learning and data science tasks are CPU-intensive, especially during model training and data processing. Removing gil can lead to significant performance improvements for these CPU-bound tasks.

Many popular libraries in Python Face Conreints because they have had to work around gil. Its removal can lead to:-

  • The easiest and probably the best implementation of these libraries
  • New opportunities to access existing libraries
  • Development of new libraries that can take full advantage of parallel processing

Installing Python Version Effend Python

If you're a Linux user, the only way to get free Python is to build it yourself. If, like me, you're on Windows (or Macos), you can install it using the official installer from the Python website. During the process, you will have the option to customize your installation. Look for the check box to install free binainaties. This will install a separate interpreter that you can use to run your code without gil. I will show how the installation works on a 64-bit version of Windows.

To get started, click the following URL:

And scroll down until you see a table that looks like this.

Image from the Python website

Now, click on the Windows (64-Bit) link. Once Enteke is Downloaded, open it again, on the first installation screen that is shown, click Customize the installation Connect. Note that I also checked Enter python.exe in the path check box.

On the next screen, select the optional extras you want to add to the installation, and click The next one – come back. At this point, you should see a screen like this,

Image from Python Installer

Confirm the checkbox next to it Download binaries for free is selected. I checked again Install Python 3.14 for all users An option.

Click the install button.

When the download is complete, in the installation folder, look for the Python application file with a 't' at the end of its name. This is the version without Python. The application file, called Python, is a standard virtual Python. In my case, the gil-free Python was called Python3.14t. You can check that it is installed correctly by typing this in the command line.

C:Usersthoma>python3.14t

Python 3.14.0 free-threading build (tags/v3.14.0:ebf955d, Oct  7 2025, 10:13:09) [MSC v.1944 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> 

If you see this, you're all set. Alternatively, check that the installation location has been added to your environment and/or double-check your installation steps.

Since we are going to compare the Python routines of gilis with the standard Python Rutimes, we should also make sure that this is installed correctly.

C:Usersthoma>python
Python 3.14.0 (tags/v3.14.0:ebf955d, Oct  7 2025, 10:15:03) [MSC v.1944 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

Gil vs gil-free python

Example 1 – Finding prime numbers

Type the following in the Python code file, e.g.

#
# example1.py
#

import threading
import time
import multiprocessing

def is_prime(n):
    """Check if a number is prime."""
    if n < 2:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

def find_primes(start, end):
    """Find all prime numbers in the given range."""
    primes = []
    for num in range(start, end + 1):
        if is_prime(num):
            primes.append(num)
    return primes

def worker(worker_id, start, end):
    """Worker function to find primes in a specific range."""
    print(f"Worker {worker_id} starting")
    primes = find_primes(start, end)
    print(f"Worker {worker_id} found {len(primes)} primes")

def main():
    """Main function to coordinate the multi-threaded prime search."""
    start_time = time.time()

    # Get the number of CPU cores
    num_cores = multiprocessing.cpu_count()
    print(f"Number of CPU cores: {num_cores}")

    # Define the range for prime search
    total_range = 2_000_000
    chunk_size = total_range // num_cores

    threads = []
    # Create and start threads equal to the number of cores
    for i in range(num_cores):
        start = i * chunk_size + 1
        end = (i + 1) * chunk_size if i < num_cores - 1 else total_range
        thread = threading.Thread(target=worker, args=(i, start, end))
        threads.append(thread)
        thread.start()

    # Wait for all threads to complete
    for thread in threads:
        thread.join()

    # Calculate and print the total execution time
    end_time = time.time()
    total_time = end_time - start_time
    print(f"All workers completed in {total_time:.2f} seconds")

if __name__ == "__main__":
    main()

This page the_memo Function test If the given number is prime.

This page Find_Primo The function finds all prime numbers within the given range.

This page work The function is the target of each thread, to find Primes with a certain range.

This page – powerful The function links to a good threaded search:

  • It divides the total distance from the number of chunks corresponding to the number of cores the system has (32 in my case).
  • It creates and starts 32 threads, each searching a small part of the range.
  • It waits for all threads to complete.
  • Calculates and prints the total fit time.

Time Effects

Let's take a look at how long it takes to use standard Python.

C:Usersthomaprojectspython-gil>python example1.py
Number of CPU cores: 32
Worker 0 starting
Worker 1 starting
Worker 0 found 6275 primes
Worker 2 starting
Worker 3 starting
Worker 1 found 5459 primes
Worker 4 starting
Worker 2 found 5230 primes
Worker 3 found 5080 primes
...
...
Worker 27 found 4346 primes
Worker 15 starting
Worker 22 found 4439 primes
Worker 30 found 4338 primes
Worker 28 found 4338 primes
Worker 31 found 4304 primes
Worker 11 found 4612 primes
Worker 15 found 4492 primes
Worker 25 found 4346 primes
Worker 26 found 4377 primes
All workers completed in 3.70 seconds

Now, with the Gil-free version:

C:Usersthomaprojectspython-gil>python3.14t example1.py
Number of CPU cores: 32
Worker 0 starting
Worker 1 starting
Worker 2 starting
Worker 3 starting
...
...
Worker 19 found 4430 primes
Worker 29 found 4345 primes
Worker 30 found 4338 primes
Worker 18 found 4520 primes
Worker 26 found 4377 primes
Worker 27 found 4346 primes
Worker 22 found 4439 primes
Worker 23 found 4403 primes
Worker 31 found 4304 primes
Worker 28 found 4338 primes
All workers completed in 0.35 seconds

That's an impressive start. 10x improvement in boot time.

Example 2 – read multiple files at once.

In this example, we will use the Sonveloprent.Futures model to read multiple text files simultaneously and count and display the number of lines and words for each.

Before we can do that, we need data files to process. You can use the following Python code to do that. Generates 1,000,000 random sentences, random, lists 20 different files, sentences_01.txt, sentences etc.

import os
import random
import time

# --- Configuration ---
NUM_FILES = 20
SENTENCES_PER_FILE = 1_000_000
WORDS_PER_SENTENCE_MIN = 8
WORDS_PER_SENTENCE_MAX = 20
OUTPUT_DIR = "fake_sentences" # Directory to save the files

# --- 1. Generate a pool of words ---
# Using a small list of common words for variety.
# In a real scenario, you might load a much larger dictionary.
word_pool = [
    "the", "be", "to", "of", "and", "a", "in", "that", "have", "i",
    "it", "for", "not", "on", "with", "he", "as", "you", "do", "at",
    "this", "but", "his", "by", "from", "they", "we", "say", "her", "she",
    "or", "an", "will", "my", "one", "all", "would", "there", "their", "what",
    "so", "up", "out", "if", "about", "who", "get", "which", "go", "me",
    "when", "make", "can", "like", "time", "no", "just", "him", "know", "take",
    "people", "into", "year", "your", "good", "some", "could", "them", "see", "other",
    "than", "then", "now", "look", "only", "come", "its", "over", "think", "also",
    "back", "after", "use", "two", "how", "our", "work", "first", "well", "way",
    "even", "new", "want", "because", "any", "these", "give", "day", "most", "us",
    "apple", "banana", "car", "house", "computer", "phone", "coffee", "water", "sky", "tree",
    "happy", "sad", "big", "small", "fast", "slow", "red", "blue", "green", "yellow"
]

# Ensure output directory exists
os.makedirs(OUTPUT_DIR, exist_ok=True)

print(f"Starting to generate {NUM_FILES} files, each with {SENTENCES_PER_FILE:,} sentences.")
print(f"Total sentences to generate: {NUM_FILES * SENTENCES_PER_FILE:,}")
start_time = time.time()

for file_idx in range(NUM_FILES):
    file_name = os.path.join(OUTPUT_DIR, f"sentences_{file_idx + 1:02d}.txt")
    
    print(f"nGenerating and writing to {file_name}...")
    file_start_time = time.time()
    
    with open(file_name, 'w', encoding='utf-8') as f:
        for sentence_idx in range(SENTENCES_PER_FILE):
            # 2. Construct fake sentences
            num_words = random.randint(WORDS_PER_SENTENCE_MIN, WORDS_PER_SENTENCE_MAX)
            
            # Randomly pick words
            sentence_words = random.choices(word_pool, k=num_words)
            
            # Join words, capitalize first, add a period
            sentence = " ".join(sentence_words).capitalize() + ".n"
            
            # 3. Write to file
            f.write(sentence)
            
            # Optional: Print progress for large files
            if (sentence_idx + 1) % 100_000 == 0:
                print(f"  {sentence_idx + 1:,} sentences written to {file_name}...")
                
    file_end_time = time.time()
    print(f"Finished {file_name} in {file_end_time - file_start_time:.2f} seconds.")

total_end_time = time.time()
print(f"nAll files generated! Total time: {total_end_time - start_time:.2f} seconds.")
print(f"Files saved in the '{OUTPUT_DIR}' directory.")

Here is what the initial display_01.txt looks like,

New then coffee have who banana his their how year also there i take.
Phone go or with over who one at phone there on will.
With or how my us him our sad as do be take well way with green small these.
Not from the two that so good slow new.
See look water me do new work new into on which be tree how an would out sad.
By be into then work into we they sky slow that all who also.
Come use would have back from as after in back he give there red also first see.
Only come so well big into some my into time its banana for come or what work.
How only coffee out way to just tree when by there for computer work people sky by this into.
Than say out on it how she apple computer us well then sky sky day by other after not.
You happy know a slow for for happy then also with apple think look go when.
As who for than two we up any can banana at.
Coffee a up of up these green small this us give we.
These we do because how know me computer banana back phone way time in what.

OK, now how long does it take to read those files. Here is the code we will test. It simply reads each file, counts the lines and names, and outputs the results.

import concurrent.futures
import os
import time

def process_file(filename):
    """
    Process a single file, returning its line count and word count.
    """
    try:
        with open(filename, 'r') as file:
            content = file.read()
            lines = content.split('n')
            words = content.split()
            return filename, len(lines), len(words)
    except Exception as e:
        return filename, -1, -1  # Return -1 for both counts if there's an error

def main():
    start_time = time.time()  # Start the timer

    # List to hold our files
    files = [f"./data/sentences_{i:02d}.txt" for i in range(1, 21)]  # Assumes 20 files named file_1.txt to file_20.txt

    # Use a ThreadPoolExecutor to process files in parallel
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        # Submit all file processing tasks
        future_to_file = {executor.submit(process_file, file): file for file in files}

        # Process results as they complete
        for future in concurrent.futures.as_completed(future_to_file):
            file = future_to_file[future]
            try:
                filename, line_count, word_count = future.result()
                if line_count == -1:
                    print(f"Error processing {filename}")
                else:
                    print(f"{filename}: {line_count} lines, {word_count} words")
            except Exception as exc:
                print(f'{file} generated an exception: {exc}')

    end_time = time.time()  # End the timer
    print(f"Total execution time: {end_time - start_time:.2f} seconds")

if __name__ == "__main__":
    main()

Time Effects

Standard Python first.

C:Usersthomaprojectspython-gil>python example2.py

./data/sentences_09.txt: 1000001 lines, 14003319 words
./data/sentences_01.txt: 1000001 lines, 13999989 words
./data/sentences_05.txt: 1000001 lines, 13998447 words
./data/sentences_07.txt: 1000001 lines, 14004961 words
./data/sentences_02.txt: 1000001 lines, 14009745 words
./data/sentences_10.txt: 1000001 lines, 14000166 words
./data/sentences_06.txt: 1000001 lines, 13995223 words
./data/sentences_04.txt: 1000001 lines, 14005683 words
./data/sentences_03.txt: 1000001 lines, 14004290 words
./data/sentences_12.txt: 1000001 lines, 13997193 words
./data/sentences_08.txt: 1000001 lines, 13995506 words
./data/sentences_15.txt: 1000001 lines, 13998555 words
./data/sentences_11.txt: 1000001 lines, 14001299 words
./data/sentences_14.txt: 1000001 lines, 13998347 words
./data/sentences_13.txt: 1000001 lines, 13998035 words
./data/sentences_19.txt: 1000001 lines, 13999642 words
./data/sentences_20.txt: 1000001 lines, 14001696 words
./data/sentences_17.txt: 1000001 lines, 14000184 words
./data/sentences_18.txt: 1000001 lines, 13999968 words
./data/sentences_16.txt: 1000001 lines, 14000771 words
Total execution time: 18.77 seconds

Now for the non-guilty version

C:Usersthomaprojectspython-gil>python3.14t example2.py

./data/sentences_02.txt: 1000001 lines, 14009745 words
./data/sentences_03.txt: 1000001 lines, 14004290 words
./data/sentences_08.txt: 1000001 lines, 13995506 words
./data/sentences_07.txt: 1000001 lines, 14004961 words
./data/sentences_04.txt: 1000001 lines, 14005683 words
./data/sentences_05.txt: 1000001 lines, 13998447 words
./data/sentences_01.txt: 1000001 lines, 13999989 words
./data/sentences_10.txt: 1000001 lines, 14000166 words
./data/sentences_06.txt: 1000001 lines, 13995223 words
./data/sentences_09.txt: 1000001 lines, 14003319 words
./data/sentences_12.txt: 1000001 lines, 13997193 words
./data/sentences_11.txt: 1000001 lines, 14001299 words
./data/sentences_18.txt: 1000001 lines, 13999968 words
./data/sentences_14.txt: 1000001 lines, 13998347 words
./data/sentences_13.txt: 1000001 lines, 13998035 words
./data/sentences_16.txt: 1000001 lines, 14000771 words
./data/sentences_19.txt: 1000001 lines, 13999642 words
./data/sentences_15.txt: 1000001 lines, 13998555 words
./data/sentences_17.txt: 1000001 lines, 14000184 words
./data/sentences_20.txt: 1000001 lines, 14001696 words
Total execution time: 5.13 seconds

Not quite as impressive as our first example, but still very good, showing more than 3x improvement.

Example 3 – Matrix multiplication

We will use the the festival module in this case. Here is the code that will work.

import threading
import time
import os

def multiply_matrices(A, B, result, start_row, end_row):
    """Multiply a submatrix of A and B and store the result in the corresponding submatrix of result."""
    for i in range(start_row, end_row):
        for j in range(len(B[0])):
            sum_val = 0
            for k in range(len(B)):
                sum_val += A[i][k] * B[k][j]
            result[i][j] = sum_val

def main():
    """Main function to coordinate the multi-threaded matrix multiplication."""
    start_time = time.time()

    # Define the size of the matrices
    size = 1000
    A = [[1 for _ in range(size)] for _ in range(size)]
    B = [[1 for _ in range(size)] for _ in range(size)]
    result = [[0 for _ in range(size)] for _ in range(size)]

    # Get the number of CPU cores to decide on the number of threads
    num_threads = os.cpu_count()
    print(f"Number of CPU cores: {num_threads}")

    chunk_size = size // num_threads

    threads = []
    # Create and start threads
    for i in range(num_threads):
        start_row = i * chunk_size
        end_row = size if i == num_threads - 1 else (i + 1) * chunk_size
        thread = threading.Thread(target=multiply_matrices, args=(A, B, result, start_row, end_row))
        threads.append(thread)
        thread.start()

    # Wait for all threads to complete
    for thread in threads:
        thread.join()

    end_time = time.time()

    # Just print a small corner to verify
    print("Top-left 5x5 corner of the result matrix:")
    for r_idx in range(5):
        print(result[r_idx][:5])

    print(f"Total execution time (matrix multiplication): {end_time - start_time:.2f} seconds")

if __name__ == "__main__":
    main()

The code performs matrix multiplication of two 1000 × 1000 matrices in parallel using multiple CPU cores. It divides the result matrix into chunks, assigns each chunk to a different process (equal to the number of CPU cores), and each process calculates its assigned part of the matrix multiplication. Finally, it waits for all the processes to finish and reports the time of the execution case, which shows how you can access multiprocessing tasks to speed up CPU-Bonder tasks.

Time Effects

Standard Python:

C:Usersthomaprojectspython-gil>python example3.py
Number of CPU cores: 32
Top-left 5x5 corner of the result matrix:
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
Total execution time (matrix multiplication): 43.95 seconds

Gil-Free Python:

C:Usersthomaprojectspython-gil>python3.14t example3.py
Number of CPU cores: 32
Top-left 5x5 corner of the result matrix:
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
Total execution time (matrix multiplication): 4.56 seconds

Once again, we get a 10x improvement using gil-free Python. Not too shabby.

Gil-Free is not always better.

An interesting point to note is that in this final test, I also tried with a different version of the code. It turned out that standard python was much faster (28%) than gil's free python. I will not deliver the code, just the results,

Time

Standard Python first (multiprocessing).

C:Usersthomaprojectspython-gil>python example4.py
Number of CPU cores: 32
Top-left 5x5 corner of the result matrix:
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
Total execution time (matrix multiplication): 4.49 seconds

Gil-Free Version (Multiprocessing)

C:Usersthomaprojectspython-gil>python3.14t example4.py
Number of CPU cores: 32
Top-left 5x5 corner of the result matrix:
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
[1000, 1000, 1000, 1000, 1000]
Total execution time (matrix multiplication): 6.29 seconds

As always in these situations, it is important to practice properly.

Remember that these are the last examples they are just Test to show the difference between gil and gil-free python. Using an external library, such as Numpy, doing matrix multiplication will be at least an order of magnitude faster than base.

Another point to note if you decide to use free python for your projects is that not all third party libraries you may want to use are compatible with it. The list of incompatible books is smaller and smaller with each release, but it's something to keep in mind. To view a list of these, please click on the link below.

To put it briefly

In this article, we discuss a possible high-level feature of the possible Python 3.14: The introduction of a “Free-Threaded” version, which removes the global interpreter (Gil). Gil is a standard python routine that simplifies memory management by ensuring only one thread executes Python ByteCode at a time. While acknowledging that this can be useful in some cases, it prevents true processing on various CPUs for CPU-wide tasks.

The removal of gil in a free-standing structure is mainly aimed at improvement to work. This can be very helpful for data scientists and machine learning engineers whose work often involves CPU-bound tasks, such as model training and training data. This change allows Python code to use all available CPU Cores simultaneously within a single process, resulting in significant speed improvements.

To illustrate the impact, the article presents several practical comparisons:

  • Finding prime numbers: A multi-twin text saw a surprise 10x performance increasedropping the casting time from 3.70 seconds in standard python to 0.35 seconds in the gil-free version.
  • Reading multiple files at once: The I/O-Off function uses a thread pool to process large text files that have run out 3 times very quicklyCompleting in 5.13 seconds compared to 18.77 seconds with a standard interpreter.
  • Matrix multiplication: A very detailed multi-color matrix multiplication code also meets it 10x SpeedUpwith the Gil-free version finishing in 4.56 seconds, compared to 43.95 seconds for the standard version.

However, I also explained that the Gil-free version is not a panacea for Python Code Development. Surprisingly, the MultiProcessing version of the matrix multiplication code ran faster with standard python (4.49 seconds) than the gil-free version (6.29 seconds). This highlights the importance of testing and monitoring certain applications, as passing process management in the Gil-free version is one of the few times it can negate its benefits.

I also mentioned the caveat that not all third-party Python libraries are compatible with gil's free python and provided a URL where you can view a list of incompatible libraries.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button