Machine Learning

How to call Rust from Python

Python is fast enough — especially if you depend on NumPy, Polars, or other well-tuned libraries written in compiled languages ​​like C. But over time, you end up with a hot loop that won't vectorize: maybe you're walking through a list of strings to clean them up, or you're parsing dirty text where each character is important. You profile, verify the hacker, and stare at a loop that eats up half of your working time. This is when Rust shines.

Rust gives you expected performance, tight control over memory, and fearless consistency, without the hassle of manual memory management. If you're thinking — it's not another language to learn!, the good news is that you don't need to leave Python to use Rust. You can keep your orchestration, your notebooks, your tests – and move only the boring inner loops to Rust. This keeps the learning curve for Rust very small.

In this article, I will show how to call Rust from Python and compare the performance differences between using pure Python and a combination of Python / Rust. This is not going to be a tutorial on Rust programming, as I assume you know at least the basics of that.

Why bother?

Now, you might be thinking: if I know Rust, why would I bother integrating it with Python—it's just a program in Rust, right?

Well, first, I would say that knowing Rust does not automatically make it the best language for your entire application. For many applications, eg, ML, AI, scripting and web backends, etc, Python has become the language of choice.

Second, most codes are not practical. For those existing components, you usually need a very small set of Rust to make a real difference, so a little Rust knowledge goes a long way.

Finally, the Python ecosystem is hard to change. Even if you're familiar with Rust, Python gives you quick access to tools like these:

  • the pandas
  • NumPy
  • scikit-learn
  • Jupyter
  • Air flow
  • FastAPI tooling
  • a large number of scripting and automation libraries

Rust can be quick, but Python often wins in ecosystem accessibility and ease of development.

Hopefully, I've done enough to convince you to give combining Rust and Python a chance. That being said, let's get started.

Rust and Maturin

For our use cases, we need two things: Rust and a tool called maturin.

Most of you will know about Rust. A fast-paced language that has come to prominence in recent years. You may not have heard maturin, although.

Maturin is basically a tool for building and packaging Python extensions written in Rust (using PyO3 or rust-cpython). It helps us to do the following:

Build your Rust code into a Python module

  • It takes your Rust crate and compiles it into a shared library (.pyd on Windows, .so on Linux, .dylib on macOS) that Python can import.
  • It automatically sets the correct batch flags for release/modification and the Python version you are targeting.
  • It works with the PyO3 extension module feature, so Python can import the compiled library as a standard module.

Includes distribution wheels

  • Wheels are .whl files that you upload to PyPI (precompiled best).
  • Maturin supports building wheels manylinux, macOSagain Windows which work on all Python versions and platforms.
  • It compiles when needed, or runs inside a Docker image to satisfy PyPI's “manylinux” rules.

It is published on PyPI

  • With a single command, Maturin can build your Rust extension and load it.
  • It handles information, metadata, and field tags automatically.

Includes Rust in Python installation

  • Maturin produces ia pyproject.toml that defines your project so Python tools like pip know how to build it.
  • Support PEP 517, so installing pip works even if the user doesn't have maturin installed.
  • It works seamlessly with setuptools when you combine Python and Rust code into a single package.

OK, enough with the theory, let's start writing, running, and timing some code samples.

Setting up the development environment

As usual, we will set up a separate development environment to do our work. That way, our work won't interfere with any other projects we may have on the go. I'm using the UV tool for this, and I'm using WSL2 Ubuntu for Windows as my operating system.

$ uv init pyrust
$ cd pyrust
$ uv venv pyrust
$ source pyrust/bin/activate
(pyrust) $

It causes corrosion

Now we can install Rust with this simple command.

(pyrust) $ curl --proto '=https' --tlsv1.2 -sSf  | sh

Finally, 3 options will be displayed on your screen like this.

Welcome to Rust!

This will download and install the official compiler for the Rust
programming language, and its package manager, Cargo.
...
...
...

1) Proceed with standard installation (default - just press enter)
2) Customize installation
3) Cancel installation

Press 1, then press Enter when prompted for input options if you'd like to use the default options. To verify that Rust is installed correctly, run the following command.

(pyrust) $ rustc --version

rustc 1.89.0 (29483883e 2025-08-04)

Example 1 – Hello World equivalent

Let's start with a simple example of calling Rust from Python. Create a new subfolder and add these three files.

Cargo.toml

[package]
name = "hello_rust"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["cdylib"]

[dependencies]
pyo3 = { version = "0.25", features = ["extension-module"] }

pyproject.toml

[build-system]
requires = ["maturin>=1.5,<2"]
build-backend = "maturin"

[project]
name = "hello_rust"
version = "0.1.0"
requires-python = ">=3.9"

Finally, our Rust source file goes into a subfolder src/lib.rs

use pyo3::prelude::*;

/// A simple function we’ll expose to Python
#[pyfunction]
fn greet(name: &str) -> PyResult {
Ok(format!("Hello, {} from Rust!", name))
}

/// The module definition
#[pymodule]
fn hello_rust(_py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(greet, m)?)?;
Ok(())
}

Now run it with…

(pyrust) $ python -c "import hello_rust as hr; print(hr.greet('world'))"


# Output
Hello, world from Rust!

We put our Rust code in src/lib.rs to follow the principle that Rust library code goes there, instead of in src/main.rs, which is reserved for standalone Rust executable code.

Maturin + PyO3 looks inside src/lib.rs for #[pymodule] function, which registers your Rust functions for Python to call.

Example 2 — Python Loops vs Rust Loops

Consider something that is intentionally unusual but must: you have a list of sentences and you need to normalize them. By general, I mean converting them to a standard, consistent form before further processing.

Let's say we want to reduce everything, discard punctuation, and split into tokens. This is difficult to vectorize well because the branches are logical for every character.

In pure Python, you can write this:-

# ------------------------
# Python baseline
# ------------------------
def process_one_py(text: str) -> list[str]:
    word = []
    out = []

    for c in text:
        if c.isalnum():
            word.append(c.lower())
        else:
            if word:
                out.append("".join(word))
                word = []

    if word:
        out.append("".join(word))

    return out

# Run the above for many inputs
def batch_process_py(texts: list[str]) -> list[list[str]]:
    return [process_one_py




So, for example,

(pyrust) $ batch_process_py["Hello, World! 123", "This is a test"]

I'll be back,

[['hello', 'world', '123'], ['this', 'is', 'a', 'test']]

This is what Rust might look like,

/// src/lib.rs

use pyo3::prelude::*;
use pyo3::wrap_pyfunction;

/// Process one string: lowercase + drop punctuation + split on whitespace
fn process_one(text: &str) -> Vec {
    let mut out = Vec::new();
    let mut word = String::new();

    for c in text.chars() {
        if c.is_alphanumeric() {
            word.push(c.to_ascii_lowercase());
        } else if c.is_whitespace() {
            if !word.is_empty() {
                out.push(std::mem::take(&mut word));
            }
        }
        // ignore punctuation entirely
    }

    if !word.is_empty() {
        out.push(word);
    }
    out
}

#[pyfunction]
fn batch_process(texts: Vec) -> PyResult>> {
    Ok(texts.iter().map(|t| process_one

#[pymodule]
fn rust_text(_py: Python<'_>, m: &Bound) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(batch_process, m)?)?;
    Ok(())
}

Okay, let's run these two programs with a large input (500,000 documents) and see what the difference in runtime is. For that, I wrote a benchmark Python script as follows.

from time import perf_counter
from statistics import median
import random
import string
import rust_text  # the compiled extension

# ------------------------
# Python baseline
# ------------------------
def process_one_py(text: str) -> list[str]:
    word = []
    out = []
    for c in text:
        if c.isalnum():
            word.append(c.lower())
        elif c.isspace():
            if word:
                out.append("".join(word))
                word = []
        # ignore punctuation
    if word:
        out.append("".join(word))
    return out

def batch_process_py(texts: list[str]) -> list[list[str]]:
    return [process_one_py

# ------------------------
# Synthetic data
# ------------------------
def make_texts(n=500_000, vocab=10_000, mean_len=40):
    words = ["".join(random.choices(string.ascii_lowercase, k=5)) for _ in range(vocab)]
    texts = []
    for _ in range(n):
        L = max(3, int(random.expovariate(1/mean_len)))
        texts.append(" ".join(random.choice(words) for _ in range(L)))
    return texts

texts = make_texts()

# ------------------------
# Timing helper
# ------------------------
def timeit(fn, *args, repeat=5):
    runs = []
    for _ in range(repeat):
        t0 = perf_counter()
        fn(*args)
        t1 = perf_counter()
        runs.append(t1 - t0)
    return median(runs)

# ------------------------
# Run benchmarks
# ------------------------
py_time = timeit(batch_process_py, texts)
rust_time = timeit(rust_text.batch_process, texts)

n = len(texts)
print("n--- Benchmark ---")
print(f"Python     median: {py_time:.3f} s | throughput: {n/py_time:,.0f} texts/s")
print(f"Rust 1-thread median: {rust_time:.3f} s | throughput: {n/rust_time:,.0f} texts/s")

As before, we need to compile our Rust code so Python can import it. In the previous example, maturin was indirectly used as the backend for building with pyproject.toml. Here, we call it directly from the command line:

(pyrust) $ maturin develop --release

And now we can simply run our code like this.

(pyrust) $ python benchmark.py

--- Benchmark ---
Python     median: 5.159 s | throughput: 96,919 texts/s
Rust 1-thread median: 3.024 s | throughput: 165,343 texts/s

That was a reasonable speed without much effort. There is one more thing we can use to further reduce the running time.

Rust has access to the same library called Rayon, which makes it easy to distribute the code across multiple CPU cores. In short, Rayon…

  • Let's replace the sequential iterators ()) and the same multiples (par_iter()).
  • It automatically breaks your data into chunks, distributes the work across CPU threads, and aggregates the results.
  • It removes the complexity of thread management and synchronization

Example 3 - Adding parallelism to our existing Rust code

This is straightforward. Looking at the Rust code in the previous example, we only need to make the following three small changes (marked with comments below).

/// src/lib.rs
use pyo3::prelude::*;
use pyo3::wrap_pyfunction;

/// Add this line - Change 1
use rayon::prelude::*; 

/// Process one string: lowercase + drop punctuation + split on whitespace
fn process_one(text: &str) -> Vec {
    let mut out = Vec::new();
    let mut word = String::new();

    for c in text.chars() {
        if c.is_alphanumeric() {
            word.push(c.to_ascii_lowercase());
        } else if c.is_whitespace() {
            if !word.is_empty() {
                out.push(std::mem::take(&mut word));
            }
        }
        // ignore punctuation entirely
    }

    if !word.is_empty() {
        out.push(word);
    }
    out
}

#[pyfunction]
fn batch_process(texts: Vec) -> PyResult>> {
    Ok(texts.iter().map(|t| process_one
}

/// Add this function - change 2
#[pyfunction]
fn batch_process_parallel(texts: Vec) -> PyResult>> {
    Ok(texts.par_iter().map(|t| process_one
}

#[pymodule]
fn rust_text(_py: Python<'_>, m: &Bound) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(batch_process, m)?)?;
    // Add this line - change 3
    m.add_function(wrap_pyfunction!(batch_process_parallel, m)?)?;
    Ok(())
}

In our Python benchmark code, we only need to add a call to the same Rust code and print the new results.

...
...

# ------------------------
# Run amended benchmarks
# ------------------------
py_time = timeit(batch_process_py, texts)
rust_time = timeit(rust_text.batch_process, texts)
rust_par_time = timeit(rust_text.batch_process_parallel, texts)

n = len(texts)
print("n--- Benchmark ---")
print(f"Python     median: {py_time:.3f} s | throughput: {n/py_time:,.0f} texts/s")
print(f"Rust 1-thread median: {rust_time:.3f} s | throughput: {n/rust_time:,.0f} texts/s")
print(f"Rust Rayon median:   {rust_par_time:.3f} s | throughput: {n/rust_par_time:,.0f} texts/s")

Here are the results of running the modified benchmark.

--- Benchmark ---
Python median: 5.171 s | throughput: 96,694 texts/s
Rust 1-thread median: 3.091 s | throughput: 161,755 texts/s
Rust Rayon median: 2.223 s | throughput: 224,914 texts/s

The parallel Corrosion code shaved about 27% off the parallel Corrosion time and was more than twice as fast as the plain Python code. Not too shabby.

Summary

Python is usually fast enough for most tasks. But if profiling shows a slow area that can't be expressed and really affects your runtime, you don't have to stop using Python or rewrite your entire project. Instead, you can move important parts of the functionality to Rust and leave the rest of your code as is.

With PyO3 and maturin, you can compile Rust code into a Python module that works well with your existing libraries. This allows you to keep most of your Python code, tests, packaging, and workflows, while getting the speed, memory safety, and compatibility benefits of Rust where you need them most.

The simple examples and benchmarks here show that rewriting just a small part of your code in Rust can make Python significantly faster. Adding Rayon compatibility improves performance even more, with few code changes and no complicated tools. This is an efficient and easy way to speed up your Python workflow without switching your entire project to Rust.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button