Launch your Python Code until 80x immediately using a Cython library

The best language of quick exercise and code development, but one thing I do often feel that people say that it is drinking. This is a certain amount of pain of data scientist and ml engineers, because they usually do wide tasks, such as multiplication matrix, photos or photo processing.
In time, Python appears inside other issues to inform new features in the language, such as various rewriting or rewriting existing advanced operations. However, Python's use of Global Inxrute Lock Lock (Gil) is usually like efforts like this.
Many foreign libraries are also recorded to close this visible app between Python and mixed languages such as Java. Perhaps most widely used and well-known for this Destruction the library. Done in C, Inuunpy designed from the ground until most CPES cups and speed and aray function.
There are other methods of profits, and in the latest TDS article, I launch rier The Library, which is, in many cases, is outside Nompalk. If you are interested in learning a lot, I will include a link to that matter at the end of this article.
Another external library that works best Mune. The Numba uses a lot of time (JIT) Syython, translates the Subset Python and Numpy Code in the immediate machine at work. It is designed to speed up the functions of the number and science price for leversm (virtual-level machine) compiler infrastructure.
In this article, I would like to talk sometime for Runtime – improving the foreign library, Cython. It is one of the most done Python libraries but one of the things are understood and used. I think this is at least part because you have to get your hands dirty and make some changes to your real code. But if you follow a simple four-step system I will reveal below, working benefits you can achieve will do more than appropriate.
What is Cythony?
If you have never heard of Cython, it is the power of Python designed to provide C-like performance electronically written mainly in Newthon. Allows Change Python Code to Code C Code, which can be compiled into shared libraries that can not be introduced in Python as typical Python modules. This process leads to C tests for C while storing Python readings.
I will show specific benefits that you can achieve by changing your code to use Cythons, check the three charges for using and provide the four steps needed to modify your existing Python code.
Setting up the environmental environment
Before continuing, we must place a different development environment so that we can keep our projects dependent. I will be using WSL2 Windows and Jobyter NeTebook of Code Development. I use UV package to set up my development environment, but feel free to use any tools and ways.
$ uv init cython-test
$ cd cython-test
$ uv venv
$ source .venv/bin/activate
(cython-test) $ uv pip install cython jupyter numpy pillow matplotlib
Nowtype 'JOYTER NETEBOOK' in your order. You must see the open writing brochure in your browser. If that does not automatically, you will see the interpretation of information after using the Jobyter Nezeook command. Toward the end of this, there will be a URL to copy and paste on your browser to start the Jobyter booklet.
Your URL will be different with mine, but you should look like such a thing: –
Example 1 – To speed up loops
Before we start using Cython, let's start with the general work of Python and how long it takes to run. This will be our Benchmark basis.
We will choose a simple loop work that lasts for a few seconds to work, and then use Cython to leave it and measure the difference at the time of starting between the two methods.
Here is our basic python code code.
# sum_of_squares.py
import timeit
# Define the standard Python function
def slow_sum_of_squares(n):
total = 0
for i in range(n):
for j in range(n):
total += i * i + j * j
return total
# Benchmark the Python function
print("Python function execution time:")
print("timeit:", timeit.timeit(
lambda: slow_sum_of_squares(20000),
number=1))
In my system, the above code produces the next result.
Python function execution time:
13.135973724005453
Let's see how Cython is done in it.
The four steps plan for effective Cython.
Using Cython to grow your code for run time to Jobyter NeTebook is a simple 4 step.
Don't worry if you are not a non-notebook user, as I will show how you can change the typical Python .py files to use Cython later.
1 / In the first cell of your textbook, Load Cython extension by typing the command.
%load_ext Cython
2 / of any of the following cells contain the Python code you wish to use using Cython, add % Cython Magic command before code. For example,
%%cython
def myfunction():
etc ...
...
3 / Definitions of work containing parameters should be properly changed.
4 / Finally, all variables should be properly changed using keyboard Identify. Also, where logical, use tasks from the CO library from Libc.Srdlib Point
Taking our first Python code for example, this is what I need to be ready to run on a handbag using cython after using all four steps above.
%%cython
def fast_sum_of_squares(int n):
cdef int total = 0
cdef int i, j
for i in range(n):
for j in range(n):
total += i * i + j * j
return total
import timeit
print("Cython function execution time:")
print("timeit:", timeit.timeit(
lambda: fast_sum_of_squares(20000),
number=1))
As I hope you see, the fact to change your code is much easier than four steps of the needed process can suggest.
Time to chase the above code that impressed them. In my system, this new Cython code produces the next result.
Cython function execution time:
0.15829777799808653
That is 80x-up acceleration.
Example 2 – Count the PI using Monte Carlo
Our second example, we will examine the most complex use of the use of the basis, its basis for many real land applications.
A place where Cython can show the advancement of the main functionality to be implemented in pricing, especially those involving strong integration, such as Mont Carlo (MC) measurements. Monte Carlo Circumstances Include multilingual Derations of the Random Procedure to balance program structures. The MC works with many studies in study, including the weather and the science of the country, the graphics of the computer, financial and the Aunitative. It is probably a very broad process.
For example, we will use the carlo in a simple way to calculate the PI value. This is a recognized example where we take the square in the length of the one unit and write a piece of quarter within it with a single unit radius, as shown here.
Average quarter of a quarter to the square area, obviously, (Pi / 4).
Therefore, if we look for more random (x, y) points that they lie inside or at the price of their points.
Here is another standard Python code that you can use to model this.
import random
import time
def monte_carlo_pi(num_samples):
inside_circle = 0
for _ in range(num_samples):
x = random.uniform(0, 1)
y = random.uniform(0, 1)
if (x**2) + (y**2) <= 1:
inside_circle += 1
return (inside_circle / num_samples) * 4
# Benchmark the standard Python function
num_samples = 100000000
start_time = time.time()
pi_estimate = monte_carlo_pi(num_samples)
end_time = time.time()
print(f"Estimated Pi (Python): {pi_estimate}")
print(f"Execution Time (Python): {end_time - start_time} seconds")
Running this pointed the following time effect.
Estimated Pi (Python): 3.14197216
Execution Time (Python): 20.67279839515686 seconds
Now, here is Cython implementation we receive in accordance with our four-step process.
%%cython
import cython
import random
from libc.stdlib cimport rand, RAND_MAX
@cython.boundscheck(False)
@cython.wraparound(False)
def monte_carlo_pi(int num_samples):
cdef int inside_circle = 0
cdef int i
cdef double x, y
for i in range(num_samples):
x = rand() / RAND_MAX
y = rand() / RAND_MAX
if (x**2) + (y**2) <= 1:
inside_circle += 1
return (inside_circle / num_samples) * 4
import time
num_samples = 100000000
# Benchmark the Cython function
start_time = time.time()
pi_estimate = monte_carlo_pi(num_samples)
end_time = time.time()
print(f"Estimated Pi (Cython): {pi_estimate}")
print(f"Execution Time (Cython): {end_time - start_time} seconds")
And here is a new result.
Estimated Pi (Cython): 3.1415012
Execution Time (Cython): 1.9987852573394775 seconds
Again, that's a beautiful 10x speed of Cython.
One thing we made in this example of the code we did not import some of the external libraries from the C Brand. That was a line,
from libc.stdlib cimport rand, RAND_MAX
This page inspp The command is a Cython keyword that is used to import C, variable, Constant, and types. We have used the import prepared for Random.uniform language versions () Python activities.
Example 3- Decision of the image
Our final example, we will make the illusion of the picture. A specific example of the image, which is the general performance in images. There are many charges for the images. We will use it to get on the bright image shown below.

First of all, here is the standard Python code.
from PIL import Image
import numpy as np
from scipy.signal import convolve2d
import time
import os
import matplotlib.pyplot as plt
def sharpen_image_color(image):
# Start timing
start_time = time.time()
# Convert image to RGB in case it's not already
image = image.convert('RGB')
# Define a sharpening kernel
kernel = np.array([[0, -1, 0],
[-1, 5, -1],
[0, -1, 0]])
# Convert image to numpy array
image_array = np.array(image)
# Debugging: Check input values
print("Input array values: Min =", image_array.min(), "Max =", image_array.max())
# Prepare an empty array for the sharpened image
sharpened_array = np.zeros_like(image_array)
# Apply the convolution kernel to each channel (assuming RGB image)
for i in range(3):
channel = image_array[:, :, i]
# Perform convolution
convolved_channel = convolve2d(channel, kernel, mode='same', boundary='wrap')
# Clip values to be in the range [0, 255]
convolved_channel = np.clip(convolved_channel, 0, 255)
# Store back in the sharpened array
sharpened_array[:, :, i] = convolved_channel.astype(np.uint8)
# Debugging: Check output values
print("Sharpened array values: Min =", sharpened_array.min(), "Max =", sharpened_array.max())
# Convert array back to image
sharpened_image = Image.fromarray(sharpened_array)
# End timing
duration = time.time() - start_time
print(f"Processing time: {duration:.4f} seconds")
return sharpened_image
# Correct path for WSL2 accessing Windows filesystem
image_path = '/mnt/d/images/taj_mahal.png'
image = Image.open(image_path)
# Sharpen the image
sharpened_image = sharpen_image_color(image)
if sharpened_image:
# Show using PIL's built-in show method (for debugging)
#sharpened_image.show(title="Sharpened Image (PIL Show)")
# Display the original and sharpened images using Matplotlib
fig, axs = plt.subplots(1, 2, figsize=(15, 7))
# Original image
axs[0].imshow(image)
axs[0].set_title("Original Image")
axs[0].axis('off')
# Sharpened image
axs[1].imshow(sharpened_image)
axs[1].set_title("Sharpened Image")
axs[1].axis('off')
# Show both images side by side
plt.show()
else:
print("Failed to generate sharpened image.")
The outgoing of this.
Input array values: Min = 0 Max = 255
Sharpened array values: Min = 0 Max = 255
Processing time: 0.1034 seconds

Let's see that Cython can strike that time to run 0.1034 seconds.
%%cython
# cython: language_level=3
# distutils: define_macros=NPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION
import numpy as np
cimport numpy as np
import cython
@cython.boundscheck(False)
@cython.wraparound(False)
def sharpen_image_cython(np.ndarray[np.uint8_t, ndim=3] image_array):
# Define sharpening kernel
cdef int kernel[3][3]
kernel[0][0] = 0
kernel[0][1] = -1
kernel[0][2] = 0
kernel[1][0] = -1
kernel[1][1] = 5
kernel[1][2] = -1
kernel[2][0] = 0
kernel[2][1] = -1
kernel[2][2] = 0
# Declare variables outside of loops
cdef int height = image_array.shape[0]
cdef int width = image_array.shape[1]
cdef int channel, i, j, ki, kj
cdef int value
# Prepare an empty array for the sharpened image
cdef np.ndarray[np.uint8_t, ndim=3] sharpened_array = np.zeros_like(image_array)
# Convolve each channel separately
for channel in range(3): # Iterate over RGB channels
for i in range(1, height - 1):
for j in range(1, width - 1):
value = 0 # Reset value at each pixel
# Apply the kernel
for ki in range(-1, 2):
for kj in range(-1, 2):
value += kernel[ki + 1][kj + 1] * image_array[i + ki, j + kj, channel]
# Clip values to be between 0 and 255
sharpened_array[i, j, channel] = min(max(value, 0), 255)
return sharpened_array
# Python part of the code
from PIL import Image
import numpy as np
import time as py_time # Renaming the Python time module to avoid conflict
import matplotlib.pyplot as plt
# Load the input image
image_path = '/mnt/d/images/taj_mahal.png'
image = Image.open(image_path).convert('RGB')
# Convert the image to a NumPy array
image_array = np.array(image)
# Time the sharpening with Cython
start_time = py_time.time()
sharpened_array = sharpen_image_cython(image_array)
cython_time = py_time.time() - start_time
# Convert back to an image for displaying
sharpened_image = Image.fromarray(sharpened_array)
# Display the original and sharpened image
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.imshow(image)
plt.title("Original Image")
plt.subplot(1, 2, 2)
plt.imshow(sharpened_image)
plt.title("Sharpened Image")
plt.show()
# Print the time taken for Cython processing
print(f"Processing time with Cython: {cython_time:.4f} seconds")
Release is,

Both programs were well made, but Cython was about 25 times early.
What about running the nature of the booklet?
To date, everything I have shown thought you used your code within Jobyter Notebook. The reason I have done this that is an easy way to introduce Cython and get a certain code and work immediately. While the NoteBook Environment is very popular among Python Developers, a large Python code is still available for normal .py files run from the terminal using the Pythal command.
If that is your main mode of installing codes and using Python texts, the % to load_ext including % Cython The magician commands of epsython will not work because those are understood only by Jusster / Python.
So, here is how to adapt to my fourth step process when you work your code as a normal Python text.
Let's take the first time Sum_of_squares Example to show this.
1 / Create a .pyx file instead of using% of% Cython
Move your Cython Advanced Code to a word called by name, for example: –
Sum_of_squares.pyx
# sun_of_squares.pyx
def fast_sum_of_squares(int n):
cdef int total = 0
cdef int i, j
for i in range(n):
for j in range(n):
total += i * i + j * j
return total
We all made it deleted% of% Cython Directive and the time code (which would be available in the strike)
2 / Create a Setup File.py to compile your .pyx file
# setup.py
from setuptools import setup
from Cython.Build import cythonize
setup(
name="cython-test",
ext_modules=cythonize("sum_of_squares.pyx", language_level=3),
py_modules=["sum_of_squares"], # Explicitly state the module
zip_safe=False,
)
3 / Launch a setup file .py using this command,
$ python setup.py build_ext --inplace
running build_ext
copying build/lib.linux-x86_64-cpython-311/sum_of_squares.cpython-311-x86_64-linux-g
4 / Create a standard Python module to call our Cython code, as shown below, and run it.
# main.py
import time, timeit
from sum_of_squares import fast_sum_of_squares
start = time.time()
result = fast_sum_of_squares(20000)
print("timeit:", timeit.timeit(
lambda: fast_sum_of_squares(20000),
number=1))
$ python main.py
timeit: 0.14675087109208107
Summary
Hopefully, I convince me for using a cython library in your code. Although it may seem sophisticated at first sight, with little effort, you can find the best advancement in your running times using the normal Python, whether you use instantized libraries such as numpy.
I gave the four-step process to convert your normal Python code to use Cython by working in Jobyter NeBook. In addition, I described the necessary steps to start the Cython code from the command line without the bookstore.
Finally, I strengthened the above for examples showing examples of converting Python standard code to use Cython.
For three examples I showed, we have earned 80x benefits, 10x and 25x speed-ups, which is not very shabby.
As promised, here is a link to my previous TDS article in using the number library to speed up the Python code.