Why Is My Code Running So Fast? Py-Spy Python Profiling Guide

the most frustrating debugging problems in data science code are not syntax errors or logical errors. Instead, they come from code that does exactly what it's supposed to do, but takes its sweet time doing it.
Code that works but doesn't work can be a huge bottleneck in a data science workflow. In this article, I will give a brief introduction and walkthrough py-spyis a powerful tool designed for profiling your Python code. It can pinpoint exactly where your system is spending too much time so inefficiencies can be identified and corrected.
An example of a problem
Let's set up a simple research question to write some code:
“Of all the flights between US states and territories, which departure airport has the longest flights on average?”
Below is a simple Python script to answer this research question, using data obtained from the Bureau of Transportation Statistics (BTS). The dataset contains data from all flights within US states and territories between January and June 2025 with information on origin and destination airports. About 3.5 million rows.
Calculates the Haversine Distance — the shortest distance between two points on a circle — for each plane. Then, it aggregates the results from the airport to get an average distance and reports the top five.
import pandas as pd
import math
import time
def haversine(lat_1, lon_1, lat_2, lon_2):
"""Calculate the Haversine Distance between two latitude and longitude points"""
lat_1_rad = math.radians(lat_1)
lon_1_rad = math.radians(lon_1)
lat_2_rad = math.radians(lat_2)
lon_2_rad = math.radians(lon_2)
delta_lat = lat_2_rad - lat_1_rad
delta_lon = lon_2_rad - lon_1_rad
R = 6371 # Radius of the earth in km
return 2*R*math.asin(math.sqrt(math.sin(delta_lat/2)**2 + math.cos(lat_1_rad)*math.cos(lat_2_rad)*(math.sin(delta_lon/2))**2))
if __name__ == '__main__':
# Load in flight data to a dataframe
flight_data_file = r"./data/2025_flight_data.csv"
flights_df = pd.read_csv(flight_data_file)
# Start timer to see how long analysis takes
start = time.time()
# Calculate the haversine distance between each flight's start and end airport
haversine_dists = []
for i, row in flights_df.iterrows():
haversine_dists.append(haversine(lat_1=row["LATITUDE_ORIGIN"],
lon_1=row["LONGITUDE_ORIGIN"],
lat_2=row["LATITUDE_DEST"],
lon_2=row["LONGITUDE_DEST"]))
flights_df["Distance"] = haversine_dists
# Get result by grouping by origin airport, taking the average flight distance and printing the top 5
result = (
flights_df
.groupby('DISPLAY_AIRPORT_NAME_ORIGIN').agg(avg_dist=('Distance', 'mean'))
.sort_values('avg_dist', ascending=False)
)
print(result.head(5))
# End timer and print analysis time
end = time.time()
print(f"Took {end - start} s")
Running this code gives the following output:
avg_dist
DISPLAY_AIRPORT_NAME_ORIGIN
Pago Pago International 4202.493567
Guam International 3142.363005
Luis Munoz Marin International 2386.141780
Ted Stevens Anchorage International 2246.530036
Daniel K Inouye International 2211.857407
Took 169.8935534954071 s
These results make sense, as the airports listed are in American Samoa, Guam, Puerto Rico, Alaska, and Hawaii, respectively. These are all places outside the contiguous United States where one would expect long average flight distances.
The problem here is not the results – which are valid – but the execution time: about three minutes! While three minutes may be tolerable for a single run, it becomes a productivity killer during development. Think of this as part of a long data line. Every time a parameter is adjusted, an error is corrected, or a cell is restarted, you are forced to remain idle while the program is running. That conflict breaks your flow and turns a quick analysis into an all-afternoon affair.
Now let's see how py-spy it can help us identify which rows are taking so long.
What is Py-Spy?
To understand what py-spy what it does and the benefits of using it, it helps to compare py-spy in Python's built-in profile cProfile.
cProfile: This is a Profile Trackingit works like a stopwatch for each function call. The time between each job call and return is measured and reported. Although very accurate, this adds significant overhead, as the profiler must constantly pause and record data, which can slow down the script significantly.py-spy: This is a Sample Profilewhich works like a high-speed camera that watches the entire system at once.py-spyit lives completely outside of the running Python script and takes high-frequency snapshots of the system's state. It looks at the entire “Call Stack” so we can see exactly which line of code is running and which function it called, right down to the top level.
Running Py-spy
To run py-spy in Python script, i py-spy The library must be installed in the Python environment.
pip install py-spy
Once the py-spy library is installed, our script can be profiled by running the following command in the terminal:
py-spy record -o profile.svg -r 100 -- python main.py
Here's what each part of the command actually does:
py-spy: Calls the tool.record: This is tellingpy-spyto use the “recording” mode, which will constantly monitor the program while it is running and save data.-o profile.svg: This specifies the output file name and format, telling it to output the results as a named SVG fileprofile.svg.-r 100: This specifies the sampling rate, set to 100 times per second. This means thatpy-spyit will check what the program is doing 100 times per second.--: This divides ipy-spycommand from the Python script command. It is tellingpy-spythat everything after this flag is a command to execute, not its argumentspy-spyitself.python main.py: This is the command to run the Python script to be profiledpy-spyin this active mattermain.py.
Be careful: If running on Linux, sudo Rights are usually a requirement to run py-spyfor security reasons.
After this command has finished running, the output file profile.svg will appear which will allow us to dig deeper into which parts of the code are taking the longest.
Py-spy Output
Opens the output profile.svg it reveals that observation py-spy it determines how much time our program spends on different parts of the code. This is known as a Icicle graph (or sometimes a Flame graph if the y-axis is reversed) and is interpreted as:
- Bars: Each colored bar represents a specific function that was called during program execution.
- IX-axis (Population): The horizontal axis represents the summation of all samples taken during profiling. They are grouped so that the width of a particular bar represents the proportion of total samples the program was in the activity represented by that bar. Note: This not timeline; scheduling does not represent when the task was called, only the total amount of time spent.
- Y-axis (Stack Depth): The vertical axis represents the call stack. The top bar labeled “all” represents the entire program, and the bottom bars represent tasks called “all”. This continues down iteratively with each bar divided into functions that were called when they were used. The bottommost bar shows the activity that was actually running on the CPU when the sample was taken.
Working with the Graph
Although the picture above is static, it is real .svg file created by py-spy it is fully interactive. If you open it in a web browser, you can:
- Search (Ctrl+F): Highlight specific functions to see where they appear on the stack.
- Zoom in: Click on any bar to zoom in on that specific function and its children, allowing you to isolate complex parts of the call stack.
- Move up: Hovering over any bar shows the specific job name, file path, line number, and the exact percentage of time it took.
The most important rule for studying an icicle graph is simply: The wider the bar, the more common the task. If the task bar exceeds 50% of the graph width, it means that the program has been working on that task for 50% of the total runtime.
Diagnosis
From the icicle graph above, we can see that the bar represents Pandas iterrows() the work is remarkably broad. Hovering over that bar when viewing the profile.svg the file reveals what the real part of this work was 68.36%. So more than 2/3 of the working time was spent on iterrows() work. Intuitively this bottleneck makes sense, as iterrows() creates a Pandas Series object for every single row in the loop, resulting in a large overhead. This presents a clear objective to try and increase the script's runtime.
Developing a Script
A very clear way to improve this script based on what you learned from it py-spy stop using iterrows() making a loop over every row to calculate that harsine distance. Instead, it should be converted to vectorized calculations using NumPy which will do the calculation of each row with just one function call. So the changes that need to be made are:
- Rewrite i
haversine()function to use vectorized and efficient NumPy functions that allow entire arrays to be passed rather than one set of coordinates at a time. - Replace the
iterrows()loop through one wire in this newly inserted vectorhaversine()work.
import pandas as pd
import numpy as np
import time
def haversine(lat_1, lon_1, lat_2, lon_2):
"""Calculate the Haversine Distance between two latitude and longitude points"""
lat_1_rad = np.radians(lat_1)
lon_1_rad = np.radians(lon_1)
lat_2_rad = np.radians(lat_2)
lon_2_rad = np.radians(lon_2)
delta_lat = lat_2_rad - lat_1_rad
delta_lon = lon_2_rad - lon_1_rad
R = 6371 # Radius of the earth in km
return 2*R*np.asin(np.sqrt(np.sin(delta_lat/2)**2 + np.cos(lat_1_rad)*np.cos(lat_2_rad)*(np.sin(delta_lon/2))**2))
if __name__ == '__main__':
# Load in flight data to a dataframe
flight_data_file = r"./data/2025_flight_data.csv"
flights_df = pd.read_csv(flight_data_file)
# Start timer to see how long analysis takes
start = time.time()
# Calculate the haversine distance between each flight's start and end airport
flights_df["Distance"] = haversine(lat_1=flights_df["LATITUDE_ORIGIN"],
lon_1=flights_df["LONGITUDE_ORIGIN"],
lat_2=flights_df["LATITUDE_DEST"],
lon_2=flights_df["LONGITUDE_DEST"])
# Get result by grouping by origin airport, taking the average flight distance and printing the top 5
result = (
flights_df
.groupby('DISPLAY_AIRPORT_NAME_ORIGIN').agg(avg_dist=('Distance', 'mean'))
.sort_values('avg_dist', ascending=False)
)
print(result.head(5))
# End timer and print analysis time
end = time.time()
print(f"Took {end - start} s")
Running this code gives the following output:
avg_dist
DISPLAY_AIRPORT_NAME_ORIGIN
Pago Pago International 4202.493567
Guam International 3142.363005
Luis Munoz Marin International 2386.141780
Ted Stevens Anchorage International 2246.530036
Daniel K Inouye International 2211.857407
Took 0.5649983882904053 s
These results are the same as the results before the code was improved, but instead of taking about three minutes to process, it only took half a second!
Looking Forward
If you're reading this in the future (late 2026 or so), check that you're using Python 3.15 or newer. Python 3.15 is expected to introduce a native sample profiler to the standard library, providing similar functionality py-spy without requiring external installation. Anyone on Python 3.14 or higher py-spy remains the gold standard.
This article explored a tool to deal with a common frustration in data science – a script that works as intended, but is poorly written and takes a long time to run. An example script is provided to learn which airports from the US have the longest average flight distance in terms of Haversine distance. This script worked as expected, but took about three minutes to run.
Using the py-spy Python profile, we were able to learn that the cause of the malfunction was the use of iterrows() work. By replacing iterrows() with the highly efficient vectorized computation of the Haversine range, the runtime has been improved from three minutes down to just over half a second.
See my GitHub Repository for code from this article, including preprocessing of raw data from BTS.
Thanks for reading!
Data Sources
Data from the Bureau of Transportation Statistics (BTS) is a work of the US Federal Government and public domain under 17 USC § 105. It is free to use, share, and adapt without copyright restrictions.



