Machine Learning

Speed ​​up Pandas Code with NumPy. But I can't install this, can I? …… | by Thomas Reid | January, 2025

But I can't install this, can I? …. yes, it is possible!

In one of the first articles I wrote on Medium, I talked about using the apply() method on Pandas dataframes and said it should be avoided, if possible, on large dataframes. I'll put a link to that article at the end of this one if you want to check it out.

Although I talked a little bit about other possible methods, namely using vectorisation, I didn't give many examples of using vectorisation, so I intend to correct that here. Specifically, I want to talk about NumPy and a few of its lesser-known methods ( whereagain select) can be used to speed up Pandas operations involving complex if/then/else conditions.

Vectorisation in the context of Pandas refers to how to apply operations to all blocks of data at once rather than iterating over them line by line or item by item. This approach is possible due to Pandas' reliance on NumPy, which supports highly optimized vectorised operations and is written in C, allowing for fast processing. When using vectorised operations in Pandas, such as using arithmetic operations or functions on DataFrame or Series objects, the functions are sent to multiple data elements at once.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button