
A Beginner’s Guide to Optimizing Pandas Code for Speed - chasedehan
https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6
======
stevesimmons
This is a good point to plug my talks on 'Pandas from the Inside' (A), 'Big
Pandas' (B) and 'Pandas 2.0' (C) that I presented at various PyData
conferences over the last 18 months:

\- PyData London 2016 - (A), 60 mins, videos online

\- PyData Washington DC 2016 - (A), 90 mins, videos online

\- PyData Amsterdam 2017 - (A) and (B), 3 hours

\- PyData Berlin 2017 - (A) and (B), compressed 90 mins, video probably online

\- PyCon UK 2017 - (A), (B) and (C), 2 hours

PDF slides for (A) are here: [https://github.com/stevesimmons/pydata-
ams2017-pandas-and-da...](https://github.com/stevesimmons/pydata-
ams2017-pandas-and-dask-from-the-inside/raw/master/slides-1-pandas-from-the-
inside.pdf)

Others are in my other repos on github:
[https://github.com/stevesimmons](https://github.com/stevesimmons)

------
deepsun
Doesn't NumPy use SIMD instructions? There's no mention of that in the
article.

~~~
VHRanger
It does under certain installations. Anaconda tends to favor intel MKL on most
x86 systems

------
VHRanger
Main performance tip is to find ways not to copy data generally:

\- use .loc[]

\- use inplace=True

~~~
audiometry
can you elaborate on `using .loc[]` -- what is the defective approach it
replaces?

~~~
sceadu
I would assume using df[['colA', 'colB']] for projection/column selection?

~~~
sceadu
Also, I would caution about using inplace=True. See:
[https://tomaugspurger.github.io/method-
chaining.html](https://tomaugspurger.github.io/method-chaining.html) (ctrl+F:
"Inplace?")

------
setr
There's something really offensive about attributing "premature
optimization..." to xkcd, even as a joke

------
L_226
step 1 - don't use pandas

~~~
bunderbunder
Pandas isn't perfect, but for small- to medium-size datasets, I haven't seen
much that matches its performance, and I haven't seen anything that matches
its combination of performance and ease of use.

~~~
closed
As a person who deeply enjoys developing with python, I'd have to reluctantly
say R's tidyverse is a delight to use and often faster than pandas in my
experience.

~~~
bunderbunder
Ah, good to know. I haven't touched R in a while.

Does tidyverse fix up the mess that is string handling in R?

~~~
closed
Sorry for the late reply. stringr does a pretty good job (though I like how
python handles strings better).

