
Pandas got 3x faster - techviz
https://prakhartechviz.blogspot.com/2019/01/faster-pandas-even-on-your-laptop.html
======
chrisaycock
To be clear, this doesn't speed-up pandas, per se. It uses a different library
(Modin) as a drop-in replacement:

[https://github.com/modin-project/modin](https://github.com/modin-
project/modin)

Modin uses Ray, a distributed computation library. There was a similar article
on HN a year ago that hyped "making pandas faster" by replacing it with Ray:

[https://news.ycombinator.com/item?id=16510610](https://news.ycombinator.com/item?id=16510610)

~~~
jabberthemutt
And to remove any mystique, this mostly does so by parallelising things to all
CPU cores.

------
wenc
Why link to a blog post instead of the Modin [1] project directly, which is
the reason for the speed improvement?

Also the title "Pandas got 3X faster" seems to contradict the conclusion in
the article, which reports the result was < 2x faster.

[1] [https://github.com/modin-project/modin](https://github.com/modin-
project/modin)

------
xiphias2
The Modin project should work on merging the implementation to the original
Pandas project, so that parallel / non-parallel algorithms could be mixed.
Drop in replacements don't work, as they are not 100% compatible with the
original project.

I would prefer to pass an optional parallel=true parameter to some functions
in the API, or have a configuration setting that can fall back to non-parallel
implementation.

------
pcmoritz
If you are interested in learning more about this, there is also a recent
PyData talk: [https://rise.cs.berkeley.edu/blog/modin-talk-at-pydata-
nyc-2...](https://rise.cs.berkeley.edu/blog/modin-talk-at-pydata-nyc-2018/)

------
sqidyyy
Imagine my disappointment when I found out this isn't about real pandas.

~~~
olliej
Right? I was super interested until I saw it was about software :(

------
kumarvvr
Can someone shed light on what the trade-offs are by using this?

~~~
Foivos
I just learned about modin as well. Based on this blog post [1], it seems to
be a bit slower for small datasets. But this was 6 months ago. Now the
documentation claims that it is suitable for datasets from 1kB to 1TB.

[1] [https://rise.cs.berkeley.edu/blog/pandas-on-ray-early-
lesson...](https://rise.cs.berkeley.edu/blog/pandas-on-ray-early-lessons/)

------
test6554
As someone who has never heard of pandas, the logo is hilarious.

