Hacker News new | past | comments | ask | show | jobs | submit login
cuDF – GPU DataFrame Library (github.com/rapidsai)
107 points by tosh 10 months ago | hide | past | favorite | 29 comments



cuDF is the most impressive DataFrame implementation I've seen and have been recommending for years. The API is exceptionally close to Pandas (just a couple of different function arguments here and there), much more so than PyArrow or Modin. Throughput and energy efficiency were often 10x that of PyArrow running on a comparable TDP SotA CPU 2 years ago [1].

[1]: https://www.unum.cloud/blog/2022-09-20-pandas


Limited by VRAM is a huge constraint for me. Even if it is slower, being able to load 100GB+ into RAM without any batching headaches is worth a lot.

Unless cudf has implemented some clever dask+cudf kind of situation which can intelligently push data in/out of GPU as required?


There is dask cudf which gets a lot of the way there.

https://docs.rapids.ai/api/dask-cudf/stable/


Last time I’ve used it, Dask was a lot worse than simple manual batching.


This is huge, this was my only gripe with cudf!


I did a conversion of 500GB of data using dask_cudf on a GTX 1060 with 6GB of VRAM and was able to do it faster than a 20 node m3.xlarge Cluster.

What you can do on even consumer GPU's is mind blowing.


How does it perform when it comes to plotting these large data points? Can I use matplotlib?


Correct me if I'm wrong, but with recent GPUs you should just be able to memory map a file on "disk" to the GPU address space?

Open, mmap and then import the host pointer to GPU API (Cuda or Vulkan). Paging should work as expected, but "stupid" access patterns hurt throughput.

This requires a GPU, a CPU and a kernel driver that supports it. E.g. my laptop Intel GPU has the host pointer import Vulkan API but last I checked it does not support pointers imported from mmap.


There's a profiler cell magic for Notebooks which helps identify if you run out of VRAM (it says what runs on CPU and GPU). There's an open PR to turn on low-VRAM reporting as a diagnostic. CuDF is impressive, but getting a working setup can be a PITA (and then if you upgrade libraries...). Personality I think it fits in the production pipeline for obvious bottlenecks on well tended configurations, using it in the r&d flow might cost diagnostic time getting and keeping it working (YMMV etc)


Does this accelerate on an M1? I know this says it is for cuda and that obviously means Nvidia GPU, but lots of ML projects have a port to Apple silicon. I would love to try this on my Mac and see what kind of acceleration my pandas tools get for free.


I wish we could commit to not conflating NVIDIA with GPU. It wouldn't hurt a soul to call it "cuDF - NVIDIA DataFrame Library." To answer your question, it will probably run on the CPU.


It requires NVIDIA



Jax-metal on Apple M-series GPUs is barely useable in my opinion. It's not possible to invert a matrix, for example, because Apple has not implemented the necessary triangular solve operator. It's also not possible to sample points from a normal distribution, because the Cholesky decomposition operator is not yet implemented. Apple hasn't responsed to both of these issues for the past year. It's difficult to take a numerical computing framework seriously if one cannot invert a matrix.

[1]: https://github.com/google/jax/issues/16321 [2]: https://github.com/google/jax/issues/17490


From the readme, no. Says NVIDIA drivers required


I should have been more precise to say that I wondered if there was anyone trying to port it to the M1. It seems like every other ML project has some pull request trying that. Or, perhaps there is another project that is attempting to do the same thing for pandas on the M1. It's a noble goal.


How does this compare to duckdb/polars? I wonder if GPU based compute engine is a good idea. GPU memory is expensive and limited. The bandwidth between GPU and main memory isn't very much either.


The same group (Nvidia/Rapids) is working on a similar project but with Polars API compatibility instead of Pandas. It seems to be quite far from completion, though.

See discussion: https://news.ycombinator.com/item?id=39930846


Thanks for the heads up. This is amazing.

I've been watching cuda since it's introduction and Polars since I had an intern porting our Pandas code there a couple years ago but I had no idea Polars would go this far, this fast!


This and Rapids.ai is the single reason that NVIDIA is the leader in AI.

They made GPU processing at scale accessible to everyone, I have been a long term user of Rapids and found that even as a data engineer I can do things on an old consumer GPU that would otherwise require a 20+ node cluster to do in the same time.


Even though other tools might be "better" than Pandas, it's ubiquity is why I suggest it.

The ability to run coffee 100-1000x faster with this is just icing on the cake.

(I've run through this with most of my Pandas training material and it just works with no code changes.)


Out of interest, what other tools might be "better" than Pandas?

I like pandas, and python.


I’m not who you were replying to but I’ve had positive experiences working with polars: https://pola.rs/


Polars is one (I'm a huge fan, I even wrote a book about it), SQL, R, Excel...


Polars is great. Most of my Python work involves wrangling data with pandas and polars was integral in getting me out of rust tutorial city and into rust production.


Is this GPU thing some kind of magic?

We have like 12 different types of it in the wild. I think it's time we came up with a 1 or 2 GPU HW standards similar to how we have for CPUs.


Is there something comparable to this for Matplotlib?


This is actually a good callout. While pipeline speedups of transforms is hugely important, lots of other fundamental python tools for viz, model examination, etc are not built on a different foundation and not optimized by pandas improvements.

I get it: some of these are legacy, others are hand optimized python since default pandas is so slow. But I'm hoping that, over time, we'll improve the runtime of the other stages of analysis too.


There are some integrations for stuff like https://docs.rapids.ai/visualization :

HoloViews hvPlot Datashader Plotly Bokeh Seaborn Panel PyDeck cuxfilter node RAPIDS




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: