How does this compare to duckdb/polars? I wonder if GPU based compute engine is a good idea. GPU memory is expensive and limited. The bandwidth between GPU and main memory isn't very much either.
The same group (Nvidia/Rapids) is working on a similar project but with Polars API compatibility instead of Pandas. It seems to be quite far from completion, though.
I've been watching cuda since it's introduction and Polars since I had an intern porting our Pandas code there a couple years ago but I had no idea Polars would go this far, this fast!