Hacker News new | past | comments | ask | show | jobs | submit login

I work heavily with pandas and dask (when you want to use multiple cores), using parquet files for storage. We see a lot of benefits in selectively bringing in duckdb into the mix. For instance, the joins are extremely slow with both pandas and dask and require a lot of memory. That's a situation where using duckdb reduces the memory needs and speeds things up a lot.

And we may not want to upload the data into postgres or another database. We can just work with parquet files and run in-process queries.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: