Hacker News new | past | comments | ask | show | jobs | submit login

> With frameworks like Python pandas, you always end up having to manually partition your data if it doesn’t fit in memory.

"Pandas Docs > Pandas Ecosystem > Out of Core" lists a number of solutions for working with datasets that don't fit into RAM: Blaze, Dask, Dask-ML (dask-distributed; Scikit-Learn, XGBoost, TensorFlow), Koalas, Odo, Ray, Vaex https://pandas-docs.github.io/pandas-docs-travis/ecosystem.h...

The dask API is very similar to the pandas API.

Are there any plans for ROOT to gain support for Apache Parquet, and/or Apache Arrow zero-copy reads and SIMD support, and/or https://RAPIDS.ai (Arrow, numba, Dask, pandas, scikit-learn, XGboost, spark, CUDA-X GPU acceleration, HPC)? https://arrow.apache.org/

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact