If you are working within the Apache Spark ecosystem you can us DeltaLake https:... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		seddonm1 on April 28, 2020 \| parent \| context \| favorite \| on: We Need DevOps for ML Data If you are working within the Apache Spark ecosystem you can us DeltaLake https://delta.io/ to create 'merge' datasets which are transactional, versioned and allow time travel by both version number and timestamp.

jamesblonde on April 29, 2020 [–]

Another alternative to Deltalake is Apache Hudi, which also includes bloom filters for indexing time-travel queries (efficiently exclude any files given the supplied time constraint). Z-ordered indexing in Deltalake is not available yet in open-source deltalake, only in Databricks version.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact