I might be wrong on this, but I don't believe this is a replacement for Spark. R... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		peytoncasper on Aug 24, 2021 \| parent \| context \| favorite \| on: Apache Arrow Datafusion 5.0.0 release I might be wrong on this, but I don't believe this is a replacement for Spark. Rather this is similar to the Spark SQL execution engine. I don't believe there is any focus on providing a distributed execution environment, rather platforms like Spark and Flink could integrate DataFusion as an implementation and expose the API for Apache Arrow operations.

houqp on Aug 25, 2021 [–]

Datafusion, and Ballista by definition, also provides a Dataframe API that let's you construct queries programmatically. It also has preliminary support for UDFs.

We also have community members implementing Spark native executors using Datafusion, which showed significant speed improvements in the initial PoC.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact