Hacker News new | past | comments | ask | show | jobs | submit login

Parquet + Arrow reminds me of a fast SQL engine on data lake called Dremio.


They also have an OSS version in GitHub.

They're heavily into Arrow. A few years ago they contributed Gandiva, an LLVM expression compiler for super fast processing. https://arrow.apache.org/blog/2018/12/05/gandiva-donation/

It's one of the reasons I like being all in on Arrow. Why do everything ourselves when a ton of other smart people are working on this too?

Talking about Gandiva, something that's open for taking: https://issues.apache.org/jira/browse/ARROW-5315 (creating Gandiva bindings for Rust).

I think DataFusion is mature enough that we could plug in Gandiva into it.

Disclaimer: I work on the Arrow Rust implementation

Ah yes, of course, hi Nevi :). Thank you again for all your work on the Rust implementation. We're obviously big fans.

Gandiva bindings is definitely something we should look into, but I'm guessing there's much lower hanging fruit within DataFusion in terms of optimizing, particularly for our use case.

Thanks Paul :)

I think with compute functions/kernels, we're sitting under a grapevine, so we'll be able to add a lot to Arrow without yet needing Gandiva bindings. The Rust stdsimd [0] work will also enable us to better use SIMD in ~ a year from now (I hope)

[0] https://github.com/rust-lang/stdsimd

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact