The HN submission title is much more clickbait than the actual content deserves. Any algorithmic improvement (in terms of big-O runtime complexity) can be scaled up to 100x by making the data set large enough.
-- Numbers are a red herring to me: HPC libs plugged onto the columnar data should destroy LLVM-generated code, including SIMD (not sure if being done yet). HPC libs should cover the typical alg shapes anyways (e.g., everything in the article), and GPU code should generally beat whatever is here anyways. For a fun time, look into SIMD polyhedral vs skeleton & manual GPU codes..
The numbers here we really care about are expression compile time - will a system respond quickly to an analyst or adaptive system generating code. Is it < 10ms overhead, < 100ms, < 1s, < 5s, ...?
-- All about the UDFs: So why do we care? A big problem w/ the algorithm template libs is they stink at accepting friendly user-defined expressions (UDFs). Fast systems are increasingly adopting LLVM to solve this. Speaking of SIMD.. MapD does exactly this: columnar SQL expressions -- LLVM --> CUDA. Our team has been thinking about how to do stuff like js/python/pandas UDFs -> CUDA, and pragmatically, most roads came down to this (vs say Numba's internals). So Gandiva providing a modular bit here is quite cool, esp. for the growing Arrow community!
-- For framework devs: Likewise, there are some cool choices here like code caching. Users shouldn't directly interact with this, but framework devs will. We've eaten cycles here for similar projects, and most Gandiva consumer would have to as well if it wasn't here too. So just as Apache Calcite is easing dev of a lot of similar systems, it's cool to see the team is not just doing the modular system, but seemingly, doing it well!