The thing I would have liked to know is why they don't use an existing fast SQL ...

robbie-c · 2026-06-24T20:04:23 1782331463

Our SQL is very similar to ClickHouse SQL, in that we used ClickHouse SQL as a starting point as that's what our underlying DB is. We needed to have our own parser so that we could add additional language features on top.

bonzini · 2026-06-24T22:03:14 1782338594

I think you should clarify that (or whether) while you didn't look at the generated code, you are actually going to adjust it in the future.

How did the two approaches compare in terms of code readability?

robbie-c · 2026-06-25T12:21:50 1782390110

The previous parser is mostly a declarative grammar file, which is extremely readable. It codegens a C++ parser, which is hard to read. It depends which of those you count as the previous parser's source code!

In the future, we'd make changes by modifying the ANTLR parser first, then using the same approach as in the blog post to get the new parser to parity. We have no plans to get rid of the C++ parser as an oracle!

bonzini · 2026-06-25T14:06:44 1782396404

Sorry, by two approaches I meant the two parsers generated by the LLM, recursive descent and graph based; not the ANTLR one.

Sort of unexpected that you're keeping the old one as an oracle—but a very good idea for anyone that writes such a complicated parser, indeed!

nijave · 2026-06-25T02:10:02 1782353402

Yeah curious why they didn't use Presto/Trino, DuckDB, or Clickhouse SQL directly with UDFs and views to augment

Zuora exposes a Trino-based data warehouse which is quite nice and powerful

Besides the parser side, existing dev tools and docs automatically work, too

__s · 2026-06-24T21:49:32 1782337772

This is pretty much the case with every SQL dialect

-warren · 2026-06-24T20:34:53 1782333293

I think thats exactly what indirectly happened. This guy didnt optimize the parser. Someone else did -- years ago. That work was pulled into the LLM and made it look like magic.

bonzini · 2026-06-24T21:58:14 1782338294

Note that it's not a particularly optimized algorithm: recursive descent + specialized subparser for expressions is simply the standard way to write parsers by hand. It's ANTLR which is super flexible but also dog slow.

robbie-c · 2026-06-24T22:32:20 1782340340

Yeah, one of the interesting parts to me while working on this is that the breakpoint for when it's worth writing your own parser vs accepting ANTLR's slowness has shifted massively. Previously it would have been someone's full-time job to maintain. Now with this approach you can get the best of both worlds.