Thoughts on DuckDB's Grammar Patching Thing

retrocryptid · 2024-12-03T15:01:39 1733238099

Is this "a new generation discovers DSLs"? I had 15 minutes of fame inside IBM in the 80s by saying "The good news about FORTH is you can extend the parser to create your own Domain Specific Language. The bad news is the guy down the hallway already has."

There are a couple things going on here:

1. DSLs aren't "bad." But they may require more forethought than you typically have had to apply to typical programming tasks.

2. Doesn't perl6 do something similar? It was about the only thing about perl6 I liked. Insert reference to your favourite dynamic grammar system: icon? forth? some lisps?

3. something that is sorta new to think about is SQL is supposed to be a declarative language and behind the scenes there's a planner that knows what to do to put a particular record in a particular state. And yeah, you're doing something similar, changing the semantic rules to produce an AST, which you're still using with previously coded code to determine the semantics of the thing you wrote in the new grammar. But that's essentially what the OP said here.

4. I agree with the author that maybe PEGs aren't the most awesome thing in the world, but they seem to be well understood and actually doing something is better than trying to mzke things perfect.

5. I liked the author's write-up, but as an old programmer take umbrage at the idea that changing your parser in the middle of a program is "crazy", we used to do this... well maybe not all the time... but with a greater frequency than we do today.

benesch · 2024-12-03T17:42:46 1733247766

> I liked the author's write-up, but as an old programmer take umbrage at the idea that changing your parser in the middle of a program is "crazy", we used to do this... well maybe not all the time... but with a greater frequency than we do today.

I think Justin addresses that point, though! He writes:

> The development of programming languages over the past few decades has been, at least in part, a debate on how best to allow users to express ways of building new functionality out of the semantics that the language provides: functions, generics, modules.

And indeed by modern PL standards patching the parser at runtime is very unusual.

The "modern" language that I've worked in that comes closest is Ruby, since the combination of monkey patching and the lack of symbols in the function call syntax is well suited to constructing DSLs. But most teams I've worked with that use Ruby eventually developed a strict "no monkey patching" rule, based on lived experience. At scale allowing developers to invent DSLs on the fly via monkey patching made the programs as a whole too complicated to reason about—too hard to move between modules in the codebase if every module essentially had its own syntax that needed to be learned.

I suppose describing this as "dark, demonic pathways" is a bit overstated for comedic effect but indeed "change the language syntax at runtime" does seem to be generally accepted these days as a bad software engineering practice. Works fine at a small scale, but doesn't age well as a team and codebase grows.

noelwelsh · 2024-12-03T15:14:13 1733238853

Agreed in general. What DuckDB seems to be missing is phases / staging. It seems like Someone Else's code could install an extension midway through your program's execution and change its meaning. This is bad. What you want to be able to do is isolate exactly what language your program runs with. Racket does this (and has extensible parsing).