kaby76's comments

kaby76 · 2025-09-23T09:50:41 1758621041

Development on Antlr4 has terminated. The "official ANTLR" successor, called Antlr5, was intended to enable ANTLR to run in a browser, replacing over a half-dozen runtime targets with a unified runtime target, and to add LSP services. But development on Antlr5 stopped after a few months, a year and a half ago, and I don't see when it'll be restarted, if ever.

Antlr-ng is Mike Lischke's port of Antlr4, which he likely undertook because ANTLR is used at Oracle for one MySQL product. It's not "official ANTLR," but Terence Parr granted him the use of the "ANTLR" name and allowed a fork to port the existing Antlr4 code to TypeScript.

Mike's Antlr-ng port of the Antlr4 code began with a Java-to-TypeScript translator he wrote. Along the way, he made some improvements to the TypeScript target.

But, Antlr-ng uses ALL(star). Therefore, it shares the same performance issues as Antlr4. I'm not sure where Mike wants to take Antlr-ng to address that issue.

ANTLR is presented as a generator for small, fast parsers. ALL(star) probably can't do that. Many grammars people write are pathological for ANTLR. People hand-write parsers, reverse-engineer the EBNF from the implementation as an afterthought, drop the critical semantic predicates from the EBNF, and then refactor it into something else—example: the Java Language Spec.

kaby76 · on Aug 22, 2021

I have a fundamental question: What happened to the idea of specifying the syntax of a language using a "formal" grammar, however terribly flawed and imperfect that has been--for decades--and still is? It seems that if you need a grammar, e.g., to write a programming language tool, it's considered "old school". If it's published, it's of secondary consideration because it's usually out-of-sync with the implementation (e.g., C#, soon to be version 10, the latest "spec"--draft at that--version 6). Some don't even bother publishing a grammar, e.g., Julia. Why would the authors of Julia not publish a "formal" grammar of any kind, instead just offer a link to the 2500+ lines of Scheme-like code? Are we to now write tools to machine-scrape the grammar from the hand-written code or derive the grammar from a ML model of the compiler?

ChrisRackauckas · on Aug 22, 2021

Who is going to put the time in to maintain it if the language is given by the implementation of the language? In the past, people wrote huge specs because you wanted multiple companies to implement compilers for their various computers. Now there's enough centralization that a single implementation of a language can exist. Why spend the time supporting 6 compilers when 99.99% of users will use the big 1 that the core language developers are working on? Labor is a finite resource, and because this requirement is gone the extra work that came with that requirement is discarded as well. If someone was willing to write up a full spec and maintain it for 10 years then I am sure it would be accepted to any open source language, but nobody decides to do that with their free time for some reason.

saurik · on Aug 22, 2021

One big reason is because it is highly likely they not only don't have one, but one couldn't even exist in any meaningful way, as their primary implementation is some turing-complete monstrosity where the only "formal" spec is their code as there is nothing "formal" about it: the cost of not using parser generators is essentially the same as the cost of using dynamic typing... it not only means you probably have inconsistency bugs all over the place, even if it made it feel easier to code without having to actually prove your program was correct all the time, but it further tends to lead you to relying on the dynamic typing features in places that make the resulting program not something that fits any known formal type model anymore even if it works perfectly.

kaby76 · on July 22, 2021

There are many reasons why there are not many PL startups. From my perspective--as a PL tool developer for 40 years--is that we still do not have reliable formal grammars that describe the syntax and static semantics for a given programming language. That is a crux because it is difficult to write a tool for a language when we cannot even define what that language is. When you ask where to find a grammar for the language, the default answer is to just use an "official" implementation. That often requires a large amount of time to find and understand an API--and then locks you into that implementation. Some of the popular languages publish a grammar derived by a human reading the implementation after the fact, prone to errors. For C#, the standards committee is still trying to describe version 6 of the language (https://github.com/dotnet/csharplang/tree/main/spec), and we are now on version 10. An Antlr4 grammar for C# is mechanically reverse engineered from the IR, but it does not work out of the box--it contains several mutual left recursions and does not correctly describe the operator precedence and associativity. Julia does not even have a formal grammar published for the language. You cannot easily refactor a formal grammar from something that does not even exist, and have to write it from scratch. Often, what is published is out of sync from the implementation, not versioned, tied to a particular parser generator and target language with semantic predicates. The quality is suspect because you are unsure whether it even follows a spec. No one bothers to enumerate the refactorings involved in modifying the grammar to fit a parser generator, or for optimization, e.g., speed or memory. Yes, I can agree: “[t]here's definitely a need for SOMETHING, as developers have so much pain.”