
Why not to use (f)lex, yacc or bison - rajnathani
https://tomassetti.me/why-you-should-not-use-flex-yacc-and-bison/
======
kqr
Is it not the case that most serious parser implementations are hand-written?
In part because it makes it so much easier to provide good diagnostic
messages. I feel like every other project moves from parser-generators to
hand-rolled parsers and levels.

~~~
pansa2
For general-purpose languages, yes, but there are some exceptions. Ruby uses
yacc, and Python uses a custom LL(1)-ish parser generator, which is in the
process of being replaced by a custom PEG parser generator [0].

[0] [https://github.com/gvanrossum/pegen](https://github.com/gvanrossum/pegen)

~~~
regularfry
Ruby's yacc parser is my usual go-to for telling people how not to do it. That
thing's terrifying.

~~~
jknoepfler
To be fair, ruby is not a fun language to machine-parse.

~~~
regularfry
Oh absolutely. As a _user_ , I love the ruby syntax. As a potential
contributor to the codebase? NOPENOPENOPE.

------
xalqpp
This whole article is very polemic and promotes an ANTLR book. It is futile to
comment in detail because nearly every statement is false.

~~~
ftomassetti
..wow, this is a bit strong. I would encourage you to be more respectful of
persons who work in this very specific field and share their ideas. There are
always persons behind some work you insult so easily.

You know, there are over 2M persons who read our articles. A few hundred also
bought a book or a video-course from us, but the vast majority just got some
information from free, and we like it in this way. I do not think we got so
many persons interested in what we do by lying.

If you have a different professional experience I would happy to learn from
it.

~~~
pqlao
It is irrelevant how many people read your articles; the critics will also
show up as readers in the statistics.

Let's start the professional experience with another person's view:

[https://research.swtch.com/yyerror](https://research.swtch.com/yyerror)

"Seibel: And are there development tools that just make you happy to program?

Thompson: I love yacc. I just love yacc. It just does exactly what you want
done. Its complement, lex, is horrible. It does nothing you want done.

Seibel: Do you use it anyway or do you write your lexers by hand?

Thompson: I write my lexers by hand. Much easier."

I happen to like both bison and flex, which are relatively easy to use and
bug-free in my experience. Yet your article spreads hundreds of lines of FUD
about these tools, a strategy that many ANTLR people use.

I have used ANTLR. It is not intuitive, the documentation is horrible, if you
happen to find some advice on Stackoverflow it is likely to be for another one
of the incompatible versions.

I suppose if you use ANTLR long enough, these problems go away. But bison or
Menhir don't have these problems in the first place.

------
mhd
It's been a while, but I had some fun using the "lemon" parser generator[1]
used by SQLite. Worked alright for something where regexes weren't enough and
I couldn't be bothered to write a custom parser. Probably has the same cons
that lex/yacc would get, according to the post.

I do have the ANTLR book, but to be honest, if I would have the task of doing
something more involved and where integration isn't as important (i.e. it
could be an isolated command, not a module for huge Java/C# app), I'd probably
be more inclined to get deeper into SML/Ocaml than ANTLR for this.

[1]: [https://www.hwaci.com/sw/lemon/](https://www.hwaci.com/sw/lemon/)

------
brutuz
I have seen multiple projects that run into maintenance problems due to the
large grammar with thousands lines and antlr 3/4 incompatibility.

Generally parser generator adds a layer of complexity/constraint which may be
significant if you need full control of lexing, parsing, semantic processing,
error recovery of your language.

After working on a few language projects (including core language, web-based
editor with syntax coloring and auto-completion) myself I would strongly
suggest hand-rolled parser for any serous language endeavour(general purpose
or DSL).

~~~
ftomassetti
I am a bit surprised by this, as I had the opposite experience. I find ANTLR
grammars much more maintanable than the hand-written parsers I encountered.
Indeed I was asked to port hand-written parsers to ANTLR for maintanability.
Also, ANTLR4 seems to me to produce grammars which are clearer than ANTLR3.

The weak point is error recovering, in my opinion. While ANTLR offers a sort-
of-decent error recovery strategy for free, one has to customize it if they
want to get great error messages.

I think that many maintainability issues are due to poor usage of ANTLR. What
we do is to: 1) Limit semantic actions 2) For complex languages do tree-
transformations, after parsing

With this approach we got pretty decent results.

Personally I find hand-rolled parsers too costly to build and harder to
maintain, but I have limited experience with them. I guess it also depends on
the context: if you are designing a DSL while writing the parser you want to
be able to evolve it very quickly and in that context I think that a parser
generator is very useful. If instead I would need to build an industrial grade
parser for a general purpose language, having a large budget, then I would go
for an hand-rolled parser

~~~
mxz3000
My experience has been similar to yours. The company I work at has very
complex business logic written as a custom DSL parsed/executed by a crazy pile
of Perl code.

Recently we had requirements to come up with a way of providing on-the-fly
validation in a web editor. This was only possible by ditching the old
implementation and re-writing the grammar using ANTLR. While the old
implementation is unmaintainable and (probably, who knows?) buggy, the ANTLR
implementation is trivial to work on, test and add new features to.

If you're working with a limited time budget, which is common when your main
job isn't to maintain the language, then parser generators such as ANTLR are a
godsend. ANTLR even enables you to generate parsers in different languages
depending on where it needs to be executed. Need something to run client-side,
in the users browser? Generate a JavaScript parser and you're done. Need it to
run in the Java backend? Generate it in Java and call it a day.

While it's true that error handling isn't the best, it's already better than
nothing, and, as you say, can probably be improved with a bit of
customization.

------
akwoq
> Can you even imagine the quality of code that was written and fixed for 30
> years?

Total nonsense. Bison is very reliable, Flex can be a bit finicky to set up
but is also reliable.

PostgreSQL uses bison ...

~~~
quietbritishjim
Indeed PostgreSQL is a good example of fairly old software (>20 years) that is
highly reliable and well used. How many people needing a database complain
that they _have to_ have something that has been written, and re-written, in
recent years?

------
nottorp
"ANTLR instead is more actively developed. It has been re-written from scratch
a few times during the years, so the code has is of good quality."

... rewrites improve code quality ... right ... right???

~~~
cmrdporcupine
I first learned Antlr way back about 20 years ago and found it fairly
impressive. So when I go to write a new parser or lexer I always go take a
look again...

But every time I take a new look at Antlr there's a huge amount of issues
related to the fact that they have a huge new rewrite deprecating old versions
while the new version is not ready: entirely new ways of doing things with
missing documentation and examples, incomplete or missing language backends,
and missing packaging for various Linux distributions.

What I get from it is: Antlr is a very powerful project... if you're in the
Java ecosystem and if you're willing to use old no longer supported versions.

Otherwise, questionable choice.

~~~
tonyarkles
Your experience mirrors my experience quite a bit. I remember a few years ago
playing with ANTLR precisely as an alternative to flex/bison for a project I
was working on. Just recently I had an embedded project (quite resource
constrained, like 4kB of RAM) that needed to parse some data coming in a
serial port, and I thought “hey, I should look at ANTLR again”.

I fire it up, play around a bit, and did a calculator example or something
like that. The next step, I figure, is to get it to generate some C code so I
can test it on the device. Turns out ANTLR4 has dropped the C backend
entirely! C++ is, currently, a no-go for the project, so... I guess I’m out of
luck there.

Pleasantly, in a discussion with a friend, he asked the silly question:
“couldn’t you parse those strings with sscanf()?”. I blinked in disbelief,
wrote the tiniest parser to split the input on newlines, and sscanf() did the
trick.

~~~
JoeAltmaier
Yeah sscanf can do quite a bit. But it doesn't have any informative error
messages, can't recover (skip to the next semicolon or whatever), doesn't keep
a symbol table, can't do any parsing more complex than "the next line looks
exactly like this pattern" etc.

So after years of rolling my own, I finally decided I would never do that
again. All the off-by-one errors, difficult-to-diagnose failures and endless
fiddling is out of my life now. I just use a parser tool.

------
skrebbel
Last I checked, the interface between lex and yacc (resp flex and bison) is by
mutating static global variables. Is that still the case? It always struck me
as exceptionally bad design.

~~~
larschdk
I don't think you are particularly restricted to doing this, but lex/yacc was
conceived and designed at a time (1970s) when memory was limited, and speed
and size of code that could be parsed was valued much higher than structured
code and being able to provide good error messages. You simply can't do a
recursive descent parser with an in memory AST in 64KiB of RAM while being
able to parse any meaningful size source file. You had to serialize the
problem and output a sequential AST/intermediate opcodes on the fly. FSA based
lexers and parsers are beautiful in how little RAM they actually require and
how fast they are, albeit clunky and very low level by modern standards.

------
kyeb
I tried to read this article, but unfortunately the lack of grammar was too
distracting and I just couldn't get through it.

A lot of these are things that Google Docs or Microsoft Word will notice and
ask you to change. I highly recommend using one of those to write (especially
if English is not your first language) and trying to understand the
suggestions it makes. Your articles will come out much more readable to
others.

~~~
a3n
Counterpoint: I used to "and I stopped reading" when I ran into grammar
issues. I gradually realized that it's worth the effort to focus on the
underlying ideas and evaluate _that_ , rather than someone's 2nd language
ability. People are smart, even when they talk funny, y'all.

(The worth/validity of the underlying ideas is a separate matter.)

Counterpoint 2: Automated grammar corrections are risky unless you're a native
speaker. Ironic.

~~~
hinkley
I think it’s helpful to recall that the language is called English, not
American, and the people who are experts on it think we don’t talk good.

~~~
zdragnar
Most English speaking people everywhere speak with some variation on "formal"
English. Even the received pronunciation in South england and london is
relatively modern.

If anything, some American accents are closer to the English of the late 18th
century than many British ones are, and the "official" received pronunciation
is not without its fair share of critics.

------
fishnchips
Maybe I was holding it wrong but in my experience ANTLR’s performance (on JVM)
was abysmal.

~~~
samatman
Do you mean the performance of ANTLR, or the performance of ANTLR-generated
parsers?

~~~
fishnchips
I mean ANTLR-generated parsers. Apologies for the ambiguity.

------
stepvhen
I would have to check, but the resources on lex/yacc that I remember would
make a point of their usability at scale. Generally it would be unlikely to
wrote a faster lexer than what lex could do, but very likely to write a better
parser than yacc. However writing a small lexer and/or parser _quickly_ would
be easier with those tools than hand rolling one almost every time. Its great
parser generators are still being developed and improved, and its fine to
point out issues, but this article misses the point by a wide margin.

------
nitnelave
Just a few points that I'd like to answer to this article: \- One of the
primary concerns of Bison is the speed of the generated parsers. In the
section about performance, they say that ALL is 135x faster than GLR. Digging
into the original paper, we find that: \- 135x is from their best case against
the worst GLR they could find \- they compare different tools, not different
algorithms in the same tool \- the grammar they use is optimized for their
tool \- the best GLR tool they're testing against isn't even running on the
same OS \- moreover, they mention that the biggest chunk of their performance
comes from the fact that they cache temporary results, so theoretically
including a cache in a GLR parser could make it even faster. \- since there is
no grammar analysis, we get back to the classic "static vs dynamic" language
debate, where Bison tells you at build time if you have conflicts, and ANTLR
relies on your good test coverage. \- They only have a passing mention of the
fact that their grammar is strictly less powerful than GLR since they don't
handle non-direct left recursion.

------
tomp
Last time I checked ANTLR, it sucked, and after a quick skim now, it still
sucks.

\- no way to specify operator precedence explicitly (instead, it’s based on
ordering of alternatives)

\- no explanation of conflicts / ambiguities in the grammar (again,
ambiguities are silently resolved based on ordering of alternatives)

\- might, or might not, properly handle left recursion (by “handle” I mean,
that the parser always halts)

TL;DR: LL parsing is a dead end, don’t use it.

------
akdor1154
I have used ANTLR to write a parser for a simple language and found it kind of
confusing, but put it down to being new to ANTLR and to parser generators in
general. The reason I chose it was for its ability to output parsers in
multiple implementation languages for a single grammar - lemon et al seem to
be tied to C. Are there any other systems the HN crowd would recommend for
this use case?

~~~
UncleEntity
I've been poking at coco/r recently and it seems to have a nice collection of
backends.

Not sure how maintained it is, spent an afternoon getting the Python version
ported over to py3 (which wasn't really all that hard) to learn how it works.

As TFA states you can't reuse grammars for multiple languages because the
actions are declared inline but a few of the other complaints aren't an issue
due to the way the generated parser does it thing -- quite well designed IMHO.

------
hoseja
This is just an advert for ANTLR. I have no real experience with either but
the article is really grating to read beyond the introduction.

~~~
ftomassetti
I am sorry we had this impression. Sure, we use ANTLR a lot, for commercial
and open-source projects, and we had to work with parsers written in (flex),
yacc, and bison, so we shared our experiences as people who work all day long
with parsers. We offer a lot of free resources on ANTLR and we have not
specific interest in advertising ANTLR. For us it is a tool we use and love,
just that

~~~
bmn__
Almost anything fares better than flex/bison, this is shooting the proverbial
fish in a barrel.

It comes across very strangely that your brother writes as if y'all are
unaware of the existence of parser generators other than ANTLR. This makes me
sad. In a fair comparison, ANTLR really is not a great tool; its capabilities
are eclipsed by its marketing.

~~~
ftomassetti
We have several articles comparing tens of other tools. That said of all the
tools we worked with, ANTLR was the one we were most efficient with. It is
true, there is a learning curve, but in our experience learning it was a great
investment. I can understand other could have different preferences

------
jasonlhy
I think they are learning tools for people to start and to play around. It is
hard for an application developer with no background knowledge to understand
what a parser does because it is quite abstract, for example lexer, tokens and
expression tree.

However, to build a realistic and comprehensive compiler, I believe to build
it manually is a much better option because it is less error prone, more
flexible to tweak around and favours unit testing. Perhaps parser generator is
better? I am investigating these kind of tools for work because we need to
make our own unique formula implementation. I did use yacc for a school
project, and I know its limitation. Since I am busy at my staff so I just let
my colleague to make their decision and they decide to use jison. It turns out
the product owners want see a much better error message in invalidating the
formula, also we need to define our own function so we have to switch to
another implementation in the next phrase.

------
acqq
The paper:

[https://www.antlr.org/papers/allstar-
techreport.pdf](https://www.antlr.org/papers/allstar-techreport.pdf)

Has the important detail why it scored good in benchmarks against other
parsers:

"7.3 Effect of lookahead DFA on performance

The lookahead DFA cache is critical to ALL(*) performance. To demonstrate the
cache’s effect on parsing speed, we disabled the DFA and repeated our Java
experiments. Consider the 3.73s parse time from Figure 9 to reparse the Java
corpus with pure cache hits. With the lookahead DFA cache disabled completely,
the parser took 12 minutes (717.6s)."

I'd still most probably hand-roll, at least when making something for a long-
term project, and not some "demo" \-- there the stability and ease of
maintenance is more important than fast availability of initial results.

------
adontz
What I don't like about any of these parsers, including ANTLR is that for real
languages you get not a clear EBNF grammar, but terrible mix of declarative
and imperative statements. I was pretty sure PEG is the modern way to go, but
looks like it's not. So, what is then?

~~~
carapace
Broadly speaking, yah, PEG. (See also Prolog DCG.) The other neat thing
happening is parser combinators. (IMO)

------
the_arun
I recollect using lex/bison for writing compiler for COBOL on mainframe using
SAS/C to list variable names in COBOL programs (remember Y2K!). Later in life
used ANTLR do something fun in Java. Never thought of comparing them. They
both have their best part. If you are coming from Lex, Bison world - you will
understand the philosophy behind ANTLR. But there is little more learning
curve with ANTLR. Felt it is slightly complicated.

------
upofadown
If the new thing is actually better, you should not have to spend the bulk of
an article tearing down the old thing to make the new thing look better in
comparison.

------
spider-mario
That ANTLR example with the sum does little to convince me that the code
should be separate from the grammar. It being separate means that you need to
look at the grammar to know what tokens you have access to anyway.

(And, really, returning something of type “Any”?)

------
pansa2
> ANTLR uses an algorithm [...] called ALL.

Why is the tool called ANT _LR_ if it uses an LL algorithm? Did earlier
versions use LR, or is it just a confusing name?

~~~
xalqpp
I think officially it is "ANother Tool for Language Recognition", unofficially
it is "Anti LR", which usually is the mindset of ANTLR adherents.

~~~
kqr
Do they have a reason for this? I have written small LL and LR parsers and the
latter seem to make much more sense to me.

------
vogre
Tldr: use ANTLR instead.

