Hacker News new | past | comments | ask | show | jobs | submit login
GNU Bison 3.3 released (gnu.org)
73 points by jrepinc 52 days ago | hide | past | web | favorite | 43 comments

"There are also many smaller improvements, including a fix for a bug which is at least 31 years old."

Its weird that web pages can succumb to link rot so quickly, apps succumb to bit rot almost as quickly, but a bug will live on.

In a thousand years time will archeologists study us through the bugs left behind in Linux 1300.05 and windows (30)95? Do you think there will still be jobs for Cobol programmers?

> but a bug will live on.

Sometimes, yes. But it has also happened often to me that an old bug I was trying to fix was simply gone, either because the error mode has gone away, or the feature that the bug related to has been removed.

> But it has also happened often to me that an old bug I was trying to fix was simply gone, either because the error mode has gone away, or the feature that the bug related to has been removed.

And this is why groups like the OpenBSD team and the Linux kernel team are so eager to remove support for old hardware. Removing support for the 80386 cleaned up code which no longer had to have special cases to handle opcodes and features the 80386 doesn't have but later x86 processors do.

It will be the "junk DNA" of the source code -- because of workarounds and baked in assumptions it will be impossible to eradicate.

Something similar is actually a plot point in Vernor Vinge's "A Fire Upon The Deep".

I wish it were easier to have more target languages. (And that there were more target languages.)

I know the docs say that additions are welcome, but I'm still in my infancy of working out how to :)

Congrats though! I love it when these tried-and-true tools continue to perform and improve!

I always kind of wonder what people use lex and yacc for. 20 years ago I made something to pretty print random C structs, but I have no idea what people do with these tools. Historically, these have been billed as "compiler compilers", and while that's certainly a use case, it never struck me as a very popular use case.

The main advantage, to me, is that the input of Yacc is a (mostly) self-contained, (mostly) declarative representation of the grammar to be parsed. This format is portable between projects.

You can, for example, take Postgres’s SQL syntax parsing “engine” out of Postgres and use it in your own project, because it’s not actually an “engine”, but rather just a Yacc grammar.

How often are yacc/lex grammars actually portable? Truly portable too — usable in different threading contexts, for example.

You say they are declarative. My experience is admittedly very limited, but I would call them anything but declarative. There are declarative elements, but the actual payload of your code is very, very often imperative. (A lot of this is shaped by the API, imo).

I compare this to parser combinators (like parsec) and PEGs (like luapeg and rust-peg). I find these other options are more naturally side-effect free, both internally and externally.

They are used to build a number of tools you probably rely on or use regularly, including ruby, golang, bash, MySQL and postgresql.

golang uses a handwritten recursive decent parser and handwritten lexer.

https://github.com/golang/gofrontend/blob/master/go/parse.cc https://github.com/golang/gofrontend/blob/master/go/lex.cc

I can't think of a single major tool that uses ANTLR or bison. I say that being a huge ANTLR fan. Parser/lexer generators are really easy to use for DSLs but for real languages and tools, they're too heavyweight and constraining.

> Parser/lexer generators are really easy to use for DSLs but for real languages and tools, they're too heavyweight and constraining.

Haskell and OCaml are examples of real languages that use parser generators.

GHC: https://github.com/ghc/ghc/blob/master/compiler/parser/Parse...

Ocaml: https://github.com/ocaml/ocaml/blob/trunk/parsing/parser.mly

GCC used to use bison. I don't think GCC became a major project recently. But you're correct that recursive descent is more efficient. Not everything is efficiency though (parsers are usually 1% of compile time since they're dirt cheap compared to code generator and especially static analyzer and optimizer). Bison makes parsing extremely simple. I can't see why anyone would use recursive descent unless they're a super major project and need to squeeze every ms.

A big part of why tools move away from Bison and ANTLR isn’t performance, but UX (especially error reporting)

In my experience recursive descent doesn't have any better error reporting than Bison or ANTLR. Parser combinators are much better if error reporting is what you're aiming for. Also, for LALR/LR parsers like Bison/ANTLR, the canonical trick is to encode errors in your grammar, which really isn't a huge deal. E.g. in pseudocode

    suite: stmt ';' stmt {/* ok */}
         | one_line_stmt NEWLINE stmt {raise("you need a ';' there buddy");}
         | // other rules
obviously this won't be useful for every case in general, otherwise why use ';' at all, but in most cases it'll give a helpful error.

Some more powerful parsers like Earley Algorithm have much worse error reporting.

context sensitive grammars.

You can deal with context sensitive grammars with "lexer trick" or multiple pass parsers.

That's the gccgo frontend; the more usual Go compiler is here: https://github.com/golang/go/tree/master/src/cmd/compile/int...

It's still handwritten though so your point stands.

Golang also has its own linker and assembler (going as far as having different notation for asm instructions), it doesn't count in these comparisons.

PHP appears to. It was the first language that came up in my Github search, and I stopped looking there.

You use it when you need a parser or a lexer. Even GCC used to use it (they switched to a recursive descent parser a few years ago). Not sure what's surprising here. Maybe you're not familiar parsers?

Yacc is a very important part of UNIX because it allows one to create a parser for a language with limited knowledge of parsing. Of course, if a language is very complicated, it is better to create a special purpose parser, but with yacc one can easily create parsers for most simple DSLs.

> Yacc is a very important part of UNIX because it allows one to create a parser for a language with limited knowledge of parsing.

This isn't true. You basically need to know the entire algorithm Yacc uses to use it. What does someone with limited knowledge of parsing do when Yacc tells them this?

    5: reduce/reduce conflict (reduce 3, reduce 4) on c
    state 5
            B : x y .  (3)
            E : x y .  (4)

I can't see the problem here. To use stdlib `rand` you need to understand that you should first `srand(seed)` and why, and that that seed needs to be unique for each program. Then, in order to use it correctly (e.g. not in a physics simulation) you should understand it's algorithm. Using programming tools require reading manual, and sometimes understanding algorithms.

> I can't see the problem here.

I didn't say it was a problem.

This unfortunately all comes down to what you mean by "basically" and what they meant by "limited".

I think what they said is closer to the truth. I've used flex/bison a few times to create simple languages, and as long as they're parsed correctly, which is quick/easy to roughly check with a REPL (mine were in that form), conflicts can be ignored. Mine worked fine. It was a joy to use - having both flex and bison program windows open, plus bash to run the makefile, I could add a feature and test it within seconds. I'd say I have very limited knowledge of parsing, but read a few flex/bison books, enough to grasp what that message means. It's not necessary to have no conflicts, from what I understand. And it seems you have a more....professional language in mind, like in a commercial situation.

But then again, Yacc/bison isn't very important because it allows etc.... (i.e. that's not the reason it's important)

One thing is knowing about parsers, another completely different thing is knowing how to implement an efficient parser. This is the difference, e.g., between using a machine learning algorithm and implementing one from scratch.

They figure it out, I assume? - LR parsing is accountancy, not rocket science. You don't need to be a genius.

Open Shading Language (from Sony) uses it to parse the, umm, shading language.

Pixie (the Renderman renderer) uses two or three bison grammars.

I personally like SQLite's lemon parser generator for my grammar poking arounding when I'm not messing with my C++ Earley parser implementation -- which I can never seem to get working properly with empty rules since the papers on the subject are less than clear.

But I'm probably a bit of a strange one in that I'll spend days getting a random grammar working then drop it for some shiny new thing to play with.

Hand-coding {object code, codecs, parsers, ...} is a disaster. That's why compilers exist. It's also why code generators exist. That's what lex/yacc (flex/bison) are: code generators specialized for generating parsers. They are not compiler-compilers, but tools used in the construction of compilers (and other things that aren't compilers too).

In the C family of languages all the notable compilers either never used a parser generator and instead simple recursive descent, or moved from using one. It's easier to write, debug, and extend an RD (or table-driver precedence-climbing) parser than work with the (large amounts of) generated code tools like Bison will create.

In general, I don't think so. It's one thing if you parse a well understood grammar like C. But many people who use recursive descent from the start for a new language or DSL just don't bother about the grammar, and when someone takes a look it is suddenly LL(4) and relies on whatever the hand-written parser does.

So what? I personally am a fan of recursive descent parsing for most programming languages, and if it works with recursive descent, why would it be troubling that it's actually an LL(4) grammar?

Now it would only be troubling if the grammar itself is intended to be widely usable across different languages, say JSON or XML. A parser generator really shines because it can generate code to parse this grammar in multiple languages. For programming languages intended to be written by humans, I value things like good error reporting far more.

I'd assumed VimScript was written using Flex/Bison; turns out it's written directly in C: https://github.com/vim/vim/blob/master/src/eval.c

I would guess that a lot of your cell data are processed by code generators that use the lex/yacc toolchain. While perhaps they're not commonly used in implementations, they have a large number of deployments.

I think a lot of people just use parser combinators which I personally find a lot easier to use and much more powerful. Code as data, data aa code, yadda yadda.

I mean, they're basically used whenever you need either a lexer or a parser, or both. So yeah, there is quite a number of possible use cases.

lex and yacc also used by the ircd-hybrid and ircu families of IRCds to parse the configuration files. OpenBSD also uses yacc for configuration files for pf, doas, httpd, etc.

yecc, the erlang port of yacc, is used by the Elixir parser.

SQL parsers, like postgres and mysql.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact