Hacker News new | past | comments | ask | show | jobs | submit login

I had an interesting thread here with somebody who wrote a pretty complete Ruby parser in Ruby:


The "canonical" documentation of the Ruby parser is basically a 6000+ line Bison file where most of the linecount is taken up with handling exceptions... It's awful.

That sounds very like much like bash, where parse.y is 6227 lines. Almost of all of it is C code, it's not actually using very much of yacc...

That'd be me... Though with the caveat that my parser (part of this project[1] ) still has fairly substantial gaps, so it will grow, but certainly not to the size of MRI's parse.y.

It's actually worse these days it turns out parse.y for Ruby is 11348 lines [2], though for fairness it's worth noting that it's written in a quite "linefeed heavy" style, and this includes providing a Ruby API to the parser.

[1] http://hokstad.com/compiler

[2] https://github.com/ruby/ruby/blob/trunk/parse.y

Yikes! 11K lines is a lot. I guess my conclusion is the same as last time: human brains are good at parsing very elaborate languages. People like all that subtle "expressiveness".

I think my bash parser is pretty complete, and it's about 4K lines in Python now, including the lexer. Changing it to the lossless syntax tree vs AST blew it up by a few hundred lines.

However, it does more than the 6K lines in bash's parse.y, because it parses in a single pass. Bash does more parsing in the 9700 line file subst.c, so it's hard to count.

But yeah I'm about to write my Oil parser now, as opposed to the OSH parser... I really want to avoid writing another 4K lines of code!!! That might be the most practical way though. :-(


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact