Hacker News new | past | comments | ask | show | jobs | submit login
Write a parser with JavaScript (lihautan.com)
114 points by bpierre 6 months ago | hide | past | favorite | 24 comments

I love articles about writing parsers by hand - it's really fun and teaches you a lot.

Last year I decided to write a complete compiler in Javascript, completely by hand. The idea was to compile a vaguely C-like toy language to WASM without any external dependencies.

Part of the motivation was that while I've written many simple parsers, I mostly used parser generators with BNF grammars and I didn't feel like I had a good sense of how the code I ended up generating actually worked for something with complex syntax and semantics. I felt like I was writing specs for a parser, rather than writing a parser.

My toy language has vaguely C-like syntax with block scope and infix operators with various precedences, so it was a bit more complicated than JSON, but I ended up using something like Pratt parsing/Precedence Climbing (see https://www.oilshell.org/blog/2017/03/31.html) and wrote the whole thing in a way that's - hopefully - pretty easy to read for folks interested in wrapping their head around parsing complex syntax (e.g. with scope and name resolution). The lexer, parser and language definition ended up being about 1000 lines of JS (counting some pretty extensive comments).

Any JS programmers that are interested in really getting into the nitty-gritty of writing your own parser/compiler should check it out. The source is here: https://github.com/j-s-n/WebBS (relevant files for parsing are in /compiler - start with lexer.js, parser.js and syntax.js).

If you want to play around with the language and inspect the generated ASTs and WASM bytecode, there's an interactive IDE with example code here: https://j-s-n.github.io/WebBS/index.html#splash

I like parsers too! I started a few years ago with the classic calculator https://caub.github.io/misc/calculator

Acorn (JS AST parser) is an interesting codebase https://github.com/acornjs/acorn/tree/master/acorn/src

engine262 (JS AST parser and evaluator) too is interesting, here's how JSON parser is handled: https://github.com/engine262/engine262/blob/master/src/intri...

That's a heck of an interview question. I like what he was tempted to write better. I really don't want engineers toiling for hours over something a language or framework already provides.

I get the purpose, of course, but geez.

One of my interview questions was to write a toy lisp interpreter (only needs to do addition and multiplication, that kinda thing). Not nearly this crazy, only took like 20 lines of Python, but I could see why it would be helpful.

But an entire JSON parser? That's nuts!

I develop a parsing library for fun and I don't think that would be an easy or useful interview question.

Maybe it makes sense if you assume the input has already been tokenized so you are not expected to deal with the minutiae of string literal escape sequences and such and can focus on the high level design/flow...

I'm obsessed with plt and parsers... What would be better interview questions.?

My intuition as someone who writes compilers, when handed a BNF grammar, is to reach for a parser-generator like yacc, so that I can keep the BNF grammar as the canonical representation of the grammar, and have everything else derived automatically and kept in sync. (Even though, yes, in this case, JSON is a very static spec that hasn’t changed since it was introduced.)

Presuming for a moment that there isn’t a parser-generator utility that emits JavaScript, though, I’d be led to wonder whether it wouldn’t still be a more efficient use of my time (compared to hand-rolling a parser, usually a pretty finicky task) to just use yacc itself to emit C, and then use Emscripten to get WASM, and then call that from JavaScript. I’d probably do that first—it’d take about an hour—and then profile the resulting parser (including the JS-WASM overhead.) Hopefully, it’d fall within the tolerances for parsing time. Only if it didn’t would I sigh and begin to hand-roll a parser.

Luckily there are many Parsing Libraries in the JavaScript eco-system, So we won't have to find out if this convoluted approach is worthwhile :)

- https://tomassetti.me/parsing-in-javascript/

BNF sounds like the standard approach. Are there any BNF parsers + generators usable from Javascript?

Some of the top of my Head:

  - Antlr (https://github.com/antlr/antlr4/blob/master/doc/javascript-target.md)

  - PegJS (https://github.com/pegjs/pegjs)

  - Nearley (https://github.com/kach/nearley)

PEG is another popular approach

jison works fairly well, and is mostly compatible with bison.

The main reason for using a parser genarator tool is being able to use the canonical BNF version of the grammar, as you said. Another good thing is that LR parser generations can also help detect ambiguities in the grammar -- those appear as shift-reduce or reduce-reduce conflicts.

But there can also be good reasons to use a hand-rolled recursive descent parser. It simplifies the build system, and it can also be easier to produce good error messages.

If you're into this stuff and want to keep it all in the JS ecosytem, PEG.js isn't a bad choice. It powered CoffeeScript Redux, among other widely used projects.

I've been looking at the WASM side myself and suspect it's only a matter of time before there's a JS/TS PEG library written mostly in Rust.

I think the (not maintained, I presume from the activity) jison[0] project was trying to achieve was auto generating a parser in JavaScript based on a similar idea.

[0] https://github.com/zaach/jison

The problem is... those tools are usually context-free, and almost every language is context-sensitive.

For those interested, there is also jison, a flex+bison ~1:1 reimplementation in js. I used it for a domain-specific format, it is really nice, easy and go-to, compared to traditional alternatives.


Very nice. Here's a similar post but for Python.


A great parser framework is Chevrotain by SAP

You can find it At: https://github.com/SAP/chevrotain

Wait this is a JSON parser, not a JavaScript parser! Still nice work! There is something "raw" about writing parsers by hand (even if they are recursive descent). Teaches one a lot about clean design and modularity etc!

It's a parser written in JavaScript, so it is technically a JavaScript parser :)

I'm surprised Ohm isn't mentioned, it is spectacular.

Anybody interested in parsers is gonna love it:


I wonder what happened to that C++ grandmaster course http://cppgm.org

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact