
Write a parser with JavaScript - bpierre
https://lihautan.com/json-parser-with-javascript/
======
catpolice
I love articles about writing parsers by hand - it's really fun and teaches
you a lot.

Last year I decided to write a complete compiler in Javascript, completely by
hand. The idea was to compile a vaguely C-like toy language to WASM without
any external dependencies.

Part of the motivation was that while I've written many simple parsers, I
mostly used parser generators with BNF grammars and I didn't feel like I had a
good sense of how the code I ended up generating actually worked for something
with complex syntax and semantics. I felt like I was writing specs for a
parser, rather than writing a parser.

My toy language has vaguely C-like syntax with block scope and infix operators
with various precedences, so it was a bit more complicated than JSON, but I
ended up using something like Pratt parsing/Precedence Climbing (see
[https://www.oilshell.org/blog/2017/03/31.html](https://www.oilshell.org/blog/2017/03/31.html))
and wrote the whole thing in a way that's - hopefully - pretty easy to read
for folks interested in wrapping their head around parsing complex syntax
(e.g. with scope and name resolution). The lexer, parser and language
definition ended up being about 1000 lines of JS (counting some pretty
extensive comments).

Any JS programmers that are interested in really getting into the nitty-gritty
of writing your own parser/compiler should check it out. The source is here:
[https://github.com/j-s-n/WebBS](https://github.com/j-s-n/WebBS) (relevant
files for parsing are in /compiler - start with lexer.js, parser.js and
syntax.js).

If you want to play around with the language and inspect the generated ASTs
and WASM bytecode, there's an interactive IDE with example code here:
[https://j-s-n.github.io/WebBS/index.html#splash](https://j-s-n.github.io/WebBS/index.html#splash)

~~~
11235813213455
I like parsers too! I started a few years ago with the classic calculator
[https://caub.github.io/misc/calculator](https://caub.github.io/misc/calculator)

Acorn (JS AST parser) is an interesting codebase
[https://github.com/acornjs/acorn/tree/master/acorn/src](https://github.com/acornjs/acorn/tree/master/acorn/src)

engine262 (JS AST parser and evaluator) too is interesting, here's how JSON
parser is handled:
[https://github.com/engine262/engine262/blob/master/src/intri...](https://github.com/engine262/engine262/blob/master/src/intrinsics/JSON.mjs#L72-L210)

------
cabaalis
That's a heck of an interview question. I like what he was tempted to write
better. I really don't want engineers toiling for hours over something a
language or framework already provides.

I get the purpose, of course, but geez.

~~~
jedimastert
One of my interview questions was to write a toy lisp interpreter (only needs
to do addition and multiplication, that kinda thing). Not nearly this crazy,
only took like 20 lines of Python, but I could see why it would be helpful.

But an entire JSON parser? That's nuts!

~~~
bd82
I develop a parsing library for fun and I don't think that would be an easy or
useful interview question.

Maybe it makes sense if you assume the input has already been tokenized so you
are not expected to deal with the minutiae of string literal escape sequences
and such and can focus on the high level design/flow...

~~~
agumonkey
I'm obsessed with plt and parsers... What would be better interview
questions.?

------
derefr
My intuition as someone who writes compilers, when handed a BNF grammar, is to
reach for a parser-generator like yacc, so that I can keep the BNF grammar as
the canonical representation of the grammar, and have everything else derived
automatically and kept in sync. (Even though, yes, in this case, JSON is a
very static spec that hasn’t changed since it was introduced.)

Presuming for a moment that there isn’t a parser-generator utility that emits
JavaScript, though, I’d be led to wonder whether it wouldn’t still be a more
efficient use of my time (compared to hand-rolling a parser, usually a pretty
finicky task) to just use yacc itself to emit C, and then use Emscripten to
get WASM, and then call that from JavaScript. I’d probably do that first—it’d
take about an hour—and then profile the resulting parser (including the JS-
WASM overhead.) Hopefully, it’d fall within the tolerances for parsing time.
Only if it didn’t would I sigh and begin to hand-roll a parser.

~~~
bd82
Luckily there are many Parsing Libraries in the JavaScript eco-system, So we
won't have to find out if this convoluted approach is worthwhile :)

\- [https://tomassetti.me/parsing-in-
javascript/](https://tomassetti.me/parsing-in-javascript/)

~~~
asdasdasdasdwd
BNF sounds like the standard approach. Are there any BNF parsers + generators
usable from Javascript?

~~~
bd82
Some of the top of my Head:

    
    
      - Antlr (https://github.com/antlr/antlr4/blob/master/doc/javascript-target.md)
    
      - PegJS (https://github.com/pegjs/pegjs)
    
      - Nearley (https://github.com/kach/nearley)

------
wruza
For those interested, there is also jison, a flex+bison ~1:1 reimplementation
in js. I used it for a domain-specific format, it is really nice, easy and go-
to, compared to traditional alternatives.

[https://zaa.ch/jison/](https://zaa.ch/jison/)

------
eatonphil
Very nice. Here's a similar post but for Python.

[http://notes.eatonphil.com/writing-a-simple-json-
parser.html](http://notes.eatonphil.com/writing-a-simple-json-parser.html)

------
TomMarius
A great parser framework is Chevrotain by SAP

~~~
bd82
You can find it At:
[https://github.com/SAP/chevrotain](https://github.com/SAP/chevrotain)

------
flashgordon
Wait this is a JSON parser, not a JavaScript parser! Still nice work! There is
something "raw" about writing parsers by hand (even if they are recursive
descent). Teaches one a lot about clean design and modularity etc!

~~~
PopeDotNinja
It's a parser written in JavaScript, so it is technically a JavaScript parser
:)

------
marknadal
I'm surprised Ohm isn't mentioned, it is spectacular.

Anybody interested in parsers is gonna love it:

[https://github.com/harc/ohm](https://github.com/harc/ohm)

------
zerr
I wonder what happened to that C++ grandmaster course
[http://cppgm.org](http://cppgm.org)

