
How to implement a programming language in JavaScript - tambourine_man
http://lisperator.net/pltut/
======
nnq
Cool... but nitpick about the parsing part (only glanced at it): _why do
people always forget about parser combinators and don 't present them as
obviously 'the first thing to try' when making a new language?!_

They seem the simplest and most intuitive tool for an unbiased newbie
designing a new language, and you can even introduce them by skipping the
theory with something like this: [http://theorangeduck.com/page/you-could-
have-invented-parser...](http://theorangeduck.com/page/you-could-have-
invented-parser-combinators) .

------
chvid
If anyone is interested in making a language in JS I can recommend the
"Nearley" parser/lexer:

[http://nearley.js.org](http://nearley.js.org)

Very easy parser/lexer to use, accepts a larger class of grammars than
yacc/lex and is reasonably fast.

I use it for the language in my project:
[http://emunotes.com](http://emunotes.com) and Nearley is fast enough to parse
stuff as the user is entering text.

It may be tempting to see if you can get away with not using a parser
generator (as the author does here) but I found it is not worth the trouble it
generates down the road.

~~~
joshmarinacci
Emunotes is really cool. A magical word processor. Is there a way to make
functions that connect to expanded services?

~~~
chvid
Thanks; if you by expanded services mean integration with external pieces of
user-defined javascript or data sources (such as a YQL query) it is a good and
very potent idea. It is something I like to do a little further down the road.
In the near future I am focusing on graphics and collaborative features.

------
moretti
This is also a really good resource: [https://github.com/thejameskyle/the-
super-tiny-compiler](https://github.com/thejameskyle/the-super-tiny-compiler)

------
avmich
Immediately thought about another awesome resource about how to write a
language in JavaScript:

[http://nathansuniversity.com/](http://nathansuniversity.com/)

------
peter303
UNIX/Linux has had off the shelf tools for doing this for nearly 40 years. Two
of them are called lexx and yacc.

Kind of reinventing thevwheel for those unfamilar with history.

~~~
eudox
Parsing is the smallest part of writing a compiler.

~~~
deadowl
It's also one of the most conceptually difficult parts, which makes it a good
learning opportunity.

~~~
greglindahl
One common criticism of compiler classes is that the first semester spends all
of its time on parsing, so the students don't learn very much about compilers.
And btw, there's an infinite number of conceptually difficult parts when it
comes to optimization.

------
anqurvanillapy
It would be nice to read some books like Language Implementation Patterns for
deeper thoughts after following some easy tutorials. That book is good for
designing both DSL and general-purpose languages and explains multiple
patterns with pros and cons, tradeoffs, etc.

And also, it's better to check out some metasyntax notations, e.g. EBNF
[https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_F...](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form)

------
matthewtoast
Cool! This is a great, clear tutorial.

I strongly recommend this activity to almost any programmer. No matter what,
you'll end up with a deeper appreciation for great system design. You realize
how almost every design decision boils down to a compromise between
convenience and flexibility. For better or worse, you'll also get new respect
for the cruel reasoning behind more arcane, but also more flexible APIs. Red
pill blue pill kind of thing.

And, just to share: Last year I took a crack at creating a JavaScript-
interpreted Lisp and this is the result -
[https://github.com/matthewtoast/runiq](https://github.com/matthewtoast/runiq)
\- I haven't added anything since then but I might pick it up again, as it was
a helluva lot of fun and also quite mind-expanding.

------
zachrose
Is there such a thing as a language that compiles to a value rather than an
executable program?

I've done a really hacky version of something this where a user would input a
succinct, high-level description of a sequence of colored lighting, and my
program would parse* that and build the tedious, low-level JSON representation
of that sequence for yet another program to perform. Is there a fancy or
unfancy name for that?

*Ok really it was a little bit of munging and then eval.

~~~
nostrademons
Well, at some level _every_ language compiles to a value rather than an
executable program - the parse tree of a source module is a data structure
describing the program, and the executable is a sequence of bytes that is
interpreted by the processor.

There's a continuum between this and what we think of as a "value" in
programming terms (eg. an int or a string), but the boundary is fuzzy. It was
pretty common at Google to write DSLs that compiled to a protobuf that'd be
loaded into the server as configuration, for example. CUDA compiles programs
into a buffer that is then sent to the GPU. HTML compiles into the DOM, which
is a value that can be manipulated with Javascript.

------
passiveobserver
The dude writes good code. He's the author of Uglify.

------
pier25
So this is how magic happens

------
ralphc
The new hotness seems to be transpiling, turning an enhanced language into
Javascript. Any good resources for this?

~~~
ehaliewicz2
It's the same as any other compiler, just generate javascript out of the
backend.

------
mikekchar
It would be nice when saying things like "don't use regexps for parsing" that
it is accompanied by an explanation (possibly I didn't read far enough to see
the explanation...)

There are only a few actual computer science theory things that I think all
programmers should know and this is one of them. There are classifications of
grammars (1). Without going into detail, regular expressions can only be used
to parse regular grammars.

The problem is that most grammars for programming languages, file formats,
communication protocols, etc, are not regular grammars. With a regular grammar
you can look at the current state and the next input symbol to determine the
next state. With context free grammars you need to have a stack of states.
With context sensitive grammars you need to have a stack of stacks states.
With unrestricted grammars you are essentially screwed ;-)

So your first task when you are designing a language or file format or
communication protocol or whatever is to choose a simple grammar. If you
choose a regular grammar then you _can_ (and probably should) use regular
expressions to parse it. With context free or context sensitive grammars you
can often do lexical analysis (creating symbols that you then pass to your
parser) with regular expressions, but you need something more complex for
parsing the stream of symbols (i.e., you need to be able to put parser states
on a stack).

The problem that you often see is that people design things and have
absolutely no idea what kind of grammar it is. They use regular expressions
(or some ad-hoc code) and then try to keep track of state in global variables.
If they happen to have a context sensitive grammar then chances are their
parser simply will not work correctly no matter what they do.

You may be wondering why people choose to use more complex grammars if it is
harder to parse. The main reason is that complex grammars give you more
options for expression. Sometimes it is extremely difficult or even impossible
to represent something with a regular grammar. Having said that, though, you
should almost always try to keep your grammars at least context free. Once you
get into context sensitive grammars, the difficulty of parsing will either
make your parser very difficult to implement or buggy as hell (usually both).
Usually it is better to remove functionality than it is to move from a context
sensitive to context free grammar.

In the past I have often seen file formats that have unrestricted grammars
(converting file formats used to be my job). People who do this should be
replaced with programmers who know what they are doing :-P

(1) -
[https://en.wikipedia.org/wiki/Chomsky_hierarchy](https://en.wikipedia.org/wiki/Chomsky_hierarchy)

~~~
groovy2shoes
This is all true, for sure, but I'd like to point out that PEGs are a
formalism that isn't much harder to learn than regular expressions, and
totally appropriate for parsing a fairly large class of languages. I've also
seen PEG-based (pakrat) parser generators for just about every programming
language (some better than others, of course). Again, these tend to be not
much harder to learn than regular expressions, and because of the "ordered
choice" operator, many programmers seem more comfortable reasoning about PEGs
than a Yacc specificiation, for example.

PEGs aren't the only formalism that attempts to make writing certain classes
of parsers easy. You can get a long way with parser combinators, and Matthew
Might's "Yacc is Dead" paper discusses another (extremely general) approach,
which I have unfortunately not seen many implementations of.

Anyway, enough of an addendum! Keep on spreading the Good Word ;)

~~~
mikekchar
Thanks for addending :-) I wasn't aware of PEGs. That definitely does look
promising for a lot of cases.

~~~
corysama
I recommend checking out
[https://en.m.wikipedia.org/wiki/OMeta](https://en.m.wikipedia.org/wiki/OMeta)
and [https://github.com/cdglabs/ohm](https://github.com/cdglabs/ohm)

~~~
joshmarinacci
This. OMeta and Ohm are awesome.

------
tumurinn
Just wondering if it is true that most of the modern language parsers are
written in C?

~~~
rtpg
I don't think this is true. Loads of FP languages are self-hosted.

~~~
sjrd
Not only FP. Lots of languages are self-hosted. So actually most modern
parsers are _not_ written in C, since they are written in the language they
parse.

