Hacker News new | past | comments | ask | show | jobs | submit login
How to Implement a Programming Language in JavaScript (lisperator.net)
99 points by cristiantincu on March 12, 2015 | hide | past | favorite | 34 comments

I've written multiple parsers/interpreters in both JS and Python -- Python being my favorite language. From that experience, I've come around to the fact that they're both the wrong language for writing languages -- lexers, parsers, interpreter loops, compilers.

I have a good analogy to explain this. Take this OCaml program.

    let sum_file filename =
      let file = In_channel.create filename in
      let numbers = List.map ~f:Int.of_string (In_channel.input_lines file) in
      let sum = List.fold ~init:0 ~f:(+) numbers in
      In_channel.close file;
This is roughly equivalent to:

    def sum_file(filename):
      with open(filename) as f:
        return sum([int(x) for x in f])
Now think of the degree to which OCaml is awkward here. That's exactly how awkward Python and JS are for writing languages :)

OCaml is based around ML-style typed records, which are exactly what you need for manipulating languages.

I am not a type safety guy, but when you write languages, there are tons of nested conditionals. In Python in JS or C, you end up with bugs in those corners. It is quite easy to crash any non-trivial parser, like the CPython parser, or Clang's parsers, etc. The type safety that ML offers helps a lot with this.

The code for writing parsers and interpreters is just SHORTER in OCaml than in Python. I used to think Python was the most concise language. But no, it depends on the problem domain. Try it and you'll see.

Someone saying the same thing here:


I 100% agree that JS and Python are the wrong language for writing languages. Haskell, however gets you the best of both worlds. Your sum_file function in Haskell would look like this:

    sumFile :: FilePath -> IO Int
    sumFile file = sum . map read . lines <$> readFile file
I don't think many people would dispute that Haskell is at least as good as OCaml for writing languages. And strangely enough, the most advanced Perl 6 implementation is even written in Haskell!

And strangely enough, the most advanced Perl 6 implementation is even written in Haskell!

That hasn't been true for a few years, but it was true for quite a while. At this point Pugs is fairly out of date and (I believe) abandoned, as it's main developer discontinued development. For a long while it was the most advanced Perl 6 compiler though, and from what I understand a lot of it's rapid advancement was attributed to it being written in Haskell (Perl 6 and Haskell share a good portion of advanced features, so that may have helped).

The current state of support of the various Perl 6 compilers can be seen here[1].

1: http://perl6.org/compilers/features

Ahh cool, thanks for the correction.

Right, I was referring to all ML-based languages -- so SML, OCaml, Haskell, F#, and possibly even Rust.

Do you know how this would look in Haskell?

    # Return K most common lines in a file

    def top_k(f, k):
      counts = collections.defaultdict(int)
      for line in f:
        counts[line] += 1

      return sorted(counts.items(), key=lambda x: x[1], reverse=True))[:k]
I was actually looking for the OCaml example which does this. I think it was in "Real World OCaml", and I remember it being horribly ugly compared to Python. I couldn't find it though.

I would do it something like this:

    top xs = map fst $ sortBy (compare `on` Down . snd) $
        M.toList $ foldr (\x -> M.insertWith (+) x 1) mempty xs
This has the type `top :: Ord a => [a] -> [a]`. You might object that I didn't include your k parameter. Because Haskell is a lazy language, I can get your behavior very simply by doing `take n . top` and it will lazily only calculate the first n values. This gives you more abstractive power and lets you write more composable code.

Yeah, so this sort of proves my point :)

I don't begrudge anyone if they want to use Haskell or OCaml for everything.

But personally I am interested in using ML-like languages for language engineering and not general purpose programming (maybe Rust could change that, but I'm not convinced). I program in 5+ different languages regularly so I've dealt with all the friction involved, mainly by explicitly designing systems as heterogeneous collections of Unix processes.

Hmmm, I'm not sure what you're saying. What point does it prove? I've been using Haskell for general purpose programming for five years now and I've found it to be exceptionally well-suited.

The ocaml version of this code would be actually quite similar, since you can even use imperative hash tables if you want. The syntax is a bit fugly and the stdlib doesn't have some of the helpers python has but I wouldn't go as far as say that its "horribly ugly". Beauty is in the eye of the beholder and Ocaml has the advantage of being much less "magical" than python (as per the "explicit is better than implicit" mantra)

* In ocaml you need to pass some "comparator" parameters around if you want top_k to be polymorphic. Its similar to haskell type classes but you need to be explicit... * Instantiating a polymorphic hash table is a bit ugly but thats more about the standard library... * There is no "default dict" function so you would need to initialize the counts yourself * Looping the file is not hard. Most data structures will have some sort of "fold" or "iter" function you can use. * The sorting function doesn't take keys, so you need to use a comparator instead. Its not that bad though - the function to convert a hash table into a list already returns a list of tuples.

Meh, I don't buy it. Can you show it?

It's conceptually similar, but just plain uglier in practice. A lot of day to day programming is just FILLED with stuff that is very elegant in Python, and not so much in OCaml.

Language engineering is something is very elegant in OCaml.

In my opinion, EVERY language is a DSL. I stopped looking for the ultimate language -- it doesn't exist.

Elegance is subjective... While its undeniable that you often need to type less characters in Python and list/dictionary comprehensions are wonderful syntactic sugar, its kind of nice that there is much less magic going on under the hood in Ocaml. In a highly dynamic language such as Python or Ruby you never know exactly what your code is doing: any method call can can be overloaded and even things like function calls and array indexing can be overloaded too. On the other hand, in Ocaml almost everything is statically dispatched and the static typing is really great (I'm sure Python wouldn't look as nice if you added the required modifications necessary to make it statically types as ocaml is)

While you might not see this dynamism as a big problem in Python, its something that I always found confusing about Ruby. Over there, people everything is a method call and people are super happy to monkeypatch extra methods to classes, which makes it very hard for me to understand the program by reading it. At some point, things get so dynamic that static analysis in your head gets too hard and you are better off just running the program and seeing if it worked...

In any case, I can't give you a concrete example right now because my laptop doesn't have ocaml installed on it but in my other post I already listed some of the bigger differences that the ocaml version has. The tldr is that some of the features you are using in Python aren't available in the ocaml stdlib (so you would need to write the sorting comparator the long way or create the helper function yourself) and that creating the polymorphic hash table is not very beginner friendly (this is a cost we pay for stronger typing). What I wanted to say is that the big things about the Python function (hash tables and generic sorting) are both things you can do in Ocaml just fine.

> Do you know how this would look in Haskell?

I'm curious, how did you think it would look in haskell? Can you give a code example of what was in your head?

I don't know Haskell, so there wasn't anything in my head. I basically know the ML subset of OCaml and I guess Haskell, and that doesn't really apply at all to the top K problem.

I was saying that OCaml was awkward for this problem, and the reply was that OCaml is but Haskell isn't. I'm not convinced after seeing the Haskell code. Python is both semantically and syntactically a better fit for that problem.

> I don't know Haskell, so there wasn't anything in my head.

Then what were you impling by saying:

> Do you know how this would look in Haskell?

Because that sentence seems to imply it would be very convoluted in Haskell. How can you imply it would look convoluted in Haskell if as you said:

> I don't know Haskell, so there wasn't anything in my head.

Anyway, onwards...

> I'm not convinced after seeing the Haskell code. Python is both semantically and syntactically a better fit for that problem.

You aren't convinced of what after seeing the Haskell code?

How can you come to the conclusion that Python is both semantically and syntactically a better fit for that problem if you don't know both Python and Haskell?

Can you give me the criteria you are using to come to that conclusion?

Sorry, but I can't agree with this.

Here is a (more or less) identical implementation in Rust:

    fn top_k(f: Lines, k: usize) -> Vec<(&str, usize)> {
        let mut counts = HashMap::new();
        for line in f {
            *counts.entry(line).get().unwrap_or_else( |v| v.insert(0) ) += 1;
        let mut counts = Vec::from_iter(counts);
        counts.sort_by( |x, y| y.1.cmp(&x.1) );
These are the things that make it more verbose than the Python that are not related to ML:

* Curly brace syntax. Not a property of ML languages--Rust and Swift are pretty much alone in the ML family in having curly brace syntax.

* Having to write & explicitly. Not a property of ML languages at all--it's a C++ism. In the type signature, the & is because Rust has first-class references--it deliberately does not abstract over them, unlike nearly every other ML (and most other languages, for that matter). The second case is a stylistic choice: Rust is (somewhat inconsistently) explicit about when you are taking a reference. There have been some proposals to remove this annotation burden.

* Rust doesn't have a find_or_insert method in the standard library (or a "HashMap with Default" a la defaultdict). Not a property of ML languages; in fact, Rust used to have this (see http://web.mit.edu/rust-lang_v0.11/doc/std/collections/hashm...). It was removed because the Entry API is seen as more general. It would be quite easy to add such methods or types back as sugar in a library (if someone hasn't already).

* By convention, Rust APIs that mutate things don't return the original. Not a property of ML languages, just a Rust thing (in Rust, you can get sugar for this a chain! macro, as demonstrated at http://www.reddit.com/r/rust/comments/2wssch/function_chaini...).

* Having to write out ".take(k).truncate" instead of just being able to use slice syntax like [:k] (in Rust, [..k]). This is not an ML thing; the reason you can't do this is because slices in Rust do not create copies, but references into the original vector. If Rust were garbage collected, this would be fine, but in Rust the counts vector would be deallocated at the end of the function, leaving the references dangling. We could clone it, but it wouldn't be any shorter and it would be wasteful compared to just calling truncate.

* Having to specify that the counts be represented as a `Vec` to turn the counts item list into a vector. This is not an ML thing; it is because in Rust, vectors are not privileged over other types of containers. Most MLs have plenty of syntactic sugar for lists; this is again something more C++ish.

* Using '::' instead of '.' as the path separator. Most MLs just use ., this is a C++ism. I believe it is related to Rust's desire to remain LL(1), but I can't remember the particular reason for this one.

* A big one that might seem like an MLism, but isn't: explicitly writing out the types. While MLs vary in how much type inference they actually support, most will infer them. Rust is somewhat unusual in requiring function-level type annotations. In particular, a language with global inference should be able to infer this signature for the function (given in Rust syntax):

    fn top_k<I: IntoIterator>(f: I, k: usize) -> Vec<(I::Item, usize)> where I::Item: Eq + Hash
As far as I can tell, these are the things that make it verbose than the Python that are related to ML:

* Having to write "let" and "let mut". I don't think there is any ML language that doesn't differentiate between variable creation and variable assignment. Similarly, I can't think of a ML-family language that doesn't differentiate between mutable and immutable bindings (however, there was a quite serious proposal by one of the core Rust developers to do this: http://smallcultfollowing.com/babysteps/blog/2014/05/13/focu..., and it could be made less boiler-platey for the common case, as Swift does, by having a separate var keyword, at the cost of losing power while pattern matching).

Thus, here is a hypothetical implementation in an MLish language that switches the non ML-ish Rust-isms to quite reasonable ML-ish alternatives (and has a garbage collector, presumably, but if you don't like that just pretend slice is a copying operation in this language):

    fn top_k(f, k):
        var counts = HashMap.new()
        for line in f:
            counts.find_or_insert(0) += 1

        from_iter(counts).sort_by( |x, y| y.1.cmp(x.1) )[..k]
I guess you could argue that all I've done here is cherry-picked the most Python-like features from all the ML languages (not the shortest, obviously, as the Haskell example demonstrates :)), but my point is that most of the syntactic concerns people associate with ML have little to do with ML itself, but unrelated decisions of the languages in question. Personally, I would love to use a language like the above for similar tasks to what I use Python for, and maybe someday someone will create one :) (perhaps someone already has!)

Sorry, line four of the hypothetical function should be counts.find_or_insert(line, 0) += 1. Would have edited, but HN won't let you edit after the procrastination settings kick in (but curiously, does let you reply).

Your comparison isn't really fair. With similar functions from Core's In_channel you can write something like:

  let sum_file filename =
    with_file filename 
      (fold_lines ~init:0 ~f:(fun a l -> a + int_of_string l))
"Using the right tool for the job" when it comes to functional vs. "mainstream" languages is a popular meme, but it doesn't hold up to scrutiny. You can always write your own higher-level code for a given domain with a reasonably expressive language. Adding a useful type system, ADTs, etc. to a language like Python is much, much harder (though that's exactly what Microsoft and Google are trying with Javascript).

FWIW I took it straight out of Real World OCaml, assuming that that's idiomatic OCaml.

See my other comment -- how does top K lines in OCaml look? I recall that was in the book too, but couldn't find it. I remember it being fantastically ugly.

Here's one way to write down the K lines example:

  let top_k chan k =
    let incr_count m l = 
      let n = try Map.find_exn m l with Not_found -> 0 in
      Map.add m ~key:l ~data:(n + 1)
    In_channel.input_lines chan
    |> List.fold ~init:String.Map.empty ~f:incr_count
    |> Map.to_alist
    |> List.sort ~cmp:(fun (_,a) (_,b) -> compare b a)
    |> List.sub ~pos:0 ~len:k
You could probably code-golph it down to a couple of lines but I find the 'pipe' operator leads to very readable code.

Idiomatic ocaml would be to use one of those "with" functions whenever possible. Keep in mind that RWO is an introductory book and that they have to show the basics before moving to the larger abstractions...

I had just stumbled on this C# 1.1 parser written in F# yesterday. I'm not sure if a syntax-tree and parser in 350 lines is decent, but I can at least see what you're talking about in regards to ML/OCaml being well-suited for the task.


While the result is probably correct, your argument is, odd. :)

Hammers in nails with an orange, and this folks, is how Black and Decker makes better drills than Makita.

The single reason that OCaml excels at compilers is due to pattern matching. It _removes_ all those explicit conditionals by making them implicit.


Python has excellent affordances for writing compilers, they just have to be used.



It's not just pattern matching -- it's also ADTs and the functional paradigm. (e.g. in Python, functools.partial is uglier than OCaml's application, just as working with dictionaries is uglier in OCaml than Python)

Do those Python libraries offer type safety? I don't think there is any way they can. If not, you might as well use OCaml.

As I said, Python is my favorite language, but there is no reason to write OCaml in Python or even shell scripts in Python. Why not use the real thing? The Python code still won't be shorter than OCaml, even if you use these fancy libraries.

I agree with you in principle. After all, "ML" got its name "metalanguage" from the fact that it was explicitly designed to be a domain-specific language for implementing programming languages. It's hard to be a DSL at its job.

At the same time, that has to be balanced with the ecosystem value of implementing your language in a language lots of others knows.

It's a worthwhile investment to implement your language with 4x the code in Python/Java/JS/etc. if you get more than 4x contributions because of it.

> It's a worthwhile investment to implement your language with 4x the code in Python/Java/JS/etc. if you get more than 4x contributions because of it.

Maybe, but I'd say it depends on the quality of the contributions.

That's definitely a consideration, depending on your goals. For many language projects, I don't think you need more than one person for the front end. For libraries and code gen, you definitely need contributions.

IMO it's actually advantageous to have a compact description of a lexer, parser, and AST that one person edits or very few people edit. That's your language design.

At one point I thought: "If ML is so great, then why don't you see it more often in the real world?" And then the revelation was that Python essentially uses ML to describes its AST:


This is Zephyr ASDL, an ML-inspired DSL for describing ASTs. A lot of ML people may recognize Andrew Appel from the authors list:


So my favorite language's AST is described with ML!!! And has been for 10+ years. (Python has a Grammar file as well in a different syntax).

I'm actually interested in a hybrid architecture: an interpreter where OCaml generates the byte code, and then C++ executes it. I think you can produce extremely compact and flexible interpreters with this architecture, and there are several other advantages to dividing it this way.

The downside is perhaps a more complex build process, but the OCaml toolchain is quite nice actually, and I have compiled it from scratch. It's head and shoulders above Haskell in that regard. OCaml can produce .o files and link with C/C++, so you still get one executable.

I looked at your Wren language which I like a lot. I did wish the front end was more high level and not in C, but that's just me :)

There are a lot of people coming around to OCaml. Facebook is using it for Flow and pfff language manipulation:

https://github.com/facebook/pfff (I suspect Google's code analysis tools would be cut down in size by 5x if written like this in OCaml.)

And Hack, the statically typed PHP, is written in OCaml. And there are several more minor languages like HaXe and another one that are OCaml.

So if you're looking for contributions from language experts, there's definitely a lot that are familiar with OCaml.

Another really good course to learn the basic about programming language design:


And if you want to actually build a "real" programming language, I advise you to try llvm - it really takes away the pain of generating bytecode, and gives you everything you need to deal with the actual design of your language.

Interesting read.

Possibly more digestable, I recommend the walkthrough of building "Egg" in the fantastic (and free!) Eloquent JavaScript by Marijn Haverbeke: http://eloquentjavascript.net/11_language.html

Given the domain name I was hoping they'd first implement a Lisp in 12 lines of Javascript, and then implement the new language in that lisp, that would've been an interesting tutorial.

I get that it's a beginners tutorial and they have some constraints, but it starts off with "let's dream up a language" and then presents a super standard language, it's basically just Javascript with obligatory semicolons..

... and then it gets continuations, and that's where it becomes interesting.

Any case, the intent was purely didactic, but I did implement a Scheme dialect based on half of that code. To be released some day...

I'm glad they didn't write a tutorial that made another Lisp - I'd your audience is javacript developers, a lisp isn't all that appealing usually.

Anyways, this is such a detailed series of tutorials I may be distracted for days bringing my half finished programming language back from the dead.

I do a lot of JavaScript and Lisp is extremely appealing to me. JavaScript is a functional language, but not a very good one. Lisps are much better at doing functional programming than JavaScript. So a Lisp that translates to JavaScript is right up my alley.

You might enjoy wisp: https://github.com/Gozala/wisp

I remember reading this the first time as a JS rookie. Every time the author mentions "this is same in JS" a light went on in my head and all the JS quirks found earlier became just natural. It still deserves its place in the Christmas tree bookmark section.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact