The syntax is of course very limited with a one-line lexer, so more sophistocation here would somewhat expand the lexer, even though if it remains centered around regexes (which it probably will), it can remain pretty small.
When expanding the language, the tree building code can probably stay pretty small as well, if the syntax of your new constructs are chosen wisely. (I foresee control structures as prefix operators, which will be interesting to say the least.)
The evaluator and transpiler would complicate themselves significantly when adding control structures to the language: I think the most code-expanding task would be handling of not-directly evaluated code blocks. In the case of conditionals, you can probably get away with the ?: operator, also seeing as JS has the comma operator. But in the case of loops, you need a conpletely different structure in the output; it would depend on the implementation whether that stays small or not.
Of course, adding control structures is only useful if there's some way of defining names, be it variables (producing an imperative language) or recursion-capable functions (producing a functional language). It might be an interesting exercise to see how far you can push this language while still keeping the compiler under, say, 100 lines. :P
Protip if you do that: ditch the evaluator. A compiler doesn't need an interpreter to compile. ;)
As Java was just released by the time my class got to do the same assignment, we ended up using Java with JavaCC and JJTree.
While not as easy as Lisp, ML or Prolog, it was still much easier than the Jurassic yacc/lex that still prevails in some circles.
Reminds me of this quote "Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."
Everyone who is into compilers and parsing should try recursive descent at least once.
From my experience most people don't use parser generator tools due to lack of knowledge that they even exist.
The first one isn't such a big deal for most programming languages, the second one is. Throw in better error handling and tricks like sub-file incremental and/or heuristical compilation (for IDEs), and the generators don't make much sense anymore.
I don't think you can make such a sweeping generalisation. Both JavaCC and Antlr have saved me a lot of time when doing DSLs in the past and both are well documented and easy to use. It would have been much more work for me to write the parsers by hand in Java. Both could also generate my own hand-rolled AST easily and so I do not see why integration would be a problem for most projects. I agree that generators do not scale to large compilers or projects with special needs, but that is not most projects.
Not so much when prototyping, when it isn't even clear if the language is going to get any users beyond the design team.
Interestingly, most mature industrial languages (e.g. Java, GCC, clang) use hand-coded recursive descent parsers. And of course such parsers can be implemented very easily from first principles using parser combinators in FP.
I've used both JavaCC and Antlr in the past for small DSLs, both are very good tools, when working with Java.
Pattern matching and FP/LP based algorithms are quite convenient, versus doing it in an imperative approach.
Hand coded parsers win on the error reporting front.
Compilers often use graphs as intermediate representation, and functional languages often are not a good fit for dealing with them.
That said, functional languages can and do work with graphs when needed, but yes it is much more awkward than trees.
Why is that?
A similar project: https://github.com/thejameskyle/the-super-tiny-compiler
Every programmer should implement a simple Lisp from scratch at least once, just to understand compilers and interpreters aren’t magic.
It called "How to implement a programming language" its in JS and its made by the guy who's behind uglifyJS
But we're just arguing about semantics here :) My comment was more directed towards the interpreter part of things.
I think you have it backwards. Every transpiler is a compiler, but the reverse is not necessarily true.
It is not black and white, compiling to IL or bytecode qualifies as compiling. Perhaps compiling means compilation to a binary executable format?
It is all a matter how the CPU executes it, direct mapping of machine code into gates, or micro-coded translation layer.
It would help squelch the confusion for future forum bike shedding.
Someone seems to try and point this out every time it comes up like they're winning some sort of pedantry points, but I don't see how they could be. Please illuminate me so I understand the next time someone says this and possibly join la resistencia.