That will get you started rediculously quickly and give you things like GC and JIT for free. You can always implement it outside of racket once it starts to take further shape
This archives the state of the system when we made the switch:
Some things, like the grammar (https://github.com/brownplt/pyret-lang/blob/5f22ec7c8affde15...) have survived largely intact from that prototype for years.
http://wiki.c2.com/?ProgrammingLanguagesAnInterpreterBasedAp... was very fun and surveys a wider variety of languages, though it's also very dated now.
The second and third editions feel less "hard core" somehow.
The book doesn't seem that great for self-study. Would you confirm or reject that assessment?
I've been hesitant at starting these books because I was thinking too that without the exercises, it would be a lot less useful, and so it would be a bigger time/effort commitment than I want to get into.
EoPL took about as much work as a good college class, minus the sitting-in-lectures-and-tests part. Some parts aren't depended on later, like the OO chapter IIRC, so they're up to whether you're interested.
1. Avoid long discussions on the theory of parsing, how to transform regular expressions into table-driven DFAs, how to build an LR(1) parser, etc.. These topics can be interesting later on, but for the benefit person who just wants to learn to write a language, the author should instead focus how to write the scanner and the parser by hand using a predictive recursive-descent. The value of this approach are three-fold: (1) the book is shorter, (2) it's a simple approach that works with any language without specialized tool (useful if someone wants to write such a tool for vim or Emacs, say), (3) it feels more concrete, more "real" to write a parser by hand rather than create a few grammar rules and let an external tool create the code that does the parsing.
2. "Breadth-first" rather than "depth-first". In a typical compiler textbook, each chapter exhausts almost all that there is to say about a topic before moving on. I think a more practical book will avoid such deep discussions and try to go as quickly as possible from source language to executable program. After the initial implementation is complete, the author can go back and add more details. Bob's approach of having two interpreters, one AST-walker and one that builds bytecode, fits that idea quite well.
3. Multiple implementation languages. The implementation of a language is largely guided by its host language: if using ML or Haskell, then sum types will be quite handy; in Java we can rely on the library's excellent collections. It's cool that Bob decided to write an interpreter in Java and another one in C; the reader will get to see how some decisions in the former (e.g., using exceptions) are "translated" in the later.
4. A full implementation in the book. Many compiler textbooks use pseudo-code and steer clear of "pedestrian" topics like good error handling. By having a full implementation, Bob ensures that the little dirty details are addressed also and not swept under the rug and left for the readers to discover and struggle with.
I really look forward to the day when I can order a copy of Crafting Interpreter, I have no doubt that it will be a terrific book.
I'm looking forward to Crafting Interpreters too.
The second part would be choosing a platform to run on (so you only have to write a compiler frontend). There are many great technologies to host a language, but i'd advocate two specific ones here:
* https://github.com/zetavm/zetavm (shoutout to @love2code)
LLVM is used in many low level / near metal languages like C, C++, rust and so on.
None of both must be your final targets, your compiler frontend can be moved to target another platform like the JVM later.
As a last part, implement your compiler frontend. This involves lexing/parsing source code, type checking if needed and generating your target's intermediate representation.
I personally prefer hand-written recursive descent parsers using a parser combinator framework (like https://github.com/Geal/nom) over parser generators.
For further processing like type checking, there is the common monolith aproach and the nanopass framework/approach https://www.youtube.com/watch?v=Os7FE3J-U5Q
Also there's a great series on compiler frontends by Alex Aiken: https://www.youtube.com/watch?v=sm0QQO-WZlM&list=PLFB9EC7B8F...
Hope this gives an overview. :)
>using a parser combinator framework (like https://github.com
>/Geal/nom) over parser generators.
Handwritten parsers are most commonly used because most tools have problems making things like proper error reporting good enough to be worthwhile.
Parser combinators is really just a fancy way of composing a parser from higher order functions instead of writing out a function body. If well done it basically provides a DSL to build the parsers, and the way this works by composing small fragments is pretty much what we do with recursive descent anyway, so it's a good fit.
There are lots of small, comprehensible compilers and interpreters there, and there is an interpreter and runtime construction kit, look for "S9 core" in http://t3x.org/s9fes/. You can also order books explaining all the stuff. And yes, I'm the author! :)
The framework has since evolved into ohm, but I think ometa-js is more fun for playing around with.
Here's what some programming luminaries had to say (lifted from the website):
“The book I want to read.” — Matz, creator of the Ruby language
“I really love this book.” — Jeremy Ashkenas, creator of the CoffeeScript language
While I'm here, does anyone know of any practical resources on how to implement LLVM languages with GC? The documentation  leaves something to be desired.
Once you've gone through it, you'll probably want to use the magic tools as they're well tested at this point, and make your life easier. Still, understanding what they're doing for you is a massive boon.
I should note, it uses Go but not in an idiomatic way. Strings instead of using an error type, etc. This is theoretically for the purpose of making it easier to follow along in another language, but I still wasn't a fan of that aspect.
I'm also super curious about your "not in an idiomatic way" comment. If you have a minute, feel free to send me a longer version - me at thorstenball.com. I'd really love to hear what you'd do differently and what could be more idiomatic.
> I'm happy to hear that you appreciate the book exactly in the way it was intended to - little magic, lots of code and unit tests.
Yeah, that's exactly what I wanted out of it, and my coworker recommended it at the best possible time on top of that.
It is built in Java and uses a recursive descent parser to parse sentences like "draw a red circle at 300, 300", and is well commented because the intention was for advanced students at my education program to be able to easily extend and hack on it.
A useful question: you want to talk to a computer conversationally, and get beyond the Alexa/Siri level of tasks. How would you do that? Fully general natural language requires strong AI. Could you come up with some more restricted form with enough expressive power but understandable by a computer? The computer might have to rephrase your questions and ask "did you mean...?" Come up with a way to converge when the understanding is poor, and you'll have something.
http://www.craftinginterpreters.com/ is coming along really nicely.
Reading academic papers can give you some excellent ideas once you get into the weeds (e.g. garbage collection, compiler optimization).
Otherwise, get your hands dirty with a parser generator(PEG parser generators tend to be fairly forgiving). It is pretty easy to get started making an interpreter that way, and it is quick to prototype with.
Also, if you're really only interested in the language then you should think about targeting LLVM IR or the JVM.
It's really easy so I could build it with coffeescript in just 4 hours while I was a student.
Once it becomes a little more mature, Zeta VM  may also be a good target.
Edit: It looks like Snabel is a Forth inspired concatenative language written in C++ with some perl like features. That's pretty cool. If you ever get the chance I think I'd enjoy it if you made some video tutorials explaining the design and some of the code choices.
I've never written a standard Forth program in my life though, never installed any other implementation. The second I was introduced to Forth, it clicked; Lisp took me much, much longer to get by comparison.
Like I said, I'm not very much into rebuilding what has already been built better by someone else. Anyone who's written any amount of code is bound to have own ideas, and the nice thing about Forth as a substrate is that doesn't come with that many ideas of its own.
I've written a few blog posts explaining design choices (https://github.com/andreas-gone-wild/blog/blob/master/forthy...), there's more interesting stuff to come now that the pieces are falling into place; I've only been working on Snabel for a couple of months.
We over-complicate things, for different reasons. The truth is that nothing is very complicated once you understand it well enough to break it down to its core.
For threading, this is probably a good start: https://www.complang.tuwien.ac.at/forth/gforth/Docs-html/Thr...
EDIT: I don't know how well you know Forth, but it's probably a waste of time trying to implement it before you grok it from userland, if that's where you are. Understanding first the concatenative paradigm and then (more importantly) the deep metaprogramming runtime-is-a-fundamental-part-of-your-application paradigm (which Forth shares with Lisp and Smalltalk) is super important
Don't start off with YACC/Bison as they hide a lot of stuff under the hood. It's cool learning things from scratch. The most commonly-suggested book on compilers is known as the Dragon Book and if you want to take this endeavour seriously, you should really get a copy.
Building the language itself is the easy part...
This book will stop you from making easy mistakes in design of whatever DSL or language you're trying to create. Also see the commentary he keeps on it which changes often http://www.cs.cmu.edu/~rwh/pfpl/commentary.pdf
If you are building an interpreter too this was a great inspiration: http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInst...
Here's the language I'm building: https://github.com/kevwu/kythera
Not free, but his screencasts are excellent, so worth a look if you see a topic that interests you.
you can continue with the links from there.
do you want to target something in particular?
implementing a lisp (scheme) or a forth is a good starting point.
If you aren't yet committed to any language, you can start building a parser with PyParsing. It's really easy.
If you want to take a quick (albeit expensive) class on it, Dave Beazley offers one:
Really, if you're coming to programming language design with the thought "I'm going to make an imperative, object oriented language" (with parallelism as an afterthought), you're doing it wrong. The world has enough of those already and you're going to invent something worse than what's already there.
Probably, instead of inventing a new language (say Matlab for matrix operations or Prolog for logical reasoning), you'd be better off implementing a library that handles the same concepts and embeds into another language (which is really what happened with Tensorflow or MapReduce to think of two examples).
(Grune's book "Parsing Techniques" is a great reference on parsing crap, but the secret is that if you design your grammar to be LL(1) you can parse your language using recursive descent: you only need a fancy parser if you designed a more complicated grammar (why'd you do that?))
Recommended book: The Reasoned Schemer. It's a cute (maybe too cute) book that shows how to implement a logic programming language (~datalog) using scheme as a base language. The Wizard book (structure and interpretation of computer languages) also has really cool examples that I think 60% of programmers I've worked with in industry don't fully appreciate.
OP asks for resource to build a language. The Dragon book is a very good book to cover the whole process.
I agree! It's hard to criticize a book rightly regarded as a classic, but I think it solves the wrong problems, or at least emphasizes the wrong areas.
if you design your grammar to be LL(1) you can parse your language using recursive descent
Yep, that's the way to do it.