For example, a sequence like this:
1. variable assignment
2. commands of one parameter
3. if / branch statements
4. much later: nested expressions, function calls, multiple stacks, environments
basically, starting with an abstraction of 3 parameter opcodes, and then slowly building features up the abstraction layers, as well as slowly building down the optimization layers, step by step.
I will put it on github if i ever get around to writing one myself.
I am thinking something targeting / including a toy version of qemu.
eval(X,X) :- number(X).
eval(X+Y,Z) :- eval(X,X1), eval(Y,Y1), Z is X1+Y1.
eval(X*Y,Z) :- eval(X,X1), eval(Y,Y1), Z is X1*Y1.
main :- write("Enter an arithmetic expression followed by a dot \".\" and a newline\n"),
repl :- read(Input),
And when I look at the credits for lex and yacc today I see some very capable programmers. Why do these programs exist? I may be biased somewhat as I use them regularly for small tasks, but my question is an honest one.
Also, someone else mentioned swi-prolog. I think gprolog is just as easy and is noticably faster. Curious what HN readers think are the advantages to swi-prolog.
Tools like ANTLR, SableCC, JavaCC, MPS and approaches like Attribute Grammars, Parsing expression grammars are much more suitable for the modern days.
I don't know gprolog, but SWI has GUI tooling, compilation and I was already using it in 1997, so it is very sound toolchain.
 - http://www.antlr.org/
 - https://github.com/SableCC/sablecc
 - https://javacc.java.net/doc/JJTree.html
 - https://wiki.haskell.org/Attribute_grammar
 - https://en.wikipedia.org/wiki/Parsing_expression_grammar
 - https://www.jetbrains.com/mps
Give me the 70's tools any day.
I just listed a few examples.
I'd at least say that it is using the lex/yacc approach, which I actually like.
The short answer might be that LL is simpler, and that handwritten recursive descent might be faster (especially if lexing and later stages are directly integrated into the parser) than tooled LALR parsers. LR is more expressive and can directly handle more grammars?
That's not to say that I don't agree with the general argument that there's no sense in shunning parser-generators all the time. I've made good use of yacc (and ocamlyacc, and menhir, and Lua LPEG, etc.) in the past, and the parser for our commercial compiler at work is written with ANTLR to generate an LL() parser in C++. Such metaprogramming tools let us work at a higher level of abstraction, and we get all the usual benefits of working at a higher level.
I think that, in most* cases, there's really no good reason not to use a parser generator (or, alternatively, parser combinators). I don't even suspect that you learn anything in particular that you wouldn't learn by learning how to formalize your grammar and specify semantic actions for a parser generator (unless perhaps you aren't familiar with mutual recursion, that is...). However, there are some good reasons not to use a generator, and, as with anything, the trade-offs need to be carefully considered before deciding.
This all reminds me -- I've been meaning to play with Treetop for Ruby lately :)
There's also something to be said for not having a dependency on an extra tool, especially since such tools tend to be complicated and difficult to debug or extend. As such, when you encounter a language construct that's not easily definable for the generator, you may have have any easier time resorting to some hacks to get what you want rather than just extending the tool.
It is not any more complicated with a parser generator (if it's a PEG-based one). In fact, it's much easier, you're avoiding a lot of boilerplate this way.
> There's also something to be said for not having a dependency on an extra tool
Do not have a dependency on an extra tool. Use a meta-language, in which a parser generator can be embedded as a first class language feature.
True. PEGs are effing great!
> Do not have a dependency on an extra tool. Use a meta-language, in which a parser generator can be embedded as a first class language feature.
This is ideal, but unfortunately not always possible. For example, if you're stuck writing a language processor in C, you don't get the necessary tools for linguistic abstraction without an external dependency. In general, though, I agree with the sentiment.
I have always really enjoyed just reading the source code for programming languages. As I learn more and more, I seem to take away a bit more each time.
Personally I've enjoyed reading through the source code for Go, since it is hand written in Go. Being hand written, it can be a little repetition reading through it, but I find it to be pretty easier to read/understand.
Also, I have read (at least parts of) the book, "Engineering a Compiler", which being a novice in the subject, some of it goes over my head, but I think it does a better job outlining the topic than any other books I have read.
I am also currently developing a "real" language that is based on the same lexing and parsing machinery:
For a simple language compiler is nothing more than a chain "parse -> strip off syntax sugar -> resolve lexical scope -> propagate types -> flatten the code -> emit target code", with each step being independent and trivial, nothing more than a set of tree rewrite rules.
There's implementations of Ruby, JS, R, Python, etc, using it.
I have a blog about the Ruby effort http://chrisseaton.com/rubytruffle/
(Though my changes are ugly).
If/When you get to implementing functions things will get really useful :)
If you look in the main file, huo.c, you will see something called store_defs. That function takes the ASTs of defined functions and stores them in a key-val type store. If someone invokes a user-defined function I just grab the AST, replace the variables with the values they passed in and execute it. That code is in process_defs and is invoked by execute.
You may want to take a look at Bison/Flex. While they don't suit the tastes of everyone, it's good to know how to work with them. For example, BASH parser is written using those tools.