
Writing a compiler in Ruby, bottom up - qhoxie
http://www.hokstad.com/compiler
======
apgwoz
I realize the author's intention and the fact that this was written in a
"bottom-up" style, but one has to be careful when using "bottom-up" in the
world of compilers. It can also refer to the type of algorithm used in
parsing. For instance, yacc generates parsers for LR grammars in a bottom-up
style, meaning it builds the parse tree upon discovery, not upon expectation
such as a top-down parser for an LL grammar.

------
sb
though this is a nice project in itself--and i don't want, and neither intend
to, to insult anybody, i think that this implementation is not actually well
suited for anybody new to compilers to understand what's going on.

AFAICT, newcomers are much better off learning the art of simplicity w.r.t.
compilers from niklaus wirth's book on compiler construction (the pdf can be
downloaded from his homepage); it is an easy read, and pascal is also a
worthwhile language to learn (IMHO, i sure did in the mid-90s and it was no
waste of time). there is also much benefit in knowing the tools of the trade
if you have to design something other than a simple language, i.e. knowing
bison/yacc/cup/antlr/burg etc. is definitiely a plus. finally, the traditional
top-down-approach is also very useful if you're designing, say a DSL, yourself
-- it can still be interpreted within the language of your choice (i.e. ruby,
perl, python have all nice parser generators for that task) -- no need to
directly get to "shores of hell" (again so mid-90s ;), ahem, code
generation...

~~~
vhokstad
Hi, I'm the one writing the series, and I halfway agree with you.

I started learning to write compilers by getting Turbo Pascal and seeing the
whole syntax expressed in BNF in a page or two of the manual, and the Pascal
family (Modula, Oberon etc.) are all great starting points if you want to
start writing a compiler top down.

Apart from Wirth's text, there's of course also the great series by Crenshaw.

The reason I chose to go in the opposite direction, though, was in large part
because starting with the syntax tree etc. has been done to death, and a lot
of people hit the wall long before getting something that generates a program
that can run.

I also get the impression that code generation is seen like something of a
dark art ("shores of hell"? It's not _that_ bad :) ), and that a lot of the
time the choice of writing an interpreter instead of a compiler comes down to
fear of the amount of work involved with generating code.

Witness how many interpreters don't even generate bytecode (Matz Ruby
interpreter, I'm looking at you...) and suffer tremendously for it.

It's not that hard. Generating really fast code is hard, but generating code
that beats the pants of an interpreter for the same language is usually fairly
easy. Though some languages make it tricker than others to get something
efficient, in particular due to the level of dynamic features (Ruby, I'm
looking at you).

Another reason was that I want to play with something close to Ruby, and the
Ruby syntax is an absolute _nightmare_ to parse when you're the kind of person
who appreciates clean, compact syntax descriptions... Who am I kidding? Ruby
is a nightmare to parse unless you're the kind of mental masochist who enjoys
doing his tax calculations on paper in binary just for the hell of it.

I decided to chicken out and do the "easy bits" first, particularly since a
lot of my motivation was to get a platform to experiment with efficient ways
of generating code for a language with the level of dynamism of Ruby.

It's as much an experiment for myself, since I've always built compilers top
down myself as well. The reason I decided to start posting it was because I
realized how quickly one can get to something that's actually semi-usable.

Anyway, a few more parts and I'll get to the parser. It will be based on a
parser generator, though a more "Rubyish" one.

(And before anyone asks: No, don't expect a Ruby compiler from me, at least
not in the next 5 years ;), I don't have the time, though I will explore
topics related to it, such as a Ruby-like dynamic object model)

~~~
sb
hi there, pascal can't be that bad after all ;)

no, seriously -- your points totally make sense, i just wanted to provide
additional material for the top-down approach. the "shores of hell" were also
more thought of being tied to the 90s analogy and not directly with code
generation. i agree that it's considered to be a black-art, but the entrance
barrier to understanding is usually pretty high (i have taken a course in
advanced code generation at my university as part of a phd curriculum); i
found that muchnicks 1997 book is a good reference and rewarding experience --
notation set aside...

one stats on interpreters/compilers: a paper called "the structure of
efficient interpreters" (or at least some title close to it) has an
interesting finding: the difference between a standard interpreter and an
efficient/optimized interpreter is a factor of 100, whereas the difference
between an optimized interpreter an a (AFAIR non-optimized) compiler is a
factor of 10. hence the tradeoff varies.

finally i have to agree with you that the dynamism plays a crucial role for
compilers--but then again, it is also very hard for optimizations in
interpreters ;)

~~~
parenthesis
"The Structure and Performance of Efficient Interpreters":

<http://news.ycombinator.com/item?id=266900>

------
snorkel
Interesting series. Here's a different approach to code parsing which some
people may find easier using Top Down Order Precedence:

<http://javascript.crockford.com/tdop/tdop.html>

The example explained on that page by Douglas Crockford (Yahoo's Javascript
guru) is a TDOP parser written in Javascript thats parses Javascript (it's a
concept demo, not a practical use case) What's nice about his example is he
covers lots of different features like scope, functions, and objects, which
are sometimes glossed over in other texts about how compilers work. This is
not a complete compiler because it only produces a syntax tree, but going from
syntax tree to interpreter/compiler is the easier part.

~~~
Psyonic
In my experience going from syntax tree to interpreter/compiler is definitely
the harder part. Parsing is a relatively straightforward task.

~~~
procrastitron
That is my experience as well, but keep in mind that the design of the source
language affects this. For instance, writing a parser for C++ would be a
nightmare full of bookkeeping, but a basic translation to machine code would
be relatively straight forward.

------
qhoxie
This is a really impressive exercise in both Ruby and compilers. He takes care
to give enough information to help those with limited C and assembler
knowledge.

