AFAICT, newcomers are much better off learning the art of simplicity w.r.t. compilers from niklaus wirth's book on compiler construction (the pdf can be downloaded from his homepage); it is an easy read, and pascal is also a worthwhile language to learn (IMHO, i sure did in the mid-90s and it was no waste of time). there is also much benefit in knowing the tools of the trade if you have to design something other than a simple language, i.e. knowing bison/yacc/cup/antlr/burg etc. is definitiely a plus.
finally, the traditional top-down-approach is also very useful if you're designing, say a DSL, yourself -- it can still be interpreted within the language of your choice (i.e. ruby, perl, python have all nice parser generators for that task) -- no need to directly get to "shores of hell" (again so mid-90s ;), ahem, code generation...
I started learning to write compilers by getting Turbo Pascal and seeing the whole syntax expressed in BNF in a page or two of the manual, and the Pascal family (Modula, Oberon etc.) are all great starting points if you want to start writing a compiler top down.
Apart from Wirth's text, there's of course also the great series by Crenshaw.
The reason I chose to go in the opposite direction, though, was in large part because starting with the syntax tree etc. has been done to death, and a lot of people hit the wall long before getting something that generates a program that can run.
I also get the impression that code generation is seen like something of a dark art ("shores of hell"? It's not _that_ bad :) ), and that a lot of the time the choice of writing an interpreter instead of a compiler comes down to fear of the amount of work involved with generating code.
Witness how many interpreters don't even generate bytecode (Matz Ruby interpreter, I'm looking at you...) and suffer tremendously for it.
It's not that hard. Generating really fast code is hard, but generating code that beats the pants of an interpreter for the same language is usually fairly easy. Though some languages make it tricker than others to get something efficient, in particular due to the level of dynamic features (Ruby, I'm looking at you).
Another reason was that I want to play with something close to Ruby, and the Ruby syntax is an absolute nightmare to parse when you're the kind of person who appreciates clean, compact syntax descriptions... Who am I kidding? Ruby is a nightmare to parse unless you're the kind of mental masochist who enjoys doing his tax calculations on paper in binary just for the hell of it.
I decided to chicken out and do the "easy bits" first, particularly since a lot of my motivation was to get a platform to experiment with efficient ways of generating code for a language with the level of dynamism of Ruby.
It's as much an experiment for myself, since I've always built compilers top down myself as well. The reason I decided to start posting it was because I realized how quickly one can get to something that's actually semi-usable.
Anyway, a few more parts and I'll get to the parser. It will be based on a parser generator, though a more "Rubyish" one.
(And before anyone asks: No, don't expect a Ruby compiler from me, at least not in the next 5 years ;), I don't have the time, though I will explore topics related to it, such as a Ruby-like dynamic object model)
If there's any reason to write a compiler that's a good reason. More examples the better.
no, seriously -- your points totally make sense, i just wanted to provide additional material for the top-down approach. the "shores of hell" were also more thought of being tied to the 90s analogy and not directly with code generation. i agree that it's considered to be a black-art, but the entrance barrier to understanding is usually pretty high (i have taken a course in advanced code generation at my university as part of a phd curriculum); i found that muchnicks 1997 book is a good reference and rewarding experience -- notation set aside...
one stats on interpreters/compilers: a paper called "the structure of efficient interpreters" (or at least some title close to it) has an interesting finding: the difference between a standard interpreter and an efficient/optimized interpreter is a factor of 100, whereas the difference between an optimized interpreter an a (AFAIR non-optimized) compiler is a factor of 10. hence the tradeoff varies.
finally i have to agree with you that the dynamism plays a crucial role for compilers--but then again, it is also very hard for optimizations in interpreters ;)
Also, bristling a little bit at the comment that you need parser generators for anything but "toy" languages; read Hanson's LCC book for a design, implementation, and justification of a hand-coded recursive descent parser. Without having written a hand-coded parser, you might also fall into the trap of thinking you need an "RD-parser" tool to build one.
Finally, this article covers code generation, something most people totally skip. "So mid-90's" --- yeah, well that attitude may explain why we're still evaluating ASTs and bytecode arrays in our most popular languages.
I loved this series.
Just to extend your point -- Most commercial compilers (GCC, for example, or Clang, among others) use hand-written recursive-decent parsers instead of parser-generators
The link on the author's homepage seems to be broken.