It still amazes me how many programmers I meet think that a compiler is something that they could never write. I say a production compiler might be too much work for just one person, but in its basic building blocks it's actually relatively easy to write.
I say this after trying to write one just to overcome the mystery, even though I probably won't need to ever write my own professionally.
You can find my attempt at:
Might I offer a suggestion? I would argue that most of the work that goes into a compiler is not the front-end functionality but rather the "middle-" and back-end functionality, i.e. making the generated code fast. It would be very insightful to those outside the compilers community as to what it's actually like to work on one to go over the basic optimizations, such as global data-flow analysis (reaching definitions, live variables, partial redundancy elimination, constant propagation, etc.), loop unrolling, and so on.
Personally, I think he is going about teaching it in a non-trivial manner making such things as you mentioned above more complex than they perhaps need to be. But, perhaps that's just how I've been taught how to write compilers... fundamentals often are the hardest to "un-learn".
There is not even one similar project for the back-end that I am aware of. And it's a shame, because I think the back-end is way cooler than the front-end.
Personally, I learnt how to write compilers using a mix of FLEX (the Fast Lexical Analyzer - not by Adobe!) and CUP. From that you create a LALR(1) grammar which then compiles down to a tree (if you choose) which you traverse depth first generating code/assembly/whatever. I would say that this is actually EASIER to write and to understand than trying to do it from scratch - you concentrate purely on the grammar and the tree generation (which leads to code generation or interpretation - e.g. BASIC, code analysis etc).
This MAY seem complex as there is a bit of "black box" going on behind the scenes (how does it lexically turn characters into symbols? How does it turn those symbols into a grammar which eventually builds a working product?) however once you understand the grammar language (lexical analysis is easy) you find that compilers aren't that difficult. It's a matter of turning code into symbols which you then apply a grammar to. From there (tree/code generation/interpretation) is easy. I would in fact put lexical analysis and grammar generation into a separate toolkit which a compiler developer _uses_ - though it may be from my experience: I welcome other compiler developers to interject.
Learning how to write a compiler definitely made me a better programmer - especially in terms of OO languages and also understanding how languages are built (i.e. learning new languages aren't an issue when you know how the internals are likely to work). I highly recommend it for anyone looking to improve their skill set. Unfortunately the course I took at university (which WAS available free on the internet) teaching this concept has now died. It's quite sad that it has disappeared as that paper was hands down my most useful paper I ever took. I've now written compilers for companies converting legacy code to modern code (and native etc) from these skills - it's much more versatile than just generating assembly/machine code!
Anyway, I digress: I admire what they are trying to do, however I would recommend others to rather learn FLEX (or JFLEX, CSFLEX etc) and CUP (CSCUP etc) instead of trying to do all the heavy lifting themselves. If they want to write a lexical analyzer or a grammar parser then that is a different journey...
Removing left-recursion from a grammar by hand! Woo!
On a side note - The first thing I thought about when I saw the name was copper.