Hacker News new | comments | show | ask | jobs | submit login
Cucu: a compiler you can understand (zserge.com)
67 points by LiveTheDream 1854 days ago | hide | past | web | favorite | 11 comments

It is a very good idea to write a small compiler for teaching purpose. However I think it would have been better to have clean code where the reader can understand what happens at each line. Here the massive usage of global variables makes it harder and unintuitive. Also there are a lot of functions which returns for instance 0 or 1 to signal their success or failure but then at the call sites their return values are always ignored. Also, the way the ∗_expr functions works in the part 2 is really, really weird and unintuitive (but this is due to the all-in-global-variables design). I could go on with other issues, but all this is to point out the big caveat: one should not take the code of the article and try to grow the compiler to extend it because doing so will lead to horrible spaghetti code. And that really is too bad, because it would have been an even better exercise for beginners to extend such a small compiler.

Very cool. I think more articles like this, about how a compiler is not magic, should be written.

It still amazes me how many programmers I meet think that a compiler is something that they could never write. I say a production compiler might be too much work for just one person, but in its basic building blocks it's actually relatively easy to write.

I say this after trying to write one just to overcome the mystery, even though I probably won't need to ever write my own professionally.

You can find my attempt at: http://www.codedemigod.com/jack-compiler/ https://github.com/alaasalman/jackcompiler

Doing something like this before tackling the Dragon book [1] is a great idea. Formal theory will make a lot more sense once you've run into some of the practical problems yourself.

[1] http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniqu...

Great work!

Might I offer a suggestion? I would argue that most of the work that goes into a compiler is not the front-end functionality but rather the "middle-" and back-end functionality, i.e. making the generated code fast. It would be very insightful to those outside the compilers community as to what it's actually like to work on one to go over the basic optimizations, such as global data-flow analysis (reaching definitions, live variables, partial redundancy elimination, constant propagation, etc.), loop unrolling, and so on.

I'd tend to disagree: I think fundamental language design is very important. Optimizations after the language tree has been built are an entirely different concept. Important nonetheless, but different to perhaps what the author is trying to achieve.

Personally, I think he is going about teaching it in a non-trivial manner making such things as you mentioned above more complex than they perhaps need to be. But, perhaps that's just how I've been taught how to write compilers... fundamentals often are the hardest to "un-learn".

It seems to me that there are a lot of efforts around making the front-end easier to understand. I think there needs to be at least one project that touches upon the back-end in a way that doesn't trivialize it.

There is not even one similar project for the back-end that I am aware of. And it's a shame, because I think the back-end is way cooler than the front-end.

Interesting concept, though I'm not sure quite what they're trying to achieve. Are they trying to teach lexical analysis? Their `typename` method seems awfully verbose for my liking - in my opinion, definitely harder than the LALR(1) syntax that CUP promotes.

Personally, I learnt how to write compilers using a mix of FLEX (the Fast Lexical Analyzer - not by Adobe!) and CUP. From that you create a LALR(1) grammar which then compiles down to a tree (if you choose) which you traverse depth first generating code/assembly/whatever. I would say that this is actually EASIER to write and to understand than trying to do it from scratch - you concentrate purely on the grammar and the tree generation (which leads to code generation or interpretation - e.g. BASIC, code analysis etc).

This MAY seem complex as there is a bit of "black box" going on behind the scenes (how does it lexically turn characters into symbols? How does it turn those symbols into a grammar which eventually builds a working product?) however once you understand the grammar language (lexical analysis is easy) you find that compilers aren't that difficult. It's a matter of turning code into symbols which you then apply a grammar to. From there (tree/code generation/interpretation) is easy. I would in fact put lexical analysis and grammar generation into a separate toolkit which a compiler developer _uses_ - though it may be from my experience: I welcome other compiler developers to interject.

Learning how to write a compiler definitely made me a better programmer - especially in terms of OO languages and also understanding how languages are built (i.e. learning new languages aren't an issue when you know how the internals are likely to work). I highly recommend it for anyone looking to improve their skill set. Unfortunately the course I took at university (which WAS available free on the internet) teaching this concept has now died. It's quite sad that it has disappeared as that paper was hands down my most useful paper I ever took. I've now written compilers for companies converting legacy code to modern code (and native etc) from these skills - it's much more versatile than just generating assembly/machine code!

Anyway, I digress: I admire what they are trying to do, however I would recommend others to rather learn FLEX (or JFLEX, CSFLEX etc) and CUP (CSCUP etc) instead of trying to do all the heavy lifting themselves. If they want to write a lexical analyzer or a grammar parser then that is a different journey...

This is a really great article and really similar to the compilers class I got to take in college.

Removing left-recursion from a grammar by hand! Woo!

Like it though I can only understand part of it. It's really nice to see tutorials about compilers.

The site seems to be down

The site was down for me also.

On a side note - The first thing I thought about when I saw the name was copper.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact