Hacker News new | past | comments | ask | show | jobs | submit login

There's no reason why implementing a language in itself would be special, it's not dealing with fringe situations in any sense of the word. It is traditional to write compilers in the very language they compile, except for slow interpreted languages, and a compiler is a very well-studied type of program that includes a variety of different types of code such as straight text processing, error reporting, and recursive data structures. Compilers are often used as test beds for new language features, famously, Niklaus Wirth would measure language improvements based on how much they would improve the language's compiler.

Think about it this way: a TypeScript compiler takes a bunch of text as input, and produces a bunch of text as output. That's not really special or weird, is it?

(The only hard part is the “bootstrapping problem” which is what happens when you want to write a compiler for language X using language X, but you can’t compile it because you don’t have a compiler yet.)

a TypeScript compiler takes a bunch of text as input, and produces a bunch of text as output. That's not really special or weird, is it?

In the case of a compiler used for production, it may have lots of optimizations, most of the time, this optimizations are non-idiomatic code that's hard to understand and its purpose its not obvious.

Can you give an example of this? Most of the compilers I've seen have excellent code bases, with good separation between the different components, probably because compilers are so well-studied. It can be surprising what the most productive part of the compiler is to optimize. Often the lexer is heavily optimized due to its outsized impact on performance, but even then, the code looks idiomatic to me.

- Scanner.Scan() https://github.com/golang/go/blob/master/src/go/scanner/scan...

- scan() https://github.com/Microsoft/TypeScript/blob/master/src/comp...

- Lexer::LexTokenInternal() https://github.com/llvm-mirror/clang/blob/master/lib/Lex/Lex...

All three examples are heavily optimized and hand-written but they’re still clear and easy to follow.

Interested in that bootstrapping process. Is it just that the first version has to be compiled with some other language. From then on, you just use the previous version that you have compiled already?

Yes, that is normal. If you can imagine, this process dates back to the first assemblers, which were written in assembly and then converted by hand to machine code.

From what I recall, start by writing a minimal compiler for language X in some other language, Y, and compile it. Now you have an X-compiler written in Y.

Rewrite the minimal compiler in X, and compile it using the one that was written in Y. That gives you an X-compiler written in X compiled from one written in Y.

To be sure _that one_ works, recompile your X source code with that last compiler. Now you have a compiler written in X and built with one also written in X.

Continue to add more features written in minimal-X to get X-compilers with more X features.

Just as small improvement, for the first step an interpreter is a better approach, just to ease the bootstraping process by not having to write a compiler backend twice.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact