
Show HN: How to write a tiny compiler - viebel
http://blog.klipse.tech/javascript/2017/02/08/tiny-compiler-intro.html?tiny
======
akkartik
Semi-tangential rant: I went to grad school for compilers, and yet I've been
struggling for several months to get started building something real-world. I
can build interpreters and parsers in my sleep at this point, but the crucial
next step nobody talks about is having a deep knowledge and understanding of
the target language that your compiler will be emitting. For a conventional
native ahead-of-time compiler that requires knowledge of x86, the ELF format,
OS loading and more. I suppose this is why toolkits like LLVM are so popular
:/

~~~
chubot
Have you looked at the Go toolchain at all? I don't know much about it, but
they implemented their back ends in a manner that is much smaller than LLVM
and GCC. And it sounds like the architecture of the compiler is significantly
different.

I watched this talk and it was pretty interesting:

[https://www.youtube.com/watch?v=KINIAgRpkDA](https://www.youtube.com/watch?v=KINIAgRpkDA)

[https://talks.golang.org/2016/asm.slide#1](https://talks.golang.org/2016/asm.slide#1)

If I am understanding right, the claim is that they have a single assembly
language for ALL GO architectures, based on something Ken Thompson wrote in
the 90's for some National Instruments (?) chip. He actually says that all
assembly language looks the same now.

I was sort of surprised by that claim. I would like to hear a critique of this
from other compiler writers. Does the fact that Go uses a single assembly
language make the code slower? I imagine taking advantage of target-specific
knowledge is useful, but I don't know how much.

LLVM has this pretty elaborate TableGen system to express target-specific
knowledge.

He also makes the claim that they will be able auto-generate a new backend
from the PDF description of the architecture. And he says somewhere that they
reduce a lot of the work to simple "text processing" (symbol manipulation) and
didn't require opening up any processor manuals.

I can see how arithmetic and bitshifting is the same among all architectures.
But I would think that even loads and stores have differences, at least if you
want to use the processor efficiently. But he says everything kinda looks the
same.

Also, another thought is that compiling to WebAssembly might be simpler than
native executables?

~~~
GFK_of_xmaspast
Looking at those slides and watching the first part of the talk, it sounds as
if he's re-invented LLVM IR, and poorly.

~~~
77pt77
Why LLVM specifically?

there are many intermediate representations.

~~~
GFK_of_xmaspast
First one I thought of and also you can write it directly.

------
eliben
Kudos for writing a compiler-construction walk-through without running out of
steam after the parsing part :-) It's a shame this is where most tutorials
end, since codegen is often much more interesting

~~~
viebel
So you made the whole journey until the last station without taking any
breaks. Right?

~~~
eliben
Nope, I was just curious so I looked at the blog archive and saw you actually
published them all

~~~
viebel
Ah I see! So what station are you for the moment?

------
Msurrow
This is great, thank you! I've been looking for something like this: a simple
small project that goes through all the involved phases and
explains/demonstrates the concepts, that I can mess around with on the
weekends. It shouldnt be have real life applications, be serious or
academically correct -- it should just be fun.

I once found a blog post with an example of defining a grammar for a small
language and then building an interpreter (in python i think) for it. It was
really great, but I lost the link and can't seem to find it again.

The same goes for OS'es - a simple but "end to end" walkthrough would be
awesome.

Anyway, cheers!

~~~
jonjacky
> I once found a blog post ... but I lost the link ...

Maybe this one? Python: Writing a Compiler and Interpreter in 160 lines of
code

[http://www.jroller.com/languages/entry/python_writing_a_comp...](http://www.jroller.com/languages/entry/python_writing_a_compiler_and)

~~~
Msurrow
Not the one, it was in multiple parts. BUT thanks - I'll read that one aswell
:-)

------
camus2
Mandatory link to lisperator :

[http://lisperator.net/pltut/](http://lisperator.net/pltut/)

------
monokrome
Hopefully it isn't intentional to suggest that you need to be a genius "guy"
to do this. To be fair, this is about where I closed the article so maybe I
missed the point.

~~~
OJFord
The actual sentence is:

> _You probably have to be one of those genius guys ..._

Used in the plural, 'guys' usually just means 'people'/'folks' \- "alright
guys, shall we get going?" \- Mac Dictionary agrees.

Happy people don't go around looking to be insulted.

------
vog
I like the idea!

However, I feel the source and target language are too similar. Won't this
discourage people? The result code looks like it could have been done in a
simpler way, without all the compiler voodoo, by just moving around some
parentheses.

Also, the implementation seems to make quite heavy use of external JS
libraries. Maybe it is just me, but I'd like to add one NoScript exception for
your site, and be done with it, rather than adding exceptions for so many
different CDNs.

~~~
viebel
I see. My blog is powered by the klipse plugin (that I have developed):
[https://github.com/viebel/klipse](https://github.com/viebel/klipse)

The klipse plugin indeed loads a couple of external scripts - mostly from my
github pages domain: viebel.github.io

------
popey456963
Is my browser messing something up or does the small div at the end seem to be
in a really weird font that seems to be way to heavy (on font-weight)? When I
read the page I get something resembling:

[https://puu.sh/tVQAL/ec7bd4a05c.png](https://puu.sh/tVQAL/ec7bd4a05c.png)

Apart from that however, I'm thoroughly enjoying this tutorial! It's
remarkably similar to a tokenizer & parser I built when I was experimenting
with a language that had no syntax, but was simply an AST that could be easily
represented as Haskell/Python in the IDE of your choice. Really love how
simplistic your code is though, mine just ended up being a mess of callbacks!

~~~
viebel
Oh. This font is so ugly. What browser are you on?

------
no_protocol
> a string of code and break it down into an array

break -> breaks

> There a three kinds

a -> are

> starts with a " and end with a "

end -> ends

> couple of tokenizer for a

tokenizer -> tokenizers

> a generic function that tokenize a single

tokenize -> tokenizes

> by means or regular expressions

or -> of

~~~
mrspeaker
Did you just tokenize that blog post?

~~~
viebel
what do you mean?

------
AstroJetson
Look at Small C from the 70's. You could change the back end to target a
different computer.

