That's one problem with these multipart blog posts. It becomes ambiguous whether the submission is "part 1" or the head of the list ("part 1" "part 2" "part 3").
If you want to make a single long version of the page that includes all three parts, I'd be happy to arrange a repost. It would be best to email hn@ycombinator.com about it.
I'm sorry to be negative here, but this article, like most articles about compilers, is bike shedding.
For those not aware of the bike shedding metaphor, it's the assertion that when discussing the design of a nuclear power plant, everyone will want to discuss the color of the shed where the workers store their bikes because they understand it. Meanwhile, nobody will want to discuss the nuclear reactor itself, because that's complicated and they don't really understand what's going on with it.
In compilers, parsing is the bike shed, and code emitters are the nuclear reactor. I've read probably hundreds of articles on "compilers" at this point, and they're all actually just about parsers. I can't point to a single one that actually emitted working assembly.
Ah, okay. It looks like you're taking a fairly C-like language and transforming it to C--I'd call this a transpiler rather than a compiler. You're not wrong to call it a compiler, but it's a pretty noncentral example of a compiler.
Congrats on at least having an emitter, but I'm still searching for an article that shows how to emit assembly of any kind.
I used to work on a static analyzer that did taint analysis, model checking, buffer bounds checking, and so on -- a bit like a compiler backend on steroids. If there's a specific topic you'd like an explanation on, I could be convinced to write something up.
My favorite was always context-sensitive, interprocedural points-to analysis. And dataflow analysis in the presence of higher-order controlflow constructs.
Honestly, I'd just like to see anything that outputs assembly I can run through an assembler to create an executable. I've fumblingly written a bit of this code myself, but always felt hampered by a lack of knowledge.
Well, what you just posted already highlights one difficulty: when you come to the PRINT statement, you have to emit the string and the instructions to two different places, so we're already talking about having two different emitters, or some other way of handling this. And we need to generate different non-conflicting names/addresses depending on architecture.
You said in your other post that this can be done with minor modifications, but I can already foresee a few modifications that would need to be made which aren't minor.
And then there's the problem that you may want to target more than one architecture. We can write two completely different code generators, but it would be nice if there were an architecture that could share some of the code.
I honestly can't tell if you are just trolling now, and I'm falling for it, but you seem to think this 2000 word set of tutorial on the basics of compiler design (lexing, parsing, emitting) is supposed to be the one and only document you will ever need to create the next C++.
> You said in your other post that this can be done with minor modifications
And it probably can, depending on the flavor of assembly you want to use, there are dozens (hundreds?) of them, i'm sure some will allow you to inline the string declaration. The example I gave probably doesn't even work since I haven't programmed in 8086 in close to 20 years, and I don't even remember how to set up data blocks and code blocks in it any more.
> And then there's the problem that you may want to target more than one architecture.
This is a toy compiler written by a professor of computer science meant to teach you the basics of building a compiler (lexing, parsing, emitting). This isn't a tutorial on building the next GCC.
> I honestly can't tell if you are just trolling now, and I'm falling for it, but you seem to think this 2000 word set of tutorial on the basics of compiler design (lexing, parsing, emitting) is supposed to be the one and only document you will ever need to create the next C++.
That's a fair criticism.
I'm frustrated with the lack of material on emitting assembly, but it wasn't right of me to take that out on the author of this post. I apologized in a different post.
> And it probably can, depending on the flavor of assembly you want to use, there are dozens (hundreds?) of them
How about one I can run on my machine? There are maybe 5 that are useful targets I can think of:
* x86 or ARM (depending your machine)
* LLVM
* GCC RTL
* Web assembly
* Parrot? Maybe the JVM has some low-level bytecode?
there's certainly equivalent assembler, but depending on the architecture you're targeting this can be a pretty monumental task. Even a single function call can be north of a hundred lines or something (i'm making this up :D )
I guess that's why we have things like LLVM that allow you to generate intermediate representations that get converted to a bunch of different instruction sets
For sure, i'm just telling the GP that their request for something that outputs assembly could be done with minor modifications depending on the assembly language they want to output.
gotcha. I also get the feeling that GP was more interested in being right vs actually knowing how to get the emitter to emit assembly. It felt like the followup was just gonna be ... well WHAT ABOUT BINARY?
nand2tetris is a great course that shows you how to design a computer from the chip level, build an assembler, and write a high level language + compiler that eventually produces the assembly code for the machine architecture _you_ built. I highly recommend it.
Please don't be a jerk on HN. I'm sure you don't mean to be, but comments like this can really come across the wrong way. That's one reason the site guidelines include: "Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
I don't personally think the fact that there are hundreds of other articles that already cover the same topic in greater depth is a shallow dismissal, and I think that could teach people something. But I'm open to the possibility that I maybe could have communicated this in a kinder way. Do you have any suggestions on how?
I guess I was technically wrong in that it does have an emitter, but I wasn't wrong in that it still doesn't emit assembly. gcc (or whatever you're using to compile your C) is still doing all the nuclear reactor part.
Sorry that I'm coming across as critical of your work here--that wasn't my intent but I didn't do enough to avoid it. Your article isn't the problem, it's just proximal to the problem I'm describing. I should have been more clear about that in my post.
To address your post: I'm not aware of any widely used compilers that compile from a general-purpose language to C. There are a handful of DSLs that do, and there may be some mainstream general-purpose language compiler that does that I don't know of.
Compiling to LLVM or GCC's RTL has a lot more in common with compiling to Assembly than it does with compiling to C.
Compiling to LLVM/RTL/assembly is fundamentally different from compiling to another high-level language. When compiling to C for example, you get to compile your for loops into C for loops--it's a fairly easy one-to-one conversion. Compiling to conditional jumps is a much more complicated endeavor, requiring more architecture.
i know what the bike shedding metaphor is and frankly you're stretching it a bit because all this person is trying to do is educate and they're not even saying this is the only way to do it.
seems odd to pull out the bike shed metaphor for every case there's an abundance of technical articles on a subject matter. There's lots of tutorials on for loops in X language. Do you consider that bike shedding?
I'm not gonna argue with you over whether the metaphor I used was applicable or not--the point of the metaphor was to communicate and it's clear I succeeded in communicating. It's clear you understood what I meant, because you provided another example. Yes, whether or not you agree the bike shedding metaphor applies, you have to agree that having the 100th tutorial on how to do something that's in the docs that ships with the language is equally pointless.
i think you succeeded in communicating your thoughts and feelings in spite of the metaphor.
there are plenty of documentation in the form of tutorials on for nearly every aspect of every programming language I can think of. I think it's important to keep in mind that we all learn differently and sometimes one explanation can make no sense while another makes a lot of sense.
That said, I think I get your frustration in that sometimes the process of _finding_ the explanation that clicks for whatever your question is (how to emit assembler?) can be really painful because all the explanations are shallow.
However, I don't think it's fair to take that frustration out on the writers (I got the impression you were, but maybe you weren't and just felt like venting). I for one encourage engineers to write if no for no other reason than to better cement their own understanding.
I do wonder, though, if maybe there's some improvements we can make to how we filter / search for long-form technical articles outside of google so that the content is more relevant
> That said, I think I get your frustration in that sometimes the process of _finding_ the explanation that clicks for whatever your question is (how to emit assembler?) can be really painful because all the explanations are shallow.
While I agree with your overall point about different learning styles, that's not really the problem I'm describing. Introductory material on emitting assembly or similar, doesn't exist for any learning style as far as I know. The best that I know of are some dead-tree books, and their emitters target dead or obscure architectures. None of the code examples are ones I could run on my machine.
> However, I don't think it's fair to take that frustration out on the writers (I got the impression you were, but maybe you weren't and just felt like venting).
That's a fair criticism: I apologized to the writer in a different post.