Amusing anecdote : During the very beginning of the 90’s, I happened to code a lot of very optimized code (3D engines) in assembler.
I was doing assembler because, well, that was the de facto way to get speed. So at that time, I forgot about the possibility
of writing things in C (and I had not access to a C compiler) -- a bit dumb of me. So, one day, tired of writing tedious assembler,
and at the same time discovering C, I rewrote one of my assembler program in C. BIG suprise, the C code was much
faster than my assembler. So I disassembled the generated code and understood what the compiler did. First, it used
very clever bit manipulations (unknown to me) to alter some comparisons I did. Second, it understood that
because my code was just a test that didn’t produce anything, then it didn’t need to generate any code. So
a big chunk of my C code was just not compiled (and rightly so). So the resulting program, was mostly doing nothing.
It was the first time I saw what an optimizing compiler means, I was totally blown away. I was beaten
at the optimization game by a program :-)
The other thing I understood that day is that an optimizer is no magic. It just carefully applies little
knowledge about optimization on little cases here and there. It basically automates what one would have
done to optimize some assembler code by hand (except for algorithmic optimization). It also means that
one can sleep well knowing that 90% of the chores of optimization are done automatically in a reliable
way. We all take that for granted but when I was around 20 years old, it was just super exciting.
The experience was both frustrating and elating. In the end, the "optimized" assembly code I could come up with was pretty much identical to the code generated by the compiler. So I missed that rush of showing the compiler who's boss. This is what led me to abandon the project very quickly.
On the other hand, I was very relieved that my hand-written code was at least not worse than what the compiler could come up with. So at least I could get out with my dignity intact.
To be fair, those small functions were so small and simple (no more than three or four instructions) that there really was only one obvious way to implement them in x86 assembly. So maybe my problem set was a bad choice. It was fun, though.
I tried #3 first, but landed on #4 in the interest of time. It's not the cleanest approach -- that'd be either using the LLVM API from .NET or implementing a proper backend (assuming there's some decent implementation of a .NET binary writer, since that's a mess) -- but it's stupid simple and effective.
It's quite pretty code. Thanks for making it available.
The "metadata" mentioned in the article is LLVM trying to figure out when these complications can be safely ignored, so that the "real" optimisation/transformation can begin.
It's interesting to compare with something like Haskell's GHC compiler, which can do a crazy amount of rearranging and optimising due the language itself having fewer of these complications (although, of course, programs written in the language may sometimes end up more complicated by "simulating" such features/effects).
Also interesting to compare with languages like Python, which are "so dynamic" that we can infer almost nothing about a piece of code ahead of time (since there's the possibility that, e.g. some class will get its methods replaced at runtime by another thread, and other such reasoning-killers). Hence why an (ahead-of-time) Python compiler can't do much optimising.
> Seibel: When do you think was the last time that you programmed?
> Allen: Oh, it was quite a while ago. I kind of stopped when C came out. That was a big blow. We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization...The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer's issue....
> Seibel: Do you think C is a reasonable language if they had restricted its use to operating-system kernels?
> Allen: Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve. By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are ... basically not taught much anymore in the colleges and universities.
mov edx, dword ptr [rdi + 4*rcx]
cmp edx, dword ptr [rdi + 4*rcx + 4]