Hacker News new | past | comments | ask | show | jobs | submit login
I wrote a peephole optimizer for QBE (briancallahan.net)
45 points by ingve on April 3, 2022 | hide | past | favorite | 5 comments



I really enjoyed this. It's always surprising how focussing on a teeny tiny part of a topic can clarify so much around it. There's so much complexity in and around compilers, it's very intimidating


Why is this Compiler Back End called QBE rather than CBE?


> Notably, O will fail to optimize this:

          movl %ecx, %eax
          movl %ebx, %edi
          movl %edi, %ebx
          movq %r12, %rdi
Easily handled by TXR:

  $ txr opt.txr data
          movl %ecx, %eax
          movl %ebx, %edi
          movq %r12, %rdi

  $ cat opt.txr 
  @(repeat)
  @  (cases)@; eliminate wasteful reverse move
          movl @x, @y
          movl @y, @x
  @    (output)
          movl @x, @y
  @    (end)
  @  (or)@; fallback: output one line, advance by one line
  @    line
  @    (do (put-line line))
  @  (end)
  @(end)
The setup here is not general, because we want to iterate on the pattern matching; not just match a pattern, consume that part of the input, transform it, and move on.

The transformed output has to be pushed back into the input. (Iterating the tool on the entire file to a fixed point is possible, but inefficient.)

In the TXR Lisp compiler (which does this kind of optimizing on List structures, not text), I wrote a little macro called rewrite-case for this: this repeatedly applies multiple pattern matching to the start of a list, rewriting the matched part of the list. When a fixed point is reached, it then deposits one item to the output list, and marches down by one item.

https://www.kylheku.com/cgit/txr/tree/stdlib/optimize.tl?h=t...

> It would be better if we printed out just the first line after discovering that the first three lines do not satisfy any optimization patterns, then check lines two through four for any optimization patterns; if we did that, we could optimize the last three lines.

That's an absolute requirement; if no patterns match, the peephole must move by one instruction.

It's exactly like a lexical analyzer throwing away one character if it doesn't match any token. (Lex will do that by default and print the character to standard output.)

> Interestingly, we cannot actually say with certainty if moving %rdi into %r12 and back again is useless.

Well, yes; peephole optimization is generally done on basic blocks, not on entire programs.

You can peephole a program containing jump labels, but you have to make sure your patterns do not blindly match across them.

For instance, say we want to peephole out this wasteful register reverse move, but one instruction away:

  $ cat data2
          movl %ecx, %eax
          movl %ebx, %edi
          xorl %esi, %eax
          movl %edi, %ebx
          movq %r12, %rdi

  $ cat data3
          movl %ecx, %eax
          movl %ebx, %edi
  label:
          xorl %esi, %eax
          movl %edi, %ebx
          movq %r12, %rdi
We can peephole it for data2, but not data3.

  $ txr opt.txr data2
          movl %ecx, %eax
          movl %ebx, %edi
          xorl %esi, %eax
          movq %r12, %rdi
  $ txr opt.txr data3
          movl %ecx, %eax
          movl %ebx, %edi
  label:
          xorl %esi, %eax
          movl %edi, %ebx
          movq %r12, %rdi

How I did it was by adding a pattern match for a generic two-opcode instruction in between the two, and doing some checks on the register:

  @(repeat)
  @  (cases)@; eliminate wasteful reverse move
          movl @x, @y
          movl @y, @x
  @    (output)
          movl @x, @y
  @    (end)
  @  (or) @; like above, but with unrelated insn in between
          movl @x, @y
          @opc @a, @b
          movl @y, @x
  @    (require (nequal a x))
  @    (output)
          movl @x, @y
          @opc @a, @b
  @    (end)
  @  (or)@; fallback: output one line
  @    else
  @    (do (put-line else))
  @  (end)
  @(end)
Here we add the require constraint that the destination operand of the unrelated instruction cannot be clobbering the destination of the first movl.

This will be subtly wrong if there are assembly language instructions which move data in the opposite direction.

We could add a

   @(require (memqual opc '("movl" "foo" "bar ...))
i.e. constrain to a specific set of instructions that we can match over.


why not put it into qbe proper then?


Indeed, the xor-with-itself bit is so basic, it barely counts as an optimization (Clang, in fact, does this even at -O0). It should have been trivially possible to do at initial codegen.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: