
54-line if condition in gcc's reload.c - mgdo
https://github.com/mirrors/gcc/blob/7057506456ba18f080679b2fe55ec56ee90fd81c/gcc/reload.c#L1056-L1110
======
DannyBee
For the curious, reload was written by Richard Kenner, who, while a wonderful
guy, was not generally familiar with modern compiler architecture (graph
coloring register allocation goes back to 1982, reload was started in the late
80's).

It essentially took the place of spill placement/code/legalization. Over the
years, it grew rematerialization, instruction combination, copy coalescing,
stack slot sharing and all sorts of other interesting scope creep.

While i believe it has finally been replaced by LRA for some targets, there
were many people who spent years of their life trying to replace reload with
separate, smaller pieces of architecture. No one has yet completely succeeded.

It is essentially an interesting object lesson in what happens when you just
incrementally improve architecture to achieve performance goals without a
stop-loss point for requiring new design.

~~~
probably_wrong
As someone moderately curious, I hope I'm not the only one when I say: I
didn't understand a word after "the late 80's". And that's only because I
searched for "graph coloring register allocation" first.

~~~
rayiner
For simplicity, a compiler manipulates an internal representation in terms of
virtual registers. A register allocator assigns physical registers to these
virtual ones, under the principle that different virtual registers may be
assigned to the same physical one if they are not live (I.e. hold a value that
will be used in the future) at the same time.

Spill code generation is necessary because at a program point, there may not
be enough physical registers for all the live values, and code must be
generated to spill some of them to the stack and reload them when needed.
Legalization performs code transformations to eliminate target independent
operations in the IR that can't be represented on the target. Copy coalescing
eliminates the need for copies between virtual registers by assigning both to
the same physical register. Rematerialization takes advantage of values that
can be recomputed cheaply, and instead of holding them in a register for a
long time, recomputes them where needed.

What makes this all so complex is that 1) most of these don't have polynomial
time algorithms that produce optimal solutions: 2) the individual problems are
mostly coupled. E.g. spill code itself uses registers, and so the interference
graph (which summaries whether virtual registers are simultaneously live)
might need to be recomputed or modified.

If you're interested in this sort of thing, Kieth Cooper at Rice has posted
the lecture notes for his graduate compilers class online:
[http://www.cs.rice.edu/~keith/512/Lectures](http://www.cs.rice.edu/~keith/512/Lectures).
Register allocation is lectures 26-7.

~~~
tptacek
Thanks everyone for watching another exciting episode of "Two Lawyers Explain
Compiler Theory". :)

~~~
DannyBee
To be fair, I still manage the compiler team as well :)

------
acqq
Please note that:

1) The goals of Reload are _very_ complex.

[http://gcc.gnu.org/wiki/reload](http://gcc.gnu.org/wiki/reload)

"Reload does everything, and probably no one exactly knows how much that is.
But to give you some idea:

Spill code generation

Instruction/register constraint validation

Constant pool building

Turning non-strict RTL (Register Transfer Language, a very low level
intermediate representation used in the backends of GCC) into strict RTL
(doing more of the above in evil ways).

Register elimination--changing frame pointer references to stack pointer
references

Reload inheritance--essentially a builtin CSE (Common Subexpression
Elimination) pass on spill code"

Reload achieved them for the last 25 years(!)

2) There are more modern approaches to reach such goals, but knowing 1) it is
_a lot_ of work before the goals can be achieved by some alternative code for
all platforms (I don't know how far the developers got at the moment)

3) "Local Register Allocator Project" presentation by Vladimir Makarov,
working for RedHat:

[http://gcc.gnu.org/wiki/cauldron2012?action=AttachFile&do=ge...](http://gcc.gnu.org/wiki/cauldron2012?action=AttachFile&do=get&target=Local_Register_Allocator_Project_Detail.pdf)

------
AYBABTME
Trivia: This code is older than many of the readers.

[https://github.com/mirrors/gcc/blame/7057506456ba18f080679b2...](https://github.com/mirrors/gcc/blame/7057506456ba18f080679b2fe55ec56ee90fd81c/gcc/reload.c#L1056-L1110)

~~~
bdcravens
I'm impressed that the blame log has survived intact. What SCM was originally
used?

~~~
b0z0
Probably Git? You know that's been around much longer than Github has, right.

~~~
bdcravens
Of course, but even without checking Wikipedia, I remember a time not too
terribly long ago when git, and even svn, didn't exist.

~~~
misnome
And are pretty sure that GCC was around at that point!

------
nardi
I was going to take a crack at breaking this apart into component functions,
but I realized that I don't know where to draw the lines, or what to name the
functions, without understanding the internals of this part (and probably
other parts) of the compiler.

And that's why this will be here forever.

~~~
ajenner
Actually reload has been replaced by LRA for x86/x86_64 in GCC 4.8 and work is
ongoing to bring LRA to other targets. So it shouldn't be forever.

------
_davidchambers
There's an off-by-one error in this thread's title. ;)

------
goblin89
Why put logical operator at the start and not the end of each line?

I.e., this style (used in this case)

    
    
          && (CONSTANT_P (SUBREG_REG (in))
    	  || GET_CODE (SUBREG_REG (in)) == PLUS
    	  || strict_low
    	  || (((REG_P (SUBREG_REG (in))
    

versus this style:

    
    
          (CONSTANT_P (SUBREG_REG (in)) ||
    	  GET_CODE (SUBREG_REG (in)) == PLUS ||
    	  strict_low ||
    	  (((REG_P (SUBREG_REG (in)) &&
    

I don't have a personal preference here, just looking for any practical pros
and cons I may not be aware of.

More on topic, I don't see why keep such a big conditional and not move it out
to its own function(s) (but this has been asked already in this thread).

~~~
mkl
I used to use the latter, in both code and maths, but I have switched to the
former. My main reason is that having the operators (logical and otherwise) at
the start of the line means they're more likely to be in the same horizontal
position, so it's easier to see which lines are continuations of the previous
ones.

~~~
drglitch
It is also easier to comment out lines of code if you need to test something.
This is especially useful in SQL, where you can quickly -- a condition or a
column, without editing commas etc on other lines, eg

SELECT

    
    
        some_col
    
        , another_col
    
        --, and_another
    
        , and_more
    

FROM

    
    
        blah
    

EDIT: too bad formatting is screwed up :/

~~~
shawnz
Isn't this behaviour exactly the same as with commas at the end? Here, to
comment out the first line you would need to edit the second. With commas at
the end, to comment out the last line you would need to edit the second last.
Other than that, no editing of other lines is required.

In fact, in a language like Javascript where extra trailing commas are
allowed, it seems that this argument makes even more sense for commas at the
end than it does for commas at the beginning. Then, there would never be a
situation where you would have to edit a line besides the one you were
commenting out.

~~~
klibertp
Trailing comma in SQL is a syntax error. It's also not allowed in JSON. And
editing second last line can be a pain if you're testing something and are
commenting/uncommenting the last line repeatedly.

------
resist_futility
I'm more concerned with the fact that the function is more than 700 lines
long.

~~~
weavie
1996 lines to be precise.

But, to be fair it is well commented.

------
chatmasta
To be fair, the length is not _strictly_ 54 lines, since blocks of it depend
on evaluation of the preprocessor (the #ifdef stuff). Still, that's pretty
long. :)

~~~
adamnemecek
That makes it even worse.

------
userbinator
The rest of that file also has a ton of redundancies and verboseness its in
logic that could be simplified considerably; e.g. I see this pattern a lot
(428 ~ 435):

    
    
        x && y || !x && z
    

In this absence of side-effects, this basically implements a 2-input
multiplexer and is identical to

    
    
        x ? y : z
    

In the 54-line condition the first obvious thing I'd factor out is
SUBREG_REG(in) and GET_MODE(SUBREG_REG(in)), and then work out what else is
duplicated from there. Here's my attempt at making this a little more
readable. It's only 2 lines less, but this gets rid of all the repeated
uppercase:

[http://pastebin.com/9zpr7Cd5](http://pastebin.com/9zpr7Cd5)

~~~
wfunction
The way it was for me was that I only "got" the hang of the ternary operator
once I learned what a multiplexer is and how it's implemented. Before that, it
was something I used now and then (certainly didn't have any trouble
understanding it or anything like that), but I never thought about it as some
way to channel data through based on a condition. I just always thought of it
as some handy form of conditional evaluation. Things "clicked" a lot when I
learned about how digital hardware and muxes works.

------
wglb
This reminds me of a comment in a particularly hairy section of code in the
MWC 8086 compiler written by dgc (among other things, author of MicroEmacs). I
don't have the exact wording but it was something to the effect of

 _I am frankly embarrassed by the number of bugs that have been tracked to
this function._

This was in the day before register coloring.

Earlier, I had the chance to write a code generator for an implementation
language targeting the 8085 (!) and even that was sufficiently hairy that the
team that took it over complained about the difficulty of improving the code.

[Edit] Time sequence correction.

------
dokem
Why not break it up into variables or functions so that the logic can be
followed while reducing the likelihood that a bug creeps in? What is the
excuse for horrible code like this?

~~~
raldi
It's been getting the job done for decades, in perhaps the most popular
compiler of all time. I'm willing to trust the maintainers' judgement on this
one.

If they'd spent all their time needlessly fixing what ain't broke, gcc
would've gone the way of GNU/Hurd.

~~~
MaulingMonkey
> It's been getting the job done for decades, in perhaps the most popular
> compiler of all time. I'm willing to trust the maintainers' judgement on
> this one.

I don't trust anyone - myself included - writing code 1/10th as convoluted,
age-of-product be damned. Code is not wine, old doesn't mean good. Neither do
the maintainers, methinks: Looking at blame shows some refactoring.

> If they'd spent all their time needlessly refactoring, gcc would've gone the
> way of GNU/Hurd.

And the lack of needful refactoring may very well send it the way of COBOL -
with everyone merely wishing it had gone the way of GNU/Hurd. I've seen clang
and LLVM replace gcc in both of my vendor toolchains that used gcc -
suggesting they've already wished and then done something about that wish.

(Either that or licensing, but code like this stomps on the scales a bit...)

~~~
richardwhiuk
Old code is better - it's got bug fixes.

~~~
MaulingMonkey
> Old code is better - it's got bug fixes.

It's also got prototypes that made it into production, ball of mud designs
that encourage usage bugs, and C++ thrown together by that short lived intern
who took a few Java classes, wrote everything assuming there was a garbage
collector around, and took great care to avoid class trees with less than 3
layers of inheritance lest he be shamed for lack of 1337ness... then topped it
off with a few __try/__catch blocks to deal with that one rare crash that
nobody could find a repro case for.

Re-implementing bug fixes is a toll... sometimes a very worthwhile one,
though.

~~~
qu4z-2
To be fair, new code frequently has those attributes too.

------
GnarfGnarf
I will never have the skills to maintain let alone write a compiler. However,
in a situation like this, I would break up the condition into a set of sub-
conditions, set intermediate booleans, and progressively build upon the
condition tests so as to make the code more intelligible and maintainable. No
need for additional functions.

(Amusing incongruity: "boolean" fails the spellchecker in an IT forum :o)

~~~
valarauca1
>booleans

C doesn't support native booleans, C99 does-ish. And you can hack it in with

    
    
        #typedef enum {false, true} bool;
    

C uses ints, all the way down, for everything, until you hit turtles.

------
__boomslang
Needs more goto's.

------
shadykiller
At least it's not all the conditions in a single line :)

------
zoner
At least it's a proof that GCC is smart enough to understand and compile this
condition :)

------
SixSigma
#ifdef should be shot

~~~
frozenport
There is no other mechanism to choose between <windows.h> and <linux.h>

~~~
SixSigma
Plan 9 manages to build the whole OS and userland for multiple architectures
without using ifdefs, even cross compiling across CPUs with different
endianness

------
adamnemecek
No wonder llvm is getting popular (I realize that there are other reasons
besides codebase quality).

~~~
webkike
Compilers are complex, and these conditionals had to be evaluated in some way.
Sure the author could have split it up into multiple if statements, but would
have that really helped?

~~~
adamnemecek
Yes, yes it would have, there is literally no question about that. It also
appears that some of the parts of the conditions are repeated, so those could
have been refactored.

~~~
laichzeit0
I'm sure a good optimizing compiler like GCC would optimize out repeated
expressions in conditions ;) (perhaps not if they're declared volatile(?))

~~~
smcl
_Definitely_ not if they're declared volatile

