
GCC: Improve on memory cost in coloring pass of register allocator - vmorgulis
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01811.html
======
jimrandomh
Summarizing the interesting bit for non-compiler-specialists: This is an
improvement to a part of GCC's compiler optimizations which will make
compiled-with-GCC programs slightly faster. The SPEC INT benchmark mean score
went from 3717 to 3730 (0.5%), and SPEC FP from 4752 to 4775 (0.34%). Which
doesn't seem like much, but making a good compiler in practice tends to mean
stacking lots and lots of improvements of this magnitude on top of each other.

~~~
chrisseaton
But is 0.5% statistically significant? How do I know it isn't natural
variation or measurement error?

He must have the most stable setup in the world if he has confidence in those
results.

~~~
iheartmemcache
You know way way more about compilers than I do, but the key is in this:

Furthermore, there exists a path from h to a use of v which does not go
through d. For every node p in the loop, since the loop is strongly connected
and node is a component of the CFG, there exists a path, consisting only of
nodes of L from p to h. Concatenating these two paths proves that v is live-in
and live-out of p.

I have no idea how to formally prove that assertion (it's logically sound,
afaik, someone like Walter Bright will have to jump in, he probably knows GCC
inside-out), but if correct, the code change is correct semantically. The
computational gain can be trivially exhibited (again, no idea how to 'prove'
it) by throwing code at it and seeing the RTL emitted. (I agree it lacks
'rigor' in the academic sense, but based on the list to which he submitted his
code-change and the fact that he's at Xilinx and sent it to his colleagues
makes me confident enough that the change is sound.)

Edit: Fair enough. You've convinced me. (Though the SPEC2000 resulting numbers
still have me in the 'maybe' category - your argument below is more sound than
my ignorance.) Anyone reading this should defer to Danny's posts rather than
mine for the time being (after a day or two, read the resulting responses to
see the expert analyses on the mailing list). Keeping this post intact solely
for continuity for the reader. RE: Walter - I'd assume he'd keep up with the
competition and know enough about the RTL as it hasn't changed extensively in
the last decade to make a valid opinion w/r/t changes. Either way, gracefully
upvoting Danny's two posts ;)

~~~
DannyBee
"someone like Walter Bright will have to jump in, he probably knows GCC
inside-out"

FWIW, walter has not contributed a single patch to GCC i can find in the past
10 years (It may be longer, i stopped looking) :)

Not that this makes him bad in any way, mind you, he's just not who i'd go to
for gcc expertise. That would be someone like Richard Henderson, who is pretty
much never talked about, but has touched pretty much all parts of the compiler
and always does great work.

Also note that embedded developers in general do not have a good reputation
among compiler people for doing "sound work". Again, not that this is a
characterization of their engineering skills - They often have very tight
deadlines, so the amount of time they can spend figuring out things instead of
applying bandaids tends to be pretty limited. So they patch something, submit
it, and move on.

~~~
bionsuba
"walter has not contributed a single patch to GCC i can find in the past 10
years (It may be longer, i stopped looking) :)"

It's worse than that. Because LLVM and GCC are copy-lefted and his compiler
has a proprietary backend, he forces himself not to even look at their code in
order to make the lawyers happy.

~~~
WalterBright
Haha, I once got sued by a company claiming I stole code from them, and their
lawyer got sent packing when it turned out they had stolen the code from me.
(I had a registered copyright on it.)

One thing I absolutely adore about Github is the audit trail about who
contributed what and when. It's a solid gold defense, and a way to limit the
damage if someone does check in bad code.

------
DannyBee
So, this is actually only true for natural loops :)

In particular: "Consideration of only enter_freq is based on the concept that
live Out of the entry or header of the Loop is live in and liveout throughout
the loop."

This is demonstrably false for loops with >1 exit, or multiple latches.

IE it assumes that there is a single loop exit and latch that everything goes
through.

That said, most compilers, however, including GCC, just don't care about loops
with these kinds of weird control flow structures, and either duplicate nodes,
or just don't consider them loops
([https://gcc.gnu.org/onlinedocs/gccint/Loop-
representation.ht...](https://gcc.gnu.org/onlinedocs/gccint/Loop-
representation.html))

More interestingly, it's not clear why the patch should make any difference at
all, nor is it explained. It just makes an assertion that it is better to do
it this way (" This increases the updated memory most and more chances of
reducing the spill and fetch and better assignment.")

(Expanded on this a bit to make it more clear).

When calculating the cost of spills/moves/etc, usually you want to take into
account how often a block is executed. This is because you often have multiple
places you can put spills/moves/etc to get a register. Multiplying by
entry/exit frequencies tells the compiler "If you need a register, it is more
expensive to spill something that occurs in a block that happens all the time,
than something that occurs in a block that happens infrequently". As an aside,
calculating predictions on loops ånd nested-loops statically is non-trivial
(it requires estimating the trip count of each loop).

In any case, this patch changes the move/spill cost calculation to avoid
taking into account the exit frequency of blocks (well, really edges, but
let's ignore that) in loops, under the argument that it should be the same as
the entry frequency.

The problem with that is exactly the last sentence. It shouldn't matter. This
change should, in practice, be a no-op, or at best, is a random cost model
change.

From another perspective, it just divides all the previous frequencies of
things in loops by 2 compared to what they were before (assuming the exit
frequencies are sane. If they are not, that is a problem to be solved instead
of this one). There is no given reason this is a necessarily good thing to do
:).

Basically, i'm not sure this patch will go in, because it's not clear it's
really solving whatever underlying problem exists.

I expect someone to come along and tell them to provide better analysis of
where exactly it's helping and what the cost numbers are that this is changing
"for the better".

~~~
boulos
Why do you think this is a no-op? Looking at the diff, the code was summing
the entry and exit counts. So, by the same argument, that means certain things
were being double counted (considered higher cost) and now aren't.

I assume (like you), that someone will come along and point out that this
doesn't really apply to all loops.

~~~
DannyBee
It's a noop in most cases because the cost of spilling a given loop variable
is still the same relative to other loop variables. But yes, the only affect
it should have is to possibly value loop variables low enough to spill them
instead of non loop variables (note that non loop variables are calculated
using both exit and entry frequencies) That is almost certainly a bad idea in
general, execution frequencies are what you should use whether in loops or
not. It most likely the real issue is a bunch of loops are having their
relative (to non loop blocks) execution frequencies over estimated highly, and
in reality the loop doesn't execute much

------
bjourne
IME, optimal register allocation isn't so important anymore. In normal code,
function bodies are so small that all local variables can be kept in
registers. I.e, if the # of local variables is small, like 5, then the
register allocation is optimal by default. It would be interesting to see
before and after assembly code from c functions that gcc generates.

~~~
gpderetta
At least C++ depends for its performance on very aggressive inlining of small
(often one-line) functions.

It is not unlikely that your hundreds of tiny functions become one massive
"function" after high level optimization.

