
Forcing code out of line in GCC and C++11 - luu
http://xania.org/201209/forcing-code-out-of-line-in-gcc
======
userbinator
It's interesting to see techniques that are pretty common in Asm programming
appearing in higher-level languages - putting the error-handling code
"somewhere else" or otherwise trying to keep it out of the way is one of
these. The other one related to this, which I haven't seen a compiler do yet,
is "chaining the error path" which looks something like this:

    
    
            ; some code that sets CY on error
            jc some_error
            ...
            ; more code that sets CY on error
        some_error:
            jc some_error1
            ...
            ; etc.
        some_error1:
            jc some_error2
            ...
        some_error2:
            jc some_error3
            ...
    
        some_errorN:
            ; error handling code goes here
    

The reasoning behind this being that conditional jumps to nearby (within
+/-127 bytes) targets are far shorter (2 bytes vs 6) and the error path is
rarely taken. When the error-handling code is ahead, they are predicted to be
not-taken with the common "always not taken" and "forward not taken"
decisions, which agrees with their use of being taken only on error. A bit
hard to express this pattern with C, however - and chained gotos, which would
be somewhat equivalent, aren't so "high level" anymore.

~~~
ot
There are a few things compilers can be really bad at, another is register
allocation. A very interesting example is the LuaJIT2 interpreter, as
explained by Mike Pall:

[http://article.gmane.org/gmane.comp.lang.lua.general/75426](http://article.gmane.org/gmane.comp.lang.lua.general/75426)

~~~
gumby
Hah, funny, as the original motivation for register-rich RISC was that
compilers can be more diligent than humans (in other words "the computer has
more fingers than a person does").

But different languages are appropriate for different problems and there is no
shame in dropping down into assembly code for pathological cases like the lua
example you cite. In the end a good programmer understands the semantics and
execution case far better (i.e. at a higher level) than the computer is able
to.

In fact this article is a good example of how you can provide intentionality
to a modern compiler so that you _don 't_ have to drop into assembly and
provide the semantics by extension.

------
grundprinzip
I'm not sure, but the assembly code generated when using __builtin_expect
looks almost identical.

~~~
Scaevolus
Yes, unlikely()/likely() is a much better way to give the compiler hints about
block linearization.

Here it is on godbolt: [http://goo.gl/iPCiSy](http://goo.gl/iPCiSy)

~~~
gaze
This is so strange! If something is unlikely to execute, it is unlikely to
execute due to a condition, which you clue the branch predictor as unlikely
and the layout thingy as out of line. Why would something be unconditionally
unlikely to execute?! I don't understand what a lambda buys you here.

~~~
nkurz
In theory, the lambda combined with the noinline attribute forces the compiler
to use a call/ret to a function rather than making a local jump. Since the
function usually will have alignment requirements, this often will mean the
error handling is in a different 64-byte cache line. If this error never
occurs, this cache line will never occupy space in the instruction cache
(i-cache), and useful instructions can be held there instead.

In practice, I'd be surprised if you can come up with a case where this makes
a significant difference. If you are on a hot enough path for this to matter,
on a modern processor you probably are running out of the even lower level
decoded µop cache, which doesn't cache µops for branches that are not taken.
If you aren't in this cache, your efforts are probably better spent making
this happen.

Edit: ot's comment about how this affects the size of the parent function and
whether it will be inlined is a good point, and might well make a measurable
difference in the cases where it is true.

~~~
mattgodbolt
Author here. I use this trick for a general 'bail out with error code' throw
macro. By definition it's the exceptional case and so I'm happy to take the
hit of going out of line. It's a macro used all over a very large codebase and
so it made sense to do this. And yes, in some cases it helps the compiler
inline more aggressively where it does matter.

------
tacos
Even the best coders often get this wrong. Let the profiler do it.

[http://llvm.org/docs/BranchWeightMetadata.html](http://llvm.org/docs/BranchWeightMetadata.html)

~~~
nkurz
The link is helpful for background, but is it really useful to tell a
programmer who is worried about i-cache contention to be guided by an
automatic profiler rather than forcing the compiler to do what they want and
then profiling that? I'm intrigued --- what prompts you to give this advice?
It's the opposite of my instinct. Have you personally experienced issues with
this?

~~~
tacos
Programmers are horrible at guessing branch prediction. It's a famous truism.
And the only thing worse than guessing the branch prediction wrong would be
moving the code farther away then adding a jump to it and back.

The next level of mistake is gathering the profile data from the unit tests.
(Hey! Free Input!) Ugh.

Hardcoding "Case 5: is most often" in a library also bit me. If you're giving
me the source, let me guide the paths, thanks. Maybe you use Case 5, maybe my
code doesn't. (And I just stepped through Clang doing the exact right thing on
a switch statement with inlining.)

~~~
userbinator
_Programmers are horrible at guessing branch prediction._

Not for error cases like the ones described in the article.

