This is one of the best technical articles I've ever read, thanks. It wasn't cle...

BeeOnRope · on June 11, 2019

Yes, it is confusing: it should probably say "organize the code so that the likely branch outcome is the fall-through one".

Does that make sense?

Basically the "jump" should be the unusual case. You can always organize the code like that, but sometimes the compiler needs a little help.

Consider the following function, which just sets negative values to zero.

    void zero_threshold(int *data, size_t len) {
        for (size_t i = 0; i < len; i++) {
            int x = data[i];
            if (x < 0) data[i] = 0;
        }
    }

On gcc this compiles to something like:

    .L4:
        mov     edx, DWORD PTR [rdi]
        test    edx, edx
        jns     .L3
        mov     DWORD PTR [rdi], 0
    .L3:
        add     rdi, 4
        cmp     rdi, rax
        jne     .L4

The `jns` is the jump that reflects the if (x < 0) statement and it jumps in the case the number is >= 0 (i.e., non-negative). gcc organizes it like this because it involves a only a single jump. However if you expect the non-negative case to be the common one, this is will limit the throughput of the code. It would be better to compile it like this:

  .L4:
        mov     edx, DWORD PTR [rdi]
        test    edx, edx
        js      .L10
  .L3:
        add     rdi, 4
        cmp     rdi, rax
        jne     .L4
  .L1:
        ret
  .L10:
        mov     DWORD PTR [rdi], 0
        jmp     .L3

This now involves two jumps when the number is negative: it has to jump out to the extra bit of code outside the main body of the function at .L10, and then jump back, but the common case is now "fall through" so this could execute at close to 1 per cycle, twice as fast as the other version.

You can get gcc to produce the second code by changing the condition to __builtin_expect(x < 0, 0):

https://godbolt.org/z/2Qgk7p

Of course, this code would be much off vectorized, but it's just an example (and the same principle applies to jumps in vectorized code).

craz8 · on June 12, 2019

I’ve worked in 1 company where we had tools that would re-organize the code to make the common code path be the fall through as much as possible

This was done before linking, based on analysis of the common paths from a running program

What this also did - which was the important bit at the time - was move all the error code to pages that wouldn’t be loaded for most users

So it moved error handling out of common code pages and changed branch instructions to mostly fall through

This was many years ago - do similar tools exist in the OSS community?

BeeOnRope · on June 12, 2019

All good compilers do this today. They use heuristics to guess what the likely outcome of branches, but you can provide them specific information with profile guided optimization, or with manual branch hits (the __builtin_expect I used in the example above is one such hint).

Similarly, you can annotate certain functions or paths "hot" or "cold", and compilers even understand that functions like abort() are cold (they can be called at most once, after all) and hence paths leading up to them are also cold.

Similarly, JIT compilers use the runtime observed branching behavior to organize branches so that fall-through is the common case.

Twirrim · on June 12, 2019

Grab a look at the FDO and AutoFDO sections from GCC's wiki:

https://gcc.gnu.org/wiki/AutoFDO/Tutorial

You'll also find it's referred to as Profile Guided Optimization, there's docs here for clang: https://clang.llvm.org/docs/UsersManual.html#profile-guided-...

RivieraKid · on June 11, 2019

It makes complete sense now, appreciate the explanation.