
Optimizations in C++ Compilers - guodong
https://queue.acm.org/detail.cfm?id=3372264
======
orangepanda
> I went home that evening and created Compiler Explorer.

Nice try. You can’t escape being known as a verb now.

Everyone knows the tool as godbolt.

~~~
2bluesc
Interesting tool I've never used, for the uninitiated:

[https://godbolt.org/](https://godbolt.org/)

~~~
dr_zoidberg
Interesting, but broken in Firefox. Could not get it to run and then I thought
"maybe in Chrome..." and then got it to work.

~~~
Smar
What did you do to get it to work?

~~~
dr_zoidberg
Opening in Chrome.

------
usefulcat
Compiler Explorer (aka godbolt) is awesome; I use it at least weekly.

It's amazing how many more code generation questions occur to me now that
there's so much less friction in getting the answers.

~~~
beached_whale
I host it locally for myself too and got the CLion plugin, it is amazing.
Seeing actual codegen in projets with multiple files is really something and
having the same scrolling with ASM or source too.

------
jedbrown
The floating point comment leaves out that one can use

    
    
        #pragma omp simd reduction(+:res)
    

as a more precise way to achieve vectorization in the reduction (compile with
-fopenmp-simd to only use it for SIMD without linking an OpenMP library):
[https://godbolt.org/z/17oTz1](https://godbolt.org/z/17oTz1)

Unfortunately, the pragma is not supported with the new-style class iterators
in a released compiler, though it works in clang-trunk:
[https://godbolt.org/z/hbP11W](https://godbolt.org/z/hbP11W) Note that Clang
disables floating point contraction by default (so no vfmadd instructions),
despite them being more accurate. One usually wants this globally (-ffp-
contract=fast) except when trying to bitwise reproduce software compiled for
pre-Haswell.

------
manch23
> I hope that some of these optimizations are a pleasant surprise and will
> factor in your decisions to write clear, intention-revealing code and leave
> it to the compiler to do the right thing.

This was my key takeaway from this article. Writing clear code that is easier
to maintain will have good enough performance most of the time. I was
particularly impressed with the devirtualization optimizations and will be
less likely to shy away from using polymorphism in future due to performance
concerns.

------
fluffything
> Tail call removal. A recursive function that ends in a call to itself can
> often be rewritten as a loop, reducing call overhead and reducing the chance
> of stack overflow.

Most important: this optimization enables pipelined execution.

When people talk about a CPU executing an integer add instruction in ~1 cycle,
what they actually mean is that the add has this latency when the CPU
pipelines are full.

If you have an 11 stage pipeline... the add can often have a latency of ~11
cycles... if you write the _right_ code for it.

~~~
gpderetta
FWIW, at least on intel cpu call instructions are fully pipelined and
effectively zero latency (although, like all jump instructions there is a
limit of one call every other cycle).

------
amelius
That's a cool bag of tricks. But I'm impressed when compilers start optimizing
programs in the big-O sense.

~~~
stephenbennyhat
Did you see the sumToX() example in the paper?

------
mrlonglong
Godbolt sounds like a top quality brand of quidditch broom :-D

