
GodBolt: Enter C, get Assembly - setra
http://godbolt.org/
======
rzimmerman
Try with gcc 6.2 options -Ofast -march=native:

    
    
      int square(int num) {
        int a = 0;
        for (int x = 0; x < num; x+=2) {
          if (!(x % 2)) {
            a += x;
          }
       }
        return a;
      }
    

All kinds of loop unrolling and vector instructions. Now remove the "!"

~~~
im3w1l
GCC is messing up here. Compare with clang!

[https://godbolt.org/g/HCifdP](https://godbolt.org/g/HCifdP)

~~~
frederikvs
Wait, that contains no loop. Is clang using mathematical rules for series
summation? Awesome!

~~~
janekm
It gets even better after playing with the code a bit:
[https://godbolt.org/g/kUMCpl](https://godbolt.org/g/kUMCpl)

magic constants galore... where does 3435973837 come from? ;)

~~~
frederikvs
It's 0xCCCCCCCD in hex, which looks a lot more structured at least. Though how
exactly this makes sense is still unclear :-)

[edit] and playing around with the increment of x it keeps throwing in some
magical values which are a repeating pattern in hex, with the LSB off by one
usually...

~~~
im3w1l
I thought 0xCCCCCCCD was a very nice number so I asked myself why it would
come out like that. I wondered whether

0.CCCCCCCC.... = 4/5 had anything to do with it.

Using hexadecimal we have

    
    
        5 * 3 = 10 - 1, so 
        5 * C = 40 - 4 so
        5 * (1 + C + C0 + C00 + C000...) = 5 - 4 + 40 - 40 + 400... where the last term is = 0 mod 2^32
             -----
               D
    

This relates to

    
    
        0.CCCCCC.... = 4 / 5
    

Because in order for that to come out right, multiplying 0.CCCC... with 5 has
to have all intermediate terms cancel, leaving only a 4.

So why would it take its digit from 4 / 5 and not 1 / 5? Because it takes x=4
to make the 5-x subtraction in 5 - x + x0 - x0 + x00 to come out to 1.

So this gives us a general rule for inversion (for numbers less than 16). Take
the hexadecimal expansion of (n-1) / n, truncate to get enough hex chars, add
1 to the least significant hex digit.

I.e. the inversion of 3 would be 0xAAAAAAAB

------
xroche
This site is extremely valuable to produce good quality reports of GCC bugs.
For
[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67283](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67283),
I was able to track multiple regressions on a GCC optimization over the
different versions of GCC, within few minutes. Doing the same manually would
have been extremely tiresome. Kudos to the website authors!

~~~
jawilson2
We use this all the time at work. Also, Matt Godbolt sits about 3 rows away
from me, which helps as well!

------
JoshTriplett
One optimization I've always found impressive:

    
    
        #include <stdint.h>
    
        uint32_t testFunction(uint32_t x)
        {
            return ((x & 0xff) << 24) | ((x & 0xff00) << 8) | ((x & 0xff0000) >> 8) | ((x & 0xff000000) >> 24);
        }
    

compiles into:

    
    
        testFunction(unsigned int):
            mov     eax, edi
            bswap   eax
            ret
    

Another fun one, that only works with clang:

    
    
        #include <stdint.h>
    
        int testFunction(uint64_t x)
        {
            int count;
            for (count = 0; x; count++)
                x &= x - 1;
            return count;
        }
    

compiles into:

    
    
        testFunction(unsigned long):
            popcnt  rax, rdi
            ret

~~~
gratilup
As a compiler engineer, those are some of the least impressive things done by
modern compilers. Why? Because there isn't anything smart behind them, it's
pure pattern matching.

~~~
acid__
Now I'm curious, what are the most impressive things modern compilers do?

~~~
wolf550e
probably this: [http://polyhedral.info/](http://polyhedral.info/)

------
samlittlewood
Also, look at Matt's write up of how this works:

[http://xania.org/201609/how-compiler-explorer-runs-on-
amazon](http://xania.org/201609/how-compiler-explorer-runs-on-amazon)

as a pragmatic example (incl. all tools & configs) of how to build an auto
scaling & deploying site, without overdosing on kool-aid.

------
smitherfield
I like the highlighting – I finally understand why people say compilers
generate faster code than a human could write. Pasted in a pretty
straightforward and already fairly optimized function I'd written and of
course got back something also pretty straightforward, then I put in "-Ofast
-march=native" and, wow. Completely rearranged everything in a totally non-
linear way and generated all sorts of bizarre (to me) instructions. I think
I'd need to study it for months if not years to understand what the compiler
did.

~~~
dom0
Do realize that no Linux distribution ships such binaries. They do the
equivalent of -march=$worstSupportedProcessor (eg x86_64). To actually get any
benefits from the smart compiler on these you'll need to use FMV and
proprietary compiler extensions (eg. GCC per function options), write runtime
dispatch code etc. (GCC also has proprietary extensions for that).

~~~
arthur2e5
Solus Linux builds some libraries into /usr/$LIB/avx and modified glibc's ld
loader to load these libraries if the CPU happens to have avx2.
[https://github.com/AOSC-Dev/aosc-os-
abbs/issues/522#issuecom...](https://github.com/AOSC-Dev/aosc-os-
abbs/issues/522#issuecomment-266824590)

(In that issue I am trying to figure out some FMV stuff.)

------
jdub
Cool! Certainly quicker than loading up cross compilers (even when they're so
easy to get on Debian/Ubuntu), building a binary, and running the right
version of objdump.

The Rust Playground at [https://play.rust-lang.org/](https://play.rust-
lang.org/) has a similar function, letting you check ASM, LLVM IR, and MIR
(Rust's mid-level intermediate representation) output for current versions of
the Rust compiler.

~~~
masklinn
FWIW godbolt also does rust
([http://rust.godbolt.org](http://rust.godbolt.org)) which is convenient
because play.rust-lang.org doesn't perform the aggressive cleanup/removal of
assembly godbolt does.

~~~
kibwen
Though it looks like godbolt only has a Rust compiler up to version 1.11
(current stable is 1.13), so it may not be including optimizations in the
current Rust release.

~~~
steveklabnik
[https://github.com/mattgodbolt/gcc-
explorer/issues/139](https://github.com/mattgodbolt/gcc-explorer/issues/139)

------
jerven
Really cool, it surprised me to see even trivial code give very different
results in gcc,icc and clang.

    
    
      int retNum(int num, int num2) {
          return (num * num2)/num;
      }
    

gives this in clang

    
    
      retNum(int, int):     # @retNum(int, int)
            mov     eax, esi
            ret
    

While icc and gcc give

    
    
      retNum(int, int):
            mov     eax, esi
            imul    eax, edi
            cdq
            idiv    edi
            ret
    
      retNum(int, int):
            imul      esi, edi                                      
            mov       eax, esi                                     
            cdq                                                     
            idiv      edi                                           
            ret
    

The clang version at first sight seem right. But then thinking about it this
is integer math.

    
    
      4/3 := 1

1 * 3 := 3 leads to 3 != 4 I believe gcc and icc returning 3 there is correct,
while clang returning 4 is not. Maybe someone more C/int versed can tell us
which are acceptable (knowing C both might be ok)

~~~
anonymouz
Funny, if you swap the order of the operands in the multiplication, GCC 7.0
with -O3 does optimize it away:

    
    
      int retNum(int num, int num2) {
            return (num2 * num)/num;
      }
    

now becomes

    
    
      retNum(int, int):
              mov     eax, esi
              ret

~~~
harpocrates
It would be an interesting project to take in some (small) source program, and
try all sorts of algebraic rearrangements (maybe just associativity and
commutativity of addition and multiplication) to see if there is any
noticeable performance change.

~~~
matt_d
STOKE, a stochastic superoptimizer, is pretty interesting in this context:
[http://stoke.stanford.edu/](http://stoke.stanford.edu/) &
[https://github.com/StanfordPL/stoke](https://github.com/StanfordPL/stoke)

------
wibr
I know C but no Assembler, so this looks like a good way to get more familiar
with what's going on under the hood. It would be really neat if you could
click on each instruction/register to get a short summary, though.

~~~
rocky1138
This is such a good suggestion.

I'd love to see Motorola 68000 support.

------
geofft
Note that "GodBolt" isn't some clever Zeus-inspired service name, "Godbolt" is
just this fellow's last name.

~~~
AndrewOMartin
But the guy fellow himself may have some divine inspiration, as his work is
often next to godliness.

------
pjmlp
For D, [https://d.godbolt.org/](https://d.godbolt.org/)

Also this is actually a repost from one year ago,

[https://news.ycombinator.com/item?id=11671730](https://news.ycombinator.com/item?id=11671730)

~~~
suprjami
Was gonna say, pretty sure I know about this site because of HN.

------
it
There's a Go version too: [https://go.godbolt.org](https://go.godbolt.org).

~~~
asymmetric
Yes, and the title is actually wrong, GodBolt is the last name of the creator,
not the project's name (which I believe is Compiler Explorer)

------
cfv
Can anyone please explain why this naive pow() implementation is so freaking
huge? I lack the chops to figure this out properly
[https://godbolt.org/g/YFvNWa](https://godbolt.org/g/YFvNWa)

~~~
jcranmer
The loop was unrolled with a large factor.

The loop was also vectorized, so that each multiply is a 4-element vector
multiplication, with the final result being a horizontal reduction.

The rest of the mess is trying to account for all of the cases where the input
power is not a multiple of 128.

------
zitterbewegung
I put in Duff's Device looked interesting.
[https://godbolt.org/g/SVhqKP](https://godbolt.org/g/SVhqKP)

[https://en.wikipedia.org/wiki/Duff's_device](https://en.wikipedia.org/wiki/Duff's_device)

------
beardog
Is this just a toy or does it have any place at all in real assembly projects?
(I don't know assembly besides having tinkered with it a bit)

Cool project regardless.

~~~
jdub
It's really useful to see what key pieces of code look like in assembly
(though easy enough to do locally), and reaaaaaaaallllly useful to see what
they look like across compilers and architectures.

Instead of finding (or worse, building) a compiler or cross compiler locally
and doing all the boring steps to compile properly (harder than it sounds) and
disassemble the code, you can just splat some code into this and take a peek.

Less about "assembly projects", more about breadth of info available to a
developer writing C/C++.

------
nowne
This is an amazing tool, I use it almost daily. Whenever I want to test an
idea, or see what is going on in the assembly, I go straight to godbolt.

~~~
source99
What kind of work do you do?

~~~
chrisseaton
I also use it at least weekly - working on a Ruby JIT I want to see what
assembly code is generated from simple C code and compare that to what I'm
getting.

------
syphilis2
I can't edit the code samples using a mobile Firefox browser. Attempting to
delete text and then type new text results in the deleted text reappearing
appended to whatever new text I typed.

~~~
mattgodbolt
Sadly the mobile support is pretty bad. It's on my list to fix at some point
but requires a bunch of changes in the underlying window library (golden-
layout.com), plus some work to reconfigure the layout for mobile.

------
Roboprog
Cool. I missed the target option list on the right the first time I looked at
the page.

The closest I've ever come to this back in the day was running (Borland) Turbo
Debugger's assembly view after building with Turbo C (w/ or w/out -O...), as
either 8086 or 80286 output - "x16". Yeah, that was a while back :-)

------
flukus
That's really cool. Is it actually compiling in the background? Is this a tool
you wrote?

~~~
pvg
It is running the specified compilers. The github page has more info on how it
works.

[https://github.com/mattgodbolt/gcc-
explorer](https://github.com/mattgodbolt/gcc-explorer)

------
ndesaulniers
folks interested in doing this locally should play around with objdump

------
rawnlq
Xcode has a disassembly view that's really similar (and works with lldb!).

After moving back to linux I haven't found a replacement ide that's as nice
for doing c++ development.

~~~
hitlin37
i have heard good things about clion. codelite works fine too for basic needs.

------
finbar
Awesome project. And Just in time for an assembly-intensive computer
organization final

------
nwmcsween
What would be incredibly useful is if I could do this within vim doing side-
by-side split.

------
selckin
Java version @ [https://javap.yawk.at/](https://javap.yawk.at/)

~~~
the8472
that's just generated bytecode, not the assembly output on a particular
platform generated by the optimizing JIT.

Most of the optimizing magic happens in the JITs, not javac.

------
rado
Reminds me of SPHINX C--, which was great.

------
anthk

       square(int):
    
            mul     r0, r0, r0
    
            bx      lr
    
       Mips is damn awesome.

------
teddyuk
we need a DevilBolt: Enter Assembly, get C!

------
cptank420
C

------
AstralStorm
Pity it isn't a recompiler.

------
poseid
really nice project - i was playing with compilers and javascript a while ago
see [http://embeddednodejs.com/compiler](http://embeddednodejs.com/compiler)
\- as a simple experiment to learn about javascript's role in compilers

------
Annatar
Should be "get assembler". Assembly or assembling is the process of using an
assembler to assemble assembler code, which is actually mnemonics.

~~~
bbcbasic
I like to call it machine code.

~~~
kaoD
But it isn't. Machine code is for computers, assembly for humans.

Although assembly is machine code represented with mnemonics, they're not the
same: assemblers take assembly and produce machine code.

