Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Then my results disagree with yours. Do you have the source code to your benchmark ?


By the way, I typically see #2 slightly and consistently faster than #1, which doesn't make sense to me. But I wouldn't expect a 3x difference in favor of #1.



The compiler is entirely eliminating your loop because you never do anything with the sum variable. Also, signed integer overflow is UB so you should use unsigned integers or compile with the non-standard -fwrapv option.


I'm not convinced of this. I modified the code to do something with the sums. #2 is now slightly slower than #1, but the absolute times and the ratios are otherwise about the same.

http://pastebin.com/vbUSdKGc


With your modified code I am now seeing #1 run about 3x faster than #2. I also still strongly suspect you weren't measuring anything at all before your modifications. Benchmarking is tricky and its hard to say if you are really measuring what you think you are. This is especially true with UB in your loop. It would be best to separate the program into 3 and then investigate the assembly.


Also make sure you are compiling with optimizations on.


Now working in an Ubuntu VM, since I know the tools better than I do XCode. I reduced TRIALS to 20, to reduce testing time. I also replaced summation with xor. My current code is here: http://pastebin.com/QcNu2A69

Here is what I see with default optimization:

    jao@minizack:/tmp$ gcc -std=c99 x.c
    jao@minizack:/tmp$ ./a.out
    INTERNAL time/scan (usec): 29608.000000		xor: 1722783152
    EXTERNAL UNSORTED time/scan (usec): 26688.105263		xor: 1722783152
    EXTERNAL SORTED time/scan (usec): 279771.105263		xor: 1722783152
    SORTED/UNSORTED: 10.482989
and with -O3:

    jao@minizack:/tmp$ gcc -std=c99 -O3 x.c
    jao@minizack:/tmp$ ./a.out
    INTERNAL time/scan (usec): 3636.421053		xor: 1722783152
    EXTERNAL UNSORTED time/scan (usec): 12114.263158		xor: 1722783152
    EXTERNAL SORTED time/scan (usec): 242141.263158		xor: 1722783152
    SORTED/UNSORTED: 19.988113
The INTERNAL time is 8x faster. EXTERNAL UNSORTED is 2x faster. EXTERNAL SORTED hasn't sped up much at all. My conclusions:

- With -O3, the difference between INTERNAL (#1) and EXTERNAL UNSORTED (#2) is close to what you are seeing.

- Loops cannot be optimized away in this version.

- The fact that -O3 speeds up EXTERNAL UNSORTED by 2x but EXTERNAL SORTED only a little makes sense if the latter is slow due to cache misses.


Well, benchmarking without optimizations enabled isn't very useful :)




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: