Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for putting this together!

My immediate question was whether another compiler might be more competitive with the assembly. On my Sandy Bridge processor, I found that although GCC was the slowest, the assembly was still the clear winner:

  gcc 4.8.0: 0.801596113 seconds
  icc 14.0.1: 0.739297534 seconds
  clang 3.2: 0.706446818 seconds
  assembly: 0.104038212 seconds
I was surprised by Clang here. It's an older version than the other two, yet fastest. A quick glance at 'perf stat' (which I used for the timing) says that although it's executing fewer instructions per cycle, it's managing to use fewer instructions than the other two.

Although it doesn't seem to make much of a difference here, those are odd flags for Sandy Bridge. Core2 is a previous generation, and it supports AVX which came out after SSE4.2. If you did to specify SB, you'd want the unwieldy "-march=corei7-avx". But probably better just to use '-mavx' or 'march=native', along with -Ofast or -O3.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: