By the way, I typically see #2 slightly and consistently faster than #1, which doesn't make sense to me. But I wouldn't expect a 3x difference in favor of #1.
The compiler is entirely eliminating your loop because you never do anything with the sum variable. Also, signed integer overflow is UB so you should use unsigned integers or compile with the non-standard -fwrapv option.
I'm not convinced of this. I modified the code to do something with the sums. #2 is now slightly slower than #1, but the absolute times and the ratios are otherwise about the same.
With your modified code I am now seeing #1 run about 3x faster than #2. I also still strongly suspect you weren't measuring anything at all before your modifications. Benchmarking is tricky and its hard to say if you are really measuring what you think you are. This is especially true with UB in your loop. It would be best to separate the program into 3 and then investigate the assembly.
Now working in an Ubuntu VM, since I know the tools better than I do XCode. I reduced TRIALS to 20, to reduce testing time. I also replaced summation with xor. My current code is here: http://pastebin.com/QcNu2A69