Hacker News new | past | comments | ask | show | jobs | submit login
Go vs. C vs. pypy/Python loop performance comparison (karlheinzniebuhr.github.io)
5 points by karlheinz_py on Sept 29, 2015 | hide | past | web | favorite | 7 comments



Looks like the go code is using an int, but the C code declares a long.


It also is buggy; it declares a signed long, but prints an unsigned one.

My money on where the time is spent is in the printing of the result, by the way, not in the loop. The C program parses a format string and does more output (the characters 'sum: '). Buffering/not buffering also may play a role.


I'll take your money.

First, a change because the optimizer will pre-compute the sum:

  #include <stdio.h>
  #include <stdlib.h>
   
  int main (int argc, char *argv[])
  {
    long a, bound=10000000;
    long sum = 0;
    if (argc > 1) {
      bound = atoi(argv[1]);
    }
    /* for loop execution */
    for( a = 0; a < bound; a++ ) {
      sum += a;
    }
    printf("sum: %lu\n", sum);
    return 0;
  }
I'll time the performance across a range of values.

  % cc tmp.c && time ./a.out 1000000
  sum: 499999500000
  0.005u 0.000s 0:00.00 0.0%	0+0k 0+0io 1pf+0w
  % cc tmp.c && time ./a.out 10000000
  sum: 49999995000000
  0.030u 0.000s 0:00.03 100.0%	0+0k 0+0io 1pf+0w
  % cc tmp.c && time ./a.out 100000000
  sum: 4999999950000000
  0.303u 0.000s 0:00.33 90.9%	0+0k 0+0io 1pf+0w
You'll notice that my timings for 10000000, at 0.03 seconds, was comparable to the timings in the essay.

If the printf overhead were the dominate cost then we would expect to see less variance in the timing. Instead, we see there's a linear increase, which is exactly what we expect if the loop is the primary cost.

So no, most of the time is not spent in printing the result.

I'll also test with optimizations enabled:

  % cc -O3 tmp.c && time ./a.out 1000000
  sum: 499999500000
  0.000u 0.000s 0:00.00 0.0%	0+0k 0+0io 1pf+0w
  % cc -O3 tmp.c && time ./a.out 10000000
  sum: 49999995000000
  0.000u 0.001s 0:00.00 0.0%	0+0k 0+0io 1pf+0w
  % cc -O3 tmp.c && time ./a.out 100000000
  sum: 4999999950000000
  0.000u 0.000s 0:00.00 0.0%	0+0k 0+0io 1pf+0w
  % cc -O3 tmp.c && time ./a.out 1000000000
  sum: 499999999500000000
  0.000u 0.000s 0:00.00 0.0%	0+0k 0+0io 1pf+0w
The optimizer does pretty well on this code.


"First, a change because the optimizer will pre-compute the sum:"

You are changing the rules; you will not get my money :-)

And thanks for the educational reply.


To be fair, you didn't say how much or how I would get it. I imagine if I were to show up at your doorstep with cap in hand, I might get a penny off of you.

If the optimizer is enabled, and the upper bound hard-coded, then the compiler pre-computes the loop. This is the entire code from llvmgcc:

  _main:
  0000000100000f10	pushq	%rax
  0000000100000f11	leaq	72(%rip), %rdi ## literal pool for: sum: %lu
  
  0000000100000f18	movabsq	$49999995000000, %rsi
  0000000100000f22	xorb	%al, %al
  0000000100000f24	callq	0x100000f34 ## symbol stub for: _printf
  0000000100000f29	xorl	%eax, %eax
  0000000100000f2b	popq	%rdx
  0000000100000f2c	ret
Which means you are right - it's hard to beat a pre-computed constant.

I think I owe you a penny.


thanks for this insight, I've countered the problem with command line args. Now the C compiler can't guess how many loops will be made and therefore the test shows the same speed as C without using optimisation. But Go keeps having consistent speed, faster than C. I'm curious what kind of optimisation the Go compiler applies. See the update for further details..


Edit: I thought C optimisation wasn't working but I repeated the tests and now C keeps having the same speed with optimisation despite passing the number as command line argument. This also suggests that the value is not calculated at compile time..




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: