Count to 1B in Ruby, Python, R, JavaScript and C++

andy99 · 2024-01-27T16:28:04

Some compilers should be able to optimize the loop away and just display the final value when the code runs.

Edit: discussed previously https://news.ycombinator.com/item?id=35398298

nomilk · 2024-01-27T16:41:20

Great find! I think the loop is essential otherwise it's an unfair comparison. But TIL that compilers can do that, so it's very interesting. I'm now wondering if there's a way to force compilers not to do that (so as to make for fair performance comparisons - e.g. it sounds like rust may do the calculation at compile time and simply print a constant at run time).

Someone · 2024-01-27T17:12:08

The first thing to do is pass the iteration count as an argument.

That makes it less likely that a compiler optimizes this, but doesn’t guarantee it won’t do it (it is fairly easy to prove how often the loop runs and what the value of n is afterwards)

Edit: for getting an idea about what compilers can do, read https://kristerw.blogspot.com/2019/04/how-llvm-optimizes-geo... for examples of what LLVM (the backend of various language implementations, including clang) does.

andy99 · 2024-01-27T17:21:30

Also, what is he really trying to benchmark? Would it be informative to generate 10^9 random numbers and add them up, or is there a specific goal of finding how fast a counter is incremented?

nomilk · 2024-01-27T17:23:21

It’s an arbitrary benchmark, inspired by this: https://youtube.com/watch?v=VioxsWYzoJk

andy99 · 2024-01-27T18:05:54

I got curious and wrote it in assembly (Disclaimer, I haven't written any assembly code since university, it was mostly chatGPT). See the attached gist. This should be the gold standard for a "fair" loop, one that's actually looping in raw assembly. https://gist.github.com/rbitr/0649972657f5d0d959eeb71a5b3e02...

nomilk · 2024-01-28T03:58:29

Found this fun comparison, but this time measuring development time: https://www.youtube.com/watch?v=3PcIJKd1PKU

Someone · 2024-01-27T18:27:12

That removes the binary-to-decimal conversion of the result, which easily could take a significant fraction of running time when using a smart compiler.

andy99 · 2024-01-27T19:23:49

I don't think that is material, it definitely isn't for code that's actually running the loop. I did just try timing it with the pipe to hexdump and of course it makes no difference. One could do the same or just modify the code to include conversion if worried about the impact.

nomilk · 2024-01-27T16:19:32

Results:

c++ 1.129 seconds

python 41.674 seconds

ruby 8.730 seconds

R 13.602 seconds

js (node) 0.481 seconds

What surprises me: Python slower than R and ruby. js faster than c++.

Qem · 2024-01-27T18:59:36

> What surprises me: Python slower than R and ruby. js faster than c++.

Nowadays Ruby includes a JIT compiler.

Someone · 2024-01-27T18:15:23

They compile the “C++” (which isn’t C++ but some implementation of C++, likely Apple clang 15.0.0) without any optimizations.

  g++ -std=c++14 1bn.cpp
  time ./a.out
  1000000000

When I run this on a Macbook Air M2 without optimizations versus with “-O3”, wall clock running time decreases from about 700ms to 6 ms (sometimes less than that)

(Also, with “-O3”, changing the loop limit doesn’t affect running time, so clang optimizes away that loop)