

Benchmarking Ruby with GCC and Clang - p8952
https://p8952.info/ruby/2014/12/12/benchmarking-ruby-with-gcc-and-clang.html

======
arthursilva
Similar tests (and results) for Postgres
[http://blog.pgaddict.com/posts/compiler-optimization-vs-
post...](http://blog.pgaddict.com/posts/compiler-optimization-vs-postgresql)

------
yxhuvud
It would have been nice to see if the changes actually were noticeable or not.
Yes, one version may be faster than another, but simply ordering a set of
tests like that doesn't show if it is worth the trouble doing anything about
it.

~~~
p8952
I have linked the data next to the graph, it's called "Raw Data". I'm unsure
how best to represent it to show real differences though. The raw scores per
test each on their own graph would be the most accurate way, but not very easy
to read.

------
lorenzhs
> All tests were run on AWS from an m3.medium EC2 instance

so you're comparing running times measured on a shared host? This is generally
not considered a best practice if you want meaningful numbers.

~~~
p8952
m3.medium is shared in the sense that it is a virtualized instance on top of a
hypervisor, however unlike the t2 instance range it provides set resources not
burstable ones. Perhaps not the best practice, but it's not an uncommon way of
doing benchmarks. TechEmpower Framework Benchmarks[1] are run on AWS for
instance.

1:
[http://www.techempower.com/benchmarks/#section=data-r9&hw=ec...](http://www.techempower.com/benchmarks/#section=data-r9&hw=ec2&test=json)

~~~
DannyBee
While not uncommon, the performance variation of most of these virtualized
instances is so high that your numbers are probably within the range of
acceptable error.

(For reference, I have a team that does nothing but maintain a performance
benchmarking harness and test lab, so i know these things pretty cold :P)

I also don't see that the benchmarks were even run multiple times. I tried to
follow the ruby-benchmark-suite all the way down the rabbit hole, but don't
see anywhere it runs benchmarks multiple times, let alone gives you the
variability, etc.

------
krisdol
Unsurprised at the superior performance of GCC, but I am surprised that ruby
ships with -O3. Why would they choose that optimization level?

~~~
mhd
Not automatic build system that tries different configurations for standard
benkchmarks for each release?

If that isn't done, then, well, it's one louder.

~~~
desdiv
Even better, use different optimization levels _per file_ and benchmark them.

If you have N files, then that's only 3^N builds to test out each possible
combination of O, O2, and O3. That shouldn't take too long. /s

~~~
vinkelhake
Why stop settle for just -O levels? You could fiddle with all flags that might
affect performance. You could even write a program that automates this and
employs generic algorithms to reach good results quicker...

[http://stderr.org/doc/acovea/html/acoveaga.html](http://stderr.org/doc/acovea/html/acoveaga.html)

------
Sjlver
This ranking uses a method called Borda Count
([https://en.wikipedia.org/wiki/Borda_count](https://en.wikipedia.org/wiki/Borda_count)).
It can lead to quite arbitrary results, for a number of reasons. One example
is that being 9th is three times better than being 11th, whereas being first
is only marginally better (relatively spoken) than being third.

Better methods are readily available, for example Schulze's
([https://en.wikipedia.org/wiki/Schulze_method](https://en.wikipedia.org/wiki/Schulze_method)).
I wonder how much these rankings would change...

------
jrapdx3
One thing I'm wondering about is whether running the tests on FreeBSD 10.1
would make a difference.

The core FBSD 10 system is compiled with clang. Since Ruby uses system
libraries, the question is if a clang vs. gcc compiled Ruby runtime would
produce different results in a clang-compiled environment vs gcc-compiled
system.

Hard to know how it would matter, but it does seem conceivable that it might.

~~~
krick
Benchmarking on the another OS would be much more than _benchmarking
compilers_. Different OS is different OS, it manages memory differently,
schedules processes differently. No way it would be a more "clean" benchmark.

If you want to use as little code produced by different compiler as possible,
you can just link libs you need statically, compiling them by whatever
compiler you want (of course it's not impossible that you'll run into obscure
compiler-specific errors with that, but whatever, it's doable if you want it
that much).

------
masklinn
Would be interesting to see if Os improves performances on O2, especially for
4.9

------
wbhart
What compiler optimisation levels were used for Clang?

~~~
p8952
Sorry, I've updated that now. O2 was used for all Clang variants.

------
Someone
Doing the comparison as a ranking is bad, as the result can change if you add
or remove compilers. For example, with two tests T and U and four compilers C,
D, E and F:

    
    
      T: C D E F
      U: D E F C
    

Looking at all four, D is better than C. If you hadn't looked at E and F, the
two would tie.

