Hacker News new | past | comments | ask | show | jobs | submit login
Benchmarking Ruby with GCC and Clang (p8952.info)
57 points by p8952 on Dec 17, 2014 | hide | past | favorite | 17 comments

Similar tests (and results) for Postgres http://blog.pgaddict.com/posts/compiler-optimization-vs-post...

It would have been nice to see if the changes actually were noticeable or not. Yes, one version may be faster than another, but simply ordering a set of tests like that doesn't show if it is worth the trouble doing anything about it.

I have linked the data next to the graph, it's called "Raw Data". I'm unsure how best to represent it to show real differences though. The raw scores per test each on their own graph would be the most accurate way, but not very easy to read.

> All tests were run on AWS from an m3.medium EC2 instance

so you're comparing running times measured on a shared host? This is generally not considered a best practice if you want meaningful numbers.

m3.medium is shared in the sense that it is a virtualized instance on top of a hypervisor, however unlike the t2 instance range it provides set resources not burstable ones. Perhaps not the best practice, but it's not an uncommon way of doing benchmarks. TechEmpower Framework Benchmarks[1] are run on AWS for instance.

1: http://www.techempower.com/benchmarks/#section=data-r9&hw=ec...

While not uncommon, the performance variation of most of these virtualized instances is so high that your numbers are probably within the range of acceptable error.

(For reference, I have a team that does nothing but maintain a performance benchmarking harness and test lab, so i know these things pretty cold :P)

I also don't see that the benchmarks were even run multiple times. I tried to follow the ruby-benchmark-suite all the way down the rabbit hole, but don't see anywhere it runs benchmarks multiple times, let alone gives you the variability, etc.

Unsurprised at the superior performance of GCC, but I am surprised that ruby ships with -O3. Why would they choose that optimization level?

Not automatic build system that tries different configurations for standard benkchmarks for each release?

If that isn't done, then, well, it's one louder.

Even better, use different optimization levels per file and benchmark them.

If you have N files, then that's only 3^N builds to test out each possible combination of O, O2, and O3. That shouldn't take too long. /s

Why stop settle for just -O levels? You could fiddle with all flags that might affect performance. You could even write a program that automates this and employs generic algorithms to reach good results quicker...


One thing I'm wondering about is whether running the tests on FreeBSD 10.1 would make a difference.

The core FBSD 10 system is compiled with clang. Since Ruby uses system libraries, the question is if a clang vs. gcc compiled Ruby runtime would produce different results in a clang-compiled environment vs gcc-compiled system.

Hard to know how it would matter, but it does seem conceivable that it might.

Benchmarking on the another OS would be much more than benchmarking compilers. Different OS is different OS, it manages memory differently, schedules processes differently. No way it would be a more "clean" benchmark.

If you want to use as little code produced by different compiler as possible, you can just link libs you need statically, compiling them by whatever compiler you want (of course it's not impossible that you'll run into obscure compiler-specific errors with that, but whatever, it's doable if you want it that much).

This ranking uses a method called Borda Count (https://en.wikipedia.org/wiki/Borda_count). It can lead to quite arbitrary results, for a number of reasons. One example is that being 9th is three times better than being 11th, whereas being first is only marginally better (relatively spoken) than being third.

Better methods are readily available, for example Schulze's (https://en.wikipedia.org/wiki/Schulze_method). I wonder how much these rankings would change...

Would be interesting to see if Os improves performances on O2, especially for 4.9

What compiler optimisation levels were used for Clang?

Sorry, I've updated that now. O2 was used for all Clang variants.

Doing the comparison as a ranking is bad, as the result can change if you add or remove compilers. For example, with two tests T and U and four compilers C, D, E and F:

  T: C D E F
  U: D E F C
Looking at all four, D is better than C. If you hadn't looked at E and F, the two would tie.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact