
When competing with C, fudge the benchmark - FrozenXZeus
https://medium.com/@n0mad/when-competing-with-c-fudge-the-benchmark-16d3a91b437c
======
gus_massa
I like the article, but I don't agree with the conclusion.

> _When profiling, it is important to always run your code over an extended
> period of time, preferably seconds, in order to smooth out any
> irregularities that may occur._

I agree. Moreover, for very small differences (1%-5%) I prefer that the
benchmark is long enough that the difference is at least 200ms, or preferably
1 second.

> _When running extremely short pieces of code, in this case, around 500
> nanoseconds (The original measurements did not have units, this is an
> educated guess)._

I looked at the other article [https://markkarpov.com/post/migrating-text-
metrics.html](https://markkarpov.com/post/migrating-text-metrics.html) . The
graph of the times don't look like the graph of a benchmark of 500ns~=1us.
They are too clean, a graph of a measurement of a few ms is very noisy, lots
of crazy values that make the results impossible to compare. The graphics have
two nice sets of bars that are apparently following a smooth curve (not drawn
in the graph). So it was definitively not a measurement of a few .5us. He was
repeating it many times to get a long enough time.

> _In my benchmarks, I’ve made the C code and the Haskell code run on inputs
> of 100 thousand, 1 million, and 10 million long strings to get some reliable
> measurements. Each measurement is done 1000 times to make sure we can obtain
> some decent results._

For comparison, the original article uses strings of length 8--160, i.e. 1000
times smaller. So he _is_ now changing the benchmark. Perhaps Haskell has a
sweet spot with smaller strings, where the compiler can be smarter. Perhaps
memory use of the algorithm is different. So this is definitively a different
benchmark. (A 10-100 character string is probably more usual in an application
of the Hamming distance.)

He should have measured the benchmark of a 100 character string with 1000000
repetitions, instead of using a 100000 character string with 1000 repetitions.

The run time is not linear in the length of the string, it looks quadratic. So
I guess it would be better to benchmark using a 100 character string with
1000000000 repetitions. So it is necessary to make some tweaks to make the
comparison fair.

One of the problems is that the other article doesn't have a link to the
complete code for the benchmark, so you must guess the details. How are the
strings generated? Which distribution? How many repetitions? This article has
a link to the code in github, so you can see all the small hidden asumptions.

~~~
bjourne
For so small strings, the overhead will dominate. In particular the time to
CALL and RETurn from the function. Haskell actually has an advantage there
because it can use tail call optimization, obviating the need for CALL/RET,
which a C function compiled in isolation can't.

Which is why you should always use the C99 "inline" keyword for performance
sensitive functions.

