
Benchmarking correctly is hard (and techniques for doing it better) - ingve
http://jvns.ca/blog/2016/07/23/rigorous-benchmarking-in-reasonable-time/
======
gus_massa
Another recommendations for informal benchmarks:

It should take between 5 and 10 seconds. (With more than 10 seconds it gets
boring.) With very short times, there is a lot of noise and you can confuse
the noise with a small signal. You can do benchmarks that are shorter but then
you must open a statistics book and read it carefully before reaching
conclusions.

Repeat it at lest 5 times. (Preferably in some order like ABABABABAB, not
AAAAABBBBB.) With 5 repetitions you can get a rough estimate of the variation,
and if the variation is much smaller than the difference then perhaps you can
skip the statistic book. Otherwise increase the run time or increase the
repetitions and use statistics.

At least once in a while, run the benchmark method against two copies of the
same code. Just make two copies of the function and benchmark the difference
between them. The difference should be small because the noise will make it
non zero. If you can prove that one of the two exact copies is much faster
than the other copy, then your benchmarking method is wrong. (This is very
instructive, it's much easier to learn about the possibilities of benchmarking
noise doing a few experiments than reading all the warnings in the books.)

------
joshstrange
> I don't point this out to make fun of the researchers for coming up with an
> incorrect result. I'm pretty sure they're way better at performance analysis
> than I am. Instead, I think this is a really good illustration that
> benchmarking programs and figuring out which one is faster is really hard --
> much much harder than you might think.

And this is how you point out a problem then offer a solution. While not
connected at all it brings to mind this story [0] (and my response [1]) from a
couple days ago, mainly for the stark contrast.

[0]
[https://news.ycombinator.com/item?id=12135484](https://news.ycombinator.com/item?id=12135484)

[1]
[https://news.ycombinator.com/item?id=12137433](https://news.ycombinator.com/item?id=12137433)

------
eatbitseveryday
Tim Harris gave a talk[1] earlier this year illustrating the pitfalls of
measurements and analysis in systems research.

[1]
[https://timharris.uk/misc/2016-nicta.pdf](https://timharris.uk/misc/2016-nicta.pdf)

------
acd
Some mobile phones optimise for certain popular benchmarks for example by
overclocking during testing. Thus if you rerun the test over and over you
would get a different result due to thermal throttling.

In application space, as far as I know the best way to test is to compare real
work loads over prolonged periods of time. Getting real test data can be hard
but not impossible for example recording alpha/beta users using the app.

------
0xmohit
Previously posted at
[https://news.ycombinator.com/item?id=12151146](https://news.ycombinator.com/item?id=12151146)

~~~
chrisseaton
There's no comments there so not much point linking to it.

~~~
0xmohit
Does that imply that HN creates a new link if a previous submission has no
comments?

The previous one was posted some 12 hours before this one.

~~~
dang
HN creates a new link after 8 hours if the previous post didn't get
significant attention. It also treats different URLs as different stories,
even if they only differ a little.

This approach lets a lot of dupes through, but we do it that way because it's
more important to let good stories have multiple chances at attention.
Otherwise it's mostly a lottery which ones achieve liftoff from /newest.

It's true that the convention here is only to post links to previous
submissions when there are comments on them (or in rare cases, no comments but
many points), since it's pointless to look at them otherwise.

------
mrb
This reminds me of what I wrote a few years ago: "Many SSD Benchmark Reviews
Contain Flaws" \- [http://blog.zorinaq.com/many-ssd-benchmark-reviews-
contain-f...](http://blog.zorinaq.com/many-ssd-benchmark-reviews-contain-
flaws/)

