

Dyeing the cheese orange – Beware of benchmarks - bslatkin
http://www.onebigfluke.com/2013/12/dyeing-cheese-orange-beware-of.html

======
mistercow
The name for this is a "false proxy", and its confounding effects are
pervasive just about anywhere you look. The problem is that any time you use a
proxy to make decisions, you create an incentive to falsify that proxy.

The worst version of the trap is when one manipulates a proxy and honestly
believes that they are helping.

~~~
Sniffnoy
See also:
[https://en.wikipedia.org/wiki/Goodhart%27s_law](https://en.wikipedia.org/wiki/Goodhart%27s_law)

------
noelherrick
This analogy does not work. He's trying to make the point that we don't know
what caused PayPal's application to speed-up and Node.js may not be the root
cause at all. The cheese analogy is not correct since he's claiming their
benchmark is like the orange color - a false proxy for what you really want:
good tasting cheese. You do really want your application to go faster, though.
It's not a false proxy - it's what this team was optimizing for. They had a
measurable increase in success - they made better cheese.

~~~
raganwald
Yes!

There are two blog posts here, both interesting, but not deeply connected. One
is about false proxies, which is specifically about people manipulating
perceptions.

The other is about getting "measurably better cheese" but not understanding
why it's measurably better. Speed is not a proxy for speed, it's just speed.
The meme here is that correlation does not equal causation.

~~~
bslatkin
The point is speed is a false proxy for the quality of Node.js. The results
don't necessarily justify using it for everything going forward.

~~~
dragonwriter
> The point is speed is a false proxy for the quality of Node.js.

Well, no, speed is the actual measure of interest. The point is that
reimplementations that change language _also_ often change other design
features, and that without knowing what else changed one cannot validate the
attribution of the post-change speed improvement to Node.js.

~~~
bslatkin
I think we're saying the same thing :)

------
jamesaguilar
Most of the nodejs success stories I've read have taken a serial, blocking
system and made it parallel and non-blocking. The wins in this context are
hardly surprising, and say little about the value of nodejs specifically.

~~~
dllthomas
That the latter is idiomatic use of nodejs may itself be of value.

~~~
jamesaguilar
It's idiomatic in a bunch of languages. There are also good ways to do
blocking IO. "We remove the capability to use an entire category of IO" is
probably the weakest feature claim of any language out there.

~~~
dllthomas
Sure. I never made any claim that it unique, or that nodejs is great
generally.

------
mathattack
Good point. Any benchmark can be co-opted, that's why it's useful to have
multiple benchmarks. There's a Dilbert that tells the story on it well. [1]
This is true for a lot of fields. If you over-optimize on individual player
scoring in basketball, your team might play worse even if there is a
correlation between individual scoring and team scoring.

[1]
[http://search.dilbert.com/comic/10%20Dollars%20Bug%20Fix](http://search.dilbert.com/comic/10%20Dollars%20Bug%20Fix)

~~~
teddyh
[http://dilbert.com/fast/1995-11-13/](http://dilbert.com/fast/1995-11-13/)

[http://dilbert.com/fast/1995-11-14/](http://dilbert.com/fast/1995-11-14/)

------
dded
We run a number of compute servers, and we mostly run just a couple of
different types of jobs. (Sort of a traditional 80/20 split: 80% of computes
are spent on a couple types of jobs, 20% is spent on a very large variety of
jobs.)

We benchmark our two important loads whenever we're buying new servers.
Surprisingly, some machines perform significantly better on one load, and
others perform better on the other. To us, it's non-obvious why that would be
so, given what we know about the software running (this software is not
written by us, ours falls in the 20% bin).

In any case, this very simplistic try-the-actual-load benchmarking serves us
very well.

------
greatsuccess
I have never known Paypal to be a company that improves their software. This
may be the first rewrite since the 90s for all I know. Yes they have a
business, but as a consumer and merchant on Paypal the overriding feeling I
have always gotten is that they are on cruise control as far as engineering.

So when they say they changed to node and things got better, I just take it
as, OMG, paypal tried to improved something and it improved.

Im sure it has nothing to do with node.

