
Averages Can Be Misleading: Try a Percentile (2014) - donbox
https://www.elastic.co/blog/averages-can-dangerous-use-percentile
======
baq
IMHO plotting the distribution should be the first step before trying to
compute its statistics. If you know the shape, you can understand the values -
otherwise it's guesswork.

~~~
jmngomes
Agreed, this is actually demonstrated by Anscombe's quartet, a set of "four
datasets that have nearly identical simple descriptive statistics, yet appear
very different when graphed"
([https://en.wikipedia.org/wiki/Anscombe%27s_quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet))

~~~
stargazer-3
... although median and percentiles would be different within the Anscombe's
quartet.

~~~
Sean1708
But there are other datasets that would do the same thing with medians.

------
Rafuino
This topic always leads me to think about this great talk from Gil Tene on how
NOT to measure latencies (basically, don't use averages!).

[https://www.youtube.com/watch?v=lJ8ydIuPFeU](https://www.youtube.com/watch?v=lJ8ydIuPFeU)

I'm also a huge fan of how Dormando showed latency distributions in one of his
recent Memcached Extstore posts. The default is 95th percentile but you can
change the percentile to what matters to you (i.e. 99th percentile if you ask
me!). Scroll down to see what he did and play with it.

[https://memcached.org/blog/nvm-multidisk/](https://memcached.org/blog/nvm-
multidisk/)

~~~
scott_s
I felt a major point of his talk was you should also look at your _max_
latency.

~~~
Rafuino
Yeah good point. Max and the various 9s that matter to your organization.

------
cromulent
There's a great story on _99% Invisible_ about averages, particularly when
used to design cockpits for the average pilot.

[https://99percentinvisible.org/episode/on-
average/](https://99percentinvisible.org/episode/on-average/)

------
sohkamyung
Check out this comic on "Why Not to Trust Statistics" [1]. His book, "Math
With Bad Drawings" [2] has a chapter on statistics and why not to trust a
single statistical measure only.

[1] [https://mathwithbaddrawings.com/2016/07/13/why-not-to-
trust-...](https://mathwithbaddrawings.com/2016/07/13/why-not-to-trust-
statistics/)

[2] [https://mathwithbaddrawings.com/2018/05/23/math-with-bad-
dra...](https://mathwithbaddrawings.com/2018/05/23/math-with-bad-drawings-the-
book/)

------
camel_gopher
Percentiles can be misleading, try a histogram -
[https://www.circonus.com/2018/11/the-problem-with-
percentile...](https://www.circonus.com/2018/11/the-problem-with-percentiles-
aggregation-brings-aggravation/)

------
novaleaf
My own solution, which might be useful to those using javascript (nodejs or
browser):

I use mathjs.quantileSeq() and log 0%, 25%, 50%, 75%, and 100%. This seems to
be good for "casual metric logs".

I've found that this gives a good shape of the data, as well as the absolute
min/max values. If you use 1% or 99% you'll miss the absolute worst
performers, and I want to be at least aware of what the worst performance
numbers are.

[https://mathjs.org/](https://mathjs.org/)

[https://mathjs.org/docs/reference/functions/quantileSeq.html](https://mathjs.org/docs/reference/functions/quantileSeq.html)

------
LiamPa
Site Reliabilty Engineering goes over this in a lot more detail.

[https://landing.google.com/sre/books/](https://landing.google.com/sre/books/)

~~~
spenthil
Specifically Chapter 4, under "Aggregation"
[https://landing.google.com/sre/sre-book/chapters/service-
lev...](https://landing.google.com/sre/sre-book/chapters/service-level-
objectives/)

------
phosfox
Reminds me of “Don’t cross a river if it is four feet deep on average.” —
Nassim Nicholas Taleb

~~~
mitchtbaum
thx.. good summary: [http://greatesthitsblog.com/the-black-swan-nassim-
nicholas-t...](http://greatesthitsblog.com/the-black-swan-nassim-nicholas-
taleb/)

------
mikorym
I've used Elasticsearch + Kibana for agricultural data and similarly
"expanded" the view out from averages to time series.

People in agriculture love averages and it makes a lot of sense in financial
data since averages preserve totals e.g.:

50 ton / ha average over 100 ha = 5 000 tons

At the same time summing each individual ha gives you 5 000 tons total.

But once you realise that you can expand on this, things get _really_
interesting. I don't know of other people working on the same problems that I
am working on, but they are relevant both economically (in the sense of making
money) and environmentally (in the sense of improving efficiency and managing
climate).

------
SketchySeaBeast
More knowledge is always better, but percentiles are a little misleading as
well - the 99% at 867 ms latency makes you have a moment of panic, but when
you see that 95% is 60 ms, then you really realize how few of your visitors
are experiencing the slow response. Might it be a problem? Possibly, and I has
brought awareness to that potential, but it also has the possibility to blow
it out of proportion if you don't look at the rest of the data.

Edit: I'm not saying Averages are better, but that Percentiles can be
misleading as well.

~~~
Pfhreak
Having multiple percentiles is key, but I can't think of a time when I've ever
found average to be useful.

~~~
dragontamer
The average (or more specifically: arithmetic mean) has a number of key
properties that allows for advanced analysis.

If you have two pieces of data: say the average roll of a D6 (6-sided dice) +
average roll of D20 (20-sided dice), you wouldn't be able to do anything with
percentiles.

The 90% percentile roll of D20 is 18. The 90% percentile roll of D6 is 5(ish).
But the sum of these numbers tells us nothing (18 + 5 == 23, which tells us...
nothing).

In contrast, the mean / average roll of D20 is 10.5, while the average roll of
D6 is 3.5. The average roll of D20 + average of D6 is 14.

You can add and subtract averages just fine, combining separate pieces of data
to make a larger conclusion. You cannot do the same with percentages.

\----------

There's nothing "misleading" about averages. Its just that most people are
awful at statistics. The real lesson is... learn statistics, so that you can
learn to use these tools correctly.

Averages, and standard deviation / variance, are excellent for combining data.
Its a blurry picture, but its still mathematically correct. Percentiles /
quartiles / graphs are more precise and allow for deeper conclusions... but
its not always possible to create a percentile graph, especially if you cannot
directly measure some attribute. "Indirect measurement", by way of arithmetic,
averages, and standard deviation, is quite useful.

\----------

EDIT: While I'm talking statistics, don't forget about the three kinds of
averages: Arithmetic Mean is the most common, but is often meaningless. You
have to also understand geometric mean, and harmonic mean.

Multiplicative problems should use a geometric mean. Harmonic Mean is used
when comparing speeds (roughly). Going 60MPH for 10 miles, and then going
120MPH for 10 miles is averaged using Harmonic Mean.

Benchmarks should typically use harmonic mean: the arithmetic mean is
meaningless in the scope of benchmarks. Etc. etc.

But this is all statistics, that for some reason is rarely taught well in the
US High School level. IMO, its an issue with our school system... we really
should be teaching more statistics, especially given how common the analysis
of data is in today's world.

~~~
coldtea
> _There 's nothing "misleading" about averages_

Well, the fact that the average of my wealth and a Bill Gates' wealth is
dozens of billions of dollars shows why averages are misleading -- and
frequently used to give a false image with statistics.

Misleading doesn't necessarily mean mathematically or factually wrong.
Something can be totally true, and still give the false impression, and
averages do that.

And saying "people should learn statistics" wont change this. Even knowing
statistics, the average doesn't tell me much.

~~~
SketchySeaBeast
> Even knowing statistics, the average doesn't tell me much.

Depends on what you're doing. Bill Gates and my wealth averaged won't tell me
anything, but taking my average speed on the highway will give me a good
estimate on when I'll get to my destination - in this case the 99th percentile
is only interesting to highway patrol, and won't help me estimate when I'll
get home. It depends on what you want out of the information.

~~~
coldtea
> _but taking my average speed on the highway will give me a good estimate on
> when I 'll get to my destination_

Like in the Bill Gates example, if you had a clear highway and went with 100
mph for the first hour, and then hit a busy traffic and are going with 20 mph
for the last 5 minutes (with no foreseeable end in sight), the average wont
help you much.

It's only really useful for this purpose when there is constant speed (as in
school math problems), or few extremities (or when any extremities neatly
cancel out).

See how average can be misleading?

~~~
SketchySeaBeast
If the last 5 minutes of a trip are incredibly slow, there's no chance you'll
be able to use estimates to determine when you'll get there anyways, as at
that point you're not actually estimating, you're now there.

~~~
coldtea
Which is neither here, nor there.

This is not about literally seeing into the future, and knowing about
unforeseen events.

This is about making the best prediction with the data you already have.

And if (as per example) you had "1000 miles of 100mph and 100 miles of 5mph in
some traffic jam" with 500km left to go, the average will be misleading as it
will lose too much info to be able to help you give the best prediction.

In that situation I wouldn't reach for the average, to make a prediction to
the person waiting for me. Would you?

~~~
dragontamer
> And if (as per example) you had "1000 miles of 100mph and 100 miles of 5mph
> in some traffic jam" with 500km left to go, the average will be misleading
> as it will lose too much info to be able to help you give the best
> prediction.

This is literally the use case of the harmonic mean, a type of average.
Harmonic mean gives you the accurate estimate in this case.

As I stated before: its about using the correct calculation for the correct
reasons. It requires an understanding of statistics, and the 5-different-kinds
of average (mode, median, arithmetic mean, geometric mean, harmonic mean) to
pick the correct one for any particular use case.

The arithmetic mean in this case is simply wrong. Just... wrong. It is
meaningless. You have to use the harmonic mean.

\--------

It takes experience and practice to know which "average" to use. But anybody
can learn it if they put forth the effort. It is definitely teachable to high
school, and even middle-school children. (Its just not commonly taught in
America for some reason).

------
Lightbody
One of my favorite (short) talks on this topic. Well worth a few minutes of
your time:

[https://www.youtube.com/watch?v=coNDCIMH8bk](https://www.youtube.com/watch?v=coNDCIMH8bk)

~~~
Aengeuad
I know it's in the spirit of the talk, but the histogram at 10:45 and the
related discussion about how the latency improved for most users but the
average latency increasing meaning a worse experience for other users reminds
me of the anecdote a Google engineer had when Youtube started rolling out
their HTML5 player, the responsiveness of the page had increased but the
average latency graphs went up. This wasn't down to it being a bad update, or
some users getting a worse experience - not really anyway, but the switch to
the HTML5 player allowed a wider audience to start using Youtube where they
wouldn't have been able to do this previously. A change increasing average
latency, even on a histogram, doesn't necessarily mean it's a bad change. Look
at your data indeed.

