

The danger of using averages alone in web analytics - pathdependent
http://highscalability.com/blog/2012/5/23/averages-web-performance-data-and-how-your-analytics-product.html

======
NyxWulf
I agree with many of the points in the article, however it is conflating two
unrelated things. Using simple averages in isolation is almost always a
mistake. The real story in a system almost always lies in it's variation and
not in the average. There are many ways to look at that, one of them is a
histogram, although I tend to prefer time series graphs with moving ranges.
One of my favorites is the Xmr chart or any of a number of statistical process
control charts. Using those you can get a much better understanding of systems
over time.

The transition to Real User metrics is not at all related to using an average.
All recording and instrumentation processes will benefit through using more
sophisticated and nuanced tools than a simple average. Real user metrics are
an important data point, but they lack certain information you can get from
external monitoring systems. We use both pingdom and catchpoint, by far my
favorite is Catchpoint because I can see things like what ISP is involved with
a slow request, what geographic region, etc. I can also get scatter plots and
nice statistical graphs around median, geometric mean, 75th, 95th, 99th
percentile.

So in short the main points are good, simple averages are misleading.
Capturing end user performance data is good. Not using external monitoring
though isn't a good idea because there a number of things that you can
identify if you have that insight.

~~~
joshfraser
While you're right that they are two separate topics, they are quite closely
related. Both are about the search for truth and getting a clearer picture of
what's actually happening on your site.

Looking at your 95th percentile with RUM data means something totally
different than looking at your 95th percentile with synthetic data. With RUM
you are looking at your actual visitors across every page on your site. With
synthetic testing (Keynote, Catchpoint, etc) you're looking at a small sample
of pages from random nodes in random locations around the world. The problem
with synthetic testing is that it makes all sorts of assumptions about your
visitors like their browser, geography, connection speed, state of browser
cache, etc. This doesn't mean that synthetic testing isn't useful (it is!),
but it's important to recognize the shortcomings of your methodology whether
that's from looking at an average or looking only at synthetic data.

------
bluesmoon
Disclaimer: My company also does web performance analytics.

I did a talk a couple of years ago on the statistics of web performance where
I cover things like median, arithmetic mean, geometric mean, margin of error
and sample sizes to carry out proper data analysis. Slides available here:
<http://www.slideshare.net/bluesmoon/index-3441823>

To zashapiro's point about Geometric Mean... while it tends to be superior in
the ideal case where the distribution is perfectly Log-normal, on average,
most distributions tend to sway from a perfect log-normal. The median gives
you a slightly better measure of central tendency in this case.

Secondly, there's the problem with user perception of geometric standard
deviation (and consequently margin of error). Unlike arithmetic standard
deviation which is +-, the geometric standard deviation is */, which means
it's not visually symmetric... humans have an easier time visualizing additive
symmetry than they do with multiplicative symmetry.

At LogNormal.com, we track the median, arithmetic mean, geometric mean a whole
bunch of percentiles and margins of error and a complete distribution curve.

~~~
joshfraser
Philip, thanks for sharing the link to those slides. I didn't think it made
sense to go that deep on my article, but it's a nice reference for anyone who
wants to really dig in.

~~~
bluesmoon
True, it was hard enough explaining those concepts in person ;)

------
jmduke
Or, put more simply.

The average of [0,100,200] and [99,100,101] are both 100. And yet these two
data sets are clearly different.

Measures of central tendency should always be supported by measures of
dispersion (range, standard deviation, etc.) Not just with web analytics.

~~~
shabble
For a compelling example, see Anscombe's Quartet:
<https://en.wikipedia.org/wiki/Anscombe%27s_quartet>

------
zashapiro
I'm surprised you didn't you mention geometric means. Seems like that would
also be a relevant way to look at performance data.

~~~
pathdependent
Sometimes that's better.

I think the more important point is that knowledge of the underlying
distribution -- normal (Gaussian), log-normal, exponential, power law,
Weibull, etc -- is very important. The more skewed the underlying
distribution, the less relevant the mean becomes -- and it can cause you to
make some very bad inferences.

