
A summary of how not to measure latency - juanrossi
http://bravenewgeek.com/everything-you-know-about-latency-is-wrong/
======
cortesoft
Some of this is good, but the idea that you can determine your likelihood of
experiencing a 99th percentile latency on a webpage by the naive probability
calculation shown (1 - .99^n where n is the number of objects requested on a
page) is silly. That is assuming that latency is completely randomly
distributed across all objects and all clients to a page.

This is completely not true. Latency is very dependent on the client
requesting and the object being requested. You are going to get clustering,
not an even distribution.

~~~
JaRail
+1. The majority of those requests on major websites don't matter. They are
optional elements/scripts and prefetches for possible actions.

Now, if the point is that _something_ will be delayed, that's true. And it's
true that many people don't realize that. There's the classic example where if
everyone tries to be five minutes early, your group is still going to be late.

The real lesson is to analyse your critical path to death and ensure it is as
resilient as possible. And if possible, real data is buckets more meaningful
than conventional load tests.

I also don't see anything in here about how to get meaningful metrics from
real users. The W3C Navigation Timing API has really shed a ton of light into
things commonly forgotten.

------
jonaf
Doesn't this assume a single-threaded application? The example of a clerk's
service time and people waiting in line is oversimplified. Modern systems have
maybe 100 clerks per store, and many stores; how do you perform a "Ctrl+Z"
test in this case? Even if you had a perfectly divided line of people waiting
at each cashier in each store (machine), the worst case would be experienced
people in line for the store or clerk with a reduced service time. Thus, for
accuracy, you would need to measure queue depth at the maximum latency per
thread (clerk) and add that latency to each subsequent request until you serve
the number of reuqests in your queue. This kind of math requires constant
sampling that would slow down any system so dramatically it would defeat the
purpose. I think this becomes even more clear when you consider that most such
systems have load balancing strategies that further mitigate queue depths such
that they are intentionally distributed based on which backend services have
the lowest historical latencies (and yes, I realize these algorithms are
likely plagued by the same "omission conspiracy" mentioned -- but they
certainly don't uniformly distribute requests).

In summary, let's focus on the max latency, home in on which backend exhibited
said latency, identify the depth of the queue at the time that latency was
experienced, and use that information to model the impact to users. From this,
I expect you can draw some meaningful percentiles in terms of latency
distributions, and without having to measure more data points than feasible
without decreasing latency further.

Am I misunderstanding something? I'm no math whiz, this is mostly intuition.

