Imagine a simple load testing tool that sends a request, measures time-to-respon...

Imagine a simple load testing tool that sends a request, measures time-to-response, waits a second, and sends another request. Such a tool might report a "99 percentile latency" by simply taking the second-worst time-to-response from 100 samples.

However, this statistic is misleading. Imagine a system that responds in 1 second to 98 of the requests, in 2 seconds to the second-worst request ("99th percentile latency") and in 100 seconds to the worst request. If this system were to process a continuous stream of requests - e.g. 100 requests from visitors to a web site - half of all visitors would arrive while the system was processing the worst-case request (which takes 100s of a total 200s of run time). So the customer with the 99th-percentile worst experience will experience about 100s of latency, not 2s!

E.g. the author's https://github.com/giltene/wrk2 tries to avoid this "coordinated omission" problem by sending requests at a constant rate, instead of constant-time-after-response.

(Azul Systems competes against HotSpot JVMs with garbage collectors tuned for throughput, which suffer from occasional long pauses; Azul sells a rather impressive JVM which largely avoids such long pauses, and needs to educate its customers on how to benchmark it properly.)