But wouldn't this solution mean that once there is a small glitch in the system,...

cortesoft · on April 13, 2018

Yes, I am always suspicious of techniques that add additional load under failure states. Good way to cascade your failure.

zzzcpan · on April 13, 2018

You are still bounded by just 2x the amount of requests, so no, this cannot take down the system, only slow it down a bit at worst. But not really, since you always need to have enough capacity for more than 2x load.

However, in my experience latencies are not static and depend on how far away the request is sent, the type of the resource requested, the size of the resource, current network load in that direction and other factors. Which gets tricky and complicated. At some point you need to store latest latency history for each request per each size group per each resource type per each node and dynamically calculate 90th percentile latency. But then things like size may not be predictable, so you may need to cap response sizes to a sufficiently small value. And so on.

If your responses are small, it's easier to just always send two requests in parallel to different servers and choose the fastest one.

kolpa · on April 13, 2018

That's why you need load shedding, regardless of whether you do speculative tasks.

theparanoid · on April 13, 2018

You're correct, they should be more paranoid.