But wouldn't this solution mean that once there is a small glitch in the system, say due to higher load than normal, there are 2x the requests and the system goes down completely?
I don't mean to be negative of the solution, I am just curious.
You are still bounded by just 2x the amount of requests, so no, this cannot take down the system, only slow it down a bit at worst. But not really, since you always need to have enough capacity for more than 2x load.
However, in my experience latencies are not static and depend on how far away the request is sent, the type of the resource requested, the size of the resource, current network load in that direction and other factors. Which gets tricky and complicated. At some point you need to store latest latency history for each request per each size group per each resource type per each node and dynamically calculate 90th percentile latency. But then things like size may not be predictable, so you may need to cap response sizes to a sufficiently small value. And so on.
If your responses are small, it's easier to just always send two requests in parallel to different servers and choose the fastest one.
I don't mean to be negative of the solution, I am just curious.