The justification for using LIFO (vs FIFO) queues for requests is interesting: at no/low load it makes no difference, while at high load the requests least likely to time out get serviced first.
Is that a common architectural decision in reverse proxies or queuing systems in general?
In queuing theory, it's well known that choosing the request with lowest service time first will result in lowest waiting time on average across the requests.
This is usually not done because it can starve the larger jobs.
Interesting. I wonder if that choice would lead to a noticeable impact on perceived latency (from the perspective of a human end-user) under some high-load/pathological scenario.
It's definitely one of my favorite techniques that I keep in my "toolbox". Really simple, but not (at least to me) intuitive.
[1] https://brooker.co.za/blog/2012/01/17/two-random.html