With a pull architecture you wouldn't identify a request queue depth up front. R...

zokier · on April 18, 2023

> So if you can handle every incoming request with one worker that worker gets them all. Otherwise each worker pops off the stack as it becomes free

As I alluded, that works great if "can handle" and "being free" are clear-cut binary properties. But for complex applications when you are driving for high utilization while keeping latency down those questions become complicated; a worker might have some free capacity to handle requests but it doesn't mean that it would produce response as quickly as some other (more idle) worker.

In other words, the problem is not just assigning requests to workers that can handle them but assigning requests to workers that can handle them with lowest latency.

preseinger · on April 19, 2023

tcp is a layer well below what we're discussing here

"handling a request" is something that is defined at the application layer