Hacker News new | past | comments | ask | show | jobs | submit login

With a pull architecture you wouldn't identify a request queue depth up front. Rather, each incoming connection gets delayed until a worker attaches to EG the TCP stream. If you're serving a webpage the client wouldn't see anything other than a little load lag, especially if your load balancer took care of the TLS connection before pausing to wait for a worker.

So if you can handle every incoming request with one worker that worker gets them all. Otherwise each worker pops off the stack as it becomes free. And if you can service more concurrent requests you just add more workers.




> So if you can handle every incoming request with one worker that worker gets them all. Otherwise each worker pops off the stack as it becomes free

As I alluded, that works great if "can handle" and "being free" are clear-cut binary properties. But for complex applications when you are driving for high utilization while keeping latency down those questions become complicated; a worker might have some free capacity to handle requests but it doesn't mean that it would produce response as quickly as some other (more idle) worker.

In other words, the problem is not just assigning requests to workers that can handle them but assigning requests to workers that can handle them with lowest latency.


tcp is a layer well below what we're discussing here

"handling a request" is something that is defined at the application layer




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: