tanveergill's comments

tanveergill · on Jan 18, 2024

I agree that queues can cause problems especially when misconfigured. But some amount of queuing is necessary, to absorb short spikes in demand vs capacity. Also, queues can be helpful to re-order requests based on criticality which won't be possible with zero queue size - in which case we have to immediately drop a request or admit it without considering it's priority.

I think it is beneficial to re-think how we tune queues. Instead of setting a queue size, we should be tuning the max permissible latency in the queue which is what a request timeout actually is. That way, you stay within the acceptable response time SLA while keeping only the serve-able requests in the queue.

Aperture, an open-source load management platform took this approach. Each request specifies a timeout for which it is willing to stay in the queue. And weighted fair queuing scheduler then allocates the capacity (a request quota or max number of in-flight requests) based on the priority and tokens (request heaviness) of each request.

Read more about the WFQ scheduler in Aperture: https://docs.fluxninja.com/concepts/scheduler

Link to Aperture's GitHub: https://github.com/fluxninja/aperture

Would love to hear your thoughts on our approach!

tanveergill · on Jan 18, 2024

I am in the same school of thought as you, fair queuing is the most optimal solution while capacity is constrained. This is the approach we took in Aperture, an open-source load management system. Read more about our Scheduler which implements a variant of WFQ (Weighted Fair Queuing) to ensure desired capacity allocation across workloads (group of requests at the same priority level) and SWFQ (Stochastic Weighted Fair Queuing) to ensure fairness across users within each workload: https://docs.fluxninja.com/concepts/scheduler

tanveergill · on Jan 18, 2024

Aperture (https://github.com/fluxninja/aperture) takes a slightly opinionated take on queue size. Instead of defining queue sizes, you put a timeout on each request which is the amount of time the request is willing to wait in the queue. Then we run a weighted fair queuing algorithm that ensures relative allocation across workloads, e.g. 90% capacity for critical requests. But the capacity is allowed to burst when there is low demand.

tanveergill · on Jan 18, 2024

I agree that fair queueing is a good solution to this problem! A request scheduler based on weighted fair queuing is also central to Aperture, an open-source load management system my team has been building for the last 2 years. Priorities and tokens (request weights) can be provided to Aperture when scheduling requests. It runs weighted fair queuing that ensures relative allocation of capacity based on the relative values of (token/priorities). It also ensures fair allocation of capacity across users within the same priority class so that no single user can starve others.

Would love to hear your thoughts about this project: https://github.com/fluxninja/aperture

tanveergill · on Jan 18, 2024

We have been building a platform called Aperture in the open-source trying to solve this very problem. Our approach is to let the developer decide how long they want to wait in the queue using a timeout parameter. If timeout is hit, they may re-try with exponential backoff or load shed. While in the queue, requests get prioritized based on a weighted fair queuing algorithm. If there is tiering in the app, there could be a policy that can allocate majority of the capacity to paid vs free customer tiers. But this allocation is not really static, if there is free capacity available in the system then the free customer tier can take all of it. This is just like how CPU time is allocated by Linux based on nice values, even low priority processes are allowed to take up all the CPU time when demand is low. Apart from relative allocation across user tiers, Aperture's request scheduler can also ensure fairness across individual users within each tier to make sure no single user is hogging up all of the server capacity.

The demand and capacity is determined based on a request rate quota or the maximum number of in-flight requests.

Would love the community here to check us out on GitHub and provide feedback: https://github.com/fluxninja/aperture