Hacker News new | past | comments | ask | show | jobs | submit login
Beigels, software, and why you should control queues instead of cycle times (lucasfcosta.com)
63 points by lucasfcosta on July 7, 2022 | hide | past | favorite | 19 comments



This queue analogy doesn't quite reflect the behavior of queues of development tasks versus queues of customers waiting to get a Bagel.

The Bagel queue is strictly FIFO - while work queues generally operate more like a priority queue. This means that a long queue of development tasks may not actually be all that damaging to the business as many of the the items may be low-priority items and the high-impact items may still have low cycle times.

Also the "specification rot" issue is not always a problem. The priority of a task may not be well known when it arrives in the queue - over time its priority becomes clearer (i.e. it's realised that the feature/task isn't actually very important) so unnecessary work can be saved.

Under-utilization is rarely a problem either. It's not like software engineers will just sit around doing nothing even if the official business-facing work queue is empty. There are always some implicit low-priority tasks which engineers will tackle even if the official business-facing work queue appears empty: refactoring, research, build-system improvements, source-code hygiene, etc.


Author here. That's a very good comment. Thank you.

With regards to task queues not being FIFO, they aren't indeed. However, even when you don't have a FIFO queue, a long queue with shifting priorities will cause cycle-times to become even more unpredictable as you'll be reshuffling the queue with time.

You should definitely reshuffle and ruthlessly prioritise, but you should also "tail drop" old items and try to create new ones "just in time".


Which reminds me of the old engineering trick of saying it will take longer than it does as managing expectations is an art best explained by Scotty - https://www.youtube.com/watch?v=8xRqXYsksFg


Thanks for the thought-provoking article. I think I agree with the conclusions even if the analogy isn't perfect. The connotations of under-utilization or over-provision are often negative - particularly in management - but I think as you point out are necessary in order to provide any sort of quality of service.

Qualitatively/subjectively, when I think about the times I've worked as an engineer in an "always busy" environment, I believe that I was often less productive - it's like you need space to breathe. It was certainly less rewarding as it was never possible to be as responsive as you'd like to requests.


This reminded me of an old classic, “Starbucks does not use two-phase commit”: https://www.enterpriseintegrationpatterns.com/ramblings/18_s...

Previous discussion of it on HN (2010): https://news.ycombinator.com/item?id=1554126


In the long-run, queueing systems tend to follow Little's law given by

  L = λW
where L is the long-term average number of customers waiting (in a stationary system), λ is the long-term average arrival rate of customers and W is the average time to serve a customer.

In software development, you can reduce W through best practices, standards, better specs, improved testing, increasing team size etc., You can control λ by triaging tickets.

This works as long as the customers (in the s/w case tickets or PRs) are all independent.

A lot of the complexity in s/w development results from interdependencies between tickets. This causes the service times to vary and pushes the long-term W higher and not to mention affecting the stationary assumption of the system.


Doesn't lambda get controlled via sales and marketing?


Yes, it could. In product teams, this could be feature requests whatever the source of that is. Ultimately, there is someone who is responsible for flagging that the arrival rate of tickets/PRs is higher than what the system can handle without compromising on service time W (and indirectly customer experience L). So, it could be a product manager too.

No different than a triage nurse (the field from where the term originated).

Deciding what practical action to take such as increasing the overall system capacity, managing the customer's expectation, doing nothing and taking a hit on service time etc., is then a business call.


Underlying assumption: μ>λ


The bagel shop probably _wants_ to keep the queue of people as long as possible in front of their shop, it shows greater demand, psychological tricks and whatnot...


Yes that is a very valid point and does move into the realms of marketing and psycology. Though there will always be a proportion of customers who that does not work upon, which is why I like ticketing systems, which enables the customer to go do something else and pop back in X minutes, as not all want to spend those ten minutes waiting in a queue when they can pop and do other tasks in that time.

Giving the customer the choice can see a balance of best of both, though smartphones have helped in that respect as they afford the abity to do that whilst being in the queue. But time managagment is an artform unto itself.


I doubt it - the place is already incredibly famous and serves an extremely good product. It makes more sense for them to churn people through as fast as possible during a busy time.


There'd be an economic trade off where the longer the queue the more likely competition can serve customers if products are somewhat fungible.


I’d be very surprised to learn that you have both been there and had the opinion expressed in this comment


I don't wait in lines to spend my money, never, for any product.

If the bagel shop was mine however, I would have no problem making people wait as long as they were willing to tolerate; it's good marketing, and means I wouldn't have to employ as many people.


Since the author is monitoring this discussion, one minor point: In the first "choice", there is: For that, they must either: 1. Hire more servers 2. Increase the rate at which employees serve beigels 3. Do both

However, in the text, it talks about option 1 being the rate at which employees serve beigels.


Tangentially, their customer service policy also hasn't been updated since the 1970s. Takes you back to a time when everybody wasn't trying hard for 5 star reviews.


Sojourn time is a much better name for "cycle time".


Like many things in life and this example chaos gets in the way.

As we all know at supermarket checkouts and the delima of which queue to join there are many factors that can make what appears to be a shorter and faster queue to join suddely bottleneck and make you regret your choice.

1) Customer slow packing - many factors for this. 2) Customer slow paying - digging out their payment method or issue with payment method. 3) Customer has a question that creates another queue factor that is waiting for a manager - return, error at checkout (reduction not processed and needs manager to use their magic key to authorize change)

Those just 3 example and the whole art of which queue to join becomes an ongoing art form. Of note, I just stick with the queue I'm in unless there are signs a new queue/checkout will open suddenly (that's another skill of recognising those signs as well as what that checkout will be and can often be case of picking no queue knowing a new checkout will open iminatly and what one it will be)

Equally I'm sure many have their own stratergy for which lane of traffic to join and garantee everyone will have instances of them changing lanes to a faster moving lane only for that lane to slow down and the lane they left suddenly become the one moving.

Software is no different as it interacts with users and us humans all add are own level of chaos to the equation. Gets down to managing the peaks and resources to handle those peaks. Though however well you plan, there will always be that chaos moment like suddenly being top of HN as a fine example of sudden exception spike.

Sure you can plan and budget for every possible spike but when those resources are idle you are paying the price and again, gets down to a fine balance. After all, for every rule of planning you put in place, there will always be an exception that bites you.

Being able to manage expectations and smoothing out those spikes and informing of delays and expectations can help a bit though can also for some, be just as bad. After all - we have all had experiences of being on hold in some call system being told we are X in a queue and see that level of movement in that queue change from we are 10 in the queue and after a cople of minutes we are now 3 in the queue, then those 3 take ages so after ten minutes your only 2 in the queue. So whilst feedback of expected waits can be helpful, they do set a level of expectation that can also become detrimental in such situations. With those, callback systems work best, though in software, and more so client interaction that can be hard. Imagine if you went to a site and instead of a 404 it would say, sorry too busy currently but we will push the content to you once we are able. That for many wouldn't work on many levels, let alone the code to handle and manage that. So you get a stage of managing and scaling best you can, but being able to gracefully manage when those limits break is just as if not more so important than planning and resourcing those queues in the first place.

So being able to manage expectations is always and if not more important. Maybe why the old engineering art of saying it will take 3x longer than it will holds up so well as if people are initially advised of a delay - whilst they may be upset, they will still understand and if that expectation of a delay of 10 minutes is over estimated and they find it was only 3 minutes then they will come away more positive. Equally advising them it will take what you expect and say 3 minutes and it turns out to be 10 minutes or you keep pushing that delay with notices (delays upon delays) then the overall experience will be very tainted more than had you over-estimated in the first place.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: