Queues invert control flow but require flow control

taeric · 2024-08-13T00:04:07.000000Z

I always wish more metaphors were built using conveyor belts in these discussions. It helps me mentally underscore that you have to pay attention to what queue/belt you load and why you need to give that a lot of thought.

Granted, I'm probably mostly afraid of diving into factorio again. :D

golergka · 2024-08-13T06:30:19.000000Z

Playing Factorio has indeed given me a much better understanding of how congestion works in complex systems.

westurner · 2024-08-13T00:43:49.000000Z

The Spintronics mechanical circuits game is sort of like conveyor belts.

Electrons are not individually identified like things on a conveyor belt.

Electrons in conductors, semiconductors, and superconductors do behave like fluids.

Turing tapes; https://hackaday.com/2016/08/18/the-turing-tapes/

Theory of computation > Models of computation: https://en.wikipedia.org/wiki/Theory_of_computation

jimbokun · 2024-08-13T03:46:24.000000Z

Like Lucy in the chocolate factory.

Cyphase · 2024-08-13T11:29:37.000000Z

What an iconic episode.

stickfigure · 2024-08-13T01:51:48.000000Z

This article is weird because it conflates queues with pub/sub systems. Even to the point of comparing Amazon SQS with Google Pub/Sub rather than the closer analogue, Cloud Tasks.

I've pushed queues from Amazon and Google to thousands of qps and never found their limits. Flow control is unnecessary; the queue either grows or shrinks, and you adjust processing to compensate (ideally by autoscaling). If the queue depth grows continuously and you can't keep up, you work on the architecture (or budget) problems.

If you're storing so much data in queues that you're hitting the storage limits of something like SQS, you are using the wrong tool for the job. It's not a database.

lunatuna · 2024-08-13T05:32:02.000000Z

I’ve worked with on prem and now AWS. Even being on AWS I think the first part of the article is great. Nice visual and description. When on prem, scaling was an issue during event storms. But by then we at least had enough to know what the problem was.

Scaling the number of messages is no longer an issue for us using SQS but size is. Basically there are still constraints. We then need to pass references. “Details created etc find it here”. On prem we could dump around 25 MB into message without issue and could go to 50 MB but it wasn’t safe.

I get your point on the queue and pub/sub not being the same. Just to note we do a lot of hybrid and expect others do to. Publish to a topic and bridge to queues. Fan out is easy and gives choice. I don’t know google’s version, with TIBCO EMS this was easy to manage and was clear to everyone. If you wanted to listen to everything on prod.alerts.* you could or if you want to process prod.alerts.devices.headend you could just queue it and process all.

We use queues like storage for long outages so that senders don’t have to change anything. Not a great use but people were sure happy to know we could help by holding all the events while they could deal with their mess. Never got close to any limit when doing this on prem.

Never used it but isn’t the idea of Kafka to hold all events like a database? Love the idea. Seems so lazy and useful at the same time. Now that I write this I can see the danger too. You become a transaction system of record. Ugh. That’s someone else’s problem ;).

okr · 2024-08-14T01:05:00.000000Z

I do not know, if SNS is actually based on Kafka, but Kafka has LogRetention, which is imho enabled by default. So your events disappear after some time. So no, holding events like a database, i would not agree. Database'ish :) Especially with all the layers on top, like SQL.

Now I wonder if Athena can actually run on SNS topics, hmmm.

Joel_Mckay · 2024-08-13T02:23:42.000000Z

It also assumes the producer and consumer are monolith-bound data streams. Almost all AMQP back-ends include traffic message-pattern smart-message-routing to pre-sort the data as kind of an implicit sharding operation. This pattern is also particularly performant using static route definitions.

Unless one is using Elixir/Erlang channels, than eventually one must glue on the equivalent queue structure to handle concurrency spikes. Note, this is less to do bandwidth peaks, but rather the OS ISO/OSI limitations will become the bottleneck for concurrency.

If your data traffic is truly separable, than it is trivial load-balancing across multiple pattern-matching routing consumers. Thus, you get pre-sharded localized data, and simply avoid the stores state-syncing cost on back-channel links.

If one has over 40k users, than they should already have re-discovered this phenomena. Simply repeat the mantra "concurrent consumers should never share state". =3

pyinstallwoes · 2024-08-13T09:50:21.000000Z

How does Elixir/Erlang mitigate the issue?

My mental model is that all queueing systems with pub/sub features are OTP-lite or OTP-wannabes.

Are you saying that Erlang just does it more efficiently to the extent that scale isn't an issue for the concern/solution?

Joel_Mckay · 2024-08-13T13:31:41.000000Z

For modern AMQP back ends, people usually implement a quorum queue based on this paper:

https://raft.github.io/raft.pdf

An overview of why this replaced many traditional non-durable queue use-cases in RabbitMQ:

https://www.youtube.com/watch?v=wuZC7m6dCDA

https://www.rabbitmq.com/docs/quorum-queues

What are Elixir/Erlang channels:

https://hexdocs.pm/phoenix/channels.html

What types of problems channels are used to solve:

https://felt.com/blog/pheonix-channel-routing-patterns

Have a great day, =3

two_handfuls · 2024-08-13T03:54:33.000000Z

You make a good point: if we’re always able to scale out the consumer, then we never need flow control to slow down the producer.

akira2501 · 2024-08-13T08:05:26.000000Z

> Flow control is unnecessary > compensate (ideally by autoscaling)

Isn't that flow control of a different form? It's a new form enabled by the cloud moreso than any previous technology, but it really does seem like a version of "egress based flow control" to me; except, we're widening the output rather than narrowing it.

Queue failures are just a destructive form of back pressure.

yxhuvud · 2024-08-13T10:28:08.000000Z

> Queue failures are just a destructive form of back pressure.

Depends on how it fails. Could just as well fail by throwing away entries.

stickfigure · 2024-08-13T22:16:41.000000Z

> Isn't that flow control of a different form?

Eh, maybe if you generalize the term flow control beyond useful meaning. The author has specific ideas (Backpressure, TTL, and Tail Drop) all of which might be narrowly tailored to his/her specific scenario but aren't things people generally need, as per the title of TFA.

aljarry · 2024-08-13T15:18:38.000000Z

The article states that "Other event sources like SNS use so-called Event Source Mapping", but (at least to my knowledge) SNS directly triggers the Lambda. SQS and few other "stream and queue-based services" use Event Source Mapping.

SNS = topics, push. SQS = queues, poll.

andrewstuart · 2024-08-13T00:19:15.000000Z

Backpressure matters.