One downside of SQS is that it doesn't support fan-out, for eg. S3->SQS->multipl...

blr246 · on May 27, 2019

Kinesis is not necessarily well-suited fan-out. It is very well suited for fan-in (single consumer, multiple producers).

Each shard allows at most 5 GetRecords operations per second. If you want to fan out to many consumers, you will reach those limits quickly and have to implement a significant latency/throughput tradeoff to make it work.

For API limits, see: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_...

staticassertion · on May 27, 2019

I do S3 -> SNS -> SQS. I don't see why I would use Kinesis instead. The SNS bit is totally invisible to the consumers (you can even tell SNS not to wrap the inner message with the SNS boilerplate), downstream consumers just know they have to listen to a queue.

I don't see a downside to this approach. Perhaps some increased latency?

hexene · on May 27, 2019

If you wanted multiple pull-based consumers for the stream, wouldn't you need a separate SQS queue per consumer, with each queue hooked up to SNS? Perhaps I'm mistaken, but that seems brittle to me. With Kinesis/Kafka, you only need to register a new appName/consumer group on the single queue for fan-out. Plus, both are FIFO by default, at least within a partition.

staticassertion · on May 27, 2019

That's exactly how you do it. To me, it's the opposite of brittle - every consumer owns a queue, and is isolated from all other consumers. Clients are totally unaware of other systems, and there's no shared resource under contention.

stellar678 · on May 27, 2019

I feel like the create/delete queue semantics hint that a queue should be a long-lived thing that consumers are configured to connect to. When I saw suggestions to have one queue per consumer and have that consumer create/delete the queue during its execution lifecycle, the idea of one-queue-per-consumer started making more sense to me.

staticassertion · on May 27, 2019

I think the word "Consumer" here is "Consumer Group".

For example, an AWS Lambda triggered from SQS will lead to thousands of executions, each lambda pulling a new message from SQS.

But another consumer group, maybe a group of load balanced EC2 instances, will have a separate queue.

In general, I don't know of cases where you want a single message duplicated across a variable number of consumer groups - services are not ephemeral things, even if their underlying processes are. You don't build a service, deploy it, and then tear it down the next day and throw away the code.

skybrian · on May 27, 2019

Hmm. It seems a bit awkward if you have a variable number of consumers?

staticassertion · on May 27, 2019

I haven't run into that myself, when would you want a variable number of consumers? Usually the way I have it is that a service, which is itself a cluster of processes, owns one queue. For example, an AWS Lambda triggered by that queue.

Then any new lambdas or other services that want to subscribe to messages will have another queue, and another, etc.

I haven't had a case where I had service groups coming up and down, I'm struggling to think of a use case.

SmirkingRevenge · on May 27, 2019

Yea, I find this setup really convoluted and unnecessarily complex. Now I have to learn the particulars of two aws services to do a job which ought to be handled by one.

Google Cloud really outshines AWS here with its serverless PubSub - its trivial to fan out, its low latency, and has similar delivery semantics (I think), and IMHO better, easier api's. Its a really impressive service, IMHO.

jjeaff · on May 28, 2019

I have been working with Google pubsub and was excited about their Push service that can post messages to subscribed endpoints/webhooks.

But their only method of throttling is to scale up and down base on failures. And it has been very unpredictable for me.

Even though my webhook started failing and timing out on requests, pubsub just kept hammering my servers until it brought it completely to it's knees. Logs on Google's end showed 1,500 failed attempts per second and 0.2 successes per second. It hammered at this rate for half an hour.

Seems like their Push option really needs some work.

actuator · on May 27, 2019

Generally they do GA in the next ReInvent from when as service is announced, so probably by end of year. But I won't be sure on MSK. It is extremely limited right now, last time I checked their API had no way for even changing the number of nodes.