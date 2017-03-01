Hacker News new | comments | show | ask | jobs | submit login
Interactive Queries in Apache Kafka Streams (codecentric.de)
15 points by krallistic 1 hour ago | 5 comments





I've been eyeing Kafka for a long time now - tried to set it up on a smaller DO box and had memory issues. Is it worth using this for smaller operations (<100k requests per day)? The stream processing seems extremely useful for my use case. Is there something else that may be a better fit?

If they're evenly distributed, 100K requests per day are barely 1/sec. It's peak throughput for which you should be planning. At 1/sec you can use just about anything.

That aside, if you're not sure if you should be using it I'm going to guess the answer is, "No you shouldn't". It's very cool tech but most likely overkill for whatever you're doing. Also, until recently securing it was a particular pain due to lack of TLS and authentication (I think this may have been resolved now).

All right, that makes sense. It is likely that we'll see a peak in the 10s of thousands at a time, if that. Probably more as we move into the future.

I want to avoid deploying a piece of software now that we'll need to swap out in a year since it does not provide what we need anymore, though I suppose that is a natural part of scaling.

What would you recommend as an alternative to Kafka that does adequate real-time processing? The real-time-ness of it is important for our use-case. Maybe even some kind of time-series database system could work.

It really depends on what you need to do. That's around 1 request per second. I think Kafka might be overkill, but if you plan on needing to increase capacity then it might be worth it. It might just be easier to use RabbitMQ or even Redis at that scale though.

Okay, thank you. I've been looking into Rabbit quite a bit. I like the stream processing aspect of Kafka, where you can introspectively peek into things as they happen rather than a straight message queue.

Some of our stuff is on the GCP, so have also been investigating Pub/Sub and DataFlow, which seem more viable since we don't have the up-front cost of hardware and maintenance but pay for resources used. If we don't use much, we should not pay much.

