
GCP and Confluent partner to deliver a managed Apache Kafka service - jto1218
https://cloud.google.com/blog/big-data/2018/05/google-cloud-platform-and-confluent-partner-to-deliver-a-managed-apache-kafka-service
======
georgewfraser
Why would someone want to use Kafka rather than PubSub? For a nonexpert, it
seems like PubSub is just a tighter abstraction---same basic operations, fewer
knobs to turn. Are there important features in Kafka, that aren't in PubSub
and can't be easily built on top of PubSub?

~~~
SmirkingRevenge
I upvoted your question - not sure why its been downvoted, as it seems like a
good-faith question, and not to mention - a very reasonable question.

Subtle differences in the semantics of pub/sub and message passing services
can have really significant consequences for their use-cases.

Google Pubsub is at-least-once delivery, and best-effort ordering. That means
any consumer pipelines need to be able tolerate duplicates and out-of-order
messages - in many cases, that's not trivial to handle. If the order of your
messages isn't that important, and if you can make the operations of consumers
idempotent - pub/sub is awesome. But more often than not, it becomes just
another message passing service to add to your plethora of message passing
services, because its limitations can't extend to all of your use-cases. (I
really want one ring to rule them all - Kafka gets the closest, IMHO)

Kafka is an basically pub/sub on top of an ordered, append-only log, and
consumers read the log stream very much like a single process reads/seeks on a
file handle - using offsets. Given infinite storage - your entire data stream
can be replayed, and any data can be recreated from scratch - thats a pretty
awesome thing.

~~~
manigandham
>> I really want one ring to rule them all

Try Apache Pulsar:
[https://pulsar.incubator.apache.org/](https://pulsar.incubator.apache.org/)

------
lbradstreet
I love Kafka and the log orientated streaming model, but I often have to think
twice before recommending it to clients who would have to manage the ops
themselves. Having a managed service on GCP, and Confluent's existing cloud
offering on AWS really brings down the barrier to entry. There aren't really
any AWS/GCP serverless equivalents (Kinesis has 7 day retention maximum, no
key compaction, and less surrounding tooling such as KStreams/KSQL).

~~~
regnerba
May I ask why you wouldn't recommend it to teams that have to manage it
themselves? I haven't used it myself but my team is currently looking at using
it internally. The first project will be integrating it into our log pipeline
between nodes and our logstash instances.

~~~
beepbeepbeep1
There is absolutely no reason other than the overhead you need to self manage
the service like you would self manage any other internal service.

If you are comfortable at operations you'll be fine. Some people are not good
at ops so outsourcing the problem making the ops side someone else's issue can
also be useful.

Self hosting will offer far more options when it comes to scaling and
tweaking. Overall on bare hardware costs it's cheaper and faster although up
front costs will be higher.

Kafka usecases are rarely elastic so don't gain that advantage in the cloud.
Also Kafka's missing tierd storage makes it expensive if storing big volumes
of data.

