Not really related to your bigger point, which I have no opinion on, but Kafka & PubSub have different delivery contracts, Kafka's are generally more strict. Therefore comparing the scalability of the 2 is somewhat problematic.
Can you elaborate on that? PubSub is a fully-managed service, which means that Google SREs are on call making sure things are up. In addition, Pubsub has "guaranteed at-least-once message delivery". In a sense, Google's SREs guarantee delivery.
PubSub is also a GLOBAL service. Not only are you protected from zone downtime, you are protected from regional downtime. Is there an equivalent to this level of service anywhere in the world?
I'm not too familiar with Kafka's fully managed service, but Kafka-on-VM is a whole other ball game. YOU manage the service. YOU guarantee delivery, not Kafka.
Kafka promises strictly ordered delivery, PubSub promises mostly ordered. The differences between those promises are what drive PubSubs ability to scale throughput and global availability.
From an availability standpoint, I don't disagree with anything you mention, but the difference between the consistency models means that PubSub is solving a different set of problems than Kafka, thus my opinion that comparing them is problematic.
That's a fair point. But remember, Kafka promises this as long as the underlying VM infrastructure is alive and well. PubSub completely removes this worry, or even the concept of VMs.
There are several ways to look at it, but I'd opine that a "mostly ordered" fully-managed truly-global service that's easy to unscramble on the receiving end is more "guaranteed" than something that is single-zone and relies on the health of underlying VMs that YOU have to manage.
edit: Kafka and PubSub have a lot of overlap, but they each have qualities the other one doesn't. I suppose you gotta choose which qualities are more important for you.
If you can design your protocol such that it can work in a mostly ordered fashion, I'd highly recommend that you do. It opens up your choices for technology stack tremendously. But, if you require ordered delivery, your choices start shrinking dramatically.
Also, just so we are on the same page. Kafka is a software product that can be run on hardware or VMs, not a managed service. Possibly, you are thinking of the Amazon Kinesis product which does offer a managed service with strict ordering.
No confusion on second point. My argument was that Kafka adds significant complexity and delivery risk because it's software that you must run on hardware/VMs, rather than a fully-managed service. You have to pay a whole lot of eng time to make Kafka truly "guaranteed delivery" because there's always risk of underlying hardware/VM/LB dying.
Pubsub guarantees delivery regardless of what happens with underlying infrastructure. In a sense, the bar has been raised dramatically.
> PubSub is also a GLOBAL service. Not only are you protected from zone downtime, you are protected from regional downtime. Is there an equivalent to this level of service anywhere in the world?
Could you point to some of the documentation that describes more about its reliability model and SLA? I glanced through the documentation and couldn't find out any information about this.
It seems like a service that has this kind of global availability would have to make a trade off in latency for writes and potentially reads. If it's a multi-region service, then all writes need to block until they're acknowledge by at least a second region, right? It seems like that will add latency to every request and may not necessarily be a good thing. Similarly, at read time, latency could fluctuate depending on which region you query, and whether your usual region has the data yet. I'm just speculating though, not having read any more about the service. It does sound nice to have the choice to fall back to another region and take the latency hit, instead of an outage. On the other hand, regions are already highly available at existing cloud providers (with zones being a more common failure point).
Is PubSub mature? The FAQ suggests that you should authenticate that Google made the requests to your HTTPS endpoint by adding a secret parameter, rather than relying on any form of HTTP-level authentication.
> If you additionally would like to verify that the messages originated from Google Cloud Pub/Sub, you could configure your endpoint to only accept messages that are accompanied by a secret token argument, for example, https://myapp.mydomain.com/myhandler?token=application-secre....
This feels rather haphazard. If I'm exposing an HTTPS endpoint in my application that will trigger actual behavior upon the receipt of an HTTP request, then of course I "would like to verify that the messages originated from Google Cloud Pub/Sub", so that they're not coming from some random bot or deliberate attacker who happened to learn my URL.
I didn't see anything in the docs that touches on those subjects in detail (I did skim the docs looking for sections and pages that might contain answers to my questions before I posted), but please point me to the page that does if you know of one and I'd be interested to read it! I trust that your perceptions and information are accurate, but cite-able and reference-able information is also valuable.
"For the most part Pub/Sub delivers each message once, and in the order in which it was published. However, once-only and in-order delivery are not guaranteed: it may happen that a message is delivered more than once, and out of order."