That being said, I'm getting tired of managing these clunky memory-hungry JVM-based systems that rely on external dependencies like Apache Zookeeper and BookKeeper. They may be necessary if you're Yahoo, but I would argue that for the vast majority of companies, these complex systems create so many administration headaches that the net productivity impact is negative. I've spend days or weeks debugging Kafka issues. I almost feel like "big clunky heavy JVM enterprise software" has become synonymous with Apache projects.
If you don't absolutely need the messaging guarantees I would recommend looking into NATS (https://nats.io/) - it's a brokered systems that's significantly more lightweight and super easy to deploy and play around with. You can get persistence and delivery guarantees with NATS streaming, but that part is optional and a bit more early stage.
Well, the Pulsar broker is (kinda) stateless, because they are essentially a caching layer in front of BookKeeper. But where's your data actually stored then? In BookKeeper bookies, which are stateful. Killing and replacing/restarting a Bookkeeper node requires the same redistribution of data as required in Kafka’s case. (Additionally, BookKeeper needs a separate data recovery daemon to be run and operated, https://bookkeeper.apache.org/archives/docs/r4.4.0/bookieRec...)
So the comparison of 'Pulsar broker' vs. 'Kafka broker' is very misleading because, despite identical names, the respective brokers provide very different functionality. It's an apples-to-oranges comparison, like if you'd compare memcached (Pulsar broker) vs. Postgres (Kafka broker).
Bookies can seamlessly be added.
Kafka unless there has been a KIP since I stopped paying attention to Kafka doesn't do this.
I remember adding brokers to Kafka and taking advantage of them on existing topics meant repartitioning which if I recall correctly breaks the golden ordering contract that most devs bank on. The data written to the partition will always be in the order for that partition itself.
Kafka brokers can seamlessly be added, too.
> I remember adding brokers to Kafka and taking advantage of them on existing topics meant repartitioning which if I recall correctly breaks the golden ordering contract that most devs bank on.
Adding brokers to Kafka does not require repartitioning. It requires data rebalancing ('migrate' some data to the new brokers), which does not break any ordering contract. I suppose the words sound sufficiently similar that they are easy to be mixed up. :)
(For what it's worth, BookKeeper requires the same data rebalancing process.)
> The data written to the partition will always be in the order for that partition itself.
Yes, for Kafka, all log segments that make up a topic-partition are always stored -- or, when rebalancing, moved -- in unison on the same broker. Or, brokers (plural) when we factor in replication. Kafka's approach has downsides but also upsides: data is always stored in a contiguous manner, and can thus also be read by consumers in a contiguous manner, which is very fast.
In comparison, BookKeeper has segmented storage, too. But here the segments -- called ledgers -- of the same topic-partition are spread across different BK bookies. Also, because of how BookKeeper's protocol works (https://bookkeeper.apache.org/docs/4.10.0/development/protoc...), what bookies store are actually not contiguous 'ledgers', but in fact 'fragments of ledgers' (see link). As mentioned elsewhere in this discussion, one downside of this approach is that BK suffers proverbially from data fragmentation. (Remember Windows 95 disk fragmentation? Quite similar.)
No approach is universally better than the other one. As often, design decisions were made to achieve different trade-offs.
Let's say we have a cluster with a sort of "global" topic called "incoming-events" and it has 10 partitions.
We'll most likely eventually end up with a "hot" write partition because rarely do I see a perfect even distribution in event streams.
I'd like to seamlessly add capacity to remove this hot spot.
With Pulsar you're using BK which means just spinning up more bookies which will take on new segments and then the rebalancement will move some segments off.
With Kafka I don't know what the option is aside from spinning up a larger and larger broker & rebalancing to the larger box. What I typically see in companies are they repartition from 10 to 20 to avoid expensive one off boxes.
Nobody likes non-uniform resources because it is a nightmare to manage. Imagine a k8s deployment where you have replica:10 but have to handle a custom edge case for resource allocation on one pod different than the other 9 brokers.
(Just assumed 10 partitions to 10 pods)
If one Kafka broker or BookKeeper bookie node cannot keep up with the write load (e.g. network or disk too slow, CPU util too high), you must add more partitions. For Kafka, for the reasons you already mentioned. For BookKeeper (and Pulsar), because only a single ledger of a topic-partition is open for writes at any given time.
You can add as man as you want, but they take no traffic.
Think: repartitioning changes the logical layout of the data, which can impact app semantics depending on your application; whereas data (re)balancing just shuffles around stored bytes "as is" behind the scenes, without changing the data itself. The confusion stems probably from the two words sounding very similar.
For Kafka, you use tools like Confluent's Auto Data Balancer (https://docs.confluent.io/current/kafka/rebalancer/index.htm...) or LinkedIn's Cruise Control (https://github.com/linkedin/cruise-control) that automatically rebalance the data in your Kafka cluster in the background. Pulsar has its own toolset to achieve the same.
> The data of a given topic is spread across multiple Bookies. The topic has been split into Ledgers and the Ledgers into Fragments and with striping, into calculatable subsets of fragment ensembles. When you need to grow your cluster, just add more Bookies and they’ll start getting written to when new fragments are created. No more Kafka-style rebalancing required. However, reads and writes now have to jump around a bit between Bookies.
When reads need to go to BookKeeper there are caches there too, with read-aheads to populate the cache to avoid going back to disk regularly.
Even when having to go to disk, there are further optimizations in how data is laid out on disk to ensure as much sequential reading as possible.
Also note that the fragments aren't necessarily that small either.
Full disclaimer: I worked on building a client library for it while it was in its early stages, and from my limited time with it, it was super easy to operate, very light on resource usage and stupid-fast.
One of my pet peeves with Kafka were CLI tools written in JVM languages. Slow JVM startup time was killing my focus.
I would argue that building up on existing systems (especially things like a distributed consensus system like Zookeeper) is generally good practice. Being able to just point a bunch of self-contained JARs/WARs at a Zookeeper/etcd cluster and not worry about startup sequencing or cluster bootstrapping makes me happy.
KIP-500 is the proposal/PR (?) for ZK-free Kafka which I understand has been merged in and work is now progressing on follow up tasks.
These startup benchmarks are typically not run with the JVM and application jars being pulled from NFS or AFS on a machine with a lot of dirty pages in the page cache. Some small sh/bash app using curl or wget to hit a REST API is going to have a lot less network and disk I/O in that case, even if bash and curl are being pulled from NFS/AFS. Many companies run JVMs from NFS/AFS to simplify deployment and management.
Also, if the OP is logged into a server running the JVM-based CLI app, they may be running it on a server JVM and getting much more eager JITting.
Maybe it's not fair to the JVM that there are lots of circumstances where things are accidentally tuned poorly for the JVM. On the other hand, there is a lot to be said for tiny apps that are pretty resilient to poor conditions. Sometimes size still matters, even on big servers with plenty of RAM and cores.
I’d argue that being able to herd your JVM procs like cattle makes them good candidates for k8s because you can always just set resource limits so they get purged when the heap becomes too large.
I run a prod Pulsar cluster using helm charts. All containers & kubernetes, zero issue.
Nats itself is never stable product , which is been completely rewritten several times. If you want same level or functionality is not on pair or you have to use another rewrite of it in deep alpha.
Just try to do a basis negative acknowledgment on NATS :) it’s not there
What I like about Nats it’s their dev focused approach. Clients are shining.
Pulsar instead doesn’t have a decent support of nodejs
It is amazing how quickly it went from "what is nats" to "that feature that required a message broker is complete".
Good job nats team.
Will check out NATS.
When I see something like NATS - A few thousand lines of code and a 40MB image, that makes me happy. It's a lightweight thing that "just works."
That's why I stay away from anything JVM. And I acknowledge that I am totally biased and may be wrong in some cases, but that has been my experience.
But it is not a “distributed log”. Pulsar (as Kafka) is built on a distributed ledger/log. People confuse ‘semantics’ of messaging with “message broker” so equate various products supporting ‘messaging semantics’.
To your point, and to doubters, simply try building an event sourcing system (complete with replays to recover) on RabbitMQ and see how that works out!
(Another clarification - Technically Pulsar client is a thick client. It allows both client and broker to co-manage the flow control. It is more than a push based queue. It does data streaming.)
Most of the time, it just works.
Here is a performance test program I wrote to see what I could squeeze out of it:
Running with official helm charts in AWS managed kubernetes and using EBS.
Currently passing about 20k/s in and 20k/s out.
Got any specific questions just let me know.
Just like RabbitMQ, Apache Kafka and many other distributed systems, writes go through an elected leader, who is able to ensure ordering guarantees.
Specifically with Apache Pulsar, each topic has an owner broker (leader) who accepts writes and serves readers.
It should be noted that Apache Pulsar supports shared subscriptions which allow two or more consumers to compete for the same messages, like having two consumers on a RabbitMQ queue. Here FIFO order cannot be guaranteed for all kinds of reasons.
Basically, they have the same ordering guarantee as Kafka: FIFO is guaranteed for messages from the same producer, within the same partition. If you need producer-level FIFO, then you can only use 1 partition.
There's no ordering guarantee for messages coming from different producer.
In a distributed system, an ordering guarantee by producer (and optionally partition "key" which in the end is like an extension of the topic key) is pretty much all you need and all you can get.
If you have two producers then how would one decide which one sent the message first? Go by some timestamp? Clocks are unreliable for these purposes so it comes down to consesus. And letting the queue decide which one came first is equivalent. Once a message got acknowledged by the queue, ordering cannot change anymore.
It has a lot of promise, but adopting it right now is going to require you spending a lot of time reading source code when you, like I did, find the docs are out of date, or have giant holes.
Things I liked:
Tiered storage - the ability to offload old data to S3 / Azure / HDFS, and then reload it upon transparently by a consumer, is awesome. (albeit with some delay, but their reasoning is that historical reads don't need millisecond latency)
Inter-cluster replication is a lot easier than running up a Kafka Connect cluster to run Mirror Maker 2, but there's a slight caveat that it doesn't work for chained clusters (A -> B -> C). A message from A won't be replicated to C by B. But Pulsar assumes A is going to replicate to both of them if they need it.
Schema registry baked in is nice. Pulsar's replication automatically replicating the schema to other clusters is verrrrry nice for consumers on a different cluster. It's doable with Confluent's Schema Registry, but it's another thing to manage.
Load balancing brokers - haven't seen it any action, but they attempt to redistribute partition ownership based on load.
Only downsides were the docs were sparse or out-of-date in places, especially the BookKeeper ones. There were some configuration fun and odd errors with the BK command line tools (and they could never tell me which bookie was the auditor) and there's two scripts that do very similar things, but not quite. That said, having an auditor process that checks for under-replicated segments automatically is nice, once it's working.
There's several tools for Kafka that do similar, but nice to have it out of the box.
Can't really comment on its capacities to act as a message queue, but it has that too.
But yeah, I reckon I'll look again in a year.
On an interesting note, Pulsar is very similar design to the PubSub system - near-stateless easily scalable brokers with (another layer) then BK as the storage layer. They moved to Kafka, but then, they have Twitter sized problems so YMMV. https://blog.twitter.com/engineering/en_us/topics/insights/2...
> Update - We have identified the source of elevated errors and are working on recovery.
> Jul 13, 05:53 UTC