Hacker News new | past | comments | ask | show | jobs | submit login
Apache Pulsar is an open-source distributed pub-sub messaging system (apache.org)
587 points by LinuxBender on Jan 2, 2020 | hide | past | favorite | 237 comments

I just finished rolling out Pulsar to 8 AWS regions with geo-replication. Messages rates are currently at about 50k msgs/sec but still in the process of migrating many more applications. We run on top of kubernetes (EKS).

It took about 5 months for our implementation with a chunk of that work mostly about figuring out how to integrate our internal auth as well as a using hashicorp vault as a clean automated way to get auth tokens for an AWS IAM role.

Overall, we are very pleased and the rest of the engineering org is very excited about it and planning to migrate most of our SQS and Kinesis apps.

Ask me anything in thread and will try and answer questions. At some point we will do a blog post on our experience.

Alrighty, a few questions:

- what k8s definitions do you use, e.g. do you use the official Helm Chart, or have you written your .yaml's from scratch?

- have you practiced disaster recovery scenarios in the context of k8s? Can you describe them briefly?

- how do you upgrade/redeploy the Pulsar k8s components, i.e. does this cause the Bookies to trigger a cluster rebalance, or does it trigger the Autorecovery

- for the Bookies, do you use AWS EBS volumes with the EKS or just local instance storage (that is, if you use persistent topics)

- do you use the Proxy pod's EKS k8s pod IPs as exposed on the AWS network, or do you use a NodePort type of service for the Proxy components (using the EKS node IPs)

- have you been bitten by the recent EKS k8s network plugin bug (loss of pod connectivity), and/or how do you maintain your EKS cluster

- do you run your EKS nodes in a multi-AZ setting?

for the k8s definitions, we started with the helm chart, rendered the template, and then moved it into kustomize, as that is our tool of choice ATM, IDK if I would recommend that approach for everyone (we expect we might move to helm v3 at some point) but it was a good choice for us.

We have practiced some disaster recovery, but it isn't 100% exhaustive (is it ever?), however it is also aided by how Pulsar is designed. We have killed bookie nodes as well as lost all our state in zookeeper. The first is pretty easily handled by the replication factor of bookkeeper data and for zookeeper we do extra backup step and just dump the state to s3 and can restore it. What we haven't tested in practice but now how to do theoretically is to restore a k8s stateful set from EBS volume snapshots. However, we see that as a real edge case. In Pulsar, we offload our data to s3 after a few hours, so we only need to worry about potentially losing a few hours of data in BK, as the zookeeper state is very easy to just snapshot and restore from s3. In other words, we are still working on getting more and more confident with data and don't yet recommend teams use it for mission critical non-recoverable data, but there are a ton of uses cases for it now and we can continue to improve on the DR front

We have done multiple upgrades and deploy all the time. Because bookkeeper nodes are in a stateful set and we have don't do automated rollouts, we manually have a process to replace the BK nodes. However, they don't trigger a re-balance as it closes gracefully and then re-attaches the EBS volume from the stateful set

We use EBS volumes, we use a piops volumes for the journal and a larger slower volume for the ledger store. THis is one of the great parts of bookkeeper design is that the two disks pools are separate so we just need a small chunk of really fast storage and then the journaled data is copied over to the ledger volume by a background process. We figure for really high write throughput we could use instance storage for the journal volume and EBS for ledger, but that would have some complications on recovery but still easier than having to rebuild the whole ledger data.

We use the pulsar proxy and expose it via a k8s service with the AWS specific NLB annotations.

We haven't had any issues with the k8s plugin and haven't really had any issues with EKS version upgrades. We just add new nodes when we migrate the kubelets

Yes, we have automation (via terraform) to allow us to add many different pools of compute and we use labels and taints to get specific apps mapped to specific pools of compute. For Pulsar, we run all the components multi-AZ

Oh forgot one aspect about DR: for critical data, we can easily turn on geo-replication (with a single API call) and have that data now in another region purely for DR purposes (or for cross region use cases)

Do you have a link to the eks networking bug?

AWS was sending "ACTION REQUIRED" email to all EKS users about this. The Github issue is: https://github.com/aws/amazon-vpc-cni-k8s/issues/641

On behalf of everyone here, thanks a lot for answering every single question being asked. Highly appreciate it.

I have questions myself:

1. Did it reduce (TCO) costs or increase it versus using Kinesis and SQS/SNS?

1a. Interestingly, there's no global-replication with those AWS services. Why did you require global-replication with the move to Apache Pulsar?

2. Since you mention internal auth: Weren't Cognito / KMS / Secrets Manager up to the job? Given these are integrated out-of-the-box with EC2?

3. Was it ever under-consideration to roll out pub/sub on top of Aurora for Postgres with Global Replication? https://layerci.com/blog/postgres-is-the-answer/

Thanks again.

1. On a short time horizon, not as sure, back of the napkin, it took ~12 dev months (5 months with 2.5 people average on it). However, our cost per 1000 msgs/sec is much lower (like 1/4 the cost of Kinesis) so we fully expect that investment to pay off over time assuming that adoption by the rest of the org continues and we don't find a ton of issues.

1a. You are correct we didn't require geo-replication for existing use cases, however, initially, we saw geo-replication as an easy way to improve DR and we have an internal requirement for a DR zone in another region. Now that we have done the work, we are starting to see multiple places where we can simplify some things with geo-replication, so we think long term the feature will be really valuable

2. We split up auth into two main components: auth of users (where we use Okta) and auth of services. For okta, we just wrote a small webapp that users can log into via OKta and generate credentials. For apps/services, we already had hashicorp in place and wanted to just piggyback of our existing form of identity (IAM roles). Essentially, a user just associates an IAM role with a pulsar role and we generate and drop off credentials into a per-role unique shared location in vault that any IAM role can access (across multiple AWS accounts)

3. Once again, geo-replication wasn't really a hard requirement initially but more of something that we really like now that we have. I think the biggest reason why not postgres is that we have combined message rates (not everything is migrated yet) on the order of 300k msgs/sec across a few dozen services. Pulsar is designed to scale horizontally and also has really great organizational primitives as well as an ecosystem of tools. While I think you could maybe figure that out with some PG solution, having something purpose built really can pay big dividends for when you are trying to make a solution that can easily integrate into a complex ecosystems of many teams and many different apps/use cases


One more:

For replication across regions, do you peer VPCs via Transit Gateways or some such, or do it over the public Internet? I ask because a lot of folks complain about exorbitant AWS bandwidth charges for cross-AZ and cross-region communication (esp over the Internet versus over AWS' backbone): At 300k msgs/sec, the bandwidth costs might add up quickly?

Consequently, maintaining a multi-region, multi-AZ VPC peering might have been complicated without Transit Gateway, so I'm curious how the network side of things held up for you.

In this case, we use just straight VPC peering with a full mesh of all our regions. We may eventually migrate to being built on our VPN based mesh (we do that in other places)

Bandwidth is certainly a concern and that is one of the nice bits about Pulsar is not everything is replicated. You mark a namespace by adding additional clusters it should replicate to. We don't expect to replicate everything, just the things teams care about.

When we did this, Transit Gateway was just within the same region. At re:invent they announce the cross region transit gateway which we will look at moving to as well, but for now, it is just a full mesh of VPC peers, which for 8 regions isn't bad... but certainly gets worse with each new region we need to add.

For exposing the service into other VPCs in the same region we use private-link endpoints as to avoid needing to do even more peering.

Really interested why you chose Pulsar over RabbitMQ and others?

I have used (and deployed) rabbitmq in the past and really love it for pub/sub, but for our needs, we keep needing retention, particularly long retention that we process with Flink for computing views. Having one system to do both is great for us.

Sorry, I'd missed that Pulsar was a streaming log system (like kafka), as well as a pub/sub system. HN title misled me :)

Isn't using Kubernetes kind of an anti-pattern due to failover and rebalancing logic clashing? If Kubernetes is killing and re-starting nodes and the cluster's brokers are detecting dead brokers and rebalancing partitions as a result, it seems counterproductive.

This is one of the main benefits of Pulsar is that because state is split between brokers and bookkeeper and bookkeeper doesn't need re-balanced (due to it's segment based architecture where you choose new bookies with each new segment), we really don't have to worry about re-balancing (in general, not just in case of failover) of storage. It is true that topics map to a single broker, but generally, Pulsar has really good limits on memory so we don't see nodes getting killed by limits and we only really see re-scheduling for real issues.

While there certainly is some aspects you need to be aware of, generally, Pulsar is much more "cloud native" and maps quiet well to k8s primitives.

It's a good pattern because it regularly forces you to deal with pods/nodes going away so that the system is designed to handle this well without human intervention. There is no system where nodes don't go away because of hardware errors/updates/decommissions, so you might as well establish it as unexciting routine from the start.

To the degree that that conflict exists in their implementation I would think that it's possible to account for all of that.

We are developing a social app with features such as messaging, notifications etc. We decided to use Yedis [0] (Yugabyte Redis) which is a distributed Redis with persistence backed by RocksDB. Yugabyte supports multiple datacentre distribution. Yedis's pub/sub is distributed as well. We are already running a Yugabyte cluster for data storage in Cassandra. So we didn't have to do anything extra to get our distributed pub/sub up and running. Would you recommend using Pulsar instead?

0: https://docs.yugabyte.com/latest/yedis/

Redis pub/sub is simplistic and ephemeral. There's no persistence or really anything beyond sending some data to any listeners who may be active at that moment.

Redis Streams now offers a capable persistent message stream solution and if/when Yugabyte supports that then it might be a good competitor. Otherwise Kafka/Pulsar offer much for persistent streams.

There's also KeyDB which is a forked Redis that comes with multithreading and persistence to support larger-than-RAM workloads and active/active failover. https://github.com/JohnSully/KeyDB

I am not familiar enough with either Yedis or your use case to make a recommendation, but I can say that Pulsar has a great set of features, particularly if you need long term retention, that make it very attractive. I also been impressed with the community and development pace.

Being that the project is a top level Apache project and also has some adoption by quite a few different companies and a number of corporate sponsors the future of the project is pretty safe bet.

Normally don't plug my own work, but this is super related. Did you ever check out Stream? https://getstream.io/ We power chat and feeds for >500 million end users. Tech is Go, RocksDB & Raft based.

Yes, we did indeed consider Stream, but figured we could save some money by deploying and running our own system. We are very hopeful to quickly get a couple million users in a short time from our launch and that would have ran up our costs with stream quickly.

That's nice to hear. Best of luck with your project.

We've had some really large companies move to Stream from their in-house tech and save 30-70% comparing Stream's monthly fees vs their in-house hosting. (the difference gets much larger if you add the engineering & maintenance cost of their in-house systems). If your team is in the USA & funded it's pretty difficult to build in-house with a good ROI.

Not questioning your judgement but interested to know about the factors moving you away from Kinesis.

Biggest pain points with Kinesis:

- ordering is really hard, you don't get guaranted ordering unless you write one message at a time or do a lot of complexity on writes (see https://brandur.org/kinesis-order) and the shards are simply too small for many of our ordered use cases

- cost, we just don't send some data right now because it would just be too much relative to the utility of the data (we would need like 250 shards)

- retention, long term, we want to store data in Pulsar with up to unlimited retention so we can rebuild views. There is still some complexity there (like getting parallel access to segments in a topic for batch processing) but it is much further along than any other options

- client APIs for consumer. We are a polyglot shop and really the only language where consuming Kinesis isn't terrible is Java (and other jvm languages). For every other language, we use lambda and while lambda is great it is still distinct deploy and management process from the rest of the app. Being able to deploy a simple consumer just as part of the app is really nice

Is there any reason you've decided not to use Dynamo to manage your materialized views?

Do you need to be able to generate a materialized view for a specific time window?

It feels weird to me to use your pub sub system to handle your persistent storage for views, but I am definitely missing context into pulsar and your use case

Pulsar wouldn't be the thing that stores the view, it would be the canonical data store used to rebuild the views.

The idea is you have an event sourced view and can rebuilt it at any time by resetting the stream pointer to the beginning.

Makes sense, so in this case realistically it would be replacing the data lake (likely S3) which currently in an AWS event driven architecture requires a lambda function to push to S3 and is vulnerable to network issues.

Seems like a cool use case!

Thanks for the write up. We are moving away from Kafka to Kinesis, and will consider these points mentioned.

Kinesis is very expensive in the long run. There's almost always an intersection point on AWS where you need to consider moving away from AWS services/"managed services" and bring it in house.

Why did you choose Pulsar?

The main driver for Pulsar is that we have a number of different messaging use cases, some more "pub/sub" like and some that are more "log" like. Pulsar really does unify those two worlds while also being a ton more flexible than any hosted options.

For example, Kinesis is really limiting with the limited retention and making it very difficult to do any real ordering at scale due to the really tiny size of each shard.

Similarly, SQS does pub/sub well, but we keep finding that we do need to use the data more than the first initial delivery. Instead of having multiple systems where we store that data we have one.

As for why we didn't go with Kafka, the biggest single reason is that Pulsar is easier operationally with no needing to re-balance and also with the awesome feature that is tiered storage via offloading that allows us to actually do topics that have unlimited retention. Perhaps more importantly for the adoption though is pub/sub is much easier with Pulsar and the API is just much easier to reason about for developers than all the complexity of consumer groups, etc. There are a ton of other nice things like being able to have topics be so cheap such that we can have hundred of thousands and all of the built-in multi-tenancy features, geo-replication, flexible ACL system, pulsar functions and pulsar IO and many other things that really have us excited about all the capabilities

> able to have topics be so cheap

For GDPR a lot of us has to do exportable 'user activity'. Can you in theory have a topic/user ( we had like 50 million users) and publish any user activity to that topic?

Pulsar docs indicate "millions" of topics but IDK what 50 million would look like but from what I know I would be a bit nervous about it :)

It might be worth chatting with Pulsar devs on their slack community (https://apache-pulsar.herokuapp.com/).

Most commonly what I hear people doing for this is either one of two approaches (or a combination of both): - encrypt the user data and delete the key, eventually the user data will get removed - regularly compact the topic (pulsar has a compaction feature) and write in a tombstone record which will remove any user data after compaction

Thanks for answering so many questions on this! One more from me: Did you consider Google Cloud PubSub [1]? In general I'd be interested in your rationale for moving from a managed solution to something you maintain yourself, because in my experience the down the line costs of maintaining your own solution are often underestimated.

[1] https://cloud.google.com/pubsub/docs/overview

That's amazing, thank you for sharing.

I understand why you chose Pulsar over RabbitMQ, but wouldn't have Kafka been a good choice as well?

There are numerous places in the discussion where reasons for not choosing Kafka are elaborated.

Was NATS a consideration for your use cases? At work, we are currently standardizing on NATS as our messaging system, and I would like to know if there is a valid comparison.

Nats is not a replacement for pulsar or rabbitmq. It is a message passing system designed to pass lots of messages live, however if nobody is their to receive them they are lost and gone forever. There is a streaming layer but that is closer to Kafka and still does not provide the typical message model with an ack/nack API.

I have used nats in several different ways but since it can be lossy its never been considered as a replacement for a pub sub message queue on my end. We used it for a chat message layer and that worked pretty well.

As for its message passing layer that can be interesting but you end up writing all the retry and failure logic anyway so its usually just better to use an existing message layer that handles all of that for you anyway without all the funky abstractions.

Again, its interesting but nowhere near close to being a rabbit or pulsar replacement if reliability is a goal.

There’s also Liftbridge, which is a Kafka-like pub-sub API built on top of NATS. It is still in development but looks quite promising IMO.

Thank you for the observation.

Sounds like zeromq to me

Semi-correct observation. You could think of it as a ZeroMQ broker that supports the pub/sub patternz, yes. Difference is it's centralized (with support for HA), adds security, gives you monitoring, supports multi-tenancy and horizontal scaling allowing to connect multiple clusters. Benefit of using broker is that service discovery is a no-op.

Also, NATS has client libraries for many languages that adds request/response semantics on top of the pub/sub semantics.

I think the other reply captures most of it for core NATS, but we also looked at NATS Streaming a bit, but it seems to be pretty immature (though promising) and doesn't check all the boxes around integrations into the streaming ecosystem like Pulsar does (Pulsar functions, Pulsar IO).

I am interested to see where NATS goes but for where we are today Pulsar was a much more obvious choice.

Thank you. I should have imagined it had been asked already.

What is the SQS-based system you are migrating from?

I'm currently building a data processing system that is backed by S3 -> SQS based events, for persistent message passing.

We have a number of systems, some use SQS, some use Kinesis. Part of the draw of Pulsar is having one piece of tech that we can unify everything over and offer more baseline features, like infinite retention via storage offloading or Pulsar IO connectors that standardize common operations. We aren't really targeting one use case, instead, we looked for the system that offered a broad set of features that other developers in the company want and is operationally doable with just a few people.

What's your plan on disaster recovery? Do your workers track their own cursors, and if so, how does that work across regions?

In Pulsar, offsets are tracked by the service as part of the bookkeeper data (unless you use the reader API which is only really needed for advanced use cases like Flink), that means we just need to do DR for bookkeepers, which I touch on in another response but the tl:dr; is that we have a 3x replication factor as well as EBS snapshots

I have no idea what "pub-sub" is used for outside of its academic definitions, and I have no idea how Hashicorp Vault works - Don't you need a secret/password in cleartext at some point, for a given service or definition?

You don't have to answer my questions, I am just shouting into the void. I'm glad it works for y'all.

Pub/sub messaging is super common.

Most IOT devices that aren’t running HTTP stacks are using MQTT.

Event based systems -> which almost every microservice is nowadays.

Or chatapps

Or for IOT

How does python stream processing work. Are the modules running in the JVM ? Jython ?

The user code it's all running in a native CPython interpreter.

The first question I have is why? SQS seems like such a simple thing to keep hosted.

They said they are currently doing 50k messages a second and they aren't even done migrating everything over.

50k messages a second would cost you around $50k a month for AWS sqs, (math could be wrong, didn't double check).

Plus, with sqs, you get what they have. No customizations.

Even if you were off by an order of magnitude, that expenditure level is not justified to run a message broker service.

5k a month would be an amazing deal.

I sincerely doubt they are sustaining 50k msgs/second. Likely that's the MAXIMUM throughput.

No way they would actually hit that sustained throughput for the entire month.

Even the other justifications about wanting to reference messages after delivery do not to me justify migrating off SQS/Kinesis, especially not at cost of 5 months development effort.

Why is this hard to believe?

Maybe they have chatty IoT devices. Maybe a ton of sensors monitoring manufacturing plants for multiple different metrics in real-time. Maybe they just have large scale.

We do 100k msgs/sec minimum in our streaming platform. This is on the low end for adtech and other industries.

Biggest thing is that we find ourselves needing to retain this data for more than initial delivery and also for use cases where we want to use the data more like a log and need ordering guarantees. It isn't just our current SQS use cases, it is being able to have one tech that does SQS like stuff and Kinesis like stuff in one place

When we ran into a similar use case we went with writing to S3 and using S3 SNS notifications and consumers could subscribe their SQS queues to that.

Having said that, assuming Pulsar has a similar feature to Kafkas log compaction I can definitely see the appeal!

Also the fact that SQS Fifo doesn’t integrate with this setup is super annoying.

Edit: Log compaction not key compaction

We tried to adopt this but found the documentation very lacking and a severe lack of quality client libraries for our language of choice (go).the "official" one had race conditions in the code as well as "todo" for key pieces littered throughout. There is another from comcast which is abandoned. We had a serious discussion about picking up ownership of the library or writing our own but as a small start up we didnt feel we could do it and still develop the product. I'll continue to keep an eye on pulsar but for now Kafka is the clear go to imo. It's well documented, great SAS offerings (confluent) and tons of books and training courses for it.

We're close to release a new "officially supported" native Go client library: https://github.com/apache/pulsar-client-go

We provide a SaaS offering of Apache Pulsar in AWS, Azure, and GCP: https://kafkaesque.io/

Cool name. That's one of those company names that almost seems like someone thought it would make a good company name first and thought it was so fitting, they should build a company around it.


I didnt find this when looking, thanks will take a deeper look.

> found the documentation very lacking

Really? It is one of the few open source projects that we've felt has had modern documentation. How long ago was this?

> As a small startup

You'll spend more time & money on the OpEx cost with Kafka than picking up the client library for Pulsar.

It was about 6 months ago.

I completely disagree with the opex of picking up kafka vs developing a whole client library. Please could you try and explain how you came to this conclusion?

> Please could you try and explain how you came to this conclusion?

1. Stateless brokers

With Kafka any time a broker goes down you need to be aware of the kafka broker id. Yes, this can be fixed by creating your entire infrastructure as code and keeping track of state.

This is something of great OpEx. I've seen few people successfully automate this, Netflix is one of the few. The rest just use manual process with tooling to get around, pager, Kafka tooling to spawn replacement node with the looked up broker id, etc.

2. Kafka MirrorMaker

Granted I have not used v2 that recently came out in ~2.6 but dear gosh v1 was so bad that Uber wrote their own replacement from the ground up called uReplicator. The amount of time wasted on replication broken across regions is disgusting.

3. Optimization & Scaling

Kafka bundles compute & storage. There's (maybe on a upcoming KIP) no way that I know of splitting this. This means you'll waste time on Ops side deciding on tradeoffs between your broker throughput and your broker space.

Worse yet time & money will be wasted here. I'd just rather hire more people than waste time on silly things like this. This is where I justify taking on the expense of client libs.

4. Segments vs Partitions

The major time wasters are where you end up in a situation with the cluster utterly getting destroyed. It will happen, it isn't a question of if but a question of when or the company goes belly up and nobody cares.

It's 3 AM, the producer is getting back pressure, you get a page and now have to deal with adding on write capacity to avoid a hot spot. Don't forget you can't just simply do a rebalancement in Kafka or you'll break the contract with every developer who has developed under the golden rule of, "Your partition order will always be the same".

You'll successfully pay the cost of upgrading the entire cluster and then spending 3 days coming up with a solution to rebalance without making all your devs riot against you when you break that golden contract.

RIP Kafka

Having spent a couple of years dealing with Kafka I'm sorry to burst people's bubbles but is dead. Even Confluent doesn't have a good enough story these days to not switch to Pulsar, they're going to sell you on the same consulting bs, "We're more mature", "We've got better tooling.", "Better suppott"...

Yes, of course, it has been in the open source community 5 years longer and the company has been also around longer for that time. Kafka is dead, long live Pulsar.

I think what is dead is confluent cloud b/c Amazon MSK and Azure HDInsight will be close to feature parity at much less cost.

Damn, I got lazy on my reply & just hoped nobody went further, but well played on digging deeper.

5. Kafka is silly expensive

Pulsar supports message ack with subscription groups. The worst case with Pulsar is you're storing the entire retention period.

Let's say you have a 4 day retention window, to cover an outage happening on Friday and not having to deal with it until Monday. This is pretty typical with what I see in the Kafka world for small-mid size companies who don't want to pay the 1.5x OT on call.

So, with Pulsar you're at worst storing the 4 days of data but at best you're only storing the messages within the lag period of all consumer groups acknowledging the message.

Now, without getting too deep into Pulsar's feature set even that is a lie because Pulsar has tiered storage as a first class citizen. The messages after the four days could be ship off to S3 if we wanted or even within 1 day depending on our use case and this is all built into Pulsar, no OpEx tooling required. Even access the messages from S3 through Pulsar is abstracted, there's no tooling required to pull them back in if you wanted.

Now with Kakfa our worst case is simply 4 days of retention data. This can get very expensive as compute & storage are tied together, it means scaling up all the brokers (even though we don't need the throughput) for the storage increase. Now, yes MSK basically abstracts all this from you but you're paying for it.

6. AWS Managed Service are not equal citizens to EC2 standalone

Managed services right now don't fall under the new Saving Plan: https://aws.amazon.com/blogs/aws/new-savings-plans-for-aws-c...

This will cost you 30-60% discount on your entire Kafka bill.

7. Excel Life

If I look at the numbers for what I'm doing it would have costed ~$4M for Kafka vs ~$1M for Pulsar.

While bare metal Kafka does really bundle itself with lots of OpEx trouble, have you ever tried using an orchestrator to manage it ?

DC/OS implementation easily shuns out 1. and 2.

3. and 4. are valid points, but I think in a real life these scenarios are usually related to cloud service cost optimization, and I would never recommend anyone running Kafka in a cloud due to these reasons.

There one more reason, which was not cited, but poses itself a real killer for cloud Kafka dream AFM: clouds, being prone to all kinds of network interruptions, are not well suited for running Zookeeper ensembles with decent uptime.

Disclaimer: I have never tried or used Apache Pulsar, and just examining its documentation after spotting this thread.

> "using an orchestrator to manage it"

These can be just as fragile and now you have to learn how to manage the orchestrator. Even Confluent's own Kubernetes operator has issues. There's just too many issues with Kafka's design that hinders easy operations.

> "I would never recommend anyone running Kafka in a cloud"

That's a major problem considering that's where most computing is heading. At this point, running in noisey overloaded cloud environments is a good test of the reliability and durability of a software system. Kafka fails massively here.

I recently did a talk covering a lot of what I wrote: https://www.youtube.com/watch?v=jLruEmh3ve0

> You'll spend more time & money on the OpEx cost with Kafka than picking up the client library for Pulsar.

Could you elaborate why this would be the case?

Not the OP, but I think they were exaggerating a bit. In practice, operating kafka is a major PITA, because it means you have to

(1) choose a "flavor" wrapper (confluent seems to be a popular one), because the base project isn't easy to develop against

(2) write your own wrappers of those wrappers, to keep your developers from shooting themselves in the foot with wacky defaults

(3) suffer the immense pain that is authenticating topic write/reads, if that's even possible???

(4) stand up zookeeper... and probably lose some data along the way.

(5) suffer zookeeper outages due to buggy code in kafka/zk (I've experienced lost production data due to unpredictable bugs in kafka/zk, but obviously YMMV).

Based on my naive assessment, the kafka/zookeeper ecosystem is maybe 10x as complicated as the problem it's solving, and that shows up in the OpEx. I personally doubt that Pulsar is that much better, but it might be.

These are also valid. I wrote the reply explaining some of the OpEx here: https://news.ycombinator.com/item?id=21938463

What do you mean by 1 and 2? I'm guessing you're referring to the kafka-clients API? The defaults for producer and consumer conf are quite sensible these days.

I wasn’t around to make those decisions at my company, but I imagine that the “these days” component was the cause? There are a lot of configurations, new ones appear and old ones disappear or change names, etc.

In this churny environment, where you want to keep on latest versions (necessitated by bugs mentioned in), you need abstractions to protect you somewhat from the churn.

Confluent also seems to have a fair amount of churn, so you need wrappers for that, that you can update all at once for your developers.

Sorry, when I say these days, I mean >= Kafka 1.0. Things like auto commit offset in 0.8 days were something like 1 minute, as opposed to 5 seconds onwards, max fetch bytes was set significantly higher etc.

My biggest problems with it were when developers who didn't really understand Kafka started setting properties that had promising names to bad values to "ensure throughput" - let's set max.poll.records to 1 to ensure we always get a record as soon as one is available!

That might be my biggest issue with Kafka - it requires a decent amount of knowledge of Kafka to use it well as a developer. I'm not sure if Pulsar removes that cognitive burden for devs or not, but I'm interested in finding out.

And yeah, the wrappers to remove that burden were written in our company too - but then proved quite limiting for the varying use cases for a Kafka client in our system. sigh

If you're a Go shop, Gazette is worth a look (https://gazette.dev).

Sidenote question :

Are we heading toward a split between apache/java/zookeeper stacks and go/etcd on the other ? I've seen an issue related to that question on pulsar, and this got me investigating the distributed KV part of the stack.

It seems by looking at some benchmark that etcd is much more performant than zookeeper, and that to some people, operating two stacks seems like an operation maintenance cost a bit too high. Is that a valid concern ?

Also, i've seen that kafka is working on removing the dependency to zookeeper, is pulsar going to take the same road ?

This sound about right. Apart from maybe original Apache HTTP server most of the Apache projects are in Java.

Looking at codebase of Pulsar it looks like typical Apache style sprawling Java project with more than thousand directories, many thousand files and more than hundred dependencies. As comparison NATS which is in Go has few hundred files, less than hundred directories and about a dozen or so dependencies.

NATS is an amazing project, I just wanted to take the opportunity to highlight it for those first hearing about it in this comment. It's so brilliantly simple, yet changed the way I design distributed systems. I handle almost anything in regards to the standard messaging guarantees that a Kafka-like system offers at the endpoints now. As a result, systems are much simpler, and diagnosability of bugs or edge cases are much more straightforward.

NATS is amazing but note that it makes different promises than Pulsar. NATS doesn't offer true durability (in exchange for amazing performance and great simplicity) whereas Pulsar and similar are meant to survive certain partition or failure situations and not lose data.

It's not one or the other, they're just different tools.

There is nats-streaming-server as well which offers true durability (via file or SQL store) and a streaming model very similar to Kafka and Pulsar. It can also run as a raft cluster or in fault tolerance mode. It still has very good performance and is very simple to deploy and operate (I use it for event sourcing for real time IoT data at my day job).

NATS Streaming has major scalability problems even if it's simple to deploy. It's only high-availability unless you the Raft clustering but that has been bolted on to the original project and isn't really well-designed.

The team is working on an entirely new system called Jetstream to eventually replace it.

This sounds interesting, what exactly do you mean by 'endpoint' in this scenario? I looked into a few alternatives before settling for pulsar, and disregarded nats because it didn't seem to support message persistence. I didn't look into it too deeply though, maybe i should have. How do you guarantee no message is lost with NATS?

In my thinking, I think of an endpoint as something at either end of the communication channel (NATS in this case) where it is effectively terminal. Usually this is where the application logic lies. Dereck Collison (creator of NATS) brings this up in many of his talks about NATS, but I think the source of his thinking might come from “End-to-End Arguments in System Design” by Saltzer, Reed, & Clark.

The core of it is this point:

"Functions placed at low levels of a system may be redundant or of little value when compared with the cost of providing them at that low level."

That is, in order get that message redundancy or exactly once delivery, or message persistence, you pay a high cost, and you may be better off delegating to the endpoints.

This blog provides a good overview


Here is the original paper


Thanks, much appreciated!

Did you check this?


"..Message/event persistence - NATS Streaming offers configurable message persistence: in-memory, flat files or database. The storage subsystem uses a public interface that allows contributors to develop their own custom implementations."


"At-least-once-delivery - NATS Streaming offers message acknowledgements between publisher and server (for publish operations) and between subscriber and server (to confirm message delivery). Messages are persisted by the server in memory or secondary storage (or other external storage) and will be redelivered to eligible subscribing clients as needed."

Also check out Liftbridge (https://liftbridge.io), which is a Kafka-like API on top of NATS.

Disclaimer: I'm the author and former core contributor of NATS and NATS Streaming.

I looked at Liftbridge when choosing a streaming platform for event sourcing, but the FAQ says it's not production ready. Is that still accurate?

Yes, for more on that see my reply here: https://news.ycombinator.com/item?id=21946939

No, i missed that. I think ive seen 'nats streaming', but didn't realize that it is its own distinct thing. All this makes more sense now to me, thanks!

Outbox pattern and NATS streaming

Additionally, you could always have reliable delivery over NATS using the request/response pattern using acking and retries.

Is it me or does NATS looks like it's aimed at an actor-based style of distributed system ?

It not necessarily aiming at that.

Its more than one project in the repo.

It's an interesting observation.

I think that the modern approach to distributed systems is moving towards golang style microservices and lightweight / simple system design with RPC communication, reconcile type loops for state reconciliation, and backing CP databases. I think this is the influence of k8s (and maybe google's approach to distributed systems).

I will almost certainly get downvoted for this (as I always seem to when I criticize the JVM), but Apache/JVM style architecture feels REALLY long in the tooth to me. I think you are committing to an outdated and very expensive approach to building software if you use anything running on the JVM, especially Apache based anything. Cassandra is a great example of this - out of the box it's a terribly performing database that is extremely expensive to run and tune. Throw enough resources and time at it and you can get it to acceptable scalability - but running on the JVM which is a huge memory hog will always make it expensive to run (and even then, you will always get terrible latency distributions with the JVM's awful GC).

If I was building a business I would run far far away from any JVM based solution. The only thing it has going for it is momentum. If you need to hire 100s of engineers off the street for a large project, then a JVM based stack is about your only option unfortunately.

> the JVM's awful GC

This just makes it seem like you are trolling. JVM devs have done more to advance state of art in this area than any other language. The problem is that most JVM apps just produce too much garbage, not necessarily that the algo itself is awful.

Either way, there's no such thing as an optimal GC algorithm, just different trade-offs depending on your use case. Not everyone cares about latency.

> The problem is that most JVM apps just produce too much garbage,

This is strange way of saying that Java the language and most frameworks around it force apps to generate this much garbage.

No, it's a way of saying that Java != the JVM. The JVM's GC is not "awful" precisely because so many Java frameworks are awful.

I stand by that - I really think the JVM has a bad GC, and it has cost businesses billions probably. There is no trade off here - the JVM & Java have incredibly high memory requirements due to poor design - and this leads to all kinds of issues (like the bad GC latencies). No other language has this kind of problem as badly as the JVM does, even other GC languages. And we shouldn't accept this anymore - look at how much better golang's GC is for real world usage.

I understand your architecture criticism, and think it has merit, but I'm not sure why Apache gets dragged into that. Apache Airflow is in Python. Apache Arrow is in C. CouchDB is Erlang.

There's a ton of projects Apache Foundation hosts that fit your description but it's a mistake, I think, to confuse individual projects with Apache in general. Bad enough that people confuse the license with the foundation.

Sorry, you are right. It would probably be more accurate to use Spring actually. I was thinking of Cassandra/Zookeeper specifically when I said Apache.

For what its worth I have a java microservice running on 13mb of ram.

Without knowing what that service does, how many users/request it serves per second/day, I can only assume it just an http listener up on some port that returns "Hello {username}" when someone sends a GET request.

Nah, its full Java EE. Its fully possible to run low profile java.


Though criticizing the JVM is fair game, this sounds too absolutist. I like and dislike some things about C++, Java, Scala, Python, Bash, SQL, etc. I bet you do too, no?

Nowadays Java based micro-services can compile to native thanks to Graal native-image and frameworks like: https://quarkus.io/.

I can't wait for projects to ditch ZooKeeper. Apache Bookkeeper, which Apache Pulsar uses for its state, already supports Etcd as a consensus store (though I believe this is still alpha? beta? quality). Pulsar is also working on supporting Etcd.

And I believe Kafka has a KIP in progress to remove their reliance on Zookeeper altogether but having the brokers communicate with each other directly.

It seems to be the trend. There would be more and more equivalents in Go in a foreseeable future. There would be a split.

I generally don't use Java or Go in my applications, but Go components are usually more lightweight and easier to use if you're not majorly working with those runtimes. JVM has a big overhead, and Java applications usually make things worse if they use things like dependency injection.

It depends. We have a ton of Java apps running atop of kubernetes. All of them use zk, but every team operates their own mini zk cluster deployed on k8s. It’s worked fine except for certain hard to debug problems that hit a few teams occasionally.

I guess my point is that k8s let’s you shift the operational burden to dev teams if they need it. If you have a centralized operations team running a giant, common zk/ etcd, yeah this would be additional operational burden.

> is pulsar going to take the same road ?

Yes, it's in the works

Related Kafka KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A...

It doesn't look like it's going to be ready anytime soon though.

Just to confirm, Pulsar has on it's roadmap to remove it's dependency on Zookeeper. Is that correct?

That's correct, we're moving to have a pluggable metadata store and coordination service.

Note that this is a different goal from the KIP referenced above, which is to entirely remove any dependency on an external configuration service. The idea of “pluggable consensus” is explicitly rejected in this KIP.

Zookeeper sux. If you are in the Java world you can often roll something better out using Hazelcast.

I keep seeing new message queue solutions pop up over the years and it's just been my impression at least that this is one area where silicon valley really is way behind the trading industry.

Reliable pub/sub that supports message rates over 100k/sec (even up to the millions) has been available for a while now and with a great deal of efficiency (eg; the Aeron project). The incredible amount of effort to support complex partitions, extreme fault tolerance (instead of more clever recovery logic), etc. add a lot of overhead. To the point of talking about "low latency" overhead in the order of 5ms instead of microseconds or even nanoseconds as is expected in trading.

Worse, many startups try to adopt these technologies where their message rates are miniscule. To give you some context, even two beefy machines with an older message queue solution like ZeroMQ can tolerate throughput in excess of what most companies produce.

This is not to discredit the authors of Pulsar or Kafka at all... but it's just a concerning trend where easy to use horizontally scalable message queues are being deployed everywhere. Similar to how everyone was running hadoop a few years back even when the data fit in memory.

Using wild heavy-duty (or faux-heavy-duty but just as hard to manage) solutions where 2-3 colo'd servers running boring services + Cloudflare would do is so well-accepted as normal practice in Startup Land that it's not worth fighting. Just take the free résumé sugar and don't rock the boat. You won't get anywhere anyway, and on the off chance you do win all you're doing is ensuring that you, personally, are to blame for any problems that come up. Meanwhile the costs and problems of Kafka and Kubernetes and all that jazz are no-one's fault, because that's "industry standard".

[EDIT] in fact it's pretty much the norm outside startup land, too, as soon is you're involved with any kind of bigco "innovation" or greenfield-development division.

Aeron[1] is something I have been look at as well.

I like that it is adaptable to run without an external broker (using embedded media driver option), yet can add that for scalability/redundancy.

Our use cases do not require a query on top of streams. Instead all data would go into timescaleDB [2].

[1] https://github.com/real-logic/aeron/wiki/Java-Programming-Gu...

[2] https://github.com/timescale/timescaledb

Worth noting that Kafka is not a queue, but an append-only log.

ZeroMQ is not a message queue, it's a networking library.

I'll bet he is aware. The problem with the trading industry is that they have hundreds of users with bespoke solutions catering to extreme performance criteria rather than hundreds of thousands like NATS. They will keep reinventing the wheel every time a nanosecond can be saved by the latest hardware stack.

This looks promising. Is there such thing as a generalized SQL query engine that runs over any key-value store that provides certain minimal core operations?

For example, say you have a KV Store with basic mathematical Set operations like GET, SET, UNION, INTERSECT, EXCEPT, etc. The Engine would parse the SQL and then call the low-level KV Store Set operations, returning the result or updating KV pairs. This explains how Join relates to Set operations:


Another thing I'd like is if KV stores exposed a general purpose functional programming language (maybe a LISP or a minimal stack-based language like PostScript) for running the same SQL Set operations without ugly syntax. I don't know the exact name for this. But if we had that, then we could build our own distributed databases, similar to Firebase but with a SQL interface as well, from KV stores like Pulsar. I'm thinking something similar to RethinkDB but with a more distinct/open separation of layers.

The hard part would be around transactions and row locking. A slightly related question is if anyone has ever made a lock-free KV store with Set operations using something like atomic compare-and-swap (CAS) operations. There might be a way to leave requests "open" until the CAS has been fully committed. Not sure if this applies to ledger/log based databases since the transaction might already be deterministic as long as the servers have exact copies of the same query log.

Edit: I wrote this thinking of something like Redis, but maybe Pulsar is only the message component and not a store. So the layering might look like: [Pulsar][KV Store (like Redis)][minimal Set operations][SQL query engine].

Spark [1], Presto [2], and Drill [3] can all do that with connectors to different data sources and varying support for advanced SQL.

Pulsar has support for Presto: https://pulsar.apache.org/docs/en/sql-overview/

Pulsar isn't a KV store though, it's a distributed log/messaging system that supports a "key" for each message that can be used when scanning or compacting a stream. GET and SET aren't individual operations but rather scans through a stream or publishing a new message.

If you just want to have a SQL interface to KV stores or messaging systems that support a message key then Apache Calcite [4] can be used as a query parser and planner. There are examples of it being used for Kafka [5].

1. https://spark.apache.org/

2. https://prestodb.io/

3. https://drill.apache.org/

4. https://calcite.apache.org/

5. https://github.com/rayokota/kareldb

Regarding the generic sql engine - it looks like this is what Apache Calcite was designed for.


One of the challenges with layering SQL on top of a KV store is query performance.

The most obvious way to model a secondary index on top of a pure KV store is to map indexed values to keys. For example, given the (rowID, name) tuples (123, "Bob"), (345, "Jane"), (234, "Zack"), you can store these as keys:

At this point you don't need or even want values, so this is effectively a sorted set.

Now you can easily find the rowID of Jane by doing a key scan for "name:Jane:", which should be efficient in a KV store that supports key range scans. You can do prefix searches this way ("name:Jane" finds all keys starting with "Jane"), as well as ordinal constraints ("age > 32", which requires that the age index is encoded to something like:

To perform an union ("name = 'Bob' OR name = 'Jane'"), you simply do multiple range scans, performing a merge sort-ish union operation as you go. To perform an intersection ("name = 'Bob' AND age > 10"), you find the starting point for all the terms and use that as the key range, then do the merge sort.

This is what TiDB and FoundationDB's record layers do, which both have a strict separation between the stateless database layer and the stateful KV layer.

The performance bottleneck will be the network layer. Your range scan operations will be streaming a lot of data from the KV store to the SQL layer, and potentially you'll be reading a lot of data that is discarded by higher-level query layers. This is why TiKV has "co-processor" logic in the KV store that knows how to do things like filter; when TiDB plays your query, it pushes some query operators down to TiKV itself for performance.

Unfortunately, this is not possible with FoundationDB. This is why FoundationDB's authors recommend you co-locate FDB with your application on the same machine. But since FDB key ranges are distributed, there's no way to actually bring the query code close to the data (as far as I know!).

I'm sure you could do something similiar with Redis and Lua scripting, i.e. building query operators as Lua scripts that worked on sorted sets. I wouldn't trust Redis as a primary data store, but it can be a fast secondary index.

FoundationDB's client bindings have a locality API which allows you to query the client's metadata cache of which key ranges are on which storage processes. This would allow you to build that feature of routing a query to the data.

I didn't know that. Very cool, thanks!

I disagree that an index is a set rather than a map: it is by definition a map from keys to row ids.

As for a generic relational layer over K/V stores, I think it’s a superficially appealing idea that would be impossible to optimize adequately in practice. Honestly, if you want to implement a distributed relational database, I would recommend starting with Postgres as your local storage engine and pushing down as much relational logic as possible to that layer. I have worked on such a system in the past and it produced very good results for minimal development effort.

I'm not talking about just any index, I'm talking about using a plain KV store as an index.

The row ID can be encoded into the key. From my example, a basic mapping might be:

  name:Bob:123 => <empty value>
If your KV store is optimized for range scans, as they usually are, then there's no reason to store anything in the value, because a key range scan can efficiently jump to the first instance of a key prefix.

For example, if I want to search for "name = 'Bob'", then I simply start at the key "name:Bob:" and pluck the row ID from each key, scanning sequentially until I reach the end of my range.

This works great for multiple values. For example:

  name:Bob:123 => <empty value>
  name:Bob:124 => <empty value>
  name:Bob:125 => <empty value>
Finding all rows matching "Bob" is a matter of just scanning by prefix.

If you store row IDs in the value, you'll risk read/write contention on the value. Let's say there are multiple rows with "Bob", you'll end up storing something like:

  name:Bob => [123, 124, 125]
To add or remove row IDs you'll now have to merge values of unrelated rows, and make sure this happens atomically. That usually means locking.

This also puts a constraint on the number of row IDs you can fit in a single value. KV stores typically co-locate value data with keys, so now you might not be able to efficiently scan a large range without also loading that data. You can do tricks like introducing an indirection, where you don't store "row IDs", but "row page IDs", where each page is sharded maximum of N row IDs, allowing you to sidestep the size limit on values. But that comes with other costs.

I'm not counting in-memory stores like Redis here. Implementing in-memory indexes is a completely different ball game to something that needs to live on disk.

As for "superficially appealing idea that would be impossible to optimize adequately in practice", your assertion is demonstrably false: TiDB and CockroachDB both implement performant relational databases on top of general-purpose KV stores.

e.g. how about a complex event processing engine? Something like that will do a lot of the above, but the inference database stays managable since old data will fall out of the windows.

Take a look at the Apache Flink CEP library, which operates over unbounded streams: https://ci.apache.org/projects/flink/flink-docs-stable/dev/l...

I usually dont associated CEP with this, but is makes sense. Are they meant to operate at this level (rather than at a higher level of abstraction)?

Which ones would you recommend looking at?

For a moment I thought Bajaj and TVS came together.

Captain here.

TVS and Bajaj are major motorbike manufacturers in India, and TVS had a model named "Apache" and Bajaj had a model named "Pulsar".

Flies away

Thank you for the explanation.

It's not obvious if you're not from India, so thank you for the explanation :)

This might be entirely off topic, but I'm having issues using RabbitMQ whereby durability suffers because messages are sent to remote hosts thus exposing them to both the network and remote host availability. On a previous platform I used an MSMQ based system which didn't have this problem since it uses a local store and forward service. So all sends are to localhost and are not affected by the network or the receiver availability. The MSMQ system was my first and only experience with messaging up to now, so I was surprised that any system would not work that way. How is this dealt with in other systems? Is it just a feature that exists or not and you just decide if it's important? And maybe just to shoe horn it to be on topic, does Pulsar use a local service?

That's an inherent issue with distributed solutions and is impossible to solve. The only way to deal with it is using various techniques like acknowledgements, retries, local storage, idempotency, etc. MSMQ handles that stuff behind the scenes but the problem itself will always exist if there's a network boundary.

These other systems are designed to be remote with a network interface. You can use the client drivers to handle acknowledgements/retries/local-buffering in your own app or use something like Logstash [1], FluentD [2], or Vector [3] for message forwarding if you want a local agent to send to. You might have to wire up several connectors since none of them forward directly to Pulsar today.

Also RabbitMQ is absolute crap. There are better options for every scenario so I advise using something else like Redis, NATS, Kafka, or Pulsar.

1. https://www.elastic.co/products/logstash

2. https://www.fluentd.org/

3. https://vector.dev/

> Also RabbitMQ is absolute crap. There are better options for every scenario so I advise using something else like Redis, NATS, Kafka, or Pulsar

Yikes, that's a bit harsh! I've been using RabbitMQ on multiple projects for several years, and I think it's a great pub/sub system. It also has a large userbase and community around it, as well as lots of plugins available.

I've never heard of using Redis for pub/sub - were you suggesting rolling your own on top of Redis? I'm not familiar with NATS (but it's been mentioned several times here, so I will definitely learn more!), but Kafka is a streaming log/event system, not a pub/sub system like RabbitMQ (different use cases).

I will say though that if something is wrong in your RabbitMQ config, the stack traces that Erlang produces when it crashes are a nightmare to decipher!

Just because it's widely used doesn't mean it's good (see PHP). It's slow with single-threaded topics that usually max out around 25k msgs/sec, fragile with dropped connections, stalled queues, corrupted data, terrible clustering that breaks often and doesn't support sharding, has a silly HIPE mode where you can choose to compile the Erlang code for more performance which turns startup time into minutes, etc.

It's poor architecture and implementation. One of the worst products built with Erlang. It's hard to work on (for new contributors) while not really using any of the natural advantages of the language.

Redis supports pub/sub channels. It's very fast and if you already use it then it saves running another system.

Streaming log vs pub/sub is a fuzzy distinction and basically has no difference at this point. Publishers send to a topic and consumers listen. You can do the ephemeral pub/sub in Kafka by reducing retention to seconds, using a single partition per pub/sub topic, and having consumers listen from latest without a consumer group. Or use Pulsar which does both much more naturally.

Also I didn't mention but there are great commercial messaging products like AMPS [1] or Solace [2] if you need much more advanced features and support.

1. https://www.crankuptheamps.com/

2. https://solace.com/

I didn't k ow about Redis pub/sub, I'll check it out! With a quick glance, I'm not sure it supports message durability (persisting messages to storage)?

IME, any Erlang system is difficult for noobs to contribute to; Erlang's syntax alone is... frightening :)

I really haven't hit any of the issues you mentioned with rabbit. Several clusters in production for years have been rock solid, connections are stable (and when they do drop, the client libraries, of which there are a multitude, can handle reconnection), throughput is excellent, and I've never once seen data corruption (and combined we must have processed many billions of messages).

The whole HIPE debacle I can at least agree on though! (I seem to recall it's deprecated now, but I might have dreamed that)

Redis pub/sub is ephemeral. Send to a topic and any active listeners will receive.

If you need persistence then Redis v5 has a new data type called Streams which is similar to Kafka/Pulsar. Push messages to a stream and read with multiple consumers (using consumer groups) that can ack each msg to track read state.

You're free to have queue and workers run on the same machine, just bind to loopback. As soon as you deal with more than one machine, which is required in HA scenarios, you deal with a networked (distributed) system. I might not have understood your question correctly though ...

Edit: Maybe you're looking for acks/confirms? https://www.rabbitmq.com/confirms.html

I have many machines, each of which have one or many applications that send messages. And I have one machine with an instance of Rabbit to which all messages are sent. If the network is down or the Rabbit machine is down, the messages are gone along with their data.

Clustering the Rabbit machine helps one particular failure scenario, but it's not a solution to the problem.

I haven't worked with RabbitMQ in years, but IIRC I solved this with federation. My sends were all to localhost or a VM on the same machine.

That solves the problem of lost messages if you have e.g. a network problem between servers. I don't know what to do about RabbitMQ being down, all you can really do about that is move the exact same problem somewhere else by making things much more complicated, such that it's probably not worth it. If the place you're storing outbound messages is broken, you (obviously) can't store them, whether that's PostgreSQL, Redis, a remote RabbitMQ server, a local RabbitMQ, whatever.

Look up : outbox pattern

Was going to suggest this too.

I'm build a distributed system with RabbitMQ just now, where producers may be offline due to transient networking issues. I write messages to a local SQLite database, with another thread responsible for sending them to RabbitMQ and deleting them on successful delivery.

Well, Postgres has an event source system.

NoSql for this seems to be a better use-case.

I also depend on nats.io instead of RabbitMQ

Not sure what you're getting at with NoSql here - how does NoSql vs relational affect the choice of message broker/queue?

My (very limited, and purely gained through this thread!) understanding of nats suggests it doesn't handle persistence, instead leaving it up to producers and consumers - but I need to not lose any messages in the event of an outage. Waiting for consumers to signal completion of processing isn't practical, as it will sometimes take several minutes or even hours before messages are processed; I think the real solution here is persistence at the broker?

I'm talking about the backend for storing the messages/events in the outbox pattern.

That's how to persist in case of a network failure.

The outbox-pattern pattern removes the send message from it's datastore when send/acknowledged.

Persistency is different from the outbox-pattern, it handles what is does if the message broker received the message. The outbox pattern handles failure before the message reached the message broker.

NoSql and event stores normally have methods to adjust data to the latest version of the model, eg. Postgress ( which can be used as NoSql using bson) supports js v8 transformations. This is useful for edge-cases in the outbox pattern.

In short, I was more or less talking about the tech used. Rabbit vs Nats and sqllite vs NoSql ( in my case, using Postgress as NoSql).

Ps. Nats can be persistent as others mentioned in this thread, look up: nats streaming

I really enjoyed the talk given by Quentin Adam & Steven Le Roux at Devoxx 2019. It gave a great overview of Apache Pulsar. Hope someone else can find it useful too!

[0] https://www.youtube.com/watch?v=De6avNyQUMw

Splunk just acquired streamlio and most of the core devs got sucked up. While pulsar is a great product - are you not concerned that these guys are getting paid $$ bank to do something else now?

spoiler: we're still working on Pulsar

That’s good news! I guess it would be very helpful to formally address this concern. Is there something that has been written / published to that effect?

There is another company StreamNative, founded by core Pulsar/BookKeeper devs 1 year ago. :)

How does this compare to Redis Pub-Sub or RabbitMQ?

Very different. Pulsar is primarily a Kafka competitor.

- it is much more performant than RabbitMQ - it's a commit log as well, not just a pub-sub system, ie. it is a good candidate as the storage backend for event sourcing - it supports geodistributed and tiered storage (eg. some data on NVMe drives, some on a coldline storage) - it's persistent, not in-memory (primarily)

.. and so on.

What about ZeroMQ?

Why use RabbitMQ and Kafka if you can use ZeroMQ? Meaning, isn’t it far more performant and distributed?

Maybe I am missing something here.

Message queues and message logs do different things. The idea of the log is that subscribers can show up after the log is written and read or reread it from the beginning. In an event sourced architecture you use the log as the source of truth and all consumers can replay the log against a local store to reconstruct a view of the system’s state. You also can use a log for pubsub, but if that’s all you need one of the MQ solutions is usually a better fit.

> Why use RabbitMQ and Kafka if you can use ZeroMQ?

They are totally different, you're comparing apples with oranges.

ZeroMQ gives you basic, very fast tooling to communicate between distributed processes. ZeroMQ does not provide tooling for e.g. maintaining a strictly ordered, multi-terabyte event log. And so on.

Yes but isn’t this a bit like comparing git / bitkeeper vs subversion / perforce?

Basically, one is decentralized and you can set up a massively parallel architecture, with eg each topic or subthread having its own pubsub.

The other is a monolithic centralized pubsub architecture.

You could argue that git in large institutional projects converges to a monolithic repo so at that point it’s less efficient even than svn.

But for most use cases, ZeroMQ would allow far more flexible distributed systems topologies and solutions. No?

Edit: HN and Google are both awesome: https://news.ycombinator.com/item?id=9634925

Zeromq is just a bit of sugar on top of tcp sockets. It isn't a message queue or anything close. You would be wasting a ton of time reimplementing a lot of basic features like retries, persistence, service discovery, dead letter queues, priority, and a ton of other stuff.

> You could argue that git in large institutional projects converges to a monolithic repo so at that point it’s less efficient even than svn.

Not true. Facebook and Google do not use Git. Microsoft does not use vanilla Git for their monorepo. They created this extension to make it scalable https://en.wikipedia.org/wiki/Virtual_File_System_for_Git

I went to https://pulsar.apache.org but didnt find a "Why Pulsar and not Kafka" -- is there an answer to that, or is this another Kafka competitor with the same strengths and not a specific differentiator?

Here is a two-part blog post I wrote on why Pulsar and not Kafka: https://kafkaesque.io/5-more-reasons-to-choose-apache-pulsar...

Thanks! And Part 1 seems like a good place to start: https://kafkaesque.io/7-reasons-we-choose-apache-pulsar-over...

Why is one Apache providing competing with another one?

Plenty competing Apache projects exist, Pulsar and Kafka aren't unique in that regard.

I don't think Apache cares if it's maintaining similar projects.

It's a persistent store, so that would be different from Redis Pub-Sub. Compared to RabbitMQ, Pulsar seems to favor strong ordering and protection from message loss.

This blog post offers some more info and leaders to other posts comparing Pulsar to RabbitMQ and Kafka: https://jack-vanlightly.com/blog/2018/10/2/understanding-how...

It’s closer to redis streams, except like kafka you can scale topics beyond the limits of a single server because they can be distributed. You couldn’t run the twitter firehose over redis streams, but you can run it over pulsar or kafka, given enough hardware.

This scales to multi-datacenter deployments well. Has strong multi-tenancy support if memory serves Yahoo is running a single cluster for all of their properties.

Or Firebase?

Or Kafka ?

Kafka is primarily designed for streaming, pulsar is both for streaming and queueing.

Firebase is a completely different animal.

Read the other comment thread: https://news.ycombinator.com/item?id=21936523

Scales much better the storage layer is separate from brokers so you can scale things independently.

I'm still on the fence with these distributed log/queue hybrids. From a theoretical perspective it seems these are excellent. I just have this nagging suspicion that there is some even-worse problem architectures based on these systems will harbor. This kind of ambivalence is something I find myself having to battle more and more in my career as I age. Most of the time the hype around new design/development patterns leads to a worse situation. Very rarely it leads to a significant improvement. I dislike that my first impression looking at a system like this is risk aversion.

Your risk aversion seems justified. It seems reasonable to estimate that very few teams are in the position of needing the kind of scale/scalability that something like Apache Pulsar offers. They are much more likely to be in either a state where they will not put Pulsar through its paces or where they already have a solution in place that serves their scale/scalability needs.

When a team you are on starts discussing switching over to a technology like Pulsar because of its amazing benefits, unless your pants are on fire, it is much more likely than not that you do not stand to gain much from the benefits that such software brings but you are accepting the maintenance burden that it represents.

I totally get where you're coming from as I feel like that alot too. But the fact that you are thinking about risk and business value over working on cool new tech should be a positive for the ventures you are a part of.

A lot is said or referenced in this conversation about why people chose Pulsar over Kafka. I'm not an expert in this area but are there use cases where Kafka is still better?

As someone with a few years of Kafka and the ecosystem under my belt, but no experience of using Pulsar in anger, the areas I can see where Pulsar is behind are mainly ancillary, and will likely be caught up by the community given a year or two.

Kafka Streaming - Pulsar functions don't intend to (by the looks of it) provide all of the functionality available in Kafka Streaming. E.g., joining streams, treating a stream as a table (a log with a key can be treated as a table), aggregating over a window. They seem to be more focused on do X or Y to a given record. That said, you don't need Kafka Streaming for that, other streaming products like Spark Streaming can do it also (although last I checked, Spark Structured Streaming still had some limitations compared to Kafka Streaming - can't do multiple aggregations on a stream etc.) A use case I have and love Kafka Streaming's KTables for is enriching/denormalising transaction records on their way through the pipeline.

Kafka Connect - Pulsar IO will get there with time, but currently KC has a lot more connectors - for example, Pulsar IO's Debezium source is nice, (Debezium was built to use Kafka Connect, but can run embedded), but you may not want to publish an entire change event stream onto a topic, you might just want a view of a given database table available - so KC's JDBC connector is a lot more flexible in that regard, and Pulsar IO currently doesn't have a JDBC source. It also looks like its JDBC sink only supports MySQL and SQLite (according to the docs) - KC's JDBC connector as a sink has a wider range of support for DBs, and can take advantage of functionality like Postgres 9.5+s UPSERT. Likewise, there's no S3 sources or sinks - the tiered storage Pulsar offers is really nice, but you may only want to persist a particular topic to S3.

KSQL - KSQL lets your BAs etc. write SQL queries for streams. That said, I do like Pulsar SQL's ability to query your stored log data. When I've needed to do this with Kafka, I've had to consume the data with a Spark job, which adds overhead for troubleshooting.

So yeah, that's the main areas I can see, but it's really a function of time until Pulsar or community code develops similar features.

The only other major difference I can see is that at the current time, it's comprised of three distributed systems (Pulsar brokers, Zookeeper, Bookkeeper) which is one more distributed system to maintain with all the fun that entails.

That said, I'll be keeping my eye on this, and trialling it when I get some spare time, as I've found that people will inevitably use Kafka like a messaging queue, and that is a bit clunky. Plus I'm a little over having people ask me how many partitions they need :D

Do they usually provide similar throughput on the same hardware?

I couldn't say tbh, I'm keen in running a trial with Pulsar alongside Kafka in production, might write it up when I've done so.

I'm sure Pulsar is worth it if you use most of what they're offering, but the Java client library is crusty, throws exceptions for control flow. I'm looking at a persistence mechanism built on top of NATS to replace it. The NATS layer would make it simpler to decouple the gateways from the persistence layer, and support our bulk computing needs.

From reading their documents, I really like the design of Pulsar. However, Kafka has been working so well for us and has much better integration with other components of our stack (Flink, Spark, NiFi, etc) that there's no compelling reason to switch.

I think Pulse should really focus on the integration with the rest of the Apache stack if they want to gain traction.

Being able to scale the durable-storage layer independently has a lot of advantages. More thoughts here: https://twitter.com/breckcs/status/1203736751681896449.

Most of the comments are just pro-Pulsar but what's the architectural trade-off? (Non-architectural trade-off is that Pulsar is a new system to learn for folks familiar with maintaining and using Kafka.)

Pulsar is better designed than Kafka in every way with the main trade-off being more moving pieces. That's why the recommended deployment is Kubernetes which can manage all that complexity for you.

Pulsar also lacks in the size of the community and ecosystem where Kafka has much more available.

Someone linked some benchmarks here (on mobile and can’t find) that showed a single node Kafka outperforms but as soon as you start scaling Pulsar pulls ahead. I’m not familiar enough with the nitty gritty to comment beyond that.

How does this compare with NATS?

NATS is a simpler PUB/SUB system that delivers in the UNIX spirit of small composable parts. Apache Pulsar or Apache Kafka deliver the banana, the ape holding it and the rest of the jungle.

Check out Liftbridge (https://liftbridge.io) as a way to add these capabilities to NATS.

your FAQ still says it's not production ready? Is this still the case, I've been keeping my eye on this project

It's getting very close. I had wanted to make a production-ready 1.0 release before the end of the year, but we're in the process of switching from protobuf to flatbuffers. Once that is complete, a stable release will be made.

NATS is ephemeral pub/sub only. There is no persistence or replay, but focuses on high performance and messaging patterns like request/reply.

Kafka and Pulsar persist every message and different consumers can replay the stream from their own positions. Pulsar also supports ephemeral pub/sub like NATS with a lot more advanced features.

NATS does have the NATS Streaming project for persistence and replay but it has scalability issues. They're working on a new project called Jetstream to replace this in the future.

"Scalability issues" makes it sound like a blocer. I'd rephrase is as "doesn't support horizontal scaling which might be a limit to some".

How is it compared to kafka?

Most of the flaws of Kafka are carefully studied and fixed in Apache pulsar. I have written a blog about why we went ahead with pulsar https://medium.com/@yuvarajl/why-nutanix-beam-went-ahead-wit...

> when consumers are lagging behind, producer throughput falls off a cliff because lagging consumers introduce random reads

I am confused by this. The format of Kafka's log files is designed to allow reading and sending to clients directly using sendfile, in sequential reads of batches of messages. http://kafka.apache.org/documentation/#maximizingefficiency

Kafka brokers handle connections to consumers and data storage. This creates contention as the primaries for each partition have to service the traffic and handle IO. Consumers that aren't tailing the stream will cause slowdowns because Kafka has to seek to that offset from files which aren't cached in RAM.

Pulsar separates storage into a different layer (powered by Apache Bookkeeper) which allows consumers to read directly from multiple nodes. There's much more IO throughput available to handle consumers picking up anywhere in the stream.

Kafka works best when the data it is returning to consumers is in the page cache.

When consumers fall behind, they start to request data that might not be in the page cache, causing things to slow down.

> However, we can’t use Kafka as a queuing system because the maximum number of workers you can have is limited by the number of partitions, and because there is no way to acknowledge messages at the individual message level. You would need to manually commit offsets in Kafka by maintaining a record of individual message acknowledgments in your own data store, which adds a lot of extra overhead — too much overhead in my opinion.

I just want to clarify this - you're limited to N concurrent consumers for N partitions per consumer group.

Separates storage from brokers for better scaling and performance. Millions of topics without a problem and built-in tenant/namespace/topic hierarchy. Kubernetes-native. Per-message acknowledgement instead of just an offset. Ephemeral pub/sub or persistent data. Built-in functions/lambda platform. Long-term/tiered storage into S3/object storage. Geo-replication across clusters.

You might want to check out this blog post I wrote comparing Kafka to Pulsar: https://kafkaesque.io/5-more-reasons-to-choose-apache-pulsar...

If you have an O'Reilly subscription, you can also check out this detailed report comparing Pulsar and Kafka: https://learning.oreilly.com/library/view/apache-pulsar-vers...

This is one of the best overviews on Pulsar with comparisons to Kafka: https://jack-vanlightly.com/blog/2018/10/2/understanding-how...

Is there a reason you went with Pulsar over Kafka? How is the pulsar community? Where are you turning when you have support issues?

This is great !! What would be the easiest way to run a 3 node cluster ?

The standalone mode will let you get started as a developer. You grab tar.gz, uncompress, run standalone.sh.

There are helm charts for running an actual cluster.

Would Pulsar be suitable for IoT messaging? An alternative to mqtt?

"high-level APIs for Java, C++, Python and GO", no love for Node.js? :(

However it currently lacks the ability to listen for messages and run an event handler when one comes in: https://github.com/apache/pulsar-client-node/pull/56

You have to manually call ".receive()" to attempt to receive a message.

Using `.receive()` will occupy a worker thread from node until it returns. Having multiple consumers waiting on receive will clog up the worker threadpool, preventing anything that uses it from running. If you want to use the consumer right now, I would suggest always using a timeout on the receive call, and waiting between timed-out calls to receive. This is extremely important if you have multiple consumers.


This is one of the best designed pub/sub messaging systems available, but you don't have to use it if you don't want to.

Care to elaborate? I just started using the standalone version of pulsar for a project, it looked better designed than Kafka, resources usage looks quite acceptable so far, but i dont have much experience with any solutions in this space, so im not sure which problems im going to run into. Any suggestions for good tech/strategies for a streaming-type setup like this? Or what to do alternatively? Should i look into rolling my own?

not really, people legitimately need this for truly big data. you can't reliably processing 7 trillion events per day with a completely C/unix CLI stack.

I would bet that vast, vast majority of production kafka deployments do not see 7 trillion events per day. I bet many/most do not even see 7 billion events per day.

Please do elaborate

Why is apache developing all those servers that are only useful to a handful of companies that are rich enough to build them themselves? How about building something that individuals can use, like, i dunno, apache server itself?

If you did bother to read the linked page, you would have understood that it’s a yahoo project handed over to Apache for management like many of Apache’s projects.

yeah i m talking more generally about their full list here: https://www.apache.org/

There are a lot of projects that were handed to Apache to manage. Kafka for example was initially created by LinkedIn. So yeah, you are right, big corps are actually creating those tools, and in addition to this, giving it away as open source to public.

Best part is that the tools are put into production before being open-sourced. In other words, they actually work.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact