Hacker News new | past | comments | ask | show | jobs | submit login
Kafka is dead, long live Kafka (warpstream.com)
392 points by richieartoul on Aug 7, 2023 | hide | past | favorite | 287 comments



> In our experience, Kafka is one of the most polarizing technologies in the data space. Some people hate it, some people swear by it, but almost every technology company uses it.

(emphasis added)

Surely that's false?

Or, I mean, neither of us are providing any evidence here... For my part, 0 of the last 6 companies I've worked for used it. The company before that did (I drove its adoption), but we later abandoned it.

Linkedin built Kafka for massive-scale problems that 99% of us don't have. Though technologists have a well-earned reputation for using tech they don't need, my perception is that most of us are succeeding in avoiding the use of Kafka.


I’m not sure how anyone can hate Kafka? It does what it says on the tin - move data from A to B with publish/subscribe semantics.

It’s quite easy to just use it as a dumb message broker with no retention if that’s all you need but if you do want to do something funky with persistence then go down that route.

I’m not sure how anyone could have a negative feeling towards a vanilla, but rock solid and wildly popular open source tool. If they do then it will be about some niche feature or use case.

I actually think this message reflects badly on the vendor here. Criticise or compete with Kafka on its technical merits if you like, but this is just a misrepresentation of their position in the market.


> I’m not sure how anyone can hate Kafka?

My experience has been that such a question has two implied audiences in it: those who consume Kafka and those who have to keep the PoS alive and healthy

The whole ambiguity around whether ZK is really still needed or not <https://kafka.apache.org/documentation/#zk> makes keeping two distributed systems alive and healthy, but don't worry you can't move your production clusters off of it anyway <https://kafka.apache.org/documentation/#kraft_zk_migration>. It's a mess


ZooKeeper is rock solid. Moving off it is a mistake, IMO.

My tinfoil hat theory is that the whole impetus for KRaft is Confluent Cloud's multi-tenanted clusters have so many partitions that it starts to exceed ZK's capacities, so Confluent have built KRaft for Confluent.

And yeah, the migration approach is nutso. Also very annoying, the KRaft metadata topics being changed to be super-secret for... ...some good reason, I'm sure.

But it entirely removes the ability to respond to changed cluster metadata that you have with ZK, where you can watch znodes.

I'm not at all a fan tbh.


Not my experience at all.

We’ve been running a 3 node cluster for several years, and a significant minority of the times I’ve been paged is because ZK got into a bad state that was fixed by a restart (what bad state exactly? Don’t know, don’t care, don’t have two spare weeks to spend figuring it out). Note that we have proper liveness checks on individual instances, so the issue is more complicated than that.

Migrated to 3.3 with KRaft about half a year ago, and we haven’t had a single issue since. It just runs and we resize the disks from time to time.


> Migrated to 3.3 with KRaft about half a year ago

Did you follow their migration guide, or did you just rebuild the cluster and then using KRaft? I didn't know how "migration" was used in that context


We built a new cluster and very carefully rolled over our producers and consumers. It wasn’t simple, but it’s possible to do without downtime.


>ZooKeeper is rock solid

That has not been my experience. I've been running several small cluster (3 and 5 node) Confluent packaged for the last 3 years, and zookeeper ~20 times has gotten into this state where a node isn't in the cluster, and the way to "fix" it is to restart the current leader node. Usually I have to play "whack-a-mole" until I've restarted enough leaders that it comes up. Sometimes I've not been able to get the node back into the cluster without shutting down the whole cluster and restarting it.

Once it's running it's fine, until updates are done. But this getting into a weird state sure doesn't sit well with me.


> ZooKeeper is rock solid

lol. lmao. never falls over or causes incidents, in the same way c is rock solid and never SIGSEGV or causes security problems


This thread is an excellent example of the author's point: Kafka is polarizing.

Personally, in my experience with Kafka and Zookeeper at Airbnb back in the day (we also used ZK for general-purpose service discovery), they both were... temperamental. They'd chug along just fine for a bit, seemingly handling outages that e.g. RDS would have thrown a fit over, and then suddenly they'd be cataclysmically down in extremely complicated ways and be very difficult to bring back up. Even just using them required teaching a more complex mental model than most cloud-hosted offerings of similar things, and you ended up in this path dependency trap of "we already invested so much in Kafka, so if you want to send a message, use Kafka" when for like 95+% of use cases something easy like SQS would've been fine and simpler. TBQH I don't think either Kafka or ZK ever quite paid back their operational overhead cost, and personally I wouldn't recommend using either unless you absolutely need to.

Warpstream looks really cool in that light!


Some people say "well, it could be worse" but my go-to version is "at least it's not fucking etcd"


Fair, that's just been in my experience. It's still better than KRaft though.


> ZooKeeper is rock solid. Moving off it is a mistake, IMO.

I’m agnostic about Kafka but ZooKeeper is problematic for many use cases based on personal experience and I wouldn’t recommend it. It can be “rock solid” and still not very good. I’ve seen ZK replaced with alternatives at a few different organizations now because it didn’t work well in practice, and what it was replaced with worked much better in every case.

ZooKeeper works, sort of, but I wouldn’t call it “good” in some objective sense.


To be fair, a lot of people use ZK wrong, then complaint about it.

For example, if you use it like a general purpose KV store like Redis, you'll have a bad time.

Another often encountered mistake is people, thinking it doesn't need to store much data, deploy ZK to a server with slow disk/network. Big mistake, as every write to ZK need to be broadcasted and synced to disk, a bottle-neck in disk and network IOPS will kill your ensembles.


This has also been my experience when I saw unreliable ZKs; they're sharing the OS, ZK, and maybe even some other services on the same disk, and sometimes they're even running software RAID or something on top of that.

I don't think teams who can't run ZK will have much luck running other distributed systems. (Maybe KRaft, if they're Kafka experts.) Most of the alternatives proposed here have been "let someone else run the hard part." (Which isn't a bad choice, but it's not technically a solution.)


(WarpStream co-founder)

I'm not sure what you mean. Message persistence is a fundamental feature of Kafka that almost everyone using it relies on, its not some esoteric feature no one uses. We're each coming from our own network bias here, but in my experience a lot of people are really unhappy with the operational toil associated with running Kafka at scale in production.


Yes it’s fundamental, but it’s not generally that significant to users of Kafka.

As a developer, Kafka is a place to publish and subscribe to data with reliability and performance.

As a developer, the fact that messages are persistent is nothing more than a cool feature in that I can replay messages if I need to.

Things like consumer groups and offsets are features of the API, but they aren’t complex. Every similar tool whether it be RabbitMQ or IBM MQ has its own API abstractions and features. Likewise, I need to learn about failover semantics, but that’s the same with any API dependency.

It seems that you and the other posters here have a concensus that it’s hard to operate. Rather than saying that Kafka is dead or a polarising technology, a better line of argument is that it’s simply hard or expensive to operate at scale. (I personally think that’s par for the course with a technology like this, but that’s an aside.)

You have to remember that for everyone operating Kafka, there will be on average tens or hundreds of developers using it. And the vast, vast majority of those will not find it to be particularly polarising. Instead, they’ll find it a de-facto choice.


i work at a place that has a whole devops department maintaining it (so not even my pain), and i still hate how overengineered the overall system ended up, it's more busywork programming for it, debugging it, etc. No reason not to use sqs or rabbit or whatever unless you have a very special use case, or begin to hit X messages/second. Or just like to spend lots of time writing boilerplate and configuring.


Easy: I hate how complex it is, compared to something much simpler like https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQS...


Kafka is a closer to a persistent WAL than a message queue. If your work doesn't need a WAL, it's almost certainly overkill and you will hate it. If your work needs a WAL then it'll be your favorite tool ever.


For those like me who aren't used to that abbreviation, it's short for Write-ahead Logging [0].

[0] https://en.wikipedia.org/wiki/Write-ahead_logging


Why? Its quite easy to use Kafka as a messaging queue without even thinking about the write ahead log semantics. It’s there if you need it, but Kafka scales down to being a message broker fairly well in my opinion.


Because operationalizing Kafka is difficult from a infrastructure (scala/java, zookeper, durable disk management, lots of moving parts), learning and a code perspective (pointer tracking, partition management, delegation, etc) relative to the other pubsub/mq tools.

So if you don't have it operationalized and your use case is simple, it makes most sense to use a simpler tool (rmq/ampq, cloud pubsub, nsq, etc, perhaps even redis)


1) scala/java ... is that fundamentally difficult?

2) zookeeper is being eliminated as a dependenct from kafka

3) durable disk management ... I mean, it's data, and it goes on a disk.

Look, do you want a distributed fault-tolerant system that doesn't run on specialized / expensive hardware? Well, sorry, those systems are hard. I get this a lot for Cassandra.

You either have the stones for it as a technical org to run software like that, or you pay SAAS overhead for it. A Go binary is not going to magically solve this.

EVEN IF you go SaaS, you still need monitoring and a host of other aspects (perf testing, metrics, etc) to keep abreast of your overall system.

And what's with pretending that S3 doesn't have ingress/egress charges? Last I checked those were more or less in like with EBS networking charges and inter-region costs, but I haven't looked in like a year.

And if this basically ties you to AWS, then why not just ... pay for AWS managed Kafka from Confluent?

The big fake sell from this is that it magically makes Kafka easy because it ... uses Go and uses S3. From my experience, those and "disk management" aren't the big headaches with Kafka and Cassandra masterless distributed systems. They are maybe 5% of the headaches or less.


> 1) scala/java ... is that fundamentally difficult?

It's certainly at least more so as you have a highly configurable VM in-between where you're forced to learn java-isms to manage (can't just lean on your unix skills)

> 3) durable disk management ... I mean, it's data, and it goes on a disk.

Most MQ don't store things to disk besides memory flushing to recovery from crash, in most cases the data is cleared as soon as the message is acked/expired.

Look, I'm not saying not to use Kafka, I'm just pointing out the evaluation criteria. There are certainly better options if you just want a MQ, especially if you want to support MQ patterns like fanout.

The reality is if you're doing <20k TPS on a MQ (most are) and don't need replay/persistance, then ./redis-server will suffice and operationally it will be much much easier.


But... go is gc as well. Most JVM gripes are about the knobs on GC, but Go is still a fundamentally GC'd language, so you'd have issues with that.

So... Go was the rewrite? Scylla at least rewrote Cassandra in C++ with some nice low-to-hardware improvements. Rust? ok. C++? ok. Avoid the GC pauses and get thread-per-core and userspace networking to bypass syscall boundaries.

And look, this thing is not going to steal the market share of Kafka. Kafka will continue to get supported, patched, and whenever the next API version of AWS comes out (it needs one), will this get updated for that?

Yeah, Kafka is "enterprisey" because ... it's java? Well no, Kafka is scalable, flexibly deployable (there's a reason big companies like the JVM), has a company behind it, is tunable, has support options, can be SaaS'd, has a knowledge database (REEEAAALLLLY important for distributed systems).

All those SQLite/RocksDB projects that slapped a raft protocol on top of them are in the same boat compared to Scylla or Cassandra or Dynamo. Distributed systems are HARD and need a mindshare of really smart experienced people that sustain them over time. Because when Kafka/Cassandra type systems get properly implemented, they are important systems moving / storing / processing a ton of data. I've seen hundred node Cassandra systems, those things aren't supposed to go down, ever. They are million dollar a year (maybe month) systems.

The big administration lifts in them like moving clouds, upgrading a cluster, recovering from region losses or intercontinental network outages are known quantities. Is some Go binary adhoc rewrite going to have all that? Documented with many people that know how to do it?


If I could get away with a vendor cloud queue I wouldn't move to Kafka for the hell of it, but if I needed higher volume data shipping I've never found the infra as hard it people make it out to be. Unless you're doing insane volumes in single clusters, most of the pieces around it can work OK on default mode for a surprisingly long time.

You can cost footgun yourself like the blog here talks about with cross-AZ stuff (but that doesn't feel like the right level to do that at for me for most cases anyway), and anytime you're doing events or streaming data at all you're gonna run into some really interesting semantic problems compared to traditional services (but also new capacities that are rarely even attempted in that world, like replaying failed messages from hours ago), so it's good to know exactly what you're getting into, but I've spent far less time fighting ZK than Kafka and far less time fighting either than getting the application semantics right.

I imagine a lot of pain comes from "I want events, I know nothing about events, I don't know how to select a tool, now I'm learning both the tool and the semantics of events and queues both on the fly and making painful decisions along the way" which I've seen several places (and helped avoid in some of the later places after learning some hard, not-well-discussed-online lessons). I think the space just lets you do so many more things, so figuring out what's best for YOU is way more difficult the first time you as traditional-backend-online-service-developer start asking questions like "but what if we reprocess the stuff that we otherwise would've just black-hole-500'd during that outage after all" and then have to deal with things like ordering and time in all its glory.


Besides the operational concerns mentioned in the sibling comment, Kafka is simply not a great queue. You can't work-steal, you can't easily retry out-of-order, you can't size retention based on "is it processed yet", and you may need to manually implement DLQ behavior.

If you already have Kafka for other (more WAL-y, or maybe older log-shippy) reasons it can be an OK queue, especially if you've got a team that can use Kafka as a WAL they can easily work around using most of the downsides of using it as queue. But I wouldn't take it as a first choice.


Additionally, you can't easily increase/decrease consumer counts such that all consumers quickly get assigned roughly equivalent workloads.

It will be interesting to watch progress on KIP-932 as the Kafka community thinks about adding message queue behavior: https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A...


Great point. The basic semantics are very different too. In MQs you partition/namespace/channel (whatever you want to call it) based on how data flows in your application (e.g. fanout). In Kafka you're tied more to a persistance model so you end up with fat linear topics and the "filtering"/flow management happen on the consumer's side.


I work as a contractor so I move between places. I have found a few companies trying to introduce kafka, and every time it has been a solution in search of a problem.

I don't doubt that it has a good use case but I have so far only encountered the zealots who crowbar it into any situation and that has left a residual bad taste in my mouth. So I fall into the "hate it" side.


> and every time it has been a solution in search of a problem.

More refined to this, in my experience at the last two jobs, the queue problem is there, but the Kafka solution is based solely on "enterpriseyness" of Kafka, not any practical reason. RabbitMQ is highly performant, SQS is really easy. Both are great queues. Kafka is muuch more, yet, Kafka is chosen because "it's enterprise."


Kafka isn't even a queue. I've done consulting on Kafka, and several times my recommendation is "You don't want or need Kafka".

A classic sign of "you wanted an MQ" is when a consumer writes a message to a topic to let the producer know it read the message the producer wrote...


> A classic sign of "you wanted an MQ" is when a consumer writes a message to a topic to let the producer know it read the message the producer wrote...

Oof. Queued RPC is such a siren song; so many developers either stumble into this pattern or seek it out. And it's such a pain. Suddenly the latency of (often user-sensitive) operations is contingent on the latency of a queue consumer plus the time it takes to process everything in the queue before the RPC was issued. Goodbye, predictable turnaround times.


"but MQ is, like, so 1998, we want new and cool and GCP told us we needed Kafka"


Right. Kafka is a database. ;-)


Anything is a database if you're brave enough :D


The bravery is in using what has traditionally been called a database. ;-)

https://youtu.be/05mVvkp6f2M


Maybe "it's enterprise" means that's what the enterprise standardized on. There are a couple of practical reasons that come to mind on why that's the case - a) it's more resilient and durable than messaging platforms, and b) it is a platform of dumb pipes, so to make it a central data bus managed by platform teams means that they don't have to get into the detail of which queues perform what functions, have what characteristics, etc. Rather the client teams in the various business units can take care of all of their "smarts" the way they want. It also handles log/telemetry ingestion, data platform integration, and interservice comms use cases which is pretty multi-functional. That's the primary reason why Kafka has become such a pervasive and common platform, it's not because it's trendy, in fact most operations teams would rather not even have to operate the kafka platform.


That is, oddly, precisely the reason I've pushed for Kafka in certain environments... and the push back was that "Kafka isn't enterprise". ;-)


RabbitMQ is "highly performant" is a handwave. The words tell me nothing, just like any other tech/software that is described as "powerful".

In my last two major gigs, RabbitMQ was already being run in a clustered config, and it was not going well. Both places were in the process of doing arch changes to do a change to Kafka.

It seems like something that works great in a big scaled node and you can go to big nodes these days, but I don't think it is ready for cloud/distributed durability.

I'm not aware of Jepsen testing of RabbitMQ in distributed mode for example, and I wouldn't consider any distributed/clustered product that hasn't let Jepsen embarass it yet.

Cassandra and Kafka are frequent examples of YAGNI overengineering (although the fault tolerance can be nice without scale), the reality is that pumping up single-node solutions for too long is a big trap. Projects that start to stress single-nodes (I'm thinking like a 4xlarge anything on aws) should probably get to thinking about the need for jumping to dynamodb/cassandra/bigtable/kafka/etc.

RabbitMQ --> Kafka is a pretty easy lift if your messaging has good abstractions.

relational DB --> Cassandra is a lot bigger headache because of the lack of joins.


I have had to make that Clustered RabbitMQ to Kafka move myself, as the failure modes from RabbitMQ we're very scary. The most scary thing in the entire infrastructure in that financial institution levels of scary. It's not that it failed much, but you don't need many middle of the night calls with no good SOP to get the cluster back to health before migrating is in the cards.

Kafka is not operationally cheap. You probably want a person or two that understands how JVMs works, which might be something you already have plenty of, or an unfortunate proposition. But it does what is on the tin. And when you are running fleets of 3+ digits worth of instances, very few things are more important.


I have a dim view of almost all inherently single-node datastores that advertise a clustered hack (and they are hacks) as a patch-on (yes, even PostgreSQL). Sure it will work in most cases, but the failure modes are scary for all of them.

A distributed database will have network failures, will have conflicting writes, will have to either pick between being down if any of the network is down (CP) or you need a "hard/complex" scheme for resolving conflicts (AP). Cassandra has tombstones, cell timestamps, compaction, repair, and other annoying things. Others databases use vector clocks which is more complex and space intensive than the cell timestamps.

It's tiring to have move fast break things attitudes applied to databases. Yeah, sure your first year of your startup can have that. But your database is the first thing to formalize, because your data is your users/customers, you lose your data, you lose your users/customers. And sorry, but scaling data is hard, it's not a one or two sprint "investigate and implement". In fact, if you do that, unless you are doing a database the team has years of former experience with in admin and performance, you are doing it wrong.

"AWS/SaaS will eliminate it for me"

Hahahahaha. No it won't. It will make you life easier, but AWS DOESN'T KNOW YOUR DATA. So if something is corrupted or wrong or there is a failure, AWS might have more of the recovery options turnkeyed for you, but it doesn't know how to validate the success for your organization. It is blind trust.

AWS can provide metrics (at a cost), but it doesn't know performance or history. You will still need, if you data and volumes are any scale, how to analyze, replicate, performance test, and optimize your usage.

And here's a fun story, AWS sold its RDS as "zero downtime upgrades". Four or five years later, a major version upgrade was forced by AWS .... but it wasn't zero downtime. Yeah, it was an hour or so and they automated it as much as they could. But it was a lie. And AWS forced the upgrade, you had no choice in the matter.

Most clustering vendors don't advertise (or don't even know) what happens in the edge cases where a network failure occurs in the cluster but the writes don't propagate in the "grey state" to all nodes. Then the cluster is in a conflicted write state. What's the recovery? If you say "rerun the commit log on the out of sync nodes" you don't understand the problem, because deletes are a huge wrench in the gears of that assumption.

From my understanding of Cassandra, which kafka appears from the numerous times I've looked to be similar too with quorums and the like, it's built on a lot of the partition resilient techniques.

And, kafka has undergone Jepsen: https://aphyr.com/posts/293-jepsen-kafka

For those that don't know, aphyr will embarrass any distributed system given enough time. What is important is that

1) the distributed system is willing to subject itself to him and

2) they have a satisfactory response.

For an example of an unsatisfactory response, I give you MongoDB:

https://jepsen.io/analyses/mongodb-4.2.6

Note the "updates" section doesn't actually have them retry/repeat the testing. MongoDB just ran from the report. They claim it was fixed.

Anyway, if a system doesn't do that (submit to jepsen testing), then IMO it is hiding some big big big red flags.


Not your main point, but MongoDB didn't commission Kyle to do that report as they had in the past, he did it on his own time. That's why his report doesn't mention repeat testing. They do actually run his tests in their CI and those new tests were used to isolate that specific bug. Moreover, some of the complaints about weak durability defaults for writing were later fixed: https://www.mongodb.com/blog/post/default-majority-write-con.... They still do default to a weak read concern, but writes are fully durable unless you specifically change the behavior. For what it's worth I agree with Kyle that they should have stronger defaults, but I don't really see a problem with MongoDB's response to the report because there is room to disagree on that.


Do you have a source for this? I got the impression at the time that there was some commissioning of his services, but that they didn't like the report. But he publishes work, and released the report, which forced them to deal with it.

Every distributed tech fails when he test it, but the tenor and nature of the report for MongoDB was different. It basically said between the lines "do not use this product".

MongoDB has a history of really crappy persistence decisions and silently failed writes, and as soon as it gets publicized saying "we fixed it in the next release". The same thing happened here of course. I simply don't trust the software or the company.

Mysql has the same annoying pattern in its history, although I have more confidence in the software because of the sheer number of users.

Still, I would probably pick PostgreSQL for both relation and document stores.


Source for which claim? Kyle was paid for work testing 3.4.0-rc3[1] and 3.6.4[2] which analyzed single document concurrency in a sharded configuration. Those tests run in their CI [3]. MongoDB had some somewhat misleading copy on their website about the result of those tests, so Kyle decided to test the new multi-document transactions feature for 4.2.6 and found some bugs.

It's fair to not trust the database or company, I don't blame you for that. But I think Kyle's MongoDB 4.2.6 report was not nearly as concerning as his PostgreSQL 12.3 report which found serializability bugs in a single instance configuration, among other surprising behaviors. MongoDB's bugs were at least in a new feature in a sharded configuration. I don't think his most recent report was actually as negative as it may read to you. I say this as someone who mostly runs PostgreSQL, by the way!

As a side note I believe there are consistency bugs existing right now in both MongoDB and PostgreSQL (and MySQL and Cassandra and Cockroachdb and...) waiting to be discovered. I'm a jaded distributed systems operator :)

[1] https://jepsen.io/analyses/mongodb-3-4-0-rc3

[2] https://jepsen.io/analyses/mongodb-3-6-4

[3] https://github.com/search?q=repo%3Amongodb%2Fmongo+jepsen&ty... (note: not an expert in when or what suites it runs, just have seen it running before as a demo)


I always find the "it's enterprise" statement so humourous, given how much time I've had to invest in convincing enterprises that Kafka wasn't some weird fly-by-night technology that couldn't provide for the demanding enterprise.


It’s amazing how far that one word can go.

In a former life, I even heard this one:

- We use CentOS.

Why?

- RedHat is enterprise.

(But you’re not even paying for enterprise support)


The biggest issue for me is people using kafka for mqtt, mqtt is a pub/sub broker already. The other issue is thinking of Kafka as some kind of “innovative” data ingestion tool, so now instead of 50 extract jobs per day, you got to reconcile millions of events in realtime. I think message brokers make sense, but they are message brokers, nothing else, no?


It's really good when you're producing a lot of data fast that you don't want to lose, and you want multiple consumers to read.

It's a complex tool that solves a complicated problem. But if you don't actually have that problem, then that's a whole lot of complexity for no gain.


Hypothetical use case I've been thinking about:

Say I want to log all http requests to the server (I know I said a keyword of log) and then process those logs into aggregates, stick them in a time series.

Would it be insane to "log" everything into kafka? Or what would be the more "correct" tool for that job?


No, that's a common use case. Ultimately Kafka is just a big dumb pipe. Only question I'd ask is how much data are you expecting?


Same finding.

That's why that sentence in the article "but almost every technology company uses it." should be rephrased to "but almost every technology company do not need it"


I disagree. I certainly think it's possible people might be looking to fit Kafka into things that simply don't need it (perhaps driven by the system design theory focus in hiring), but for the applications where you have event streaming, Kafka is still the top choice. Analytics, messaging, sensors, etc.

From my side, I agree with the author about the "Accidental SRE" points. But Kafka is a solid technology, so much so that there's no shortage of "Kafka but better" tools out there (e.g. Redpanda).

Also you kind of drift off the point there at the end - even if it wasn't used extensively (a point of contention), that has nothing to do with whether it is polarizing or not? The statement about it being loved or hated is still relevant to those solving the 1% scaling problems you mentioned, even if 99% aren't.

It's like saying that the statement "lamborghinis are polarizing" is false because most of us don't have one? The author explicitly says "in the data space" too, effectively restricting the people he's talking about.


that's a littlebit of a stretch. when you say "no shortage" - outside of redpanda what product exists that actually compete in all deployment modes?

it's a misconception that redpanda is simply a better kafka. the way to think about it is that is a new storage engine, from scratch, that speaks the kafka protocol. similar to all of the pgsql companies in a different space, i.e.: big table pgsql support is not a better postgres, fundamentally different tech. you can read the src and design here: https://github.com/redpanda-data/redpanda. or an electric car is not the same as a combustion engine, but only similar in that they are cars that take you from point a to point b.


I've been impressed with Apache Pulsar, with it's Kafka wire-compatibility layer enabled.


(WarpStream co-founder)

That's fair, that statement is probably significantly colored by my own personal network / work experience.


A lot of companies use a product that uses kafka under the hoods. I was running graylog a few years ago for months before I knew kafka lay under the hoods.

They have a list: https://kafka.apache.org/powered-by


Ironically, I have a fundamental hatred for LinkedIn and its sluggishness. It’s one of the slowest websites I frequent. I have few connections all things considered. Putting my feed together cannot be rocket science (its contents stay quite static for sometimes weeks at a time).


I think many that use it might want use something like ActiveMQ instead. I think such middleware can also be interesting for smaller applications. Especially if they manage messages and data streams between two different companies that like to communicate directly but have arcane software components on each side that fit together like fire and water.


To me a technology company is not just a company that uses tech (every company does that) but one whose core value proposition is fundamentally technical. And I think most serious companies doing that have a need for highly available data storage, for which Kafka is the least bad option.

What are the alternatives? Cassandra is just as operationally complex and harder to fit your dataflow into. The various efforts to built proper master-master HA on MySQL or PostgreSQL or similar tend to be flaky, expensive, and vendor-lockined. BigTable can work if you're all-in on Google Cloud, but that's quite a risk.

As far as I can tell there are mostly companies that use Kafka and companies that have a SPOF PostgreSQL/MySQL database (with some read replicas, and maybe some untested perl scripts that are supposed to be able to promote a replica to master) and stick their fingers in their ears.


> As far as I can tell there are mostly companies that use Kafka and companies that have a SPOF PostgreSQL/MySQL database

I haven't seen that at all, across the many companies I've worked at, consulted with, and talked with others about.

Kafka is usually an ancillary system added to companies with a strong culture around one or more pre-existing datastores (from PG/MySQL to Dynamo/Cassandra to Mongo/Elastic). When Kafka's actually needed, it handles things those pre-existing stores can't do efficiently at high volumes.

Are you really seeing companies use Kafka for their main persistence layer? As in, like, KQL or the equivalent for all/most business operations?

Even the CQRS/ES zealots are still consuming from Kafka topics into (usually relational) databases for reads.


> Are you really seeing companies use Kafka for their main persistence layer?

I'm seeing kafka-streams-style event processing as the primary data layer used by most business operations, although only in the last couple of years.

> As in, like, KQL or the equivalent for all/most business operations?

> Even the CQRS/ES zealots are still consuming from Kafka topics into (usually relational) databases for reads.

Yeah, I'm not seeing KQL, and I'm still seeing relational databases used for a lot of secondary views and indices. But the SQL database is populated from the Kafka, not vice versa, and can be wiped and regenerated if needed, and at least in theory it can't be used for live processing (so an SQL outage would take down the management UI and mean customers couldn't change their settings, it would be a big deal and need fixing quickly, but it wouldn't be an outage in the primary system).


I think if you dismiss HA setups of SQL dbs as "you won't get around to operating it properly" the same ops culture will also end up getting many less 9's availability than aspired to with Kafka.

(But also of course lots of applications are also fine with the availability that you get from fate-sharing with a single db server)


> I think if you dismiss HA setups of SQL dbs as "you won't get around to operating it properly" the same ops culture will also end up getting many less 9's availability than aspired to with Kafka.

Up to a point. IME Kafka is a lot easier to operate in true HA form than SQL dbs, and a lot more commonly operated that way; Kafka has a reputation for being harder to operate than a typical datastore, but that's usually comparing a HA Kafka setup with a single-node SQL db. And I don't know why, but many otherwise high-quality ops teams seem to have a bizzare blind spot around SQL dbs where they'll tolerate a much lower level of resilience/availability than they would for any other part of the stack.


We standardized on Clickhouse for everything. (With its own set of surprising and/or horrifying ops issues.) But at least it is a proper high-load, high-availablity solution, unlike Kafka, Cassandra, et al.


> We standardized on Clickhouse for everything. (With its own set of surprising and/or horrifying ops issues.)

Clickhouse I admittedly haven't personally seen quite as much operational unpleasantness as Greenplum or Galera, but at this point I'm dubious of anything in that bucket.

> But at least it is a proper high-load, high-availablity solution, unlike Kafka, Cassandra, et al.

What went wrong with those for you? In my experience the setup stage is cumbersome, but once you've got them running they work well and do what you expect; most complaints you see come down to they're not relational/not SQL/not ACID (true, but IME more of an advantage than a disadvantage).


Not the parent, but I have some ClickHouse experience. ClickHouse is surprisingly easy to deploy and setup, talks both mysql and postgresql wire protocols (so you can query stuff with your existing relational tools), the query language is SQL (including joins with external data sources, such as S3 files, external relational databases and other clickhouse tables), and it is ACID on specific operations. It assumes your dataset is (mostly) append-only, and inserts work well when done in batch. It is also blazingly fast, and very compact when using the MergeTree family of storage engines.

Development is very active, and some features are experimental. One of the common mistakes is to use latest releases for production environments - you will certainly find odd bugs on specific usage scenarios. Stay away from the bleeding edge and you're fine. Clustering (table replication and sharding of queries) is also a sort-of can of worms by itself, and requires good knowledge of your workload and your data structure to understand all the tradeoffs. Thing is, when designing from scratch, you can often design in such a way where you don't need (clustered) table replication or sharding - again, this also has a learning curve, for both devs and devops.

You can easily spin it on a VM or on your laptop, load a dataset and see for yourself how powerful ClickHouse can be. Honestly, just the data compression alone is good enough to save a s**load of money on storage on an enterprise, compared to most solutions. Couple this with tiered storage - your hot data is eg. in ssd, your historical data is stored on s3, and rotation is done automatically, plus automated ingestion from kafka, and you have a data warehousing system at a fraction of the price of many common alternatives.


"Linkedin built Kafka for massive-scale problems that 99% of us don't have."

What a tool is built for is not the same as what it is good for is not the same as what it is used for... and just like people spend an inordinate amount of time worrying about what happens if they get rich, companies spend a lot of time future proofing for scenarios where they are hugely successful. If nothing else is true about the tech industry, it's certainly true that people misjudge tools and misapply them with alarming regularity, to the point where it is at least as likely that the tool being used is a bad fit for the problem as it is a good fit.


What do you use instead? Polling APIs? A queue instead of an event stream?

Event based architectures definitely add infrastructural overhead, but are positive at a certain scale and/or architectural complexity (multiple decoupled subscribers).


We use a table called "Messages" in our SQL Server database. Everyone talks to the same database. Turns out we don't really need to push extreme message rates or meet aggressive single-digit millisecond budgets, so this works out well in practice. It is also the easiest thing on earth to develop & debug, because you can monitor the table/log and instantly understand the state of the whole system and how it got there. Oh - it is also magically included in our DR strategy. No extra infra. We don't have to have a totally separate procedure to recover this thing.

We primarily use it as a backhaul between parts of our infrastructure in order to perform RPC. The approach is for the users of the broker (our services) to poll it at whatever rate is required. This is actually a little bit clever if you think about it - Users that don't really care about liveliness can poll for their messages every minute or so. Users that are in the hot path of a web UI could poll every 50~100ms.

Polling sounds kinda shitty (at least to me) but I argue it's the best default engineering solution until proven otherwise (assuming its not somehow harder than the other magic async event bubbling things). We don't have a lot of services doing this so contention isn't really a problem for us. Even if it did get to that point, I would reach for a read replica before I refactored how all of messaging worked. Most of polling is just a read operation that does nothing, so we can go horizontal on that part pretty easily.


For PG users: you can use NOTIFY to avoid polling.


Ah yes, just start with SQL is really the right choice for a lot if not most things. I work on things that are both too high scale and too organizationally complex for the simple solutions.


fyi, SQL Server has a message broker built-in.

https://learn.microsoft.com/en-us/sql/database-engine/config...


Not available in SQL Server Hyperscale, unfortunately. We think we can ride this one all the way.


how do you design avoiding message queues? Or do you use other alternatives around kafka for these things?


Most of my designs include a message queue, and I would not use Kafka unless there was a strong need.

Right now my preference varies a bit depending on the rest of the tech stack, but for the most part I use Redis or RabbitMQ.

If the stack is already hard dependent on AWS or another cloud, then SQS or whatever is also fine.

I also wouldn't overlook just using your existing DB (like postgres)! At low and even medium scale this can be totally fine, and also comes with lots of benefits like single-source-of-truth, normal relational DB constraints, transaction wrapping, and more. One of the highest scale apps I've worked on uses Postgres for queueing. It's take a number of optimizations over the years as performance starts to fall due to scale, but it's doable.


There was recently an article about distributed systems that showed up here. (Harry Doyle: Christ, I can't find it. To hell with it!)

And the author made a very interesting point about message queues. Simply, any problem that could be resolved by a message queue could be resolved by load balancing or persistence, and, therefore, messages queues were actually kind of a bad idea.

There were two basic issues.

The first is that because of the nature of message queues, they're either empty or full. The second is that for many of the ways that queue are used, the unit putting the request on the queue in the first place may well be waiting for the response related to the request. So you've just turned an every day synchronous request into a more complicated, out of band, multi-party asynchronous request.

If your queues are not empty, they they are filling up. And queue capacity is a two fold problem. One, is that you simply run out of space. But, more likely, referring to the earlier point about waiting for a response, is that you run out of time. The response does not return fast enough to manage your response window to the unit making the request.

This is a load balancing problem. If the queue is filling you simply don't have the capacity to handle the current traffic. It's also a mechanically simpler thing to send out a request and wait for the response than to do the dance via a message queue.

The second part is that if you're throwing items onto a message queue, and you "don't care" about them, that is it's a true "fire and forget" kind of request and direct response time is not a concern, then what does the queue gain you over simply posting it to a database table? If the request is Important, you certainly don't want to trust it to a message queue, a device not really designed for the storage of messages. Messages in a queue are kind of trapped in no mans land, where the easiest way to get to a message is to dig through the ones piled in front of it.

They're interesting insights and worth scratching your chin and going "Hmmm" over.


> It's also a mechanically simpler thing to send out a request and wait for the response than to do the dance via a message queue.

Yeah, this is simpler for the requester, but not for the counterpart that has to respond. Because now, the responder has to have 100% uptime and better not fail during the request, otherwise things get lost.

Let's take sending emails as an example. You have a server A that can send emails, you have a server B that fulfills requests/actions by a user. Let's just assume that this is the (legacy) setting we are dealing with.

Now, what do you do if you want to send the user an email on a certain action, e.g. a password reset or changing a payment information etc.? Is B then making a synchronous request to A? What if server A is currently down due to maintenance? What if A uses another entity to send the emails, which itself is down? How do you observe the current state of the system and e.g. detect that A is overloaded?

With a message queue you get all those things for free and A doesn't even have to have any persistence layer for "retries" in case of crashes etc.

While it's true that those issues can all be resolved by "by load balancing or persistence" it just means that you now traded one issue (having a message queue) for multiple issues (having database(s), having load balancer(s) and essentially re-implementing a part of a message queue).

In most cases a message queue seems like a good trade-off.


This is where the "or persistence" bit comes in, but the trade-off isn't as you describe. One way to approach it would be for B to set a flag against the user in the database they must already have, because they have users so that A can process password resets on its own schedule. It's add a column versus add a message queue. Databases already come with locking primitives and well understood patterns, no reinvention needed.


You are now assuming that they indeed have a database nd store users in there, and not use a 3rd party system that might not even allow to attach meta data (or at least make it difficult).


In which case why is the password reset my problem at all?

But even if I've handed off identity management entirely, I almost certainly do have some per-user state. Otherwise... what on earth am I doing with users in the first place?


Well, maybe because the 3rd party service is not reliable or you want to send an additional mail.

But okay, fair point - password-reset is maybe not the greatest example. But it doesn't invalidate the general point (I gave a couple of other examples).

> Otherwise... what on earth am I doing with users in the first place?

Maybe just making sure that the user is known and has paid for plan X (if the 3rd party service offers that).


Kafka is less message queue (rabbitmq, sqs) and more ordered stream/write ahead log (ala kinesis)

> The first is that because of the nature of message queues, they're either empty or full.

... Wat?

> that is it's a true "fire and forget" kind of request and direct response time is not a concern, then what does the queue gain you over simply posting it to a database table?

Performance is why. The fire and forget aspect is like udp in the sense that you don't need to ensure ordering of messages (packets) or hard persistence to the database. Also, dead letter queues exist for a reason.

Message queues are super useful. The highest performance systems I've seen use the message queue + pool of workers paradigm, as it allows you to better smooth your load (unlike immediate republishing like in sns, which requires hardware available to accept) with minimal guarantees (unlike a write ahead log such as kinesis). The buffer is also great because it allows you a bit more time to scale up both your message queue fleet and worker fleet when you get a load spike.


> > The first is that because of the nature of message queues, they're either empty or full.

> ... Wat?

I interpreted it as "they are either trending towards empty or full". The statement doesn't seem well thought through.

That might be true (either empty or full) most of the time (maybe) _if you squint_, but the entire point of the Message Queue is to provide buffering from the transient state (somewhere between empty and full) trending toward the empty state.


Yea, I feel like these people never got a chron daily data dump (times N customers). Scaling is dead simple, and there isn't a need (or ability) to instantaneously handle large bursty workloads in like 99% of cases.

Longer SLAs mean it's also easier to hit those SLAs. Giving the clients realistic SLAs is super important.


> If the request is Important, you certainly don't want to trust it to a message queue, a device not really designed for the storage of messages.

That's false. Virtually all MQ systems are designed to persist (often with replication/redundancy) and store data. Most MQs also support non-persistent delivery, with the cost/benefit (ephemerality/performance) that entails, but that doesn't mean that durable storage is any less well-supported.

Sure, folks have plenty of operational war stories regarding failures of persistence in their MQ broker/cluster/whatnot. Same as the DBAs who manage relational databases.




If I'm thinking of the same article: was it about going from message queues to state machines for more operational robustness?


They may still use some sort of event/message system. Kafka is lower level than other sorts message queue systems and requires more work to get correct (dedupe, ordering, retry logic), but has great performance. It's often easier to choose a different messaging system though.


Kafka ain't a MQ, it's a distributed log.

Pulsar is a Kafkaesque system that can act as distributed log _or_ an MQ.


RabbitMQ, ActiveMQ, MQtt

All of those are fine is you only need pub/sub


Just to clarify MQTT is a pub/sub protocol, while ActiveMQ and RabbitMQ are message broker implementations. As an example ActiveMQ implements MQTT as one of its optional protocols. Though if anyone's looking for a simple MQTT implantation I'd recommend Mosquitto. My 2 cents as someone who's worked with ActiveMQ pub/sub for quite a few years.


How good does RabbitMQ do in terms of availability nowadays? Because one thing a message queue should offer is high availability - otherwie it loses one of it's most compelling benefits.


Rabbit's quorum queues are an improvement on the extremely poor HA/clustering system they provided previously. Users can now choose between both.

Rabbit's defaults are still unfortunate, in my opinion: queues and messages are not disk-persisted by default, though this can easily be enabled. As a result, many folks run and benchmark a "high availability rabbit" only to discover that they're benchmarking distributed state stored in memory, not disk.

https://www.rabbitmq.com/quorum-queues.html


The one thing that made me pull back from RabbitMQ years ago was that using it between datacentres was a bad plan, because all the clustering was based on Erlang's underlying cluster implementation and the advice on that was not to use it between geographically distinct locations. I don't know if it's since improved or if that advice no longer holds, but working under an environment where we needed cross-DC redundancy made it impossible to select, for that reason.


Those are also message queues. They even use MQ in the name.


Couple questions:

1. aren't you going to get murderous S3 API call bills if you're pushing each message directly into S3? How're you buffering / queuing / coalescing messages durably without local storage?

2. what's the problem with "just" running a kafka cluster in each AZ and not replicating data between AZs until it's time to ETL the data to wherever? AZ1 clients push to AZ1 clusters; AZ2 clients push to AZ2 clusters, etc.

3. What's done to preserve order-of-operation within a kafka partition?


[WarpStream co-founder and CTO here]

1. Each WarpStream Agent flushes a file to S3 with all the data for every topic-partition it has received requests for in the last ~100ms or so. This means the S3 PUT operations costs scales with the number of Agents you run and the flushing interval, not the number of topic-partitions. We do not acknowledge Produce requests until data has been durably persisted in S3 and our cloud control plane.

2. We think people shouldn't have to choose between reliability and costs. WarpStream gives you the reliability and availability of running in three AZs but with the cost of one.

3. We have a custom metadata database running in our cloud control plane which handles ordering.


It sounds like there's a sweet spot here. If you are not ACKing Produce requests for 100ms then there's a huge amount latency. If the user want's to reduce that latency from 100ms to say 1ms then their S3 GET requests cost just went up by 100x.


[WarpStream co-founder here]

We've done lots of customer research here and, combined with the experience my co-founder and I have, we can confidently say most Kafka users (especially high-throughput users) would happily make a trade off of increased end-to-end latency in exchange for a massive cost reduction and the operational simplicity provided by WarpStream.


Replying here instead of below because we hit depth limit. WarpStream definitely isn’t magical, it makes a very real trade off around latency.

On the read side, the architecture is such that you’ll have to pay for 1 GET request for every 4 MiB of data produced for each availability zone you run in. If you do the math on this, it is much cheaper than manually replicating data across zones and paying for interzone networking.

RE:deletes. Deleting files in S3 is free, it can just be a bit annoying to do but the WarpStream agents manage that automatically. It’s creating files that is expensive, but the WarpStream storage engine is designed to minimize this.

I will do a future blog post on how we keep S3 GET costs minimal, it’s difficult to explain in a HN comment on mobile. Feel free to shoot us an email at founders@warpstreamlabs.com or join our slack if you care for a more in depth explanation later!


Very interesting trade-off! I was curious what you and Ryan were cooking post DDOG. "cost-effective serverless kafka" is a very interesting play. And congrats on the public announcement for "shipping Husky", finally. --Marc


It could be easy to operate when everything is fine but what's about incidents? If I understand correctly, there is a metadata database (BTW, is it multi-AZ as well?). But what if there is a data loss incident and some metadata was lost? Is it possible to recover from S3? If this is possible, then I guess that can't be very simple and should require a lot of time because S3 is not that easy to scan to find all the artefacts needed for recovery.

Also, this metadata database looks like a bottleneck. All writes and reads should go through it so it could be a point of failure. It's probably distributed and in this case it has its own complex failure modes and it has to be operated somehow.

Also, putting things from different partitions into one object is also something I'm not very keen about. You're introducing a lot of read amplification and S3 bills for egress. So if the object/file has data from 10 partitions and I only need 1, I'm paying for 10x more egress than I need to. The doc mentions fanout reads from multiple agents to satisfy a fetch request. I guess this is the price to pay for this. This is also affects the metadata database. If every object stores data from one partition the metadata can be easily partitioned. But if the object could have data from many partitions it's probably difficult to partition. One reason why Kafka/Redpanda/Pulsar scale very well is that the data and metadata can be easily partitioned and these systems do not have to handle as much metadata as I think WarpStream have to.


[WarpStream CTO here]

I'm not going to respond to your comment directly (we've already solved all the problems you've mentioned), but I thought I should mention for the sake of the other readers of this thread that you work for Redpanda which is a competitor of ours and didn't disclose that fact. Not a great look.

https://github.com/Lazin


I'm not asking anything on behalf of any company and just genuinely curious (and I don't think that we're competitors, both systems are designed for totally different niches). I'm working on tiered-storage implementation btw. Looks like the approach here is the total opposite of what everyone else is doing. I see some advantages but also disadvantages to this. Hence the question.


I don’t want to disagree with the research here, but what is not evident from the article is that this is not a magical solution that improves upon Kafka hands down, but rather a solution that addresses trade offs someone might be willing to entertain. I think on the query side things may be quite suboptimal in this setup if I understand it correctly. Correct me if I am wrong but if two agents write on a single topic, I would need to read two files to consume. Also I remember infamous stories about the cost of deleting data from S3, how do you tackle that if you have that many individual files? With these trade offs how does the solution compare to using Aurora?


Is it possible to have a 'knob' here? some topics might need low latency even if most don't. My sense, reading this, is that while most topics / use cases will be fine on Warpstream, that some will not be.


Yes kafka is definitely in an awkward latency spot.


won't that be a problem for high-traffic topics? Kafka latency is usually in single digit milliseconds. For a topic with high throughput, a typical java client instance can send thousands of messages per second. When the acknowledgement latency increases to 1000ms, then the producer client would need to have multiple threads to handle the blocking calls. Either producer will have to scale to multiple instances, or else risk crashing with out-of-memory errors.


(WarpStream cofounder)

Yeah you have to produce in parallel and use batching, but it works well in practice. We’ve tested it up to 1GiB/s in throughout without issue


Does WarpStream guarantee correct order inside partition only for acknowledged messages or also among the acknowledged messages (in different batches)? If so how do you keep clocks synchronized between the agents?


(WarpStream founder)

It guarantees correct ordering inside a partition for all acknowledged messages regardless of which batch they originated from. We don't synchronize clocks, the agents call out to our cloud metadata store which runs a per-cluster metadata store that assigns offsets to messages at commit time. Effectively "committing" data involves two steps:

1. Write a file to S3 2. "Commit" that file to the metadata store which will then assign the partitions at commit time 3. Return the assigned partitions to the client


Then the order is on batch level? Say batch1 is committed with a smaller timestamp than batch2, then all messages in batch1 are considered prior/earlier than any message in batch2?


It’s getting late and I’m not 100% confident I’m sure what you’re asking, but I believe the answer to your question is yes. When you produce a message/batch you get back offsets for each message you produced. If you produce again after receiving that acknowledgement, the next set of offsets you receive are guaranteed to be larger than any of the offsets you received from your previous request.


> We have a custom metadata database running in our cloud control plane which handles ordering.

This is the secret sauce, right there. Anybody can host a bunch of topics and artefacts on S3, and have them locally-replicated if they want. However, there is no world where you can have that without a synchronisation service that ensure cursors are uniques and properly ordered.


Flushing every 100ms means you would end up with lots of tiny files (bytes) in s3 unless you have something out of process re-writing them in to larger blobs similar to Delta lakes optomize?

The lots of tiny files would be really inefficient from throughput and api call perspective in blob storage.

With the acks, you have up to 100ms waiting for the buffer to fill, + s3 put request + your metadata request/response. For high throughput that must have very high latency putting back pressure on partitions?


Behind the scenes if they're sinking to S3 using Iceberg it handles compaction via it's maintenance API.


This is continuing the trend of cloud pricing driving system designs more than the underlying hardware. AWS overcharges for inter-AZ traffic between EC2 instances, but undercharges for inter-AZ traffic between EC2 instances and S3.

It makes perfect sense to design this way, and as your blog post mentions, people have made similar realizations for columnar databases, and map-reduce frameworks.


Related to 1. If I understood corrently the agent generates single object per each flushing interval containing all data accross all topics it has received. Does this mean that when reading the consumer needs to read multiple partition data simultaneously to access just single partition? How about scaling consumers horizontally how does WarpStream Agent handle horizontal partitioning of the stream from consuming side?


[WarpStream co-founder here]

That is correct about flushing. RE: consuming. The TLDR; is that the agents in an availability zone cluster with each other to form a distributed file cache such that no matter how many consumers you attach to a topic, you will almost never pay for more than 1 GET request per 4MiB of data, per zone. Basically when a consumer fetches a block of data for a single partition, that will trigger an "over read" of up to 4MiB of data that is then cached for subsequent requests. This cache is "smart" and will deduplicate all concurrent requests for the same 4MiB blocks across all agents within an AZ.

It's a bit difficult to explain succinctly in an HN comment, but hopefully that helps.


Is there a reason you built that cache layer yourself (rather than each node "just" running its own sidecar MinIO instance, that write-throughs to the origin object store?)


(WarpStream co-founder)

The cache is for reads, not writes. There is no cache for writes.

We built our own because it needed to behave in a very specific way to meet our cost/latency goals. Running a MinIO sidecar instance means that every agent would effectively have to download every file in its entirety which would not scale well and would be expensive. We also have a pretty hard and fast rule about keeping deploying WarpStream as simple as rolling out a single stateless binary.


Does Azure’s Append Blob support (and AWS’ lack thereof) provide any inherent performance advantages for Azure vs. AWS?


The way Kafka works, it naturally buffers & coalesces messages even before they get to the brokers, so yes, of course the messages are being coalesced.

There is no problem with "just" running a Kafka cluster in each AZ and only replicating data between AZs until it's time to pull it all together. It's just that when presented with a distributed system and AZs, engineers (and in fairness the business requirements) are more than likely to go with a multi-AZ solution. Same goes for regions. So the vast majority of Kafka clusters are multi-AZ but probably shouldn't be, and Kafka gets the bill for that, even though it shouldn't.

The Kafka protocol doesn't really preserve order-of-operation within a Kafka partition. It preserves the order of operations within a producer-partition pair (and even then, only if you configure it a certain way). The standard implementation does this by preserving the order-of-broker-receipt-of-messages from producers, but from an external system's vantage point, it really only means that (if configured the right way) messages with any given key, from any given producer, will be preserved in the order they are received.


The architect mentioned a 100ms (or 10 times per second) buffer / flush rate; presumably there's some windowing settings so I can flush at 10,000 records or after 1/10th of a second, and can chose 10 records or 1/100th of a second if I don't mind the bill.

Kafka's nice because there are a lot of knobs to adjust how you prioritize availability or durability or latency. Kafka's tedious because there are a lot of knobs... I was curious as to the nature of the knobs on this product.

I probably should have also asked "and by S3 do you mean S3 or any S3 like object store?" probably that's answered elsewhere.


[WarpStream CTO here]

WarpStream flushes after 4MiB of data or a configurable amount of time. Flushes can also happen concurrently.

In general, we'd prefer to not introduce many knobs. We're running a realistic throughput testing workload in our staging and production environments 24/7, so we've configured most of the knobs already to reasonable defaults.

We just added support for other S3-compatible storage systems today: https://docs.warpstream.com/warpstream/reference/use-the-age...


The impression I have is that they have deliberately removed a lot of the knobs, so I'd speculate that you can't tweak the buffer/flush rate. However, that's just my speculation.


I'm also especially interested in 3) - from the arch overview it sounds like all agents are actively writing and actively compacting, how do they coordinate which topic-partitions to compact? Is the Cloud Metadata Store essentially responsible for handing out the offsets?


(WarpStream founder here) Yes exactly. The cloud metadata store assigns offsets, does compaction planning/scheduling, handles service discovery, etc


I see - this also explains the tiny consumer group / topic-partition limit of the free plan, then...


[WarpStream co-founder here]

What do you think would be a good limit for the free plan?

This isn't actually an architectural constraint for us. We just didn't want to promise unlimited usage forever so we picked a somewhat arbitrary number to start with.


I'm probably the wrong person to ask, sorry! My interest is purely academic, both for data governance and technical reasons I'll probably never be able to use this. I also don't mean to imply they're unfair, free is free - just that it explains why things which are pretty cheap in Kafka proper trigger billing (vs. of course things that are expensive in Kafka now being really cheap).

I do think a lot of people using Kafka compaction topics as their source-of-truth are defaulting to 256-~1000 partitions, if you want to entice them I'd probably offer at least 1024. (But I suspect your target market is not using these - rather they're salivating over the possibility of the automatic format transcoding you mentioned in another comment...)


Re #1 I don’t think Amazon charges for data transfers from within AWS, assuming Kafka is hosted there.


There are per-API-call charges. It's just not per-byte.


(WarpStream founder) Exactly, and we designed WarpStream's storage system to minimize the number of object storage API calls it has to make while still maintaining relatively low latency (P99 of ~1s end-to-end)


S3 requests are paid regardless of source (tldr GETs are cheaper than PUTs). You're confused with data transfer within a VPC.


I'm Ryan Worl, co-founder and CTO of WarpStream. We're super excited to announce our Developer Preview of our Kafka protocol compatible streaming system built directly on top of S3 with no stateful disks/nodes to run, no rebalancing data, no ZooKeeper, and 5-10x cheaper because of no cross-AZ bandwidth charges.

If you have any questions about WarpStream, my co-founder (richieartoul) and I will be here to answer them.


Congrats! "The SQLite of Kafka" is an item from my side projects pile I'm happy to delete.

One reason I never built it is because it felt paradoxical that users might want a scaled down Kafka rather than using SQLite directly if the scale didn't matter. But you may find out that people enjoy the semantics of the Kafka protocol or are already using Kafka and have learned they don't have the scale they thought they did to warrant the complexity. Best of luck!


> it felt paradoxical that users might want a scaled down Kafka rather than using SQLite directly if the scale didn't matter.

I don't need to push very many messages (not enough to justify running Kafka), but each of the messages that I do push are both 1. very important and must be cross-AZ durable, and 2. very urgent and must not be blocked by e.g. contended writes in a regular RDBMS.

Currently, the winner of this use-case for IaaS customers is "whatever cloud-native message-queue service your IaaS offers." (And those customers would also be the extent of WarpStream's Total Addressable Market here, given that WarpStream's architecture fundamentally relies on having a highly-replicated managed object store available.)

I'm therefore curious: in what ways does WarpStream win vs. Amazon SQS / Google Cloud Pub/Sub / Azure Queue Storage?


I can't speak to GCP or Azure but the semantics of a log offer replayability whereas SQS does not.


Does it support S3-compatible services, notably Cloudflare R2? I heard that there might be special handling for each S3-compatible providers, due the slightly different API behavior and different consistency models, and etc.

If it support Cloudflare R2 then it would be great for multi-cloud too.


The blog post mentions that partitions are too low-level an abstraction to program against. Does that mean WarpStream doesn't use partitions?

Do you provide any ordering guarantees like Kafka does at the partition level?


(WarpStream founder) No WarpStream has partitions internally and provides the same ordering guarantees Kafka does at the partition level. We're just saying that we think for most streaming applications this is not a great programming model, and we think there is an opportunity to do something better (but we haven't done that yet).


Do you still provide low-level control over partition subscriptions and offset management? Any plan to support Kafka transactions?

That's all required to build exactly-once systems on top of Kafka (like the stateful stream processing engine I work on) even if it's not the easiest interface for normal application-level development.


[WarpStream CTO here]

WarpStream is Kafka protocol compatible, so we do support topic-partitions and consumer groups. We do not expose support for transactions or idempotent producing today, but the internals of the system support that and we will probably work on the idempotent producer sometime in the next month, with transactions coming shortly after, depending on demand from the Developer Preview users.


1. dont producers now have much higher latency since they have to wait for writes to s3.

2. if the '5-10x cheaper' is mostly due to cross AZ savings, isnt that offered by AWS MSK offering too?


(WarpStream founder)

1. Yeah, we mention at the end of the post the P99 produce latency is ~400ms. 2. MSK still charges you for networking to produce into the cluster and consumer out of it if follower fetch is not properly configured. Also, you still have to more or less manage a Kafka cluster (hot spotting, partition rebalancing, etc). In practice we think WarpStream will be much cheaper to use than MSK for almost all use-cases, and significantly easier to manage.


How does the cost compare if follower fetch is properly configured?


1. what payload size and flush interval is that latency measured against?


1. By payload size do you mean record size? They're ~1KiB 2. Flush interval was 100ms, the agent defaults to 250ms though I believe.


How do you replace ZooKeeper?


Kafka replaced ZooKeeper with Kafka itself already a few years ago https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A...


And first announced to be "production ready" in October 2022.


(WarpStream founder) WarpStream has a completely different architecture than Kafka: https://docs.warpstream.com/warpstream/background-informatio...

That said it does require a lot of metadata to orchestrate all the different concurrent operations over S3. We handle this with a custom metadata store that we run in our cloud control plane.


What was used to generate the diagrams in the post?


Do you have a reference documentation for S3 data layout?


(WarpStream founder here) Not currently. One of the things we're looking to do next is make it so any topic can be "automatically" turned into a standard format in S3, something like Parquet/Iceberg/Deltalake so its easier to consume for application that don't particularly care about the Kafka protocol.


That would be a killer feature. Once we switched to MSK back at a previous gig, our biggest care and feeding task was adding new event types to our kafka-to-redshift ETL thing. Well, that and dealing with scaling our http wrapper...


That would be awesome!


Kafka itself no longer requires zookeeper.


Well, one thing's for sure. Running Kafka on discrete VMs on a cloud provider "by the book" is ludicrously expensive.

I remember having a very simple discussion with quite a few customers about both Kafka and Hadoop that boiled down to this: Why replicate data at the VM/disk level when those disks are already provided as a fully redundant system? (in this case it was Azure storage, which provides locally, redundant, AZ-redundant or globally redundant storage, most of which are available to run managed disks upon).

This is why properly designed Hadoop/Kafka cloud managed services employ storage adapters to leverage the provider's baked in redundancy. And why some cloud providers have Kafka-compatible event brokers.

The rest of what WarpStream does is just icing on the cake (although I'm curious as to the internals and how they avoid inter-AZ charges).

(full disclosure: I work at Microsoft, but built Hadoop/Spark/Kafka clusters before joining nearly a decade ago.)


In the good ol' days, we used to implement redundant systems such that there was tie breaker process that used only a fraction of the resources of the true processes. Some Raft implementations allow for nodes that are voters but cannot be quorum leaders (example: the branch office where all traffic is funneled through an asymmetric VPN tunnel should never be elected leader, but knows which candidates it can and cannot see)

So the table stakes for running a cluster was not 3x as much hardware but closer to 2.2x, which is a huge deal for solutions in the small and developer sandboxes. It also matters when 3 shards don't quite cover your load but 5 is too many. Or 6 vs 7.

The problem is that with geographic replication, this doesn't fix either of the problems articulated as part of the thesis of this article:

1. Cloud economics – by design, Kafka’s replication strategy will rack up massive inter AZ bandwidth costs.

2. Operational overhead – running your own Kafka cluster literally requires a dedicated team and sophisticated custom tooling.

Still, we need this functionality back for cloud, particularly as the pendulum swings back to self hosting, which it always has in the past.


Or, you know, use ephemeral data storage with your brokers like the good folks intended...


> Why replicate data at the VM/disk level when those disks are already provided as a fully redundant system?

That's easy. EBS and similar solutions comes with the price. They're very expensive. Especially, when you need a lot of IOPs. You may be saving on cross-AZ traffic but you will pay ridiculous amount of money on storage. If you have replication you can use attached storage which is way cheaper.


Azure Managed Disks are inherently replicated. But there are other storage solutions (like redundant WebHDFS) you can take advantage of. At the time, the customers I was talking to wanted to have identical deployments in the cloud (to what they had on-premises, down to the number and size of disks).


> Why replicate data at the VM/disk level when those disks are already provided as a fully redundant system?

Azure disk replications are for the durability of the data, not the avalability of the data from a kafka perspective.


Many systems might just be using Kafka to drive async batch transaction processing (think: sending emails, charging credit cards), and therefore don't care at all about availability.


From a read perspective I would agree but from a write perspective if the partition is not available? (genuinely asking)


Think Unix pipelines: writes that can't immediately complete block the producer. (Probably with a bit of an in-memory buffer, but still.) Pile up enough un-ACKed messages to push, and the whole producer should stop consuming its own input end until the consumer on its output end comes back to start pumping messages again. The whole pipeline from the blocked stage back to the start receives backpressure and temporarily stalls out — which is fine, because, again, async batch processing.

And yes, this means that you need to have logic all the way back at the original sender (the one triggering the async message-send as part of some synchronous business-logic), to be able to refuse / abort / revert the entire high-level business-logic operation if the async-message-send's message-accept fails. (A user shouldn't be considered signed up if you can't remember to send them a verification email; a subscription should not be created if you can't remember to charge the card; etc.)

In est, you can think of this as "semi-async": each stage is doing a synchronous RPC call to an "accept and buffer this batch of async messages" endpoint on the broker — which might synchronously fail (if the broker is unavailable, or if the consumer of a bounded-size(!) queue has blocked to create backpressure and therefore the queue has filled and the broker has in turn stopped accepting to that queue.)

With such an API, rather than pretending that there's some magic reliable-delivery system you can "fire and forget" messages onto, these failures gets bubbled up to the caller on the send side, like any other failure of a synchronous RPC call.

Take this to its fullest extent, and you get Google's "effectively synchronous" RPC philosophy, where you have event brokers for routing and discoverability (think k8s Services), but async messages are always either queued in either the sender process's [bounded] outbox, or the recipient process's [bounded] inbox, with no need for a broker-side queue, because everything is designed with backpressure + graceful handling of potential accept failure in mind, including the initial clients knowing to retry pushing the initial message-send. (If you're familiar with the delivery semantics of Golang channels — it's basically that, but distributed rather than process-internal. There's a reason that particular language feature came out of a language designed at Google.)

---

Mind you, there's also the "truly async" batch-processing semantics — the kind ATMs have, where if even the initial client doing a synchronous operation (think: withdrawing cash) can't get in contact with the server/broker to push the async message-sends, then you just append the message to a big ol' local log file, and proceed as if the async sends already succeeded; and then later, when you come back online, you dump your whole built-up log of messages to the broker, and all events in the log are inherently accepted — but there are higher-level semantics that might generate additional revert events in response to some of them (i.e. if the ATM user overdrew their account), that get backfed into the system. But you, as the initial producer of messages, don't have to worry about collating those against your messages or anything like that.


Managed disks reside on redundant and high availability storage.


Dear richieartoul, your blog post is a tad oversalted.

Kafka doesn't inherently require dedicated teams of experts and millions of dollars until you're running very large clusters.

But fully agree that 3-AZ stretch clusters suck money through those inter-AZ xfer fees. Which is how AWS sells MSK, the inter-AZ xfer is "free"! That is, already priced in...


This looks great but reading the accidental SRE gave me two questions:

We’ve had bare metal for a very long time now and it seems that managing your own bare metal hasn’t become much easier. If it were very easy, we’d see more of these sort of things be managed by the end user. That being said how are you managing this service? A cloud provider or bare metal?

Both you and Ryan have much experience with foundationdb which generally is managed yourself. Speaking of which, did you go with that for your metadata store again? Why or why not?


(WarpStream co-founder)

WarpStream's current offering is a hybrid BYOC approach. The customer runs the agents in their cloud account, and we manage the metadata store for them remotely. This keeps all the customer data in their cloud account and their S3 bucket where we can't see it or touch it. It does mean that the customer has to run the WarpStream agents themselves, but they're just stateless containers that are pretty easy to manage.

We considered using FoundationDB for our metadata store, but ended up not in the end. In order to make our free tier cost effective we really need to make our metadata store as efficient as possible for this specific use-case which required something a bit more custom. That said FoundationDB is a fantastic piece of technology. Best distributed database I've ever used, and I've used many :)


> how many partitions do I need? Unclear, but better get it right because you can never change it!

That is.. simply wrong. You can change the number of partitions.

Plus I really don't get the "you need an entire team of engineers to operate Kafka" claim you keep repeating. That is simply not true, speaking from experience. It is certainly expensive to run, but doesn't require a lot of engineering hours in our team.


This is super interesting @richieartoul. I specced out something similar myself and was going to implement it in Zig https://github.com/fremantle-industries/transit.

FWIW I came to a similar conclusion that a lot of the power in Kafka comes from the API and that eventually much of the complexity of managing the cluster will eventually be abstracted away with multiple implementations. I also felt that if I could implement Kafka persistent over the S3 keyspace then I could start with persistence direct to S3 like you've done with warpstream and then layer on a faster hot disk and in memory tiering mechanism to eventually lower end to end latencies.

I love where you're going with this so hit me up on twitter if you ever want to chat more in-depth https://twitter.com/rupurt.


DMd you on twitter


We've built something probably very similar to this product at my previous job. We had double digit TB daily ML traffic that didn't require realtime latency, so we moved those all onto S3 and also saw approx ~90% cost savings. This was built on top of JVM and still used a 6 broker Kafka cluster to keep metadata (vs probably 300 when it was originally all on Kafka). Kafka's compute and storage model doesn't scale too well for the extreme use cases that can tolerate latency, and the Apache Pulsar model sort of worked better (though at that time Pulsar wasn't too stable for us to use in prod). One of the keys to cost efficiency for us was that the size of the data was large enough that we didn't need to wait too long before hitting an economic file size to upload. Trying to imagine how a pipeline with less than 10 MB/s would work with this efficiently.


(WarpStream co-founder)

Yeah we’ve run into a number of people who’ve rolled their own solution in this space. The “push pointers to S3 through traditional Kafka” approach is a very practical one.

Was this memq at Pinterest, or something else?


Yeah it is, sort of glad that it's actually known. I've since left the ingestion side of data infra so not very familiar of the landscape after one year.


The title of the article should be "Kafka is dead. Long live Warpstream." The long live part refers to the successor.


(WarpStream CTO here) This is a bit subtle I will admit, but we view the Kafka protocol as a successor here because it will outlive Kafka the implementation.


You're right! I've only heard it being used as a contradictory phrase though. https://en.wikipedia.org/wiki/The_king_is_dead,_long_live_th...!


As per your link, it's only "seemingly" contradictory - that is, it emphasizes that as soon as the (previous) king dies, the (new) king is automatically immediately in office to be wished a long life, without needing to wait for any coronation ceremony. This is the rule in many monarchies past and present, though not all. So, imagine that the (old) king dies at the end of the first clause and therefore the (new) king takes office at the beginning of the second.


Exciting work! Some questions:

1. Any plans on open-sourcing this?

2. Why not have a tiered architecture that can provide lower latency? p99 of 1s can be too high for some use-cases.

3. Related to 2, how does WarpStream compare to tiered storage in Pulsar?

[Edit 1] Added (3)


Yeah really kind of surprised the free tier isn't the self hosted kind. That will keep me looking.


(WarpStream Cofounder)

FWIW we’re considering a version where you can host the metadata yourself for enterprise users. For the free tier though we didn’t think it made sense since for a workload that could fit into our free tier, it didn’t seem like anyone would want to be responsible for the metadata layer themselves. Would love your feedback on that.


I'm also quite surprised there's no option for self hosting metadata on the free tier. To be fair, in my experience having managed metadata server for prefect (while they have similar Bring Your Own Server model) is quite hard to get right. But at the end we decided to keep maintaining it because our company prefer having all the data in our servers (including metadata).

But I'm still excited to try this, since at least now I could play around (and learn) with a partially kafka compatible system, without the burden of maintaining all of Kafka parts (and costs). Thanks!


It's refreshing to see cloud computing costs matter. Hopefully it'll cut down of needless cloud complexity and the proliferation of indirection in software abstractions of the last decade.


(WarpStream co-founder)

Cheers! We really want to drive the incremental cost (both in terms of $ and management overhead) per GiB down as low as possible so people can start using these systems for more use-cases.


I feel like people misuse the X is dead, long live X construction, but maybe I’m missing something.

I’ve always taken it to be a reference to the stability and near-programmatic nature of British royal succession. So while “the king is dead” in some places historically would be a disaster with conflict surely following, in Britain it automatically and instantly becomes “long live the king” for the next king and there is no panic.


Hi @richieartoul, does warpstream support the standard Kafka Admin API?

We build an admin console / dev tooling for Kafka (https://kpow.io) that supports Kafka 1.0+ including Redpanda due to their fairly strict adherence to those API.

Warpstream seems like a cool idea, I'd like to see what happens if we plug Kpow on top of it, if that's possible.


Our kafka protocol support is still incomplete. We’ve documented our progress here: https://docs.warpstream.com/warpstream/reference/kafka-proto...

That said we’d be stoked to get this working as an additional tool for people. Do you want to shoot me an email or join our slack so we can discuss further? I can probably prioritize whatever protocol features were missing to get it working.

founders@warpstreamlabs.com


I think the focus on the $/GB is wrong. The main goal of Kafka is providing certain processing guarantees and connecting consumers with producers of events.

What Kafka is not for is dealing with enormous number of events or enormous amount of data. Not that it is particularly slow at it (actually, quite fast) but I see a lot of people needlessly try to push more data and events than they need and then complain they have problems.

One easy trick to deal with enormous numbers of events and amounts of data is batching. Just batch some number of events into a single event so that a consumer can pick it up in one go. Removes a lot of cost of transferring the event.

Too much data to transfer? Don't be stupid and try to transfer those huge documents through the Kafka topic -- there is no need for this. Just upload them to S3 and pass through a reference. Or even better, take 10k events, extract data, zip it up, push to S3, and THEN pass a single event that describes 10k large events.


You’ve described the tricks WarpSream does in their implementation, buffering data until a large enough chunk can be written to S3.


I didn't know that. I worked with some pretty large scale projects and solutions like this are my daily bread and butter to the point where it seems obvious to me but I have no idea what is/isn't obvious to general population of developers.


This comment from 4 months ago helped me understand when you might want to use kafka: https://news.ycombinator.com/item?id=35160555

(The comment is by rad_gruchalski, who I see is also active in this thread. Thanks, rad_gruchalski!)


I wonder, will we have any solution for local development? Or we always will depend on closed control plane?


(WarpStream co-founder) We're thinking about making the `warpstream playground` command run an in-memory local version of control plane. Come chat with us in our Slack or Discord if you want to discuss more!


This article seems to be spreading some FUD, in particular this comment:

> Cloud economics – by design, Kafka’s replication strategy will rack up massive inter AZ bandwidth costs.

You're no more or less forced to put Kafka replicas in different AZs than you would be with an alternative.


Of course, as soon as an AZ goes down that had all of your Kafka nodes in it, you get eviscerated on HN for running such a setup.


3 AZ Kafka is the safest, but I've run several 2.5 clusters just fine. (cluster stretched across 2 AZs with a tiebreaker ZK in a 3rd AZ to maintain quorum in the event that you lose an AZ)


(WarpStream Founder) That's fair, but we think people shouldn't have to choose between multiple AZs and costs. All cloud providers recommend that critical workloads running in cloud environments should run in more than one AZ.


It's a neat idea for a SaaS to just host the control-plane/metadata and allow customers to write to their own data store via S3. I can't think of too many other systems work that way - it certainly is operationally simpler but I wonder if it also gets you some compliance wins.

I wonder if we'll eventually standardize on a cloud-agnostic S3-like protocol but for locking and fast transactional writes so we can just have companies building pure software again but using standardized Cloud instead of POSIX APIs.


Hey Richie, As you seem to support the Kafka + Kinesis(?) API: We have developed Kadeck (kadeck.com), a control plane and collaboration layer for Kafka & Kinesis teams. I have looked at the documentation and reckon we should be supporting a lot of Warpstream already. Is there any quick way to test this out? Would love to share this with our users!

Whether dead or alive, there's no denying that Kafka, or data streaming in general, is one of the most exciting fields today. Happy to welcome a new member!


(WarpStream Cofounder)

https://docs.WarpStream.com is the best document we have right now. This sounds interesting though, can you jump in our slack or shoot me an email at founders@warpstreamlabs.com ? Happy to support you any way we can!


I’m so happy to see that someone has finally built such thing. I’ve been asking for tiered storage in Kafka since 2016 (https://gruchalski.com/posts/2016-05-08-the-case-for-kafka-c...) but this approach is so much better.

Kafka itself could solve this with tiered storage OR, at least, allowing adding volumes at runtime. But neither is possible.

Very cool, have to find some time to take this for a test drive.


Just fyi, tiered storage is coming at some point in Kafka: https://cwiki.apache.org/confluence/plugins/servlet/mobile?c...


It’s been like that for a while.


The Uber devs are working hard at it.


I can't edit the above, but it's actually AirBnB.


(WarpStream co-founder)

Thanks for the kind words! I'll just add that tiered storage is not quite the same thing because it means you still have to manage local disks and replication carefully, however briefly. Please reach out with questions any time!


Indeed. The way I was thinking about it was to implement a different file system for it.


> WarpStream is 5-10x cheaper than Kafka

<sigh> I'm guessing they mean 1/10 to 1/5 the cost? "1x cheaper" would be free.


Okay, I hate to be that guy but I'm given two options for pricing

> Free but with only 24 Hours of Retention

or

> Contact us.

Pricing is hard for sure, but if I'm being honest neither of these options make me want to try the product.


(WarpStream co-founder)

I know that is super frustrating, I’m sorry. We’re just really early and unsure how to begin even discussing public pricing.

We are 100% committed to that free tier. We designed our entire architecture around making that free tier so cost effective that we could offer it for free.

We want to get to a point where we can have transparent pricing on our website, it will just take us some time to figure that out after speaking with initial customers. We would love to hear your feedback though. You won’t get handed off to a sales guy or anything, it’ll just be a direct conversation with me (CEO) and my cofounder Ryan (CTO)


So the idea is that S3 offers practically unlimited throughput, so we can certainly use S3 for a scalable queue? I wish the open-source community could have something competing with S3, but then S3 has made it really hard for companies to have a convincing justification to build their own alternative[1].

[1] There are a few alternatives, such as Apache Ozone, Ceph, and Good'ol HDFS, but none seems have an accelerating momentum for S3's workload.


The criticism I have through kafka is the "at-least-once" semantics.

How do you manage exactly-once semantics? If kafka performance is based on reading small batches of 50 messages, in case of crash of the consumer, some of them will be processed twice. Depending on your business logic this may be ok, or may be create a new problem that must be solved farther in the process by adding an external data store.


You can't have exactly-once in distributed system. Crashes happen, network or power goes down and your ack get's lost[0].

The best one can do is accept that and make your processing idempotent.

[0] The most interesting case I've witnessed was power going down because a hot air balloon crashed into power line.


Is WarpStream considering hiring Aphyr to do a Jepsen test?


WarpStream relies on a proprietary metadata store hosted within their internal network to operate, so it's pretty unlikely that Jepsen tests could cover that.

If you're ok with the externally hosted metadata stores as well as the high per-request latencies (p99 of 400ms, according to WarpStream), it's highly likely that things like liveness and safety properties are pretty far from your mind. So, I wouldn't bank on them submitting to a Jepsen test. :)


I think that WarpStream relying on a proprietary metadata store isn't an issue for Jepsen tests. If I understand correctly, Jepsen tests treat the distributed databases (or logs like Kafka) pretty much as a black box. Jepsen tests introduce partitions and look for missing/unexpected items against the items that were acknowledged as written successfully by the system.

If you look at Kyle's blog post, https://aphyr.com/posts/293-jepsen-kafka, there is no mention of looking into a broker's storage or any storage for that matter.


(WarpStream co-founder)

FWIW we subject WarpStream to continuous chaos/fault injection in our integration tests and staging environment to verify correctness and liveness properties. I wouldn't say they're far from our mind, we've just made a big trade off around latency that we think will make sense for a lot of people.


I've seen both your FoundationDB talks, curious if you are using FoundationDB under the hood or how much of the metadata store is homegrown?

Also, how did you implement the parsing of the Kafka protocol? From scratch or pieced together with open source Go bits? Having a nice programmable API for building Kafka-compatible servers could be a huge boon to the Kafka community (wink wink).


As someone who's been doing async-fanout physical-replication of Postgres instances for the longest time using pgBackRest's S3 repo support (the primary writes to S3; the replicas read from S3; the primary's uplink doesn't get saturated serving the writes) I've always wondered where the equivalent for message-queue systems was. Glad to see it :)


> requires teams to operate it

I mean, managed Kafka has been a thing for a long while now. I’ve used it plenty and haven’t had any issues…


How do Pulsar and Kafka compare these days?


Ok, intsead of providing commit log guarantees on their own they rely on S3.

I'm not sure if S3 can actually provide durable commit log guarantees.

There is timestamp-based last write wins policy for concurrent writes, so I'm not sure if this thingy can actually replace Kafka in all the usecases providing same guarantees.


(WarpStream co-founder) We use S3 for data replication, availability, and durability, but not for the log ordering guarantees. For that, we have our own custom metadata store that "imposes" ordering guarantees on top of the data that is written to S3.

We don't rely on any kind of timestamp-based last write wins policy for concurrent writes. In fact the agents will never write a file to S3 with the same name more than once or overwrite an existing file.


So, in fact it's not really stateless, you have your locking/ordering layer and you implement some consensus for that layer?


Its stateless from the customer's perspective since they don't have to manage the consensus layer (which is only possible because we separated data from metadata).

Obviously there is state somewhere since the cluster is storing data, its just offloaded to S3 + our control plane, while still keeping all the data in the customer's cloud account.


I think you should write some detailed docs about your design and consistency guarantees.


The object storage backend is interesting and I’m wondering if Kafka will have a native option for that, eventually.

Confluent hasn’t outright advertised it, but one of their engineers mentioned in a q&a that their Cloud offering is using object storage for their own backend now.


They always have.. you get it but you don't know it.


The blog post mentions "P99 of ~1s of producer-to-consumer latency". What about just producer latency i.e. message successfully received into the queue ready to be picked up? S3 writes seem be in the low 100s of ms so I assume that's part of the quoted end-to-end latency.


(WarpStream co-founder)

Yeah our P99 for producer latency is ~400ms right now.


Sounds like you're currently optimized for throughput, not latency. (I work with customers for whom 10ms of latency for a 128kB write is too high.) What are your plans to improve latency?


(WarpStream co-founder)

None currently, if you need latency that low WarpStream is not a good fit and you should probably stick with something more traditional like a very well tuned Kafka cluster.


Have you considered making a plain HTTPS frontend like Confluent's REST proxy? Or even just designating a bucket as 'incoming messages' and letting people use the S3 API to write messages?

Kafka is great but if you're targeting Amazon customers, you're competing with SQS too.


[WarpStream co-founder here]

Please sign up for our mailing list! We'll have some interesting things to announce related to this soon.


Will this only exist as a managed service (SaaS) or will there be an open source (or open core) version?


Oh man, I can't wait until these people get their first S3 bill for the number or GET requests.


[WarpStream co-founder here]

My co-founder and I worked at Datadog for over 3 years where we built Husky, an event storage and query system built directly on top of S3 as well. We know what we're doing here, I promise ;)

https://www.datadoghq.com/blog/engineering/introducing-husky...


"My co-founder and I worked at Datadog"

If there's one company that's known for blowing up budgets as you scale, it's Datadog. Hoping WarpStream doesn't follow that playbook.

Best wishes!


What happens if I pull the plug on a server that hasn't flushed to S3 in the configured time? Do I loose that event?


With an appropriately configured client (i.e. one that retries and waits for requests to be acknowledged), another Agent would receive the retry and the event would be written to the topic at that time.


(WarpStream co-founder) Check the end of our blog post, we brought receipts :)


I didn't check the site. I have nightmares debugging kafka from years ago and it's only huger now. No thank you, give me Apache MQ or RabbitMQ, even IBM's MQ and it's all good and don't try to tell me I need a streaming thingamajig.


What's to stop Confluent/Kafka from simply implementing a feature like this themselves?


their recent blog post doesn't sound that far off from the OP's architecture: https://www.confluent.io/blog/cloud-native-data-streaming-ka... (discussed: https://news.ycombinator.com/item?id=36870278 )


As a hobbyist I probably don’t need this but I’m curious. I haven’t used any of the S3-compatible data stores. Any ideas which would be easiest for messing around with?

Also, what databases are there built on S3? I think there is sqlite replication?


Haven’t seen it mentioned yet so I’ll just say we use Kinesis Data Streams with enhanced fanout at work and it seems very minimal in terms of operational overhead if it fits your needs.


(WarpStream co-founder) Yeah if you can fit your workload into it, it's pretty solid from an operational perspective since AWS manages the whole thing (except scaling up shards and stuff like that).

It gets pretty pricey though for large volumes of Data.


Is the plan to eventually open source this and provide paid support?


Pretty cool. I'm definitely going to check this out.


It'd be great if there was an rss feed for your blog


Care to summarize the 'contact us' pricing?


We're aiming for per-GB usage-based pricing that is significantly cheaper than the alternatives, but the BYOC model combined with our extremely efficient cloud control plane gives us a lot of flexibility here. That's why we can offer a free tier at all.

We're mostly just not sure yet, so your input would be appreciated.


I keep seeing these HN titles about technologies named for other concepts, thinking it's talking about those other contexts, without any indication it's about those obscure technologies. Then I'm disappointed its just about some random, badly named technology.

So, in honor of actual Zombie Kafka, here a short news blurb about Kafka international airport being Named most Alienating Airport: https://www.youtube.com/watch?v=gEyFH-a-XoQ


Seriously. I ordered Windows and all I got was a piece of cardboard with some random letters and numbers on it, and not a frame with glass.


I am the founder of RisingWave (http://risingwave.com/), an open-source SQL streaming database. I am happy to see the launch of Warpstream! I just reviewed the project and here's my personal opinion:

* Apache Kafka is undoubtedly the leading product in the streaming platform space. It offers a simple yet effective API that has become the golden standard. All streaming/messaging vendors need to adhere to Kafka protocol.

* The original Kafka only used local storage to store data, which can be extremely expensive if the data volume is large. That's why many people are advocating for the development of Kafka Tiered Storage (KIP-405: https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A...). To my best knowledge, there are at least five vendors selling Kafka or Kafka-compatible products with tiered storage support:

-- Confluent, which builds Kora, the 10X Kafka engine: https://www.confluent.io/10x-apache-kafka/;

-- Aiven, the open-source tiered storage Kafka (source code: https://github.com/Aiven-Open/tiered-storage-for-apache-kafk...

-- Redpanda Data, which cuts your TCO by 6X (https://redpanda.com/platform-tco);

-- DataStax, which commercializes Apache Pulsar (https://pulsar.apache.org/);

-- StreamNative, which commercializes Apache Pulsar (https://pulsar.apache.org/).

* WarpStream claims to be "built directly on top of S3," which I believe is a very aggressive approach that has the potential to drastically reduce costs, even compared to tiered storage. The potential tradeoff is system performance, especially in terms of latency. As new technology, WarpStream brings novelty, and definitely it also needs to convince users that the service is robust and reliable.

* BYOC (Bring Your Own Cloud) is becoming the default option. Most of the vendors listed above offer BYOC, where data is stored in customers' cloud accounts, addressing concerns about data privacy and security.

I believe WarpStream is new technology to this market, and and would encourage the team to publish some detailed numbers to confirm its performance and efficiency!


If you're still within the edit window for your comment, the ");" in them are attached to the URLs, making them problematic to click

ed

-- Confluent, which builds Kora, the 10X Kafka engine: https://www.confluent.io/10x-apache-kafka/

-- Aiven, the open-source tiered storage Kafka (source code: https://github.com/Aiven-Open/tiered-storage-for-apache-kafk...

-- Redpanda Data, which cuts your TCO by 6X https://redpanda.com/platform-tco

-- DataStax, which commercializes Apache Pulsar https://pulsar.apache.org/

-- StreamNative, which commercializes Apache Pulsar https://pulsar.apache.org/


Question : does this architecture have any difference regarding ordering or durability of messages, compared to kafka ?


(WarpStream co-founder)

Nope, we provide the same durability and ordering messages as Kafka, albeit with higher latency. We never acknowledge produce requests until they've been durably persisted in object storage (P99 ~400ms), so its the same guarantee as running a 3AZ Kafka cluster with fsync enabled.


Running Kafka is an exercise in sadomasochism. This on the other hand sounds pretty neat.


What? It’s pretty easy to set up and get going


The last time I was running a cluster it was 2014-2015 and it was not too bad to get up and running, but it felt so heavy handed for what it was. The zookeeper requirement was frustrating. The fact that consumers needed to maintain offset state was frustrating (although... understandable). For the right use case I suppose it is 100% worthwhile but more often than not I have seen it used in places where other tools would be more appropriate.


They've made Zookeeper optional now, although they're being understandably conservative about that (e.g. only declared it production-ready a year or two ago).


Getting a distributed set up plugged into your k8s/whatever framework can be a pain.


The issue there is running it on k8s. On bare metal or even VMs it’s not at all hard to run even at scale.


Well when all of your services are in k8s, and your service discovery is on k8s, etc, you tend to put things like Kafka into K8s.


Have you tried Strimzi?


I've heard good things about Red Panda[1], but I haven't tried it.

[1] https://redpanda.com/


company competing with kafka says kafka is dead, is that accurate?


Kafka the Apache open source project is kind of dead/legacy for Confluent.

Confluent re-wrote the backend of Kafka for their SASS product and use the public Kafka wire protocol for public integration https://www.confluent.io/blog/cloud-native-data-streaming-ka... .

I don’t think the Apache Kafka distribution is going away, Confluent will still have paying customers with support contracts wanting features and the protocol will likely still need to evolve but the open source version will likely be a burden on Confluent with their own private implementation so it’ll be interesting to see how things look in a few years.

https://www.confluent.io/blog/cloud-native-data-streaming-ka...


No


Can this be installed locally and run on Minio?


Will make this possible soon, oversight on our part that the agent doesn't expose a way to let you do that right now


I was hoping this would be about Franz.


Is there any source available for this?


Kafka's been dead almost 100 years.


Can this be used without S3?


(WarpStream co-founder)

It requires some form of object storage (S3, GCS, Azure blob storage, etc). You could use minio if you want to host the storage yourself, although I think I need to make a few changes to make that configurable in the agent.

For local development you can use an in-memory or local file storage, but not for "real" usage.


Okay - yes, it would make a lot of sense as a Kafka alternative to those of us who prefer to stay away from the public clouds.


Can I use minio?


(WarpStream co-founder)

That does not work currently because there is no way to configure the agent with a Minio URL. That's just an oversight on our part though, I'll get that working within a few days.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: