sijieg's comments

sijieg · 2026-04-10T16:01:20 1775836880

Example: S3-Queue (in Rust!): https://github.com/lakestream-io/leaderless-log-protocol/tre...

sijieg · 2025-07-31T17:49:40 1753984180

Ursa is available across all major cloud providers (GCP, Azure, AWS). It also supports pluggable write ahead log storage. For latency relaxed workloads, we use object storage to get the cost down. So it works with AWS S3, GCP GCS, Azure Blob Store. For latency sensitive workloads, we use Apache BookKeeper which is a low-latency replicated log storage. This allows us to support workloads ranging from milliseconds to sub-seconds. You can tune it based on latency and cost requirements.

sijieg · 2025-07-31T17:43:52 1753983832

There seems to be a confusion here.

Pulsar has been widely adopted in many mission-critical business-facing systems like billing, payment, transaction processing, or used a unified platform that consolidate enterprises diverse streaming & messaging use cases. It has quite a lot of adoptions from F500 companies, hyperscalers, to startups.

Kafka is used for in data ingestion and streaming pipeline. Kafka protocol itself is great. However, the implementation has its own challenges.

Both Pulsar and Kafka are great open source projects and their protocols are designed for different use cases. We have seen many different companies use both technologies.

Ursa is the underlying streaming engine that we re-implemented to be leaderless and lakehouse-native so that we can better leverage the current cloud infrastructure and natively integrate with broader lakehouse ecosystem. It is the engine we used to support both in our product offerings.

sijieg · 2025-07-31T17:30:50 1753983050

There are a few things unlocked by Ursa:

1. It is leaderless by design. So there is no single lead broker you need to route the traffic. So you can eliminate majority of the inter-zone traffic.

2. It is lakehouse-native by design. It is not only just use object storage as the storage layer, but also use open table formats for storing data. So streaming data can be made available in open table formats (Iceberg or Delta) after ingestion. One example is the integration with S3 Tables: https://aws.amazon.com/blogs/storage/seamless-streaming-to-a... This would simplify the Kafka-to-Iceberg integration.

Kinrany · 2025-07-31T18:18:11 1753985891

They were asking about changes that enabled Ursa itself.

sijieg · 2025-07-31T17:25:59 1753982759

We also love Kafka as a protocol. However, the implementation can be evolved to adopt the current cloud infrastructure and and rethought based on the modern lakehouse paradigm. That was one of the reasons we created Ursa.

sijieg · 2025-07-31T17:23:26 1753982606

I am one of the co-founders of StreamNative.

Currently Ursa is only available in our cloud service. But we do plan to open-source the core soon. Stay tuned.

Imustaskforhelp · 2025-07-31T18:29:01 1753986541

Can't wait for you guys to open source this stuff. If I may ask, what's the license you guys are thinking of? Since I am interested in hoping to someday live as a developer while working on open source too but its a tough line b/w getting no sponsors with MIT license and being called non foss and being charged in HN for some crimes because you used some license like SSPL or some custom license.

The sad reality is that most people in open source want stuff for free and won't pay back and that sucks. So what are your thoughts on this? I am genuinely curious.

The second part as someone noted, in a comment of the parent comment that you are responding, that code is not the most important part here, how much do you agree with that statement? Since to me, If I can self host it using open source without using your cloud service but rather using amazon directly, I do think that might be cheaper than using the cloud service directly.

fnordian · 2025-08-01T11:24:07 1754047447

People are suspicious to being rug pulled. There have been many of those instances in the past, where companies advertised with FOSS, but didn’t mean it. A proper FOSS license and clear and assuring communication about the long term freedoms associated with the product are important.

sijieg · on Sept 16, 2015

Ah, just happened to see those two blog posts together in same place. Those are really good posts on explaining replication scheme of Apache BookKeeper.

One thing to add on Flavio's blog post. Readers of a log (ledger) agree on LastAddConfirmed (lac), which LAC could be thought of 'commit' message in most of consensus protocol. In replicated log, commit means making data visible for readers.

BookKeeper doesn't enforce 'commit' like what other consensus protocol does. Instead it exposes the core elements of a consensus protocol as primitives and let applications decide things such as when to commit, how often to commit. Readers could use API (readerLastConfirmed) to catch up to latest 'commit' data. Controlling when to commit is the way how DistributedLog uses BookKeeper to tune end-to-end latency for different types of workloads: for latency-sensitive workloads like database, it does aggressive commits, for analytics workload, it does periodical commits to get benefits (such as reducing bandwidth by compression) by grouping.