Hacker Newsnew | past | comments | ask | show | jobs | submit | deniscoady's commentslogin

If Apache Foundation is where open source projects go to die ...

I can't think of a better place for longevity of open source projects than Apache (maybe I'm out of the loop?).

Compare it to the Linux Foundation where everything is a single commercial vendor sponsored project. At lease Apache requires independent governance and a diverse ecosystem before the project graduates.

Am I missing something with the Apache Foundation?


No, you're right. That's why I said it's a bit unfair to say that. But that's the meme.


I'm an employee at Redpanda.

> Redpanda recently introduced leader pinning, but this only benefits setups where producers are confined to a single AZ—not applicable to our multi-AZ benchmark.

Redpanda has leadership pinning (producers) and follower fetching (consumers). I suspect a significant amount of cost is improper shaping of traffic.

> Interzone traffic - replication: 10GB/s * $0.02/GB(in+out) * 3600 = $720

With follower fetching you shouldn't have cross-AZ charges on read, only on replication. In 15 seconds of looking at this piece I cut out $360/hour...no offense but this reeks of bad faith benchmarketing...


Disclaimer: I currently work for Redpanda.

It's just my polite way of saying it's safe enough for most use cases and that you're wrong.

Low volume data can be some of the most valuable data on the planet. Think SEC reporting (EDGAR), law changes (Federal Register), court judgements (PACER), new cybersecurity vulnerabilities (CVEs), etc. Missing one record can be detrimental if its the one record that matters.

Does everyone need durability by default? Probably not, but Redpanda users get it for free because there is a product philosophy of default-safe behavior that aligns with user expectations - most folks don't even know how this stuff works, why not protect them when possible?

The fsync thing is complete FUD by RedPanda.

You want durability? Pay the `fsync()` cost. Otherwise recognize that acknowledgement and durability are decoupled and that the data is sitting in unsafe volatile memory for a bit.

They later introduce write caching[1] and call it an innovation[2].

There are legitimate cases where customers don't care about durability and want the fastest possible system. We heard from these folks and responded with a feature they can selectively opt-in for that behavior _knowing the risks_. Again the idea is to be safer by default, and allow folks to opt-in to more risky behaviors.

those that are super concerned with safety usually run with an RF of 5 (e.g banks)

Going above RF=3 does not guarantee "more nines" since you need more independent server racks, independent power supplies or UPSs, etc, otherwise you're just pigeonholing yourself. This greatly drives up costs. Disks and durability is just cheaper and simpler. Worst case you pull the drives and pull the data off them, not fun and not easy, but possible unlike in-memory copies.

And you can configure Kafka to fsync as often as you want[3]

Absolutely! But nobody changes the default which is the issue - expectations of new users are not aligned with actual behavior. Same thing happened during the early MongoDB days. Either there needs to be better documentation/education to have people understand what the durability guarantees actually are, or change the defaults.


I agree that data can be valuable and even one record loss can be catastrophic.

I agree that there needs to be better documentation.

I just don't agree that losing 3 replicas each living in a different DC at once is a realistic concern. The ones that would truly be concerned about this issue would do one of two things - run RF>3 (yes, it costs more) or set up some disaster recovery strategy (e.g run in multiple regions, yes that costs more.)

Because truth be told - losing 3 AZs at once is a disaster. And even if you durably persisted to disk - all 3 disks may have become corrupt anyway.


If you are willing to accept multiverses then isn't this solved by the Anthropic principle where the speed of causality (or mass of the proton) is what it is, simply because it can sustain the evolution of eventual observers?

I don't think the numbers independently are valuable, but together the constants of physics are tuned to support life. To be honest, I dislike the Anthropic principle as a generalizable cop-out, but it nonetheless works.


Disclaimer: I work for Redpanda and formerly Cloudera.

I've worked with Apache Kafka at massive (50+ Gbps) scales. It's a proper nightmare. When it breaks – it breaks fast and violently.

But the problem is that Apache Kafka (and more modern Kafka-compatible alternatives like Redpanda < obligatory mention) solve a need for a durable streaming log that other systems cannot offer. The access patterns, requirements, use cases, ecosystem, etc, are different from those of traditional databases and require a proper streaming solution.

Streaming from a traditional database is kinda a solved problem. Why not just use a managed Kafka provider with a change data capture (CDC) capability if you don't want to deal with Kafka yourself? At least then you get to use all of the tools in the vibrant Kafka ecosystem.


Hey Denis, I haven’t run into you before. But hi, this is Luis, Ambar’s founder. Nice to meet a fellow data streamer.

When I started writing Ambar I thought streaming from a database was a solved problem. But in operational use cases where ordering and delivery guarantees are assumptions developers need, it isn’t a solved problem. The first version of Ambar was just Debezium under the hood, but guess what, it failed and failed hard. Like you described Kafka. Hence we built Ambar :)

FYI we’ve considered using Redpanda under the hood instead of Kafka, but didn’t dare make the jump yet.


Ah okay, so is Ambar more of a way to finally replace Debezium then?


Yes, for operational use cases. Eg event driven microservices communication. Keeping in mind we replace the sink as well, which allows us to do cool things such as https://ambar.cloud/blog/optimal-consumption-with-adaptive-l...

For analytics (eg copy your PG database to Snowflake), Debezium is still relevant.


Disclaimer, I work at Redpanda.

Redpanda Community Edition is licensed with the Redpanda Business Source License (BSL). The core features are free and source-available.

This license was inspired by MariaDB and CockroachDB and "for 99.999% of users, restrictions will not apply". The big restriction is that users "cannot provide Redpanda as a commercial streaming or queuing service to others" which is primarily to deter large cloud vendors from taking our work and impacting our ability to operate as a business.

Here are some links for more details on how we came to this decision and how we license all our products and features.

https://redpanda.com/blog/bsl-source-available-license

https://docs.redpanda.com/docs/get-started/licenses/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: