More

chenyang · on Sept 14, 2024

Please checkout automq: https://github.com/AutoMQ/automq

AutoMQ is a cloud-first alternative to Kafka by decoupling durability to S3 and EBS. 10x cost-effective. Autoscale in seconds. Single-digit ms latency.

chenyang · on Aug 8, 2024

Hello, I am the author of this article, any questions are welcome about AutoMQ(https://github.com/AutoMQ/automq)

chenyang · on July 16, 2024

Just like warpstream or confluent freight cluster, but AutoMQ is source available, you may find the key s3 wal implementation here: https://github.com/AutoMQ/automq/blob/main/s3stream/src/main...

chenyang · on June 25, 2024

S3Stream is a shared streaming storage library that provides a unified interface for reading and writing streaming data to cloud object storage services like Amazon S3, Google Cloud Storage, and Azure Blob Storage. EBS is utilized here for its low-latency capabilities. It is designed to be used as the storage layer for distributed systems like Apache Kafka, Apache RocketMQ, etc.

nicholasjiang · on June 25, 2024

How about the performance of S3Stream compared with other stream?

chenyang · on June 25, 2024

Here is a benchmark report for your reference: https://docs.automq.com/docs/automq-opensource/IJLQwnVROiS5c...

Compared to Apache Kafka, here are some highlighted conclusions: * a 300-fold efficiency in partition reassignment. * a 200-fold improvement in cold read efficiency. * twice the throughput limit. * one-eleventh of the billing cost

chenyang · on April 30, 2024

In fact, EBS is entirely a cloud storage solution and operates as shared storage. It is not a local disk system.

chenyang · on April 30, 2024

In addition to the high cost of S3Express, utilizing warpstream to write three replicas to S3Express and later compacting them to S3Standard could result in quadruple network/outbound traffic costs. With two consumer groups involved, this could increase to six times the network/outbound traffic.

Considering a c5.4xlarge instance with 16 cores and 32GB of memory, which offers a baseline bandwidth of only 5Gib, it's limited to a maximum production throughput of 100MiB/s.

Therefore, I have reservations about the cost-effectiveness of your low-latency solution, given these potential expenses.

chenyang · on April 8, 2024

Can't agree more! S3 will be the modern data storage primitive. Also, the move towards shared storage and separating compute from storage is a key trend in cloud-native architecture, enhancing scalability and cost-efficiency.

chenyang · on April 8, 2024

Indeed, operating Kafka can be challenging and complex due to its nature as a stateful and distributed system.

However, AutoMQ has adopted a cloud-native architecture, offloading storage to EBS and S3(https://docs.automq.com/docs/automq-s3kafka/Q8fNwoCDGiBOV6k8...), eliminating the need for replication and rendering the Broker stateless, which simplifies operations significantly.

chenyang · on April 8, 2024

Hi, thank you for your interest in AutoMQ. If you have any questions regarding cost-related data, here is an analysis report for your reference: https://docs.automq.com/docs/automq-s3kafka/EJBvwM3dNic6uYkZ...

chenyang · on April 8, 2024

Yes, thank you for the clarification. AutoMQ has replaced the topic-partition storage with cloud-native S3Stream (https://github.com/AutoMQ/automq/tree/main/s3stream) library, thereby harnessing the benefits of cloud EBS and S3.

jpgvm · on April 8, 2024

One thing that isn't made clear is when writes are acknowledged.

Specifically is a write acknowledged when it's written to Delta WAL or when it's uploaded to object storage?

If writes are acknowledged when written to Delta WAL is it possible to lose acknowledged writes when an EBS volume becomes unavailable or does that whole partition become unwritable until the volume comes back? Or is Delta WAL itself replicated in a similar fashion to traditional Kafka storage?

chenyang · on April 8, 2024

You might be interested in our strategies for managing different types of failures:

- In case of an EC2 instance failure, we take advantage of EBS's ability to be attached to multiple instances(https://docs.aws.amazon.com/ebs/latest/userguide/ebs-volumes...). This allows us to quickly mount the EBS volume from the failed EC2 instance onto another Broker, facilitating a seamless failover process. - For failures that affect an entire availability zone, we utilize Regional EBS which is available in Azure and GCP: https://cloud.google.com/compute/docs/disks/regional-persist...

jpgvm · on April 8, 2024

Ok yeah, multi-attach was the magic I was looking for to handle failure of instances.

Thanks!

chenyang · on April 8, 2024

Yes, acknowledgments for writes occur once the data is committed to the EBS WAL, with each write operation bypassing the cache via Direct IO. Data is then asynchronously uploaded to S3.

Given that EBS already ensures various levels of data durability, AutoMQ does not replicate data. Addressing your last question regarding the scenario when an EBS volume becomes unavailable:

- AutoMQ maintains a minimal amount of data on EBS, for example, only 500MB, which can be easily cached in memory. If an EBS volume goes offline, we promptly upload all data to S3 and close all partitions on the affected broker. Subsequently, we redistribute the closed partitions to other brokers.