AutoMQ is a cloud-first alternative to Kafka by decoupling durability to S3 and EBS. 10x cost-effective. Autoscale in seconds. Single-digit ms latency.
S3Stream is a shared streaming storage library that provides a unified interface for reading and writing streaming data to cloud object storage services like Amazon S3, Google Cloud Storage, and Azure Blob Storage. EBS is utilized here for its low-latency capabilities. It is designed to be used as the storage layer for distributed systems like Apache Kafka, Apache RocketMQ, etc.
Compared to Apache Kafka, here are some highlighted conclusions:
* a 300-fold efficiency in partition reassignment.
* a 200-fold improvement in cold read efficiency.
* twice the throughput limit.
* one-eleventh of the billing cost
In addition to the high cost of S3Express, utilizing warpstream to write three replicas to S3Express and later compacting them to S3Standard could result in quadruple network/outbound traffic costs. With two consumer groups involved, this could increase to six times the network/outbound traffic.
Considering a c5.4xlarge instance with 16 cores and 32GB of memory, which offers a baseline bandwidth of only 5Gib, it's limited to a maximum production throughput of 100MiB/s.
Therefore, I have reservations about the cost-effectiveness of your low-latency solution, given these potential expenses.
Can't agree more! S3 will be the modern data storage primitive. Also, the move towards shared storage and separating compute from storage is a key trend in cloud-native architecture, enhancing scalability and cost-efficiency.
Indeed, operating Kafka can be challenging and complex due to its nature as a stateful and distributed system.
However, AutoMQ has adopted a cloud-native architecture, offloading storage to EBS and S3(https://docs.automq.com/docs/automq-s3kafka/Q8fNwoCDGiBOV6k8...), eliminating the need for replication and rendering the Broker stateless, which simplifies operations significantly.
Yes, thank you for the clarification. AutoMQ has replaced the topic-partition storage with cloud-native S3Stream (https://github.com/AutoMQ/automq/tree/main/s3stream) library, thereby harnessing the benefits of cloud EBS and S3.
One thing that isn't made clear is when writes are acknowledged.
Specifically is a write acknowledged when it's written to Delta WAL or when it's uploaded to object storage?
If writes are acknowledged when written to Delta WAL is it possible to lose acknowledged writes when an EBS volume becomes unavailable or does that whole partition become unwritable until the volume comes back? Or is Delta WAL itself replicated in a similar fashion to traditional Kafka storage?
Yes, acknowledgments for writes occur once the data is committed to the EBS WAL, with each write operation bypassing the cache via Direct IO. Data is then asynchronously uploaded to S3.
Given that EBS already ensures various levels of data durability, AutoMQ does not replicate data. Addressing your last question regarding the scenario when an EBS volume becomes unavailable:
- AutoMQ maintains a minimal amount of data on EBS, for example, only 500MB, which can be easily cached in memory. If an EBS volume goes offline, we promptly upload all data to S3 and close all partitions on the affected broker. Subsequently, we redistribute the closed partitions to other brokers.
AutoMQ is a cloud-first alternative to Kafka by decoupling durability to S3 and EBS. 10x cost-effective. Autoscale in seconds. Single-digit ms latency.