The design has been heavily influenced by these traits: multi-tenancy, millions of topics, strong-durability, geo-replication, low publish latency (even when reading at max IO capacity), maintaining consumer position and, finally, operability (we need to replace machines, shift traffic and increase capacity seamlessly, without any user impact).
Here is a quick comparison of Kafka and Pulsar:
- Kafka is a complete streaming platform vs a messaging system which is what Pulsar is. Through Kafka Connect (http://www.confluent.io/blog/announcing-kafka-connect-buildi...), it has support for connectors to stream data between various sources and systems. Through Kafka Streams (http://www.confluent.io/blog/introducing-kafka-streams-strea...), it has support to do stream processing and transformations over Kafka topics.
- Broad adoption base: Kafka is very widely adopted across thousands of companies worldwide. https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
- Tunable durability and consistency knobs on the producer: The Kafka producer API allows the application to either wait until a message is fully committed across all replicas or just the leader. This allows applications to make the right tradeoffs for throughput vs durability. One size does not fit all.
- Performance and efficiency: Kafka supports zero-copy consumption allowing the consumers to read large amounts of data at high throughput. To the extent that I understand, Pulsar with its legder-broker model does not support zero-copy consumption.
- A lot of the reasons quoted for creating Pulsar are features that exist in Kafka and are used in production:
-- Kafka has multi-tenancy support through user-defined quotas (See this http://www.confluent.io/blog/sharing-is-caring-multi-tenancy...)
-- Kafka has support for authentication, authorization, user-defined ACLs (See this http://www.confluent.io/blog/apache-kafka-security-authoriza...)
-- Kafka has support for geo replication. In fact, that is the most common use case for Kafka in several companies. (See this https://engineering.linkedin.com/kafka/running-kafka-scale)
-- Latency: The end-to-end latency from publish to consume can be very low in Kafka (<10ms).
- Support for millions of topics: To the extent that I understand, both Pulsar and Kafka use ZooKeeper for metadata management. That is the main bottleneck for supporting a large number of topics and likely the same tradeoffs apply to both Kafka and Pulsar as a result.
- Storage model: The length of a partition in BookKeeper and
hence in Pulsar is not bounded by the capacity of a server. So you have the ability to add servers to accommodate a workload spike.
This is merely a quick overview. There might be more aspects of this comparison that I'm missing.