

BookKeeper: High-availability scalable distributed logging - mad44
http://muratbuffalo.blogspot.com/2014/09/paper-summary-high-availability.html

======
kylequest
Kafka can be compared to HedWig, which uses BookKeeper internally to store
data.

HedWig is closer to the traditional message broker model where the broker (Hub
in case of HedWig) keeps track of the subscriptions and what's been consumed
so far. Kafka, on the other hand, uses a stateless broker model where the
consumers maintain the subscription state about what has been consumed.

HedWig Hubs keep track of all subscriptions and once all consumers "consume" a
given message Hubs delete the message. Kafka doesn't do that. It allows its
consumers to start all over again even if the messages have been consumed (as
long as the message is not too old).

HedWig is also slower because of its focus on high durability. Earlier
versions of Kafka didn't care about durability as much, so Kafka was much
faster.

HedWig is also design to work with a large number of topics and a few
consumers for those topics. Kafka can do a better job supporting a large
number of consumers (given its stateless broker design).

------
t1m
The post mentions Tango, which has a novel consistency algorithm, but doesn't
mention how BookKeeper differs from Kafka (also an Apache project). Can anyone
comment on the difference?

