Hacker News new | past | comments | ask | show | jobs | submit login

Kafka brokers handle both the computation (partition/topic management, sequencing, assignments, etc) and storage together. This coupling creates scaling and operational challenges which LogDevice removes by separating the layers. Storage nodes can be as simple as object stores (but optimized for appending files) and use multiple non-deterministic locations for a given piece of data to randomize placement. They read, write and recover data very quickly by working together in a mesh.

Meanwhile the compute layer becomes very lightweight and almost stateless, which is easy to scale. In LogDevice, the Sequencers are potential bottlenecks but generating a series of incrementing numbers is about the fastest thing you can do so it'll outpace any actual data ingest to a single log, while giving you a total order of all entries within that log. The numbers (LSNs) follow the Hi/Lo sequence pattern so if a Sequencer fails, another one takes its place with a greater "High" number, so it's guaranteed that all of its LSNs will be greater than the previous Sequencer as a result. This also provides a built-in buffer to still accept messages and assign the permanent LSNs to them after recovery in case a Sequencer fails.

Apache Pulsar is similar to LogDevice but goes further where brokers manage connections, routing and message acknowledgements while data is sent to a separate layer of Apache Bookkeeper nodes which store the data in append-optimized log files.




Interesting. Microsoft's Tango paper had some interesting things to say about sequencers/sequences as well.


Amazing! We had lots of operational issues because of the coupling you mentioned.

One question though: will Presto support querying from LogDevice directly? :)




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: