Hacker News new | past | comments | ask | show | jobs | submit login
Decentralized cluster membership in Rust (quickwit.io)
169 points by evanxg852000 on April 28, 2022 | hide | past | favorite | 19 comments



Here is some nice background reading on Swim, Serf and Raft by Hashicorp.

https://www.hashicorp.com/resources/everybody-talks-gossip-s...


Thanks a lot.


Hi,

It's a good read. I have a few questions/comments:

- given the description of the Rumor-mongering approach vs the Anti-entropy approach, it looks like the Anti-entropy approach:

-- has an important overhead in terms of network/messages sent (since nodes are always chatting even when there are no changes in the cluster).

-- is slower to propagate a change of cluster state to all the nodes. Does it mean that in case of a node failures/shutdown, the cluster will be instable for longer (since dead nodes will receive queries)?

- the article mention a "seed node" but doesn't define what this is.

- on a dynamic quickwit setup (that often downscales/upscales depending on the load), it seems that the size of the metadata in each node will keep increasing since the state of the dead nodes will be kept, unless the same node_unique_id can be reused after a downscale/upscale (but I don't know enough how kube works to see if it's the case).


This is correct yes. Rumor-mongering is more efficient both in term of detecting/propagating a node failure and in term of network overhead.

We stopped using SWIM because it is too hard to get right, and because scuttlebutt allows all nodes to expose a bunch of meta-information about themselves.

It took a lot of time and effort to Hashicorp to patch memberlist into what it is today.


- In term of message overhead there is not much difference. one details is that each node keep incrementing his heartbeat counter continuously, making it a state change that need to be propagated.

- slower here depends on the number of node selected to gossip with. if you choose a long gossip period yes, but in real system the gossip period is short enough to not notice it. So slower means compared to Rumor-mongering style. Not as in having an inconsistent cluster state.

-(Thanks) I will probably amend this as a note: A seed node can be any node in the cluster we know is reliable and will almost always be there; kind of like the ACE within your cluster, they can be many.

-Garbage collection :"... We mitigate this by periodically garbage collecting obsolete nodes."


And it's open-source, MIT License.

Crate: https://crates.io/crates/chitchat

Github: https://github.com/quickwit-oss/chitchat


The source itself is commercial AGPL v3.0. I'm guessing that this was an accident so I opened https://github.com/quickwit-oss/chitchat/issues/42 to clarify.


Indeed, thanks for spotting that. We are fixing this.


Awesome!!!


Is this related to [Secure] Scuttlebutt (SSB), or just using the same name?


Unfortunately, there exist two different distributed protocols named scuttlebutt. The one referenced here is a gossip algorithm, the one you mention is a social network protocol.

They appeal to different enough spheres and scuttlebutt is such a good name I can't fault whoever moved second on that one.


No it's not.


This looks great. I might be giving this a shot for a use case we have. My main concern is the docs.rs-generated documentation is hard to use. I don't exactly know how things fit together. I'm sure I could figure it out via tests, but more docs and usage examples would help a lot.

As for your search for SWIM in the Rust ecosystem: I found a pretty good (well documented and tested) crate: https://crates.io/crates/foca


Thanks a lot, you can always file a issue on https://github.com/quickwit-oss/chitchat. We can improve the documentation, add more example and mostly learn from your use case.


Is there a more active way to indicate that a node expects to die? Say, a shutdown handler that manually evicts itself from the cluster with a last, 'I'm dead' state. Would this be useful?


Usually that's modeled in the application layer. In scuttlebutt you'd store a higher-level state as one of the node-specific parameters. The gossip protocol doesn't need to know that you know the node knows it's about to die, though there are optimizations to be made there.

Despite what the sibling commenter claims, in a production system you absolutely need that distinction -- not in terms of the reliability of the low-level protocol, but in terms of the application features you want to expose. A node about to be removed needs a chance to drain active connections.


Then you have to differentiate between accidental and intentional death. Much more robust not to make that distinction.


Depending on the need, it can be implemented on client side as well as on chitchat. We actually thought of this feature as a way for a node to normaly exclude itself from the cluster (maybe for application update). We are still evaluating how necessary this is because a node crash or normal shutdown can just work. Future experience will surely tell us.





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: