Hacker News new | comments | show | ask | jobs | submit login
SDPaxos: Building efficient semi-decentralized geo-replicated state machines (muratbuffalo.blogspot.com)
56 points by mattdemon 9 days ago | hide | past | web | favorite | 13 comments

After going through the article, I'm not seeing anything that supports or justifies the phrase "decentralized" or even "semi-decentralized". The terminology should be "distributed".

Claiming this is semi-decentralized is confusing, and seemingly wanting to borrow from the recent success of decentralized systems (like IPFS, ours, GUN, and others) without being honest: There system is distributed, not decentralized. In the same way "Serverless" totally requires using servers.

Otherwise, very good article.

I'm not sure the distinction that you're making between "distributed" and "decentralized" is commonly accepted in the research community (or broader technology community). In this context, the authors appear to be using "decentralized" to contrast with the "centralized" nature of leader-based state machine replication protocols.

anyone familiar with the algorithm can help answer a few question?

1- Why C-instance can come from any node without a paxos phase 1a Prepare message, is it because each node (R0 to R4) have its own distinct replication log for C-instance?

2-When sequencer receive a C-Accept why is it safe to assume this value was successfully accepted by other replica without receiving a paxos phase3 Commit?

3-If replicating large value, are the value only sent in C-instance message and not in O-instance messages?

1) Yes.

1& 2) In fact the C-instance messages do not conflict with each other and gets accepted immediately. These messages do not even need a ballotnum, but the ballotnum used is that of the O-instance to denote sort of an epoch of which sequencer the sender thinks is still in-charge.

3) If replication messages are large, you can just order the "commands" referring/pointing to them via Paxos, and not necessarily the data itself.

for 3) I mean O-instance is the commmand itself replicated or just the node id ?

Yes, for O-instance it could be as small as the command id.

Curious, why Paxos over Raft?

Note that there's 3-10 variants of Paxos depending on who you ask and how many research papers you've read.

Paxos has different performance characteristics, different implementations, and more maturity.

There is no simple answer to your question unless you make it more specific.

Considering we’re talking about distributed state-machines, I am curious why you would choose Paxos, a more complex algorithm over Raft. Raft, to my knowledge provides the same guarantees as Paxos, and is simpler to understand and implement.

I am no expert on Paxos - just hoping for an explanation.

Paxos itself and not something built on top of Paxos like multi Paxos is in fact super simple compared to raft.

Where it get complicated is trying to build a practical replicated log system using Paxos. Raft just happen to clearly and completely define this use case.

What SDPaxos or EPaxos try to achieve is good performance over WAN. Something that Raft and any Paxos variant that rely on a stable leader are very bad at.

This can’t be easily added to raft because the main reason the raft algorithm is simpler is because it assume a stable leader in every operation except leader election.

AFAIK, Raft is a form of Paxos, spec-ed out to do log replication.

Yes and we’re talking about distributed state machines, right? A raft append-only log usually contains state machine commands.

Actually raft is just a simplified version of multi-paxos that minimize state space needed.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact