Hacker News new | comments | show | ask | jobs | submit login

Yes, a system is available if one node doesn't respond and you can contact another. But that node will be unable to guarantee consistency.

If a node you can contact is required to guarantee consistency, there will be some times that it will have to refuse your request because other nodes are not contactable.

The author's point was that in any distributed system there is a non-zero probability of a network failure. While both clients and server nodes can retry connections, there is a non-zero probability that the problem will persist longer than your "availability agreement" allows. In that case, you have a choice - return potentially inconstent data or refuse the request.

What you seem to be arguing is that the probabilities of failure - in particular of repeated failure - while non-zero, are effectively zero. The author would disagree (as they point out, the probabilities combine exponentially as the number of nodes increase.) I think he's right and that you are wrong.

Paxos is the quintessential example of a highly available, consistent system. It is available as long as more than half of the nodes are up and able to communicate with each other. It remains consistent, regardless of the failure pattern. You really do only have to worry about a true network partition. This isn't a probabilistic argument in any sense.

As you have yourself pointed out, Paxos will in some cases (when less than half the nodes are up) become unavailable, but remain consistent in all cases where it is available.

So it is tolerant of partition and it sacrifices availability in favour of consistency. So it is CP, not CA.

Of course it will become either unavailable or inconsistent (or both) during a network partition. That's the essence of the CAP theorem.

But what does it mean to tolerate a partition? As if the system has a choice?

Any CA system is claiming to be consistent and available as long as the network doesn't partition. That's the strongest statement you can make under the CAP theorem, and Paxos certainly falls in that camp.

My problem with the original article was that it claimed that any individual network or node failure was a partition affecting the consistency or availability of the system. Paxos is a clear counterexample to that, as it tolerates a lot more than that without sacrificing consistency or availability.

Once the network actually partitions (or half the nodes become unreachable), then you are correct. The CAP theorem comes into play again and we must sacrifice either C or A, and Paxos chooses A.

* it never becomes inconsistent (C) * it always returns either success or failure (A) * sufficiently severe partitions kill it dead (!P)

Or does any system subject to hardware or power failure fail to count as "available"?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact