Hacker News new | comments | ask | show | jobs | submit login

> I'm not denying that Dynamo features more consistent availability, but it does so at either cost of temporal consistency or a much larger amount of resources.

The trouble is that your article outlines problems with pretty much all quorum systems, including multi-Paxos: the setup where there's an elected leader -- elected by first round of Paxos -- which then performs subsequent writes using a single-round-trip second-round.

Incidentally, that is very close to what you've proposed, except you've chosen to not perform quorum reads. That is perfectly fine, but there are also hidden costs -- not only do you need to have a leader election, but you also need synchronization barriers. Before a node can handle writes or serve latest reads, it must also back-fill its transaction up to the last entry (which it does by reading from peers).

So while you are saving on read traffic (online reads only go to the master), you are now decreasing availability (contrary to your stated goal), and increasing system complexity.

You also do hurt performance by requiring all writes and reads to be serialized through a single node: unless you plan to have a leader election whenever the node fails to meet a read SLA (which is going to result a disaster -- I am speaking from personal experience), you will have to accept that you're bottlenecked by a single node. With a Dynamo-style quorum (for either reads or writes), a single straggler will not reduce whole-cluster latency.

The core point of Dynamo is low latency, availability and handling of all kinds of partitions: whether clean partitions (long term single node failures), transient failures (garbage collection pauses, slow disks, network blips, etc...), or even more complex dependent failures.

The reality, of course, is that availability is neither the sole, nor the principal concern of every system. It's perfect fine to trade off availability for other goals -- you just need to be aware of that trade off.

You may want to note evolution of Google's services: BigTable -> MegaStore -> Spanner. Essentially, they've started out with a non-HA system (BigTable), found that every team has began implement HA on their own, and then added quorum-based protocols (Paxos), finally adding an optimization (TrueTime) to reduce read-latency.




One quick point - AFAIK Megastore uses quorum based across data centers but uses BigTable within a data center - so master based. Is that not true? If it is then your point does need to be qualified - cross center DC failure models are (I assume) different from those within a DC.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: