The paper has an evaluation for multi-data-center replication in Figure 12. We assume that the clients are web servers, so they are always close to one of the replicas, but not all of them. The result we found is basically that TAPIR performs better in the multi-data center case except when the leader is in the same data center as the client. So it depends on whether you can always guarantee that the leader is in the same data center as the client.
The abort rate continues to essentially track the latency needed for commit. So, TAPIR reduces the abort rate compared to OCC because it reduces the commit latency. At very high contention, locking is likely to make slightly more progress, but no systems with strong consistency will be able to provide high performance. If you are interested in some other ways to optimize for the high contention case, take a look at our work on Claret: http://homes.cs.washington.edu/~bholt/projects/claret.html
We also tested with high clock skew. The paper notes, "with a clock skew of 50 ms, we saw less than 1% TAPIR retries." Since the clients can use the retry timestamps to sync their clocks, it only adds an extra round-trip, so it still leaves TAPIR with the same latency as a conventional system, even in cases of extremely high clock skew.
The full paper is here: http://delivery.acm.org/10.1145/2820000/2815404/p263-zhang.p...
tcp itself is a reliable transport over unreliable media. So running tcp on top of tcp means running two sets of algorithms for reliability, ultimately doing more work than is needed. Running tcp over udp (where udp is unreliable) means you still get the reliability over the tcp overlay, but don't need to be worried about the udp layer since it can fail and the tcp overlay algorithms will fix up the data stream.
"Consistent replication" would be using a protocol like Paxos to have the replicas decide on a single order of operations.
Which is almost always the case. Except of course, if you build your data as a growing "collection of knowledge", where the order of facts doesn't matter. But this is cheating, since you're implicitly bolting an ordering-mechanism on top of the system in this case.
The other interesting conclusion is that there are other workloads like this. For example, it is possible to build a reliable and better performing lock server in this way as well (and there is an implementation in the github repo). So, you'd get something similar to Chubby, but where the latency to the server is only a single round-trip in the common case.
Oh, hell yeah! Now that's great stuff. Can't wait to see the next step done by this or another team: building an alternative to the F1 RDBMS that Google built on Spanner. Would give CochroachDB some competition.
I've just been watching the talk on this at https://www.youtube.com/watch?v=yE3eMxYJDiE. GoshawkDB has a very similar design wrt the messaging and replication design. In fact, in some places, it appears that GoshawkDB's design is a little simpler.
There are obviously many differences too: for example GoshawkDB runs transactions on the client, GoshawkDB uses Paxos Synod instead of 2PC, and GoshawkDB clients only connect to one server so there are 2 extra hops, but that's a constant so from a scaling pov, it should behave the same.
One of the biggest differences is GoshawkDB uses Vector Clocks (that can grow and shrink) rather than loosely synchronized clocks.
This TAPIR work does look great - I had no idea that it was ongoing. I'll read through the paper too, but it's great that GoshawkDB has so many design ideas in common.
So, I knew it was interesting and possibly great work but didn't have any time to look at it. I'll move it up the backlog a bit. Maybe at least give the papers and stuff a good skimming tonight. :)
Note: Parallel, ongoing, similar work going unnoticed is a huge problem in both IT and INFOSEC. I have a meme here about how we constantly reinvent the same solutions, make the same mistakes, or move at a slower pace due to lack of connections with similar work. I keep proposing something that's like a free and open combo of ACM or IEEE with members who are academics, FOSS coders, pro's... people that contribute at least in writing. Stash papers, code, and forums there. So, odds of serendipity increase. Thoughts?
Im about to leave for work but quick question. What I want to see is an open equivalent of Google's F1 RDBMS: the best one. Does yours already provide its attributes, is it more like Spanner jnstead, or what? Aside from CochroachDB, where is OSS on a F1 competitor?
I'm not sure if it's worth trying to compare anything to Spanner or F1 because unless I'm mistaken, no one outside of Google can use F1 or Spanner - they're not released. So who knows what the performance of these things actually is? There's no way to verify the claims made in any Google paper.
"I'm not sure if it's worth trying to compare anything to Spanner or F1 because unless I'm mistaken, no one outside of Google can use F1 or Spanner - they're not released. So who knows what the performance of these things actually is? There's no way to verify the claims made in any Google paper."
I think it's worthwhile for these reasons:
1. They describe it in enough detail in their papers for comparisons or maybe clones to be made.
2. That's led to competitors or open-source clones of tech before. Remember map reduce?
3. They've deployed it operationally.
4. Since when can Google not handle a backend tech it claims to be able to do? I thought their rep was solid on that stuff.
So, Google already as a strongly-consistent, scalable, multi-datacenter RDBMS with NoSQL-like performance. If good on enough workloads, that's the best thing I've heard in that category since NonStop SQL. The GPS thing, while brilliant hack, might be hard for adoption. An improvement area CochroachDB people are already targeting. A full clone or competitor could rock DB world as a great alternative to Oracle clusters or NonStop for mission-critical workloads where some delays were tolerable.
How does it compare with Redis, Aerospike, Tarantool, Couchbase/MemBase, Memacached, VoltDB, LevelDB, Kyoto Cabinet, Riak, Cassandra, RocksDB, LMDB, Neo4j, HBase, ArangoDB, Voldemont, FoundationDB?