
Slog: Cheating the low-latency vs. strict serializability tradeoff - evanweaver
http://dbmsmusings.blogspot.com/2019/10/introducing-slog-cheating-low-latency.html
======
taldo
How is Slog different (or better) than multiple asynchronously-replicated
regional deployments of Spanner? Or, even simpler, a single Spanner deployment
where different sets of groups have different replication configurations. That
is, in fact, a fairly common deployment of Spanner internally at Google, with
different sets of voting replicas for data that's most frequently accessed
from different places. As an example: Paxos groups that host European users'
data have a quorum of voting replicas in the EU, while groups that host US
users' data have a quorum of voting replicas in the US. Data can be explicitly
moved across these different sets of groups, e.g., in response to a user
changing the location of their accesses. All the normal location resolution
methods still work as in the simple case of uniform replication configurations
(see Directories and Placement section in the original OSDI 2012 paper).

~~~
teraflop
The paper [1] compares Slog against Spanner, both in theoretical terms and
using benchmarks. If I'm understanding the paper correctly, I think it makes
sense why Slog is better -- or at least, can do better in theory.

Both systems operate very similarly for _local_ transactions that only touch
data "owned" by a single master region; they just relay the transaction to be
executed by the master. For multi-region transactions, Spanner uses a
coordinator to perform two-phase commit, which acquires locks on all regions
before allowing the transaction to proceed.

Slog does something similar, but effectively _pipelines_ the locking to
achieve higher throughput. First there's a global coordination step that
globally-orders the transaction, without any locking (which means this step
can use batching for high throughput). Then, each region's master
independently acquires _local_ locks in that global order, and replicates
those locks as transactions so that replicas can deterministically apply them
in the same order. Finally, each replica independently executes the
transaction once it sees that all of the locks have been acquired. So a lock
blocks the _execution_ of conflicting transactions, but it doesn't block their
_replication_. Once the replication is done, the locking overhead of actually
executing the transactions should be comparable to a non-distributed DB.

All of this communication has a latency penalty, of course; there's no
avoiding that for a consistent distributed DB. But the point is that it
provides better throughput for transactions with conflicts. For transactions
that only touch one region, the latency is still just a single round-trip to
the master region, and that can be very fast if your client locality is high.

The benchmark results are heavily normalized, since it wasn't possible to do
an apples-to-apples comparison on the same replication topology. So they don't
demonstrate convincingly that Slog is _faster_ than Spanner, in numerical
terms. However, they do show that Spanner's throughput drops off much more
quickly with increasing contention, compared to Slog.

[1]:
[http://www.vldb.org/pvldb/vol12/p1747-ren.pdf](http://www.vldb.org/pvldb/vol12/p1747-ren.pdf)

------
jumpingmice
"Cheating" is doing all of the work here. They are sending transactions to
their "home location" which is worst case all the way on the other side of the
planet and then they are only counting the time to apply the transaction at
that location, instead of including the latency between the client and the
homed frontend. So it's low latency if you draw the system boundaries in such
a way but from the perspective of the client it's still high latency.

~~~
abadid
Latency is measured from the client. In the example in the post (and more
details in the paper), you see the latency tail from when clients access data
that is far from them. The challenge is to make multi-home transactions no
worse than regular Paxos latency. In previous systems, this required multiple
rounds of communication across the homes that are involved in that
transaction. In systems like PNUTS, they would disable such transactions
altogether. SLOG's ability to handle such transactions with latency no worse
than Paxos is a big step forward.

------
kcolford
It feels like they've snuck in an element of deterministic databases. This
would explain why they don't take the penalty of the 2-phase commit round-
trips. What's more interesting is how they've implemented this by just waiting
for sufficient replica updates to appear in the database until they're
confident in being strictly serializable. Of course, if those updates don't
make it because of a network partition then the transaction will hang until
it's over. Hopefully they really can count on there never being a network
partition. Then again, a network partition would halt all related multi-homed
transactions anyway so I guess it's a moot point.

~~~
abadid
Yes --- the post explicitly states that deterministic execution is a
prerequisite.

SLOG is CP from CAP, so indeed suffers from unavailability in the event of a
network partition.

------
karmakaze
CockroachDB has a different cheat where the voting nodes would "follow the
sun" as different regions would be the most highly active at different times
of a 24h period. This has then been generalized to "follow the work" where
each voting range keeps its voting nodes near the location of high activity.
I'd like to see how this benchmarks to Slog.

~~~
abadid
See what I wrote below regarding Spanner. The same thing applies to the
CockroachDB solution. If you run 2PC for multi-region transactions that is
very slow (increases latency), and prevents conflicting transactions from
running for longer periods of time (decreases throughput).

