
Spanner: Google's Globally-Distributed Database - SriniK
http://research.google.com/archive/spanner.html
======
ChuckMcM
This was an interesting project at Google, it started when I was there, and it
was breaking things when I left. It is too bad that Ken Thompson didn't get at
least acknowledged for his role in making it happen.

I don't think it will be as influential as the original GFS was but its an
important piece of work that folks should study.

~~~
Locke1689
No, I think it's critical. I worked on one of the first services to ever use
Spanner when I was an intern. Lock-free read transactions is a game changer.
Short answer -- if your database system can't do lock-free reads, your
database is broken. That one feature allows one to do some incredible
performance optimizations.

~~~
linuxhansl
>if your database system can't do lock-free reads, your database is broken

Yep.

~~~
rdtsc
I know CouchDB doesn't do read locking. What are other ones out there?

~~~
coolestuk
Not doing read locking is not a game-changer.

Firebird doesn't do read locking. Neither does Lotus Notes. Both have been
around about 20 years.

~~~
Locke1689
Not doing read locking _alone_. Combine it with a planet-scale data storage
system...

------
linuxhansl
I work on HBase (the Apache version of BigTable). It makes me sad to see how
far ahead Google is compared to the rest of the world. :)

The notion of uncertain time is ingenious.

~~~
zaphar
I think that's more a factor of Google's scaling needs vs the rest of the
world. We needed to invent it first so we did.

~~~
state
That's a nice way to put it. That's exactly why these inventions are so
interesting: they seem give insight in to problems of another order of
magnitude.

------
lsb
Interestingly, the data storage seems similar to Rich Hickey's Datomic: "data
is versioned, and each version is automatically timestamped with its commit
time; old versions of data are subject to conﬁgurable garbage-collection
policies; and applications can read data at old timestamps."

~~~
tdg
That's exactly like BigTable[1]. It makes sense that they built on top of
that.

[1]
[http://static.googleusercontent.com/external_content/untrust...](http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-
osdi06.pdf)

~~~
akkartik
But you can mutate bigtable cells. Datomic seems dramatically different in
that respect.

~~~
Evbn
Can you? Or do some apps just always ask for the latest timestamped version
when they read?

~~~
akkartik
You _could_ but it's not enforced. In practice, teams at google seem to use
the time axis in myriad ways, and seldom like datomic.

Also, always reading the most recent timestamp doesn't use time like datomic
does. You aren't querying by time and so on.

------
Nitramp
I think the major contribution in this paper is how to do consistent snapshot
reads in a distributed system without a common reference clock, i.e. the use
of True Time.

Many databases use some sort of MVCC, but they operate on a single node or in
a closely connected cluster. This paper shows how to achieve the same
properties in a system spanning continents.

------
linuxhansl
Another observation that struck me when I read this (and after reading the
percolator and megastore papers) is how there is a convergence of the
"traditional" relational DB world and the "new NoSQL" world. Relational
Databases are becoming more scalable, partially with new technology, partially
by shedding features in some scenarios. And the NoSQL stores, are becoming
less so (it was really about "NoSQL" anyway, but that's a different story).
All of these stores have layers or features that bring closer to the
traditional SQL/relational model.

Spanner appears to strike a nice middle ground.

------
hellooo
Is spanner written in cc or java?

~~~
kaib
cc

------
moondowner
Another research publcation from Google that's more-than-worth reading.

These just pile up, I must find time and get my hands on them...

------
sudhirj
This looks like the High-Replicaiton datastore which is now the default in App
Engine - Paxos replication, a choice between strong and eventual consistency
and tablet sharding. Interesting that they've already built it and it's
available for everyone to use.

------
tete
Fun fact: Spanner means voyeur in German slang.

Anyway, looks like a very exciting project. One could come up with so many
applications.

------
kleiba
Interestingly, "Spanner" is German for "voyeur". Coming from Google it's
almost kind of ironic.

~~~
dmayle
Even more interesting, "Spanner" is English for "something that spans", as in
a database spanning the world.

Maybe it's a bit snarky, but I really don't see how you can read into
something like that. It reminds me of the following Jack Handy quote:

Maybe in order to understand mankind, we have to look at the word itself:
"Mankind". Basically, it's made up of two separate words - "mank" and "ind".
What do these words mean? It's a mystery, and that's why so is mankind. - Jack
Handy

~~~
huxley
A spanner is also British English for what North Americans call a wrench.

~~~
regularfry
Also, colloquially, for an idiot.

------
pwpwp
Transactions don't scale. They really need to use NoSQL.

~~~
Evbn
Did you read the first page? BigTable has no trnsanctions, and scales, but is
a pain for apps that need consistency. Spanner adds transactions for apps that
need it, at scale, charging a tax in the form of latency.

Using two different clock technogies per node (GPS and atomic!) and light
speed networking helps make this manageable.

Fault-tolerant _time_!

~~~
zaphar
We read about this at work at google a few months ago in a reading group.
(perks of the job) And we spent almost the entire time talking about the
timestamps. It is perhaps the most important piece of this paper. Fault
tolerant time is right.

~~~
tonyarkles
Yeah, I've been doing research in distributed systems and the timestamp part
of this paper is incredibly interesting to me. It's awesome that I might
actually get to cite something more recent than Lamport.

