
Google Spanner's Most Surprising Revelation: NoSQL is Out and NewSQL is In - aespinoza
http://highscalability.com/blog/2012/9/24/google-spanners-most-surprising-revelation-nosql-is-out-and.html
======
nostrademons
It's really funny to watch tech journalists try to write about Google
infrastructure from the outside, based only on one paper...

Hell, it's usually really funny just to watch tech journalists try to write.

~~~
pron
High Scalability is a serious and informative blog, and I wouldn't dismiss it.
This isn't TechCrunch. Its focus is less on rigorous scientific analysis of a
narrow field, but a good, moderately deep, overview of anything to do with
data scaling. And it's doing a darn good job.

~~~
chc
In case you aren't aware, nostrademons is a Googler. He is pretty well-
qualified to dismiss it in this particular case.

~~~
ArbitraryLimits
Is he similarly well-qualified to dismiss an entire profession?

~~~
nostrademons
The second sentence is admittedly a cheap-shot and a bit overgeneral. I
thought about editing it out, but people had already responded to it and I
hate when people ninja-edit the part of a post that I'm responding to.

~~~
ArbitraryLimits
Well, thanks.

------
sigil
Buzzword headline aside, the Spanner paper is great and worth your time. As is
the BigTable paper, the Dremel paper, and the Paxos Made Live paper.

I read the Google whitepapers and wonder, is there anywhere else one can go to
work on real solutions to distributed systems problems? At smaller scales you
can cheat -- you don't need Paxos, you can get away with non-consensus-based
master / slave failover. You can play the odds with failure modes. At Google's
scale, you can't: behavior under normally unlikely failures matters,
probability matters, CAP matters.

~~~
espeed
Look at Titan (<http://thinkaurelius.github.com/titan>), a new distributed
OLTP graph database that has a storage layer that adds distributed
transactions to pluggable backends, such as HBase and Cassandra. It's by the
team the created Tinkerpop Blueprints and Gremlin, the graph traversal
language.

You can read more about it in Matthias's PhD dissertation
([http://www.knowledgefrominformation.com/category/publication...](http://www.knowledgefrominformation.com/category/publication/)).

Also see Calvin:

"Calvin can run 500,000 transactions per second on 100 EC2 instances in
Amazon’s US East (Virginia) data center, it can maintain strongly-consistent,
up-to-date 100-node replicas in Amazon’s Europe (Ireland) and US West
(California) data centers---at no cost to throughput."

"Calvin is designed to run alongside a non-transactional storage system,
transforming it into a shared-nothing (near-)linearly scalable database system
that provides high availability and full ACID transactions. These transactions
can potentially span multiple partitions spread across the shared-nothing
cluster. Calvin accomplishes this by providing a layer above the storage
system that handles the scheduling of distributed transactions, as well as
replication and network communication in the system. The key technical feature
that allows for scalability in the face of distributed transactions is a
deterministic locking mechanism that enables the elimination of distributed
commit protocols."

[http://cs.yale.edu/homes/thomson/publications/calvin-
sigmod1...](http://cs.yale.edu/homes/thomson/publications/calvin-sigmod12.pdf)

And Omid (<https://github.com/yahoo/omid>) is another somewhat similar system,
but it only works with HBase. Here's a comparison from the Omid team:
[https://groups.google.com/d/msg/omid-
project/BTue2jAH1iQ/ZP3...](https://groups.google.com/d/msg/omid-
project/BTue2jAH1iQ/ZP3wIDFyREsJ)

------
cgs1019
This article comes across as really cynical and entirely lacking in the kind
of rigor and detail I have previously found on highscalability. Spanner is
really mind-blowingly cool tech. I thought this article was much more
informative and worth the time to read:
<http://news.ycombinator.com/item?id=4562546>

~~~
efuquen
"Another complicating factor is that as Masters of Disk it’s not surprising
Google ..."

Masters of Disk? He seriously that that was a good line? One of the many
things that annoyed me about that post. Thanks for the links.

~~~
josephcooney
"He seriously that that was a good line?"

Ironic.

~~~
apu
Muphry's law

------
Dave_Rosenthal
Disclaimer: I'm a co-founder of a database company (FoundationDB) building a
scalable, ACID database.

I couldn't agree more with main quote that they pulled from the paper,
expressing the difficulty of (even great) programmers having to "code around
the lack of transactions." Ease of development is one of the biggest benefits
of transactions.

However, another huge benefit that didn't get much play in the article is the
freedom that transactions afford you to build abstractions and other data
models on top of whatever you are given. In our product's case, a low-level
ordered K/V store is used for a storage layer and several different data
models are exposed on top (see <http://foundationdb.com/#layers>).

I think the future of databases has a diversity of data models and query
languages (including SQL, document, K/V, columnar, etc.). I also think the
future of databases is ACID. It seems like more and more of the NoSQL early
adopters (and creators) are coming to the same conclusion.

~~~
dkhenry
I wonder how you see the publishing of the Spanner paper effects your product?
Does this give validity to your product seeing as there is a production system
with similar features to yours already in use, or is there a risk Google might
provide this system as a service to other companies possibly competing with
you.

~~~
Dave_Rosenthal
Well, it doesn't seem like it's their ambition, but there's no doubt that if
Google released all of their in-house data tools, including Spanner, that it
would drastically change the market.

We love to see this stuff, though. When we started building our product over
three years ago, the idea of a distributed, ACID database was sort of laughed
at. (I think the CAP theorem scared a lot of people off from building really
useful products.) FoundationDB isn't the same as Spanner, but they share some
of the same goals. We see that as a huge validation.

Thanks for the question.

~~~
akldfgj
What's not their ambition?

<http://cloud.google.com/products> /compute-engine.html /cloud-storage.html
/big-query.html /more-products.html

------
nolok
NewSQL ? Seriously ? Do we really need another low-quality buzzword for people
to re-use everywhere ?

~~~
mwexler
I don't know, I think it's helpful. It changes the conversation from being
about the access language to being focused on how a technology processes data
at scale. Well, almost...

Notice that even with the shift from "NoSQL" to "NewSQL", mentions of joins
continues to be conspicuously absent from many of these discussions. So, it's
worth noting that many of these NewSQL things are "almost but not quite SQL",
hence the value of a new word.

~~~
bcoates
I don't think Spanner is one of these fake-relational databases, it's an ACID
datastore that's used as underlying storage for the real RDBMS F1.

From the F1 paper (<http://research.google.com/pubs/pub38125.html>) It looks
like it's mostly intended as an improvement to sharded MySQL, by putting the
sharding where it belongs, down at the physical storage level, instead of up
at the client access level.

Maybe a better buzzword would be NoMySQL (NOracle?)

------
realrocker
Imagine an amateur programmer walking into this whole debacle.

~~~
wmf
Isn't that called Hacker News?

~~~
realrocker
I guess I should clarify my statement. But I won't.

------
aklein
Here is a google engineer giving a keynote on Spanner:

<http://vimeo.com/43759726>

------
neovive
With some of the best programming minds in the world, it's interesting to see
that Google finds it more efficient for programmers to solve the performance
issues vs reimplementing core db functionality, such as transactions.

I like the direction of moving closer to original database theory concepts and
allowing the creative energy to focus on solutions to performance problems at
high-scale.

~~~
dmix
> best programming minds

They probably have a lot of great minds at the top of Google but there's still
going to be a significant amount of average ones, just like in any
company/structure.

------
kyt
"Maybe this time Open Source efforts should focus elsewhere, innovating rather
than following Google?"

There's a ton of innovative projects in the open source community, but it's
difficult to convince people to use them. Developing a clone of a Google tech
has a built in marketing advantage: "Google uses something like this."

------
cageface
So the correct and responsible thing for Google to do now would be to patent
the shit out of this and then sue back into the stone age anybody that
implements anything even vaguely similar, right?

I mean, the social fabric depends on companies protecting their innovations,
right?

------
zaidf
I am in favor of _any_ label that replaces "NoSQL" with something else. NoSQL
always rubbed me the wrong way. Besides not being a huge fan of the "No", it
really says nothing about what it _is_ (if it isn't SQL).

------
exabrial
So... transactions are cool again? Thank goodness!

~~~
jbigelow76
Transactions have always been cool, it's just that business that don't need to
them have always been cooler in the in the world of tech journalism so they
don't get much mention. If something goes wrong at Instagram nobody really
cares if you they don't roll back posting the picture of your pastrami
sandwich just because the #lolcatz tag got applied by mistake.

~~~
rbranson
I'd love to have actual distributed transactions that could scale indefinitely
and not create availability issues. We actually get a steady stream of user
complaints about inconsistencies between counter caches and what appears in
results. Worse is the inconsistencies that can happen between graph edges that
you want to partition in two different manners (eg. following vs followers).

~~~
jaylevitt
Likewise, consistency's more important in facebook-y scenarios than you'd
think. The canonical example I've heard is that "defriend my boss" MUST always
be seen before "post 'I'm quitting!'"

------
peterwwillis
So is Spanner basically just a distributed mostly-relational database? It
seems like to find the record you want you just search the indices of nodes
like a B-tree and in a couple queries are left at the node with the records
you want. The downside is all that lost write performance but since it's
synchronous you can probably mitigate some of the rebalancing by having many
indices of indices, so only a couple nodes have to actually rebalance anything
on a write.

~~~
zaphar
Don't mistake Transactional with Relational. Just because a database has
transactions doesn't automatically mean it is Relational too.

------
mjs
"A complicating factor for an Open Source effort is that Spanner includes the
use of GPS and Atomic clock hardware." (!)

Can anyone explain why such an accurate clock is helpful? I can see that it's
needed if you create a document on the East Coast at about the same time as
you create a document on the West Coast, and you absolutely need to know which
was created first, but for most applications can't you just go with whatever
time the system that got the insert thinks it is??

~~~
jpollock
Distributed state machines are all about deciding in what order things
happened in. That A happened before B. If you had a clock that everyone agreed
on, much of the problems would go away. The problem is that we don't have one
that everyone can agree on. Protocols like Paxos allow us to guarantee an
order will be agreed, but it can be slow. They seem to be using the clocks
with an error bound to perform a first pass ordering to see if a collision is
possible at all, or if they can skip a few steps. Caveat: I don't fully
understand that part of the paper yet...

------
gbog
> few organizations need to support transactions on a global scale

That doesn't ring true to me. My previous job was in the field of services to
administrations, nothing very uncommon, and we needed global transactions very
badly. I suppose any bank, insurance and even plane ticket trader would love
to have global transactions.

------
mooneater
Why does Google so often give details on their production infrastructure? I
would think they could get a bigger lead by keeping quiet.

------
novaleaf
a bit late, but bigtable has transactions... we use it a lot (but limits you
to 5 objects at a time)

------
philthom
Yeah Silver Bullets Are Always Plated

------
effinjames
i like the way elasticsearch use json to express query (query-DSL)

~~~
effinjames
what's with the downvote? do i really care? the thing is, json could be use as
a replacement for sql, a safer sql.

