* There is a native SQL interface in Spanner, rather than relying on a separate upper-layer SQL layer, a la F1 
* Spanner is no longer on top of Bigtable! Instead, the storage engine seems to be a heavily modified Bigtable with a column-oriented file format
* Data is resharded frequently and concurrently with other operations -- the shard layout is abstracted away from the query plan using the "distributed union" operator
* Possible explanation for why Spanner doesn't support SQL DML writes: writes are required to be the last step of a transaction, and there is currently no support for reading uncommitted writes (this is in contrast to F1, which does support DML)
* Spanner supports full-text search (!)
My understanding from 2012 paper was that it doesn't support nested transactions, even at the storage layer.
Can anybody provide insider-knowledge if it was even a requested feature from Google devs internally?
> Overcoming this limitation requires supporting reading the uncommitted results within an active transaction. While we have seen comparatively little demand for this feature internally, supporting such semantics improves compatibility with other SQL systems and their ecosystems and is on our long-term radar.
No internal demand, but on "long-term radar".
For me the fascinating thing is looking at the list of authors to recognize so many from the 2005-2012 Microsoft SQL Server team. Folk I know personally as exceptional performers. Same when I look at Aurora papers. I see this as the result of Ballmer's famous HR initiatives and the massive brain drain that occurred at Microsoft around 2010-ish.
Add to this the lack of vision and direction, catastrophic acquisitions, dismal flagship product releases. At the time there were running jokes about the Inbox filling up with "After 15 years, is time to send that email" subject lines...
This engineer was clearly a cheerleader for the product, so I'm dubious as to whether that will actually happen, but it's clear that they have quite a bit of confidence in it.
> we made the service available externally to all Azure Developers in the form of Azure DocumentDB. Azure Cosmos DB is the next big leap in the evolution of DocumentDB and we are now making it available for you to use. As a part of this release of Azure Cosmos DB, DocumentDB customers (with their data) are automatically Azure Cosmos DB customers.
From my understanding, DocumentDB ≈ Dynamo, while Cosmos DB would be closer to Cloud Spanner.
Google's internal data will be stored on / migrated to Spanner, I expect sooner rather than later. We'll be using Spanner whenever we'll be using Google. Furthermore, it's likely we'll use Spanner whenever we use a planet scale system built on GCP, say Spotify.
As a developer I can't bring myself to code against something I can't install on my laptop. And as enterprises would go, they won't use any data store that can't run their financials/hr/ldap/sharepoint thing.
So, who uses them and for what?
Cloud spanner, being fairly new and unusual (SQL, but no INSERT/UPDATE), though, doesn't yet have a big name customer. Jda.com and quizlet.com were their reference customers.
Innovation is being kept from scrutiny hidden behind closed doors. The kind of thing patents were meant to prevent back when the system wasn't broken.
Google is one of the better players in this regard, at least telling the world what they're up to. Try to figure out how something like Amazon's systems work and you'll run into a deafening wall of silence.
Funny that we're so willing to trust these "clouds" when we know next to nothing about their internal workings. I don't think the honeymoon will last forever. Somebody will eventually abuse their position and within a few years everyone will be "on prem" again.
More or less agreed, but you'd probably have to concede that the existence of papers like this one and some of Google's other Spanner publications are admirable; they're being more open about the system's design than they have to be.
Of course Google knows that the system's secret sauce is not the concept itself, but the cost of its implementation, the infrastructural harness to support it, and the resources to reliably operate it. Even with a rough understanding on how Spanner works, it's still going to be difficult to ever migrate off it for practical reasons alone — who else is going to be able to build and run an alternative?
I have huge respect for what the Spanner team is doing, but this is a reason that Citus  is also very interesting to me right now. You could conceivably start out with nothing but Postgres and migrate into a Citus cluster when (and if) you need to.
If a point comes where you realize that you need out, you could either (1) see if you can scale back down to simple Postgres, (2) host your own cluster on your own infrastructure with the Citus source code, or (3) migrate onto your own Postgres sharding scheme à la Instagram. At no point do you lock yourself into custom GRPC APIs which are going to be ~impossible to get off of.
GCP and AWS provide hugely useful foundational IaaS, but they're incentivized to move beyond that layer and provide more custom solutions that (1) provide better margins, and (2) lock you into their services. As people and companies building on top of these clouds, we should be looking for whatever opportunities we can to keep our stacks generic so that AWS <-> GCP <-> Azure migrations are possible, even if a last resort.
To Google's credit, they are actually moving to a more open cloud environment. They started with Google App Engine, which definitely had lock-in with it's own custom API. But now they are pushing container services and kubernetes, which are really easy to move to other clouds or run on your own servers.
I wonder what would happen if the open source community built a viable alternative to the cloud IaaS. Like OpenStack but not a failure :). OpenFlow has shown promise and could form the core for an open IaaS. Network virtualization is the hardest part.
I would think that the hardware itself is the hardest part. Companies move to the cloud because it reduces their internal OPs team, and you can scale up and down hardware extremely easily.
VMWare comes close to offering the same thing on premesis but Oracle is too obsessed with wringing money from it to let it reach it's full potential.
If my memory serves, it's the second paper they've ever published (the other one being the famous Dynamo paper).
I think overall this has been a good thing. Had they open-sourced, we'd all be using their stuff, but there wouldn't have been the many competing open source big data projects. The fact that the rest of the world was forced to reimplement what google did created a beehive of open source technologies which is going places google never did.
It looks to me recent history contradicts this point. Amazon is responsible for the Dynamo paper, a mandatory thing to know if you're going to deal with open source NoSQL DBs. All the big players have released how they work internally, giving us Storm and Heron, Luigi, LevelDB, and so many others. Surely you have heard of Kafka ? Coming from a big one as well.
I seriously can't understand where you get this from. The amount of internals that is shared is really interesting.
If you want something open source, cockroachdb is the closest right now.
Google's inter-dc traffic flows entirely on private links rather than on the public internet, which is very hard for any other company to match on a global scale.
But you don't need an atomic clock to get Spanner's guarantees. CSACs are more convenient in terms of setup and requirements—but GPS clock-sources will do just fine for Spanner's 6ms quantum. You can buy a commercial-off-the-shelf GPS NTP appliance (https://www.microsemi.com/products/timing-synchronization-sy...) and run signal lines from it to all your machines; or cobble a similar solution together yourself, in true HAM fashion, using a GPS antenna + a Linux box + a UHF Software-Defined Radio card (i.e. a TV-tuner card) + GPSd + NTPd. (Or or you could buy cheap USB GPS receivers and hook them up to each and every server in your DC—but you'd need a very thin ceiling for that to work.)
Amusingly, the practicality of that last approach also means that you could run Spanner just fine on a cluster of Android phones. :)
I always hear this from CockroachDB folks and fans, but no details. What are the downsides?
As per my understanding, if a server/replica could not catch up wrt time (i.e. goes out of sync), it can be identified and usually be marked as a temporary-replica-failure -- which is the maximum extent that Hybrid Logical Clocks can help us with. Rest of the system has to deal with consequences: A new replica must take it's place to keep-up the fault-tolerance-level and start making-a-new-copy -- which is in the order-of the amount of data stored in the out-of-sync replica. I assume this puts lot different design requirements on the storage-engine; also, making-a-copy and cancel-copying operations would eat network traffic, thus throughput.
IIRC Google Spanner uses atomic clocks only on few severs per data-center because cross data-center latencies are much higher and erratic (over internet). So CockroadDB has much higher rate of temporary-failures due to out of time-sync and associated downsides. It would be helpful if CockroachDB guys could shed some light on this.
> > But you don't need an atomic clock to get Spanner's guarantees.
This comment continues "...but GPS clock-sources will do just fine for Spanner's 6ms quantum". Providing Spanner's guarantees with reasonable performance requires specialized hardware, but there are more options for that specialized hardware than just atomic clocks.
Note that Spanner itself uses both atomic and GPS time sources according to Google's publications; when we talk about "atomic clocks" we're usually talking about the entire category of specialized time-keeping hardware instead of distinguishing atomic clocks from GPS clocks.
> I always hear this from CockroachDB folks and fans, but no details. What are the downsides?
As we describe in our blog post (https://www.cockroachlabs.com/blog/living-without-atomic-clo...), CockroachDB on commodity hardware provides a slightly weaker consistency model than Spanner (serializable instead of linearizable), and latency is sometimes higher as we need to account for the larger clock offsets in certain situations.
If you do have a high-quality time source available, we have an experimental option to use a Spanner-like linearizable mode.
The pertinent part was: "Spanner always waits on writes for a short interval, whereas CockroachDB sometimes waits on reads for a longer interval. How long is that interval? Well it depends on how clocks on CockroachDB nodes are being synchronized. Using NTP, it’s likely to be up to 250ms. Not great, but the kind of transaction that would restart for the full interval would have to read constantly updated values across many nodes. In practice, these kinds of use cases exist but are the exception."
This means you only need a few time receivers for a datacenter, along with careful monitoring and implementation of your time distribution, but that happens through your Ethernet switches.
But you need your datacenters to stay synced even if you lose GPS. You can use an atomic clock for this (Cs or Rb). But, for the rest of us, a good GPS-disciplined double-oven crystal oscillator (XO) can get you within the range needed for Spanner, IIRC spanner's time sync requirements. For example, this little one: https://www.microsemi.com/document-portal/doc_download/13341...
will do +- 7 microseconds / 24h holdover. ("holdover" == operating when it's lost GPS).