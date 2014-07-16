Hacker News new | comments | show | ask | jobs | submit login
Introducing Cloud Spanner – A Global Database Service (googleblog.com)
469 points by wwilson 3 hours ago | hide | past | web | 184 comments | favorite





Congratulations to the Spanner team for becoming part of the Google public cloud!

And for those wondering, this is why Oracle wants billions of dollars from Google for "Java Copyright Infringement" because the only growth market for Oracle right now is their hosted database service, and whoops Google has a better one now.

It will be interesting if Amazon and Microsoft choose to compete with Google on this service. If we get to the point where you have databases, compute, storage, and connectivity services from those three at equal scale, well that would be a lot of choice for the developers!

> It will be interesting if Amazon and Microsoft choose to compete with Google on this service. If we get to the point where you have databases, compute, storage, and connectivity services from those three at equal scale, well that would be a lot of choice for the developers!

There are also plenty of choices evolving for developers who aren't looking for hosted solutions (which can sometimes be a showstopper for enterprise on-prem deployments). There's a growing ecosystem of distributed open-source databases to look out for too.

Take Citus, for instance – a Postgres-compatible distributed store which automatically parallelizes normal SQL queries across machines. It's as easy to set up as adding an extension, and people are doing some staggering things in prod with it.

Different audience from BigQuery and Spanner, but no less exciting.

Disclaimer: no professional association, but love their product and the team.

Craig from Citus here. Thanks for the kind words. We've seen a lot of people scale-out transactional workloads with Citus as well. In particular, we've seen a lot of multi-tenant apps that need to keep scaling beyond a single node when they're running into memory or compute issues.

If you are looking for something that is more Postgres flavored (meaning we're just an extension to it so you get all the good stuff of Postgres such as JSONB, PostGIS, etc.) then we hope we'd be a good fit. And we run a managed service on top of AWS as well (https://www.citusdata.com/product/cloud) built by the team that built Heroku Postgres. If curious on pricing you can find it at https://www.citusdata.com/pricing/

It would be very interesting to have your product at the 200-300$ pricepoint. Currently, the lowest tier starts at almost 2000$ per month for the high availability version.

I'm not trying to compare on a per-mb level, but it would be nice for smaller scale workloads.

Helpful feedback, we do have a development plan for $99, but it's not really intended for production workloads. If you only have 10 GB of data we'd heavily recommend going with something like RDS or Heroku Postgres. At that amount of data single node Postgres works great.

As someone who works (in part) in the MS SQL field, is it irrational to be a bit worried about the effects some of these platform advances might have one one's career?

For example, being a MSSQL performance tuning expert requires years of experience and probably pays very well, but just the other day I read an anecdotal story where someone switched a large BI database to use columnar indexes, allowing them to replace very complex (extreme manual tuning to achieve acceptable performance) queries with just standard SQL with comparable performance.

How long until the scale, pricing, and now transparent & full(?) sql compliance offered by these cloud platforms starts to make traditional RDBMS platforms a niche platform?

Microsoft has a history of sales and support that will allow them a certain longevity. They also have less "brand hate" than Oracle. I dont think MSSQL is going to be like Sybase any time soon, but I probably wouldn't focus on that stack starting now if you are into the startup or california scene. For many places in the USA, MS is the way to go.

EDIT: Also, most DB users don't need global-scale databases.

People are abusing databases like MSSQL to do things they may not be not good at. Large scale analytics is an example where databases like Infobright give amazing performance.

I write software as a developer. This is how I earn my livelihood.

Four years ago, I determined that while development work might seem to be near the top of the food chain, there will at some point where my work will be replaced by AIs.

This is not so different from how word processors replaced the specialist job of typesetters. Word processors make "good enough" typesetting. You can still find typesetters practicing their craft; the rest of us use word processors and don't even think about it.

At the time, I was learning to put the Buddhist ideals of emptiness and impermanence to practice, and to become more emotionally aware: the _main_ reason I had thought I would never be replaced by an AI writing software has more to do with wishful thinking and attachment than any clear-sighted look at this.

I also made a decision to work on the technologies to accelerate this. Rather than becoming intoxicated by the worry, anxiety, and existential anguish, I decided try to face it. Fears are inherently irrational, but just because they are irrational does not mean it is not what you are experiencing. Fears are not so easily banished by labeling them as irrational. Denial is a form of willful ignorance.

Now, having said all that, whether our tech base will come to that, who can say?

Since then, I have been tracking things like:

Viv - a chat assistant that can write it's own queries

DeepMind's demonstration of creating a Turing-complete machine with deep learning using a memory module.

I watched a tech enthusiast write a chat bot. He does not write software professionally. Talking with him over the months when he tinkers with in his spare time, I realized that in the future, you won't have as many software engineers writing code; you would learn how to _train_ AIs when they become sufficiently accessible to the masses. Skills in coaching, negotiation, and management becomes more important then some of the fundamental skills supporting software engineering. And like typesetting, I can see development work being pushed down the eco-ladder.

It's not surprising to me to see that Wired article about how coding becoming blue collar work. And even that will eventually be pushed down even further.

It's not surprising to me about Google's site-reliability engineering book, branding, and approach. I have done system admin work in the past, and I can already see traditional, manual sysadmin work being replaced.

It's easy to get nihilistic about this, but that isn't my point here either. I know the human potential is incredible, but I think we have to let go of our self-serving narratives first.

It'll be interesting to see how well customers will adopt this. When I was at one of the two companies you mentioned above, we tried adding global snapshots (a la TrueTime, which is the real innovation in Spanner not the clocks) and demoed it to our DBAs/MVPs. They didn't understand what on-earth was going on. Wanted something that worked with existing clients. We just gave them classic 2PC and they went home happy. I think that's the reason why Oracle will keep on chugging. There just aren't that many workloads that need this sort of scale. It is real cool technology though and we always used to wonder why Google wasn't offer Spanner as a service.

>well that would be a lot of choice for the developers!

A sad choice though. The centralization of computation is likely not a good thing in the long run.

> The centralization of computation is likely not a good thing in the long run.

I agree. It only makes sense if you need special data for statistics, AI training, etc.

In all other cases the classic way of programming on pc and notebook is smarter. If you do everything in the cloud, what if you lose Internet connection? I had that experience several times over the last years.

The absolute level of computation available isn't changing at the consumer level. What's happening for the next decade is the destruction of businesses hosting their own IT infrastructure and moving it to a couple of core centers.

So, the computational "Gini index" is increasing, but no one is being thrown into computational poverty.

The parent comment doesn't seem to specify "consumer level" and the loss of businesses having their own infrastructure is equally troubling. Everyone is putting a lot of eggs in a very small number of baskets.

reply


I would disagree about the character of the situation. This isn't about people putting eggs in a few baskets, it's that it's more efficient to have centralized chicken coops instead of every family in the world owning their own chickens.

Now, you could play with that analogy further and see some issues as well, but I don't think the issue here is centralized failure; all these data centers/"clouds" are at least good. The Cloud is about businesses focusing on core business and not supporting functions.

[Disclosure, I work on the Google Cloud team, I'm biased]

Monocultures are efficient, but not healthy ecosystems in the long term.

> the only growth market for Oracle right now is their hosted database service

This is not true. Oracle is far more than a database company nowadays in the same way that Microsoft is more than Windows. Oracle has been acquiring high-growth startups at a significant rate.

reply


I was going by this document: http://s1.q4cdn.com/289076952/files/doc_financials/quarterly...

Which has their 'cloud services' doubling their contribution to revenue year over year and licenses losing 50% of their contribution to revenue year over year.

There 'cloud' collateral is pretty opaque though.

Yes, the "cloud" includes all of their non-database offerings too which have been the focus of recent growth/acquisitions.

As a bit of a veteran in the database industry, I concur (at least about the impact on Oracle's database business). There is a lot of pent-up demand for anything that offers distributed consistency.

We've been seeing this demand at Fauna. FaunaDB offers distributed consistency, based on Raft and the Calvin protocol instead of depending on specific networking and clock hardware. We've seen a big part of our appeal is the ability to run FaunaDB across multiple cloud services.

What is the monetisation plan? Purely SAAS with on-premise or an open source version with support like postgres/mysql?

Amazon's Aurora databases seem to be solving the same problem, and are MySQL or Postgres compatible to boot.

reply


Aurora is a 'better MySQL mousetrap', IMO.

This is a globally-available, nearly-CAP-beating datastore that powers one of the biggest websites on the internet.

It's not quite apples and oranges, but this is definitely a different problem they are solving.

That's vague. AWS also powers huge websites and Amazon is recommending Aurora as the "default choice" for most workloads.[1] There are certainly significant architectural differences but I would say we can definitely make a direct practical comparison.

[1] http://www.computerworld.com/article/2953299/cloud-computing...

Aurora is very cool but won't help you much after you vertically scale your master and still need more write capacity. With Cloud Spanner you get horizontal write scalability out of the box. Critical difference.

reply


So if I'm understanding you, with Aurora all writes go to one master and you're constrained by the biggest instance AWS offers. Is that right?

Do you have a sense of what that limit is?

There's a pretty big price difference between Spanner and Aurora at the entry level so it's useful to explore this.

Yes, Aurora has a single write master, though it does have automatic write failover -- i.e. if the Aurora primary dies, one of your read replicas is promoted to the primary and reads/writes are directed to the new instance. That does constrain your primary's capabilities to the largest instance size (currently a db.r3.8xlarge).

I don't have a good idea what the upper limit is for an Aurora database setup.

reply


While Aurora doesn't provide true horizontal scalability, the same-node scalability seems so strong it might allow many companies to stay single-node for quite a while.

For example, see this benchmark:

http://2ndwatch.com/wp-content/uploads/2016/09/Graph-3.jpg

from this article:

http://2ndwatch.com/blog/benchmarking-amazon-aurora/

Thoughts?

Aurora's other zone replicas are read-only. Probably no atomic clocks and GPS for time synchronization.

To be fair, Spanner's cross-region service is coming "later 2017".

reply


It is not close to equivalent. But I do want to get a better feel for if Google really has figured how to do basically the impossible. I want to see if this truly scales horizontally but of it does then competitors better hope for a much more detailed paper :)

reply


The team here at Quizlet did a lot of performance testing on Spanner with one of our MySQL workloads to see if it's an option for us. Here are the test results: https://quizlet.com/blog/quizlet-cloud-spanner

reply


What's the SQL and wire compatibility level? MySQL?

EDIT: Found quite a bit of my answers in your linked article:

> Cloud Spanner uses a SQL dialect which matches the ANSI SQL:2011 standard with some extensions for Spanner-specific features. This is a SQL standard simpler than that used in non-distributed databases such as vanilla MySQL, but still supports the relational model (e.g. JOINs). It includes data-definition language statements like CREATE TABLE. Spanner supports 7 data types: bool, int64, float64, string, bytes, date, timestamp[20].

> Cloud Spanner doesn't, however, support data manipulation language (DML) statements. DML includes SQL queries like INSERT and UPDATE. Instead, Spanner's interface definition includes RPCs for mutating rows given their primary key[21]. This is a bit annoying. You would expect a fully-featured SQL database to include DML statements. Even if you don't use DML in your application you'll almost certainly want them for one-off queries you run in a query console.

> Though Cloud Spanner supports a smaller set of SQL than many other relational databases, its dialect is well-documented and fits our use case well. Our requirements for a MySQL replacement are that it supports secondary indices and common SQL aggregations, such as the GROUP BY clause. We've eliminated most of the joins we do, so we haven't tested Cloud Spanner's join performance.

This seems like it'd prevent any kind of easy switch over to Spanner.

Just to be clear, the JOINS were removed for the vertical sharding prior to looking at Cloud Spanner. Cloud Spanner fully supports complex JOINS of many times (e.g. INNER, OUTER)

Details - https://cloud.google.com/spanner/docs/query-syntax#join-type...

Disclaimer: I work on Cloud Spanner

Sorry for the confusion but I meant the DML portion.

It sounded like you can only modify by primary key? Can you make a transaction that contains a query and a bunch of updates by PK ?

And yeah it makes it sound like writing an OEM adapter will be much more difficult.

reply


reply


reply


Seems more like they want all the data that ever existed for the database.

reply


>> Cloud Spanner doesn't, however, support data manipulation language (DML) statements. DML includes SQL queries like INSERT and UPDATE. Instead, Spanner's interface definition includes RPCs for mutating rows given their primary key[21].

Does this mean I need to rewrite my application?

My application uses an ORM and it typically converts my logic to SQL statements and fires them off to Postgres. Would I need to change it such that it doesn't issue INSERT / UPDATE statements?

reply


reply


> We've eliminated most of the joins we do, so we haven't tested Cloud Spanner's join performance.

The join performance is by far the most interesting part of this to me. A more traditional NoSQL solution sounds like it would have worked just as well for you, sans all the atomic clock fanciness. Joining across geographically disparate data is a real trick, and it seems like there would be some physical performance limits?

reply


> So a query that accesses 10 rows in disparate parts of the primary key space will take longer than one where the keys reside on the same splits. This is expected with a distributed system.

No, why? Query can be executed in parallel.

BTW, isn't 20k/sec is a very very small performance for 30 node installation. Cassandra can handle 50k+ (both writes and read) on a single node. When in most queries you are trying to collect data from many nodes it will scale almost linearly.

reply


I don't think that comparison holds. It's easy to push 50k+ on a single node, you're basically only resource bound on that machine. Pushing 20k+ on something that's globally consistent spread out over so many instances is a different exercise entirely. It also depends on the level of consistency you're asking from Cassandra. You'd probably need to set this to EACH_QUORUM or ALL to mimic the behaviour Spanner gives you.

And yes Cassandra will scale linearly-ish as long as you're in the same datacenter. Try running a geo-distributed 30-node Cassandra ring and it's a whole different story at that level of consistency and availability.

reply


> Not every application can handle Spanner's ~5ms minimum query time, but if you can, then you can have that latency for a very high-throughput workload

This is the tradeoff we've all been looking for. Cool product, anyway!

reply


Really a CP system but with the Availability being five 9s or better (less than one failure in 10^6)

How: 1)Hardware - Gobs and Gobs of Hardware and SRE experience

"Spanner is not running over the public Internet — in fact, every Spanner packet flows only over Google-controlled routers and links (excluding any edge links to remote clients). Furthermore, each data center typically has at least three independent fibers connecting it to the private global network, thus ensuring path diversity for every pair of data centers. Similarly, there is redundancy of equipment and paths within a datacenter. Thus normally catastrophic events, such as cut fiber lines, do not lead to partitions or to outages."

2) Ninja 2PC

"Spanner uses two-phase commit (2PC) and strict two-phase locking to ensure isolation and strong consistency. 2PC has been called the “anti-availability” protocol [Hel16] because all members must be up for it to work. Spanner mitigates this by having each member be a Paxos group, thus ensuring each 2PC “member” is highly available even if some of its Paxos participants are down."

reply


> with the Availability being five 9s or better (less than one failure in 10^6)

Anyone know how exactly this is defined for them? (Time? Queries? Results?)

reply


This release shows the different philosophies of Google vs Amazon in an interesting way.

Google prefers building advanced systems that let you do things "the old way" but making them horizontally scalable.

Amazon prefers to acknowledge that network partitions exist and try to get you to do things "the new way" that deals with that failure case in the software instead of trying to hide it.

I'm not saying either system is better than the other, but doing it Google's way is certainly easier for Enterprises that want to make the move, and why Amazon is starting to break with tradition and release products that let you do things "the old way" while hiding the details in an abstraction.

I've always said that Google is technically better than AWS, but no one will ever know because they don't have a strong sales team to go and show people.

This release only solidifies that point.

This isn't entirely accurate. BigTable was Google's earlier cloud database, and it's certainly non-traditional, and you have to build your application without traditional consistency guarantees, the way you describe.

Spanner doesn't exactly hide the details, but it lets you make transactions that span multiple shards. You still eat the cost of the transaction, you're just free from having to implement it at the application level, which is a more difficult and error-prone way of doing things. The bottom line is that if you need consistency, it needs to be implemented somewhere in your stack. If you don't need consistency (analytics workloads come to mind) then you have more flexibility with your database.

Disclosure: Google employee, reconstructing what I know from published information.

>and release products that let you do things "the old way" while hiding the details in an abstraction.

However, by 'abstracting' this away, you're not being forced to think about failure domains. If there is ever a massive country-wide connectivity break to the wider Internet (feasible for lots of people inside censored countries), you'll be pretty pissed when you can't use the DB services for your servers in the Google-local datacenter that you still have connectivity to because it can't get quorum.

reply


Cloud Spanner is currently a regional service, not a global service. So you would only lose availability for failures within the region.

reply


reply


The "old way" was sacrificing functionality such as transactions and joins to get scalability (BigTable, DynamoDB).

Google tried that a decade ago and found it lacking, this is why Spanner exists in the first place.

Well, I'd say the "old way" is SQL with joins and schemas and transactions, and the "new way" is KV with eventual consistency.

reply


Chronologically, we have: SQL -> NoSQL -> NewSQL

You're both right.

I'm not sure if what you said applies, they have severe restrictions and spanner offers subset of MySQL functionality which is already bare compared to other databases. Changes can be done by primary key only, so it almost feels like a KV store that can do joins...

I don't think it's easy to port existing applications to use it and in the end you will still need to accommodate shortcomings in your application.

reply


Either way, they are trying to abstract away having to think about eventual consistency with this offering.

reply


People used to make similar comparisons between the Russian and American space programs.

reply


reply


Presumably a reference to the classic "Pencil myth": Americans when faced with a need to write things down spent millions of dollars inventing a low gravity ballpoint pen and Russians just used a pencil.

As cutesy of a sentiment as it is, it's also full of misconceptions. The pens were invented by an American corporation that wanted better pens to sell in general (a smoother flow in a pen, regardless of gravity/orientation, is a better pen), and they saw a good opportunity to market the pen to NASA for use in space. Both NASA and the Russians used pencils in space, but the problems with pencils is the flakes can pollute an environment pretty quickly in low gravity and the pens turned out to be a much better solution. (So far as I've heard, every space agency these days buys similar pens.)

I wonder how this will affect adoption of CockroachDB [1], which was inspired by Spanner and supposedly an open source equivalent. I'd imagine that Spanner is a rather compelling choice, since they don't have to host it themselves. As far as I know, CockroachDB currently does not support providing CockroachDB as a service (but it is on their roadmap) [2].

[1] https://www.cockroachlabs.com/docs/frequently-asked-question...

[2] https://www.cockroachlabs.com/docs/frequently-asked-question...

(Cockroach Labs CTO here)

Google launching Spanner is generally a positive thing for our industry and our product. It's more proof that what we're aiming for is possible and that there's demand for it. We expect that in five years, all tech companies will be deploying technology like ours.

One of the big differences is that Spanner only uses SQL for read-only operations, with a custom API for writes. We use standard SQL for both reads and writes, which means we also work with major ORMs like GORM, SQLAlchemy, and Hibernate (docs should be live today or tomorrow). Spanner's custom write API will make it difficult to work with existing frameworks, or to convert an existing application to Spanner.

Cloud Spanner only works on Google Cloud and is a black-box managed service. CockroachDB is open source and can be run on-prem or in any cloud on commodity hardware. (We don't offer CockroachDB as a service yet, but may in the future)

At this point, both products are still in beta and are still missing features like back-up and restore (according to the Quizlet blog post). We plan to launch CockroachDB 1.0 with back-up / restore enabled.

* For anyone wanting to know more about how we make CockroachDB work without TrueTime, see our blog post: https://www.cockroachlabs.com/blog/living-without-atomic-clo...

The main sales pitch of Cloud Spanner is Google's network infrastructure.

No startup will be able to replicate that anytime soon, a lot of time (and money) has been put into it by a lot of people over a long time.

reply


Curious: is there any company in the world that could replicate its breadth, performance, and reliability in the next decade?

Could any government? Has any government?

My impression is that, infrastructure wise, Google is genuinely in a class of size one.

reply


CockroachDB can be hosted on any cloud for a fraction of the cost, I'd think that's a huge advantage for small/solo startups.

reply


https://www.cockroachlabs.com/blog/living-without-atomic-clo...

> A simple statement of the contrast between Spanner and CockroachDB would be: Spanner always waits on writes for a short interval, whereas CockroachDB sometimes waits on reads for a longer interval. How long is that interval? Well it depends on how clocks on CockroachDB nodes are being synchronized. Using NTP, it’s likely to be up to 250ms. Not great, but the kind of transaction that would restart for the full interval would have to read constantly updated values across many nodes. In practice, these kinds of use cases exist but are the exception.

CockroachDB is waiting for time keeping hardware to improve.

reply


I imagine the globally distributed database market is big enough for more than one winner. The presence of competitors can sometimes even be a boon, increasing the visibility of a market's goods relative to other similar goods.

reply


reply


The white paper is available here: http://static.googleusercontent.com/media/research.google.co...

for anyone interested

Oh and the newer white paper from today: https://cloud.google.com/spanner/docs/whitepapers/SpannerAnd...

reply


I bet there will be a lot about CAP theorem in the comments:-)

I wonder why they charge a minimum of $0.90 per node-hour when they offer VMs for as little as $0.008/hr. This is hugely useful even for single-person startups, so why charge a minimum of ~$8,000 per year?

reply


Hugely useful but also hugely different from an engineering/coverage perspective, perhaps.

Companies with more data than can fit in a single-instance RDBMS system (like >3TB of hot data, more throughput than a single node can handle) but still seeking transactional consistency are a clear use case. Single-person startups could definitely benefit, but it's a less-likely scenario that they would require the level of coverage Spanner provides.

reply


To be fair, you shouldn't even run MySQL on an f1-micro. This is more on par with an 18 vcpu raw server, before you'd even consider any value that the software provides.

I'd certainly love to see us get to a world where we can split up a single spanner "install" in an isolated, multitenant manner, but even for a small company, $8k/year is admittedly a small fraction of one engineer. At a company with several, you can share your single Spanner instance just like you would any other database.

Disclosure: I work on Google Cloud (but not Spanner).

Concur. The pricing makes sense for their "target market" of folks who currently have a "bursting at the seams" MySQL or PgSQL instance, but it locks out folks just getting started with a tiny database and low load. This seems like bad positioning: the "bursting" folks will have to decide between the cost of re-hosting their whole system on the Cloud Spanner and trying to incrementally keep their current platform running; the small folks who would like to organically grow on a platform without scaling limits are locked out of the low end, so by the the time they are big enough to need Cloud Spanner... they too with be forced into the "re-host or muddle on" decision.

reply


> The pricing makes sense for their "target market" of folks who currently have a "bursting at the seams" MySQL or PgSQL instance

My intuition, which I hope is wrong, suggests this is a small market.

reply


Because you're paying not for cost plus, but value added. To make a meaningful cost comparison, the best alternative must be considered, which is you spending your own labor to engineer and build a similar system yourself.

reply


reply


because if you aren't spending 8k/yr on your database then you don't need this level of scale

reply


Having your database problem solved however, one less thing to worry about, if you're bigger.

reply


A cheap VM is not likely to give the same type of performance.

Why should useful things be cheap?

reply


reply


> Does this mean that Spanner is a CA system as defined by CAP? The short answer is “no” technically, but “yes” in effect and its users can and do assume CA.

It's somewhat ironic that Brewer, the original author of the CAP theorem, is making this sort of marketing-led bending of the CAP theorem terminology. I think what he really should be saying is something in more nuanced language like this: https://martin.kleppmann.com/2015/05/11/please-stop-calling-...

But perhaps Google's marketing department needed something in the more popular "CP or AP?" terminology. I don't see what would be wrong with "CP with extremely high availability" though.

It's certainly wacky to be claiming that a system is "CA", since as the post admits it's technically false; to me this makes it clear that CP vs. AP (vs. CA now?) does not convey enough information. I'd prefer "a linearizably-consistent data store, with ACID semantics, with a 99.999% uptime SLA". Not as snappy as "CA" (I will never have a career in marketing I suppose), but it makes the technical claims more clear.

reply


Global Spanner looks like a different beast, though. It looks like Google has configured a database for master-master(-master?) replication, across regions and even continents. They seem to be pulling it off by running only their own fiber, each master being a paxos cluster itself, GPS, atomic clocks and lot of other whiz-bangery.

reply


From the technical blog post

> Does this mean that Spanner is a CA system as defined by CAP? The short answer is “no” technically, but “yes” in effect and its users can and do assume CA. The purist answer is “no” because partitions can happen and in fact have happened at Google, and during some partitions, Spanner chooses C and forfeits A. It is technically a CP system. However, no system provides 100% availability, so the pragmatic question is whether or not Spanner delivers availability that is so high that most users don't worry about its outages. For example, given there are many sources of outages for an application, if Spanner is an insignificant contributor to its downtime, then users are correct to not worry about it.

Basically, the underlying system is CP, but A is so high (because of the custom fiber, paxos etc) that they're rounding it off to 100% and calling it CAP.

reply


Except that A in CAP has nothing to do with overall system's availability over time and using it as such is just confusing.

reply


But it makes sense in this case. The system guarantees CP. But as customer it looks like you're getting CA as well, because A is so high. If you drink the kool-aid, you get C & A & P.

The kool-aid isn't too bad, though if they can measurably guarantee A > 99.999999%, I'm happy to round off to 100% and call it CAP.

reply


The availability is "only" 99.999%, which IMO is still really high!

(I work for Google Cloud)

reply


So essentially a marketing buzz. "It's always available because we spent a lot of money to make it available"

CAP is about how the system deals with partitions not whether it has partitions or not.

reply


> Aurora... supports master-master replication in the same region

I don't believe that's true, but I could be mistaken?

reply


How does this compare to AWS Aurora in terms of pricing and performance?

With Aurora the basic instance is $48/month and they recommend at least two in separate zones for availability, so it's about $96/month minimum. Storage is $.10/GB and IO is $.20 per million requests. Data transfer starts at $.09/GB and the first GB is free.[1]

Spanner is a minimum of $650/mo (6X the Aurora minimum), storage is $.30/GB (3X), and data transfer starts at $.12/GB (1.3X).

Of course with Aurora you have to pick your instance size and bigger faster instances will cost more. Also there's the matter of multi-region replication, although it appears that aspect of Spanner is not priced out yet. So maybe as you scale the gap narrows, but it's interesting to price out the entry point for startups.

[1] https://aws.amazon.com/rds/aurora/

reply


reply


Sure, but to compare accurately we should look at where that impacts performance in practice. How many writes can Aurora's largest instance handle? What's the write latency from other parts of the globe?

reply


Forgive my ignorance, but could someone explain in layman's terms in which situation this would be helpful? E.g. if I have 1TB of data would I use this? If I have 1GB with a growth rate of 25GB/daily would I use this?

reply


Amazon likes to respond to Google with it's own price drops and product launches. It's telling that their announcements are orthogonal instead of direct competition with Spanner.

When Google announced Spanner back in 2012, I'm sure Amazon and Microsoft started teams to reproduce their own versions.

Spanner is not just software. The private network reduces partitions. GPS and atomic clocks for every machine help synchronize time globally. There won't be a Hadoop equivalent for Spanner, unless it includes the hardware spec.

reply


Amazon already has Aurora: https://aws.amazon.com/rds/aurora/details/

You're right that there's literally nothing else out there that has tight synchronization using atomic clocks, though.

And because of that, Aurora's multi-zone replicas are read-only.

I just noticed Google says the cross-region feature is coming later in 2017. Amazon might be planning to announce a similar change for Aurora in the coming months.

reply


While everyone is puzzling over how Spanner seems to be claiming to be CA, I would like to take this opportunity to bring up PACELC[1].

The idea is that the A-or-C choice in CAP only applies during network partitions, so it's not sufficient to describe a distributed system as either CP or AP. When the network is fine, the choice is between low latency and consistency.

In the case of Spanner, it chooses consistency over availability during network partitions, and consistency over low latency in the absence of partitions.

1: http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf

reply


https://cloudplatform.googleblog.com/2017/02/inside-Cloud-Sp...

reply


I think A is wrongly implied, tried to explain it here: https://news.ycombinator.com/item?id=13645925

I said "seems to", since a lot of people seem to think it :P

reply


Link to the actual OSDI paper (not the simpler whitepaper) https://static.googleusercontent.com/media/research.google.c...

reply


Interesting reading: Spanner Whitepaper

https://cloud.google.com/spanner/docs/whitepapers/SpannerAnd...

reply


> Today, we’re excited to announce the public beta for Cloud Spanner, a globally distributed relational database service that lets customers have their cake and eat it too: ACID transactions and SQL semantics, without giving up horizontal scaling and high availability.

This is a bold claim. What do they know about the CAP theorem that I don't?

Separately, (emphasis mine):

> If you have a MySQL or PostgreSQL system that's bursting at the seams, or are struggling with hand-rolled transactions on top of an eventually-consistent database, Cloud Spanner could be the solution you're looking for. Visit the Cloud Spanner page to learn more and get started building applications on our next-generation database service.

From the rest of the article it seems like the wire protocol for accessing it is MySQL. I wonder if they mean to add a PostgreSQL compatibility layer at some point.

> This is a bold claim. What do they know about the CAP theorem that I don't?

It's right there in the article:

"Remarkably, Cloud Spanner achieves this combination of features without violating the CAP Theorem. To understand how, read this post by the author of the CAP Theorem and Google Vice President of Infrastructure, Eric Brewer."

The post they are referring to: https://cloudplatform.googleblog.com/2017/02/inside-Cloud-Sp...

Why? Strong consistency isn't mutually exclusive with scalability. Google has written about it at length[1][2][3].

Furthermore, there are already more than a few attempts underway to build scalable relational databases ("NewSQL") outside Google.[4]

1: https://research.google.com/pubs/pub36971.html

2: https://research.google.com/archive/spanner.html

3: http://datascienceassn.org/sites/default/files/F1%20A%20Dist...

4: http://db.cs.cmu.edu/papers/2016/pavlo-newsql-sigmodrec2016....

reply


reply


You may be interested in CockroachDB[1] and TIDB[2], which are open-source newSQL databases inspired by Spanner and F1.

1 - https://www.cockroachlabs.com 2 - https://github.com/pingcap/tidb

reply


Eric Brewer who is the author of the CAP theorem works at Google now. He has a post here: https://cloudplatform.googleblog.com/2017/02/inside-Cloud-Sp....

reply


I don't know your expertise on the subject, but they do have a post on this topic.

Some highlights:

"Does this mean that Spanner is a CA system as defined by CAP? The short answer is “no” technically, but “yes” in effect and its users can and do assume CA."

"The purist answer is “no” because partitions can happen and in fact have happened at Google, and during some partitions, Spanner chooses C and forfeits A. It is technically a CP system."

https://cloudplatform.googleblog.com/2017/02/inside-Cloud-Sp...

"High availability" is not the same thing as the "A" in the CAP theorem. Spanner chooses consistency over availability, but it is a HA system in the sense that it can tolerate datacenter outages.

reply


> From the rest of the article it seems like the wire protocol for accessing it is MySQL. I wonder if they mean to add a PostgreSQL compatibility layer at some point.

It looks like the wire protocol is Protocol Buffers and client libraries will likely use GRPC: https://cloud.google.com/spanner/docs/reference/rpc/google.s...

reply


They write that it's a CP system. So A is not guaranteed, but in practice they are able to provide A most of the time, with a private network and only 1 failure in all of 2016 (whatever that means).

reply


> What do they know about the CAP theorem that I don't?

Their statement, for what it's worth: https://cloudplatform.googleblog.com/2017/02/inside-Cloud-Sp...

reply


I doubt it, the spanner is not even offering full MySQL capabilities, so it's unlikely to to support any advanced PG SQL.

reply


There is a link at the end of the article to a blog post on exactly this topic:

https://cloudplatform.googleblog.com/2017/02/inside-Cloud-Sp...

> The purist answer is “no” because partitions can happen and in fact have happened at Google, and during some partitions, Spanner chooses C and forfeits A. It is technically a CP system.

> However, no system provides 100% availability, so the pragmatic question is whether or not Spanner delivers availability that is so high that most users don't worry about its outages.

> To understand how, read this post by the author of the CAP Theorem and Google Vice President of Infrastructure, Eric Brewer.

Seems like they might know a lot :)

reply


reply


Theoretically it means they are giving up on being Partition Tolerant. There was a popular post a while ago about how the P can't be sacrificed. Because if it is... everything else will fail.

Being Google they are probably prideful enough to think their servers could never have an outage. Which yes, I agree with you, that is a very scary claim.

reply


This is thoroughly wrong. Cloud Spanner sacrifices the "A", not the "P." The cool thing being accomplished here is that the sacrifice to the A is greatly reduced (five or more 9s). There are several documents on the subject linked right off that page and elsewhere in these same comments, like this one: https://cloud.google.com/spanner/docs/whitepapers/SpannerAnd...

The "choose two" of the CAP theorem is a bit misleading. By creating a distributed system, you've chosen P. So it's really choose one: A or C.

reply


> Today, we’re excited to announce the public beta for Cloud Spanner, a globally distributed relational database service that lets customers have their cake and eat it too: ACID transactions and SQL semantics, without giving up horizontal scaling and high availability.

This sounds too good to be true. But it's Google, so maybe not. Time to start reading whitepapers...

One thing to note is Spanner's transactions are different compared to what you get with a traditional RDBMS. See https://cloud.google.com/spanner/docs/transactions#ro_transa...

An example is the rows you get back from a query like "select * from T where x=a" can't be part of a RW transaction. I believe because they don't have the time-stamp associated with them. So, you have to re-read those rows via primary key inside a RW transaction to update them. This can be a surprise if you are coming from a traditional RDBMS background. If you are think about porting your app from MySQL/PostgreSQL to Spanner, it will be more than just updating query syntax.

Disclaimer: I used F1 (built on top of Spanner, https://research.google.com/pubs/pub41344.html) few years ago.

reply


Thomas Watson in 1943 amd his famous quote: “I think there is a world market for about five computers".

If he was alive, he could say these computers are Google, Apple, Microsoft, Amazon and Facebook.

reply


I beat you by 7 years :)

https://arstechnica.com/civis/viewtopic.php?f=21&t=1109206

reply


reply


Looks cool, but the pricing seems a bit non-cloud-native (or at least non-GCP-native).

"You are charged each hour for the maximum number of nodes that exist during that hour."

We've been educated by Google to consider per-minute, per-instance/node billing normal - and presumably all the arguments about why this is the right, pro-customer way to price GCE apply equally to Cloud Spanner.

The per-minute billing is an advantage when you are scaling up and down rapidly. If you use VMs just for 5 minutes, per minute pricing is 20x cheaper than hourly billing.

However with a database it is rare to scale up and down rapidly. Rather you expect change over the order of days. Imagine you go from 10 instance to 15 instances over a week. per minute billing only saves a possible 5 instance-hours over the week compared to hourly billing, which is less than 1% saving.

reply


> clients can do globally consistent reads across the entire database without locking

How is this possible across data centres? Does it send data everywhere at once?

Seems too good to be true of course but if it works and scales it might be worthwhile just not having to worry about your database scaling? Still I don't believe it ;-)

EDIT: further info...

> Spanner mitigates this by having each member be a Paxos group, thus ensuring each 2PC “member” is highly available even if some of its Paxos participants are down. Data is divided into groups that form the basic unit of placement and replication.

So it's SQL with Paxos that presumably never get's confused but during a partition will presumably not be consistent.

reply


reply


I would expect more from Brewer.

"CA except when there are partitions" is CP. It's not "effectively CA".

reply


No, he's saying it's effectively CAP because the A downtime is so small.

It's one thing to do that for a key-value store. Entirely another to support joins on a globally distributed database. This ain't just one availability zone. Spanner is amazing.

It took them a few years to make it a service, but when they announced its use internally a few years ago, it seemed like the nail in the coffin for in-house database hosting.

I understand what he's saying. It's marketing.

There's nothing wrong with saying it's CP, but since we control everything there's extremely rare P. Then he can show availability numbers (which he kinda does).

Saying it's "effectively CA" defeats the point of the CAP theorem, which says you have to make tradeoffs. See: https://codahale.com/you-cant-sacrifice-partition-tolerance/

reply


Another point is that since all records are globally timestamped, you can do a read that is consistent at a timestamp in the past (i.e. read data as the database was 1 second ago, or something like that).

If data from other places has synchronized to your zone, you may be able to do this globally-consistent read while only touching your local datacenter (because TrueTime guarantees that no other records anywhere in the system will be created at the time you are querying).

Note: I work at Google, but I don't know more about Spanner than the Spanner paper.

reply


reply


Given that CockrochDB is based on Spanner and F1, this DBaaS sounds like it will compete directly with them.

reply


While the product is compelling (acid compliant, horizontally scanning DB), it does seem expensive.

If you use 2 nodes/hour, Cost = (20.9) 24 * 31 = $1400/month not anointing for storage and network chargers.

reply


> If you have a MySQL or PostgreSQL system that's bursting at the seams

Postgresql ? How does this work for people migrating from traditional SQL databases - typically people use ORM. How would this fit in with, say , Rails or SqlAlchemy ?

There is JDBC support, so if you're willing and able to connect that way, you could hope for an easier migration. Any move from one database to another is a migration, even if only to deal with dialect-specific things.

Disclosure: I work on Google Cloud (but not Spanner).

reply


Oh this looks really compelling! Though I'm guessing this is targeted to companies? I'd love to use this for some personal projects but the pricing seems really high. Am I reading it right that a single node being used at least a tiny bit every hour is about $670 a month?

Maybe I'm misunderstanding how the pricing works here. Any clarification would be highly welcomed :)

It's targeted to large datasets more than companies. There isn't really any advantage of a single node Cloud Spanner instance over Cloud SQL. Cloud Spanner becomes worthwhile when you have more data/throughput than a single node system can support, at which point the pricing is competitive with other options.

reply


"What if you could have a fully managed database service that's consistent, scales horizontally across data centers and speaks SQL?"

Looks like Google forgot to mention one central requirement: latency.

This is a hosted version of Spanner and F1. Since both systems are published, we know a lot about their trade-offs:

Spanner (see OSDI'12 and TODS'13 papers) evolved from the observation that Megastore guarantees - though useful - come at performance penalty that is prohibitive for some applications. Spanner is a multi-version database system that unlike Megastore (the system behind the Google Cloud Datastore) provides general-purpose transactions. The authors argue: We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions. Spanner automatically groups data into partitions (tablets) that are synchronously replicated across sites via Paxos and stored in Colossus, the successor of the Google File System (GFS). Transactions in Spanner are based on two-phase locking (2PL) and two-phase commits (2PC) executed over the leaders for each partition involved in the transaction. In order for transactions to be serialized according to their global commit times, Spanner introduces TrueTime, an API for high precision timestamps with uncertainty bounds based on atomic clocks and GPS. Each transaction is assigned a commit timestamp from TrueTime and using the uncertainty bounds, the leader can wait until the transaction is guaranteed to be visible at all sites before releasing locks. This also enables efficient read-only transactions that can read a consistent snapshot for a certain timestamp across all data centers without any locking.

F1 (see VLDB'13 paper) builds on Spanner to support SQL-based access for Google's advertising business. To this end, F1 introduces a hierarchical schema based on Protobuf, a rich data encoding format similar to Avro and Thrift. To support both OLTP and OLAP queries, it uses Spanner's abstractions to provide consistent indexing. A lazy protocol for schema changes allows non-blocking schema evolution. Besides pessimistic Spanner transactions, F1 supports optimistic transactions. Each row bears a version timestamp that used at commit time to perform a short-lived pessimistic transaction to validate a transaction's read set. Optimistic transactions in F1 suffer from the abort rate problem of optimistic concurrency control, as the read phase is latency-bound and the commit requires slow, distributed Spanner transactions, increasing the vulnerability window for potential conflicts.

While Spanner and F1 are highly influential system designs, they do come at a cost Google does not tell in its marketing: high latency. Consistent geo-replication is expensive even for single operations. Both optimistic and pessimistic transactions even increase these latencies.

It will be very interesting to see first benchmarks. My guess is that operation latencies will be in the order of 80-120ms and therefore much slower than what can be achieved on database clusters distributed only over local replicas.

What is TrueTime really? Are their Distributed Systems 'sharing a global clock'?

reply


From the Spanner paper:

> The underlying time references used by TrueTime are GPS and atomic clocks. TrueTime uses two forms of time reference because they have different failure modes... TrueTime is implemented by a set of time master machines per datacenter and a timeslave daemon per machine. The majority of masters have GPS receivers with dedicated antennas; these masters are separated physically to reduce the effects of [GPS] antenna failures, radio interference, and spoofing. The remaining masters (which we refer to as Armageddon masters) are equipped with atomic clocks. An atomic clock is not that expensive: the cost of an Armageddon master is of the same order as that of a GPS master.

Source: https://static.googleusercontent.com/media/research.google.c...

Here is a good article: http://www.bluetreble.com/2015/10/time-travel/

reply


Spanner's paper: https://static.googleusercontent.com/media/research.google.c...

Does it support spatial objects / can it replace PostGIS?

reply


No. https://cloud.google.com/spanner/docs/data-types

reply


> This leads to three kinds of systems: CA, CP and AP,

What is a distributed system that is CA? Can you build a distributed system which will never have a partition.

reply


> For distributed systems over a “wide area,” it's generally viewed that partitions are inevitable, although not necessarily common. If you believe that partitions are inevitable, any distributed system must be prepared to forfeit either consistency (AP) or availability (CP), which is not a choice anyone wants to make. In fact, the original point of the CAP theorem was to get designers to take this tradeoff seriously. But there are two important caveats: First, you only need to forfeit consistency or availability during an actual partition, and even then there are many mitigations. Second, the actual theorem is about 100% availability; a more interesting discussion is about the tradeoffs involved to achieve realistic high availability.

reply


> If you believe that partitions are inevitable, any distributed system

How does that answer it? Are they implying that partitions will not happen if you don't believe in them?

reply


reply


A memcache cluster might qualify. If there's a partition, just forget about the missing nodes. It's just a cache anyway.

reply


Trivially, by never allowing either half of a partition to make progress while the partition is in place. Since the CAP theorem, by itself, doesn't put a cap(oof) on latency, it is valid to consider a system CA, if it if always available to listen to requests while partitioned, but never able to fulfill them.

This, of course, is effectively useless in practice, and is dependent on an infinite buffer of pending operations, etc.

https://brooker.co.za/blog/2014/07/16/pacelc.html

> Trivially, by never allowing either half of a partition to make progress while the partition is in place.

Doesn't availability mean getting a response on success or failure. If during a partition there is no response on success of failure how is the system available? It seems re-writing a term like "x will happen" to "x will happen after an infinite timeout" should not be valid

reply


It does, but within what bounds? The CAP theorem doesn't specify. One could assume that it means before the partition is restored, but that is only one possible valid interpretation. The PACELC theorem, which is by no means the last word on the story, clarifies this well:

https://en.wikipedia.org/wiki/PACELC_theorem

"PACELC builds on the CAP theorem. Both theorems describe how distributed databases have limitations and tradeoffs regarding consistency, availability, and partition tolerance. PACELC however goes further and states that a trade-off also exists, this time between latency and consistency, even in absence of partitions, thus providing a more complete portrayal of the potential consistency tradeoffs for distributed systems."

And I would take that argument one step further and say that latency and partitioning are effectively identical, and from the point of view of any given operation, it is impossible to say whether the system is in partitioned state until max lateny (timeout) has elapsed, because failure to make progress within timeout is the only meaningful definition of partion-induced unavailability.

Very interesting. How does this pricing compare to AWS Aurora? https://aws.amazon.com/rds/aurora/pricing/

reply


Not sure. If you need to scale beyond a single master, Aurora won't help in the same way Spanner does though. You can dial up the number of nodes in Spanner dynamically under load with good results.

reply


reply


Doesn't seem possible to use this yet. No client libraries and no samples: https://cloud.google.com/spanner/docs/tutorials

Have they documented the wire protocol? I couldn't find it.

I work on Cloud Spanner and client libraries are rolling out right now, but API definitions are available.

RPC: https://cloud.google.com/spanner/docs/reference/rpc/ Rest: https://cloud.google.com/spanner/docs/reference/rest/

reply


Anyone working on a Rust lang client-library?

reply


We're still working on rolling a few docs throughout the day. For example - here's the node.js lib:

https://github.com/GoogleCloudPlatform/google-cloud-node#clo...

Is this similar to AWS Aurora or is this something else completely different?

reply


Aurora is not globally distributed. Spanner is, and is based on Google research which takes advantage of atomic clocks installed in each server: https://research.google.com/archive/spanner-osdi2012.pdf

Edit: not every server has an atomic clock; see replies by Google employees

reply


There are only atomic clocks in some master servers: "TrueTime is implemented by a set of time master machines per datacenter and a timeslave daemon per machine. The majority of masters have GPS receivers with dedicated antennas; these masters are separated physically to reduce the effects of antenna failures, radio interference, and spoofing. The remaining masters (which we refer to as Armageddon masters) are equipped with atomic clocks. An atomic clock is not that expensive: the cost of an Armageddon master is of the same order as that of a GPS master."

The timeslave daemons running on each machine keep them synchronized with the master time servers, and maintain tight bounds on their inaccuracy.

(Disclaimer: I work at Google)

reply


Each datacenter not server :).

SqlAlchemy engine please :) ?

reply


reply


Actually it is most closely related to the new ["Standard SQL" in BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sq...).

reply


> Unlike most wide-area networks, and especially the public internet, Google controls the entire network and thus can ensure redundancy of hardware and paths, and can also control upgrades and operations in general

I know this is a single system, but I'll still say it. This seems like another step in a scary trend for our internet.

I am an employee, and have my biases, but I’ll always prefer a customer's data stay on our backbone and not be passed through the public internet. Our customers also prefer it, and it's not something other Cloud providers can fully cover.

reply


reply


Why is it a scary trend that Google has made their part of the internet more resilient?

The parent comment author seems to be disturbed by the phrase "controls the entire network." Setting aside his unwarranted paranoia, it does make me wonder which organization has the largest and most reliable corner of the internet. Google certainly qualifies by sheer volume of hardware and network infra.

reply


This is no different from a myriad of other companies that have private networks on private fiber that span large geographical paths. Google's is probably bigger than most (both in terms of geography and throughput), but it's nothing new to have private fiber to ensure latency/throughput/reliability.

Given the CAP theorem I wonder what trade-offs they make and how much visibility they give you into these trade-offs.

In any case this is much better than Amazon's offerings... when they actually ship it. :)

reply


Check out this post mentioned in the original post:

https://cloudplatform.googleblog.com/2017/02/inside-Cloud-Sp...

Of note:

They say Spanner is "both consistent and highly available despite operating over a wide area". So not 100% availability but they've got it to "more than five 9s of availability (less than one failure in 1066)."

I didn't pick up on it until just a moment ago, but when you say "They say", it's actually Eric Brewer saying that -- who's most known for coming up with the CAP theorem. I think they've got a pretty good understanding of it!

reply


reply


I wonder how many people will get a seizure from that red-blue blinking rectangle in the video :(

Upd: Downvoting this warning will only increase that number.

I see there's "data layer encryption" but the data is still readable by Google. Why would anyone want to keep feeding the Google beast with more data?

Software is about separating concerns, and decentralizing authority. Responsible engineers shouldn't be using this service.

