And for those wondering, this is why Oracle wants billions of dollars from Google for "Java Copyright Infringement" because the only growth market for Oracle right now is their hosted database service, and whoops Google has a better one now.
It will be interesting if Amazon and Microsoft choose to compete with Google on this service. If we get to the point where you have databases, compute, storage, and connectivity services from those three at equal scale, well that would be a lot of choice for the developers!
There are also plenty of choices evolving for developers who aren't looking for hosted solutions (which can sometimes be a showstopper for enterprise on-prem deployments). There's a growing ecosystem of distributed open-source databases to look out for too.
Take Citus, for instance – a Postgres-compatible distributed store which automatically parallelizes normal SQL queries across machines. It's as easy to set up as adding an extension, and people are doing some staggering things in prod with it.
Different audience from BigQuery and Spanner, but no less exciting.
Disclaimer: no professional association, but love their product and the team.
If you are looking for something that is more Postgres flavored (meaning we're just an extension to it so you get all the good stuff of Postgres such as JSONB, PostGIS, etc.) then we hope we'd be a good fit. And we run a managed service on top of AWS as well (https://www.citusdata.com/product/cloud) built by the team that built Heroku Postgres. If curious on pricing you can find it at https://www.citusdata.com/pricing/
I'm not trying to compare on a per-mb level, but it would be nice for smaller scale workloads.
Though hopefully you'll find many more useful ones about Citus and Postgres broadly on our blog: https://www.citusdata.com/blog/
Imagine.. RDS is literally the ONLY place where you can buy a 10 GB data multi-AZ replicated, snapshotted and managed postgresql.
Its pretty much a monopoly, now that Google seems to have officially closed the book on ever supporting postgresql.
Not true. I was looking for a hosted Postgres provider and discovered these two:
https://aiven.io/postgresql (tried it, worked excellently)
I would be hesitant to say that this is a fact.
Uh, how? I wouldn't be surprised to see a Cloud SQL-like managed Postgres service from Google.
While there's obviously some overlap in the potential market for any relational datastore service, Spanner doesn't really overlap with a cloud Postgres service as much as Cloud SQL does.
the issue is that the migration path of self hosted mysql to cloud sql to spanner is pretty well defined. I dont see postgresql being strategically important or relevant to google for anything.
if I was a startup deciding on my database, there's a lot less compelling reasons to choose postgresql from the point of view of long term viability.
hell, I can pretty much do a back of the envelope calculation on how much will it cost me to support 100 million users on mysql.
Is it safe to think that Evernote and Snapchat - startups who are giant success stories - are google mysql hosted? (in some form.. maybe even spanner)
So uber, snapchat, Google, Evernote and a clear cut path for upward scale.
I have very less hope for postgresql on google cloud.
What does Cloud Spanner have to do with MySQL? It's neither API nor SQL-dialect compatible with MySQL. If there are MySQL bits used somewhere in the implementation, they are well hidden, and irrelevant to users.
> the issue is that the migration path of self hosted mysql to cloud sql to spanner is pretty well defined.
So what? Were there a Cloud SQL-like Postgres offering, the same would be true; Spanner is no closer to MySQL than Postgres. (If anything, it's SQL dialect is a little closer to Postgres's dialect than MySQL's, though not so much that you'll get away without doing substantial conversion going from either.)
Er, that's a bet I'd strongly suggest you didn't make.
If anything, hosted Postgres from Google Cloud will be priced in a way that makes Spanner some what more attractive, as a way to get conversions to Spanner in the long-run.
How many nodes are you looking to run for $300/month? Unless you have more than 150gb/node of data, you don't really need a distributed database which is what Citus is for.
Please note what im paying for is availability and reliability.. not for a database per se.
And im not even talking Aurora. That stuff is going to blow every other price point out of the water at probably higher reliability metrics.
For example, being a MSSQL performance tuning expert requires years of experience and probably pays very well, but just the other day I read an anecdotal story where someone switched a large BI database to use columnar indexes, allowing them to replace very complex (extreme manual tuning to achieve acceptable performance) queries with just standard SQL with comparable performance.
How long until the scale, pricing, and now transparent & full(?) sql compliance offered by these cloud platforms starts to make traditional RDBMS platforms a niche platform?
EDIT: Also, most DB users don't need global-scale databases.
That is not necessarily a foregone conclusion.
In tandem with their marketing of Azure, MS is pushing SQL Server on Linux heavily .
"SQL Server is Windows-only" is no longer a valid argument to choose another RDBMS if a startup uses lots of MS tooling but deploy on Linux servers.
There's a reason why regulated (including self-regulated) professions have continuing education requirements; progress happens and you become obsolete if you don't keep up with it.
Just because tech isn't regulated doesn't mean it's any more sensible to expect to remain valuable without keeping up with progress in the field.
That being said, MSSQL experts will likely have good-paying opportunities for quite a while, for the same reason thats then case for any well-established enterprise technology: lots of systems are going to be around using it long after it has become distressingly uncool to spend time learning.
Four years ago, I determined that while development work might seem to be near the top of the food chain, there will at some point where my work will be replaced by AIs.
This is not so different from how word processors replaced the specialist job of typesetters. Word processors make "good enough" typesetting. You can still find typesetters practicing their craft; the rest of us use word processors and don't even think about it.
At the time, I was learning to put the Buddhist ideals of emptiness and impermanence to practice, and to become more emotionally aware: the _main_ reason I had thought I would never be replaced by an AI writing software has more to do with wishful thinking and attachment than any clear-sighted look at this.
I also made a decision to work on the technologies to accelerate this. Rather than becoming intoxicated by the worry, anxiety, and existential anguish, I decided try to face it. Fears are inherently irrational, but just because they are irrational does not mean it is not what you are experiencing. Fears are not so easily banished by labeling them as irrational. Denial is a form of willful ignorance.
Now, having said all that, whether our tech base will come to that, who can say?
Since then, I have been tracking things like:
Viv - a chat assistant that can write it's own queries
DeepMind's demonstration of creating a Turing-complete machine with deep learning using a memory module.
I watched a tech enthusiast write a chat bot. He does not write software professionally. Talking with him over the months when he tinkers with in his spare time, I realized that in the future, you won't have as many software engineers writing code; you would learn how to _train_ AIs when they become sufficiently accessible to the masses. Skills in coaching, negotiation, and management becomes more important then some of the fundamental skills supporting software engineering. And like typesetting, I can see development work being pushed down the eco-ladder.
It's not surprising to me to see that Wired article about how coding becoming blue collar work. And even that will eventually be pushed down even further.
It's not surprising to me about Google's site-reliability engineering book, branding, and approach. I have done system admin work in the past, and I can already see traditional, manual sysadmin work being replaced.
It's easy to get nihilistic about this, but that isn't my point here either. I know the human potential is incredible, but I think we have to let go of our self-serving narratives first.
The second idea that interests me is this idea of very high technology. It is built upon layer after layer of very clever tech year after year that I wonder how long it would take to start again from scratch if some disaster rendered a large part of one of these layers unusable.
For instance, if you were on a desert island, could you (would you want to?) build some piece of tech? An electric generator would be useful, perhaps. How long would it take to build? You'd need knowledge, raw materials, plant, fuel etc. It's not an easy solve. And that's way down the tech stack before you start talking about AIs. I suppose what I'm saying is, that the AI layer is based upon such high tech, that is inherently fragile, because it is so hard to do.
I don't know! :-D
I don't know what society would look like from a purely technological point of view. From a spiritualist point of view, though, it could either go very well or very badly. When everything is automated, would people have enough time and space to really start asking the really big questions? Or would it accelerate and intensify existential anguish?
> There are a small number of people reaping the benefits, and huge swathes of the population being marginalised and disenfranchised as a result.
Yeah. Arguably, this has already happened.
> The second idea that interests me is this idea of very high technology. It is built upon layer after layer of very clever tech year after year that I wonder how long it would take to start again from scratch if some disaster rendered a large part of one of these layers unusable.
The stuff of sci-fi :-D Among them, alt-history novels (what happens when someone drops into a lower-tech era; you'd have to start from 0 ... literally, 0, as in Arabic numerals).
Open Source Ecology is trying to preserve some of this tech base. I find their aims awesome, though I am not sure how effective it is.
The flip side are things being spoken from well outside the techno-sphere, (for example, shamans and mystics) It is the perspective that the further evolution of human consciousness will, at some point, no longer require a technology or artifacts. Technology seen as the last crutch. The collapse of a high-technic civilization then sets the stage for a removal of that crutch, and humans learn to stand with two feet (so to speak).
Not a fair argument against the point made above, however, I believe we will find the next big challenge for software to solve as soon as traditional problems are commoditized/automated and considered solved. Also, just knowing how to code is not going to be enough. You must complement it with domain expertise to solve challenging unsolved real world problems.
We've been seeing this demand at Fauna. FaunaDB offers distributed consistency, based on Raft and the Calvin protocol instead of depending on specific networking and clock hardware. We've seen a big part of our appeal is the ability to run FaunaDB across multiple cloud services.
On-premises is licensed by core.
We have a developer edition you can use on your local machine, but we don't currently have plans to open source FaunaDB itself.
Edit: Looks like maybe you're referring to the recommended 3 node minimum in production mentioned at https://cloud.google.com/spanner/docs/instance-configuration
It may be recommended as protection against an availability issue on an instance, though, which is, after all, a big reason why you'd want a distributed DB in production.
A sad choice though. The centralization of computation is likely not a good thing in the long run.
The advantages of owning your own hardware will never go away, but soon this will be made quite intentionally impossible as the big players coalesce and continue building their walled gardens.
This is already happening. All the big players own their hardware and rent it out to everyone else, while trying to convince everyone it's not worth owning your own hardware at the same time.
These companies have already begun closing off server platforms by developing custom hardware and software systems that cannot be bought for any price, only rented. These systems represent a new breed of technology with unbreakable vendor lock in.
Theses same companies compete with each other and countless other companies across the space. Take for example a start-up that wants to run their own app store. Google, Amazon, and Microsoft all run app stores. Where will this company go for cloud services? Their only big name options are to host their software on the hardware of a direct competitor. Their host has full visibility on how their system works, and control over the pricing and reliability of their machines.
It's laughable to think their "cloud partner" will give them any chance to compete if they enter the same market.
We've seen UEFI BIOS and un-unlockable mobiles enter the market in droves the last few years. A lot of new PC's can't run anything except windows. A lot of new phones can only run the carrier's version of android. We have all these general purpose CPUs that can no longer run general purpose programs because "security", and a lot of lobbyist pushing to make it actually illegal to run your own software on these with "anti tampering" laws, again for "security" . Soon the big guys (same companies, MS and Google) will make it impossible to run your own software on any reasonably inexpensive devices and the walled market will be complete.
Mark my words, I've never seen an industry with a couple big players where growth and innovation doesn't eventually turn into collusion, higher prices, and market stagnation. Once MS, Google and Amazon have their slice of the pie and they've killed off everyone else, we will see the death of general purpose computers and mobile devices. Everything you buy will be "android computer" "windows computer" and "apple computer". Anything general purpose will be massively more expensive because individual companies can't get the kind of volume discount of the giant behemoths that increasingly control large swaths of the world's computing power. We've already seen the endgame, with Amazon trialing an "on premesis" version of their compute platform which is basically a super locked down server that you can't buy, only rent endlessly. The future of on premesis will be a cloud in a black box if these companies have anything to do with it. Why? Because once they've got you locked in it makes no sense to sell you anything for keeps. Why keep improving their product so you buy the new version when they can just make it incompatible with everything else and force you to rent it forever, for whatever price they feel like charging?
One day running your own servers will be like running your own ISP . Massively impractical because the free market has been manipulated to the point that it effectively no longer exists
What? People use cloud computing because it already is massively impractical to run your own servers. Hardware is hard to run and scale on your own and experiences economies of scale. This principle is seen everywhere and can hardly be viewed as something controversial. Walmart for instance can sell things at a really low price because of the sheer volume of their sales. Similarly, data centers also experience economies of scale.
As someone who cares about offering the best possible, reliable user experience, cloud computing is absolutely the next logical step from bare metal on-prem servers. When your system experiences load outside the constraints of what it can handle, a properly designed app that has independently scaling microservices horizontally scales.
Even if you had the state of the art microservice architecture running on a kubernetes cluster on your own hardware, you still wouldn't be able to source disk/CPU fast enough if your service happens to experience loads beyond what you provisioned.
And there is the rub, buying your own hardware costs money, and no one wants to buy hardware they may not ever use. Another advantage of cloud computing.
You are seeing the peak of free market right now, because of cloud computing, which enables people with little upfront cash to invest to form real internet businesses and scale massively.
You think a game like Pokemon Go can exists and do the release they did without cloud computing?
Secondly, software companies like Microsoft, Google and IBM might know a thing or two about running data centers. Due to economies of scale, these companies are inherently in a better position to supply hardware at scale.
> If entire region is down do you think other regions can handle the load. If you think so you're kidding yourself
Netflix routinely does just this to test the resilience of their systems. They pick a random AWS region, and they evacuate it. All the traffic is proxied to the other regions and eventually via DNS the traffic is routed entirely to the surviving regions. No interruption of service is experienced by the users.
Here's a visualization of Netflix simulating a failure on the US-east-1 region and failing over to US-west-1/US-west-2
The top right node is the one that fails. As the error rate climbs, traffic starts getting proxied over to the surviving nodes, until a DNS switch redirects all traffic to the surviving nodes. Netflix does this monthly, in production. They also run https://github.com/Netflix/SimianArmy on production.
The cloud enables fault tolerance, resiliency and graceful degradation.
No, tooling to failover and spin up new instances does that. An enterprise with 3 data centers can do that.
"the cloud" is just doing it on someone else's hardware.
One person, with maybe 3 hours a week of time investment after a few weeks of setup and hardware purchase. Using containers I can move between the cloud and my own servers seamlessly, and long as I never bite the golden apple and use any of the cloud's walled garden "services" like S3. If I need more power I can spin up some temporary servers at any cloud provider in a few hours. For me the cloud is a nice thing because I don't use too much of it. If AWS disappeared tomorrow it would be a mild inconvenience, not devestating like it would be to many newer unicorns.
Go ahead and try to use the cloud you're paying for as a CDN or DDoS sheild, or anything amounting to a bastion of free speech. You'll quickly find out that your cloud provider doesn't like you to use all the bandwidth and CPU you pay for, and they don't like running your servers when they disagree with your views. They quietly overprovision everything pulling the same crap as consumer ISPs where they sell you a 100mbps line and punish you if you use more than 10 of that on average. That's the main reason the cloud is so cheap.
Hardware is cheap, colo's are cheap, software is largely easy to manage. The economy of scale they enjoy is from vendor lock-in and overprovisioning more than anything else.
Is it really that hard to double the amount of servers you own every few weeks? No! If you're using containers or managed KVM you can mirror nodes basically for free over the network as soon as the Ethernet is plugged in. Your time amounts to what it takes to put the thing in a rack, plug in the Ethernet, and hit the "on" button. Everybody in SV land thinks you have to use cloud to "scale massively" but they forget that all of today's technology behemoths were built years ago when the cloud didn't exist. Oh yeah, they all still run all of their own hardware too and have from the early days. Using their model as a template, you should own every single server you use and start selling your excess capacity once you get big enough.
Did you ever read about how Netflix tried to run their own hardware but can't because they have so much data in AWS that it would basically bankrupt them to extract it? Look at how these cost models work. Usually inbound bandwidth is extremely cheap or free but outbound is massively more expensive than a dedicated line at a datacenter, 50-100 times the cost if you're saturating that line 24/7. The removal fees from a managed store like S3 or glacier are even more ludicrous. The cloud is like crack and as soon as you start using it more than a few times a year you will get locked in and unable to leave without spending massive $$$. Usually companies figure out this shell game once they're large enough, but by then it's far too late to do anything about it.
Why are they marketing these things so heavily to startups? Because lock in is how they make their money. They make little or nothing on pure compute power, but since you don't have low level hardware access they can charge whatever the hell they want for things like extra IP's, DDoS protection, DC to DC peering, load balancing, auto scaling. You give massive discounts to new players using these systems and inevitably some of these will become the next Uber or Netflix. Then you are free to charge whatever exhoribitant rates you please once it's so impractical to move that it would require a major redesign of the business.
I see it a lot like franchising. By building on Amazon's cloud services you become "Uber company brought to you by Amazon". Like franchising, your upside is limited because any owner with a significant share of total franchises will begin to put pressure on the service owner itself.
But any "lock in" is totally up to you. Take a look at this:
You can architect your system in a way that it'll run on any cloud provider. All the major Cloud Providers support kube for orchestration.
To be honest I don't think you know what you're talking about. You should refrain from making uninformed opinions on hacker news, especially on a throwaway.
Where did you read this? You can have Amazon send you a truck full of hard drives. I doubt it costs more than Netflix can afford.
Also, the truck is for data in, not data out. Getting data out of AWS is far more expensive than putting it in. That's the lock in.
Also, the truck is for data in, not data out. Getting data out of AWS is far more expensive than putting it in. That's the lock in.
This is also not true. The bulk transfer service is bi-directional.
They cache popular content close to the users, they don't manage their catalog at the edges.
The most unique thing about spanner is the use of globally synchronized clock timestamps to guarantee "comes before" consistency without the need to actually synchronize everything.
There is nothing stopping startups and open source developers from building the same thing in a few years. The missing ingredient is highly stable GPS and local time sources which will hopefully be available on cloud instances sometime soon. This is a new piece of hardware so it will be interesting to see if cloud providers make one available or use the opportunity to sell their own branded "service" version you can't buy. Unfortunately I think we'll see the latter far before the former, it it ever even exists. Without a highly stable timesource doing what spanner does will be completely impossible.
Yes spanner is special right now but that's even more reason to not go near it. Google has a complete monopoly on it, the strongest vendor lock in you can possibly have
Only "new" in the sense that it is currently not commonly offered, the devices themselves have been available for ages. (If you are a large enough customer you apparently can get at least some colo-facilities to provide you with the roof-access and cabling needed for the antennas). If cloud providers make precise time available I don't see much potential for locking you in with their specific way of providing it, as long as it ends up as precise system time in some way.
I know GPS time sources have been available forever but a fault tolent database needs a backup. The US GPS is incredibly reliable but there have been multiple issues with both Glonass and Galilio.
It sounds like Google has an additional time source making this possible, probably a highly miniaturized atomic clock, possibly on a single chip. There's no way they're running on GPS alone
So, the computational "Gini index" is increasing, but no one is being thrown into computational poverty.
Yes, and this will be disadvantageous over the long run for people that want to run things themselves. Ultimately companies like AMD/Intel go where the big money is at. As things centralize further and further, there will only be 3 customers they care about in the server market.
Maybe not, but consumers increasingly use centralized computation resources. I would guess that by now most applications used by consumers run in their web browser, such as Facebook.
Now, you could play with that analogy further and see some issues as well, but I don't think the issue here is centralized failure; all these data centers/"clouds" are at least good. The Cloud is about businesses focusing on core business and not supporting functions.
[Disclosure, I work on the Google Cloud team, I'm biased]
Having a devops team with the necessary expertise in Google Cloud or AWS is still a supporting function. You've just traded one skill (managing physical servers) for another (managing proprietary virtual resources).
Hence we get caching. There's the farms, then the inbound warehouses, then the distribution centers, then the grocer, then our refrigerators by the dozen or dozen and a half. When your local cache is empty of eggs, though, it requires a trip back out to the grocer to get an egg even if you need nothing else that trip. Then you generally have to buy at least half a dozen if not a dozen or more eggs just to get the one you wanted.
If I have my own couple of hens, I can go out into the yard and get an egg. If that's the whole of my fetch list, it's much more efficient for this single egg to have the hens laying right out back.
This whole few baskets metaphor breaks down from another point of view, though, when we consider that by the very nature of using a globally distributed hosted service we're actually eliminating a single basket problem. Yes, there's not much choice among just Google, Amazon, and Microsoft. (That they are the only options is a bit of a strawman, but lets grant this one legs.) However, putting just your own employees in charge of all your infrastructure in just your own datacenter(s) in just PostgreSQL or just MySQL is another single-basket problem. Spreading it out so that someone else gets to manage the hardware and the service and replicating your data widely within that service is from that point of view more baskets. You get more datacenter baskets, more employee baskets, and more software baskets. Using standard SQL means you can move among compliant software later, too, so you're not as tied to those baskets.
Now, back to your coop analogy. What's stopping me from having my application talk to Cloud Spanner and a local database proxy (or a work queue that sits between the app and the DB or whichever) so I can use Google's reliability for transactions and my local cached replicant for query speed when I'm querying older data? Why can't I keep a few eggs around?
Also, why would I be scared of Google or Amazon "having my data"? Why would I put sensitive data into my own database in plaintext and then replicate it among multiple datacenters that way?
Only if the owner of the chicken-coop has everyone else's best interests in mind. Protip: They don't.
The Cloud isn't about efficiency, it's about data control. Getting people's systems and data into Google/AWS/etc helps with data mining, vendor lock-in, etc. Often times that can be efficient, but also it often isn't.
They could tinker with the binaries, something many did with game binaries. But your point is well taken; open source is also very valuable to innovation.
The apps on top, Facebook, Snapchat, etc., are not so open and much of what they do is out of reach from the user.
Also, I meant to add above: People could tinker with data files (e.g., Word docs), configurations, etc. The whole system was local and accessible. You could write local code, such as VB or for Windows, that integrated with those systems.
Putting your savings under the mattress instead of in a bank account wouldn't have prevented the Great Depression either.
The only thing it would have accomplished is making your savings easier to steal.
I agree. It only makes sense if you need special data for statistics, AI training, etc.
In all other cases the classic way of programming on pc and notebook is smarter. If you do everything in the cloud, what if you lose Internet connection? I had that experience several times over the last years.
* Most Internet usage is via smartphone
* Computers are much more stable than they used to be
* Much of the world lives in places with less stable connections
* The most expensive spec in an Internet connection is availability. You can get a low-end 15 Mbps connection with no availability guarantee for $40/month; a T1 is one-tenth the speed and costs 10 times as much (all numbers are rough estimates).
It's only so many heavy industries that definitely need some sort of local infrastructure (probably not master tho) to be locally.
It could end up that way, but lacking INSERT and UPDATE will likely limit this to a niche market for now.
Do you have a sense of what that limit is?
There's a pretty big price difference between Spanner and Aurora at the entry level so it's useful to explore this.
Per their pricing page it looks like the largest instance available is a "db.r3.8xlarge", which is a special naming of the "r3.8xlarge" instance type which is 32 cpus and 244gb of memory.
That's a hell of a lot of capacity to exhaust, especially if you're using read replicas to reduce it to only/mostly write workloads. Obviously it's possible to use more than this, but the "sheer scale" argument is a bit of a flat one.
Let me present a quote to you:
512 kb ram ought to be enough for everybody
I don't have a good idea what the upper limit is for an Aurora database setup.
See here for more info: https://aws.amazon.com/rds/aurora/faqs/#high-availability-an...
For the former, I don't think they specify beyond "automatic".
For the latter, "service is typically restored in less than 120 seconds, and often less than 60 seconds": http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora...
This is a globally-available, nearly-CAP-beating datastore that powers one of the biggest websites on the internet.
It's not quite apples and oranges, but this is definitely a different problem they are solving.
Doesn't strike me as a business with complex logic.
And their needs are reasonably complex. They use machine learning and big data analytics to generate the list of videos that you should be watching. In order for those to work they need to capture a whole raft of end user metrics e.g. at what point you paused video X.
Netflix was given as an example of scale. I guess for another example, spanner could be used to store every visa transaction
For example, see this benchmark:
from this article:
To be fair, Spanner's cross-region service is coming "later 2017".
It's equivalent, with different (unknown) constraints. Aurora is specifically for scaling workloads in the same way. You can say it's horizontal (machine) over vertical (resource) but it's all a matter of accounting.
The big nono is the Spanner pricepoint. I will stick with Aurora for scaling based on traffic I use, over pricey timeslices.
You would have to have quite a load to justify the switch from cheaper de jour solutions right now (AWS). Relying on the few that do, is a risk.
This pendulum swings. We're pretty near the apex now. A little work on ergonomics and these tools could be turn-key, and back we go to decentralized hardware.
That's what the ExaDatas are supposed to be.
This is not true. Oracle is far more than a database company nowadays in the same way that Microsoft is more than Windows. Oracle has been acquiring high-growth startups at a significant rate.
Which has their 'cloud services' doubling their contribution to revenue year over year and licenses losing 50% of their contribution to revenue year over year.
There 'cloud' collateral is pretty opaque though.
You're still buying the same stuff, but you're outsourcing your dev ops to them on top of it (which may not be a bad thing).
it's actually hard to beat MySQL for a lot of things. i was skeptical about this when I joined google, but as an SRE on the MySQL team around this time, I gained a lot of respect for it.
I don't know if anyone put it to them that way but as Spanner was just getting started when I left I know that one of its success criteria was to be able to be a scalable replacement for MDB. Given the white paper and other papers on their results, I'm sure it managed that requirement.
 MDB, Machine Data Base, used throughout the org but especially in Platforms and SRE to keep track of machines and their parts.
Everything you need to know is here: https://research.google.com/pubs/pub38125.html
it walks through the architecture of the Ads DB, the issues with replacing MySQL, and some of the heroic efforts to implement it via Spanner.
At the time, I asked the team why they used MySQL instead of Postgres (which I prefer) and the short answer was: MySQL replication worked at the time.
1)Hardware - Gobs and Gobs of Hardware and SRE experience
"Spanner is not running over the public Internet — in fact, every Spanner packet flows only over Google-controlled routers and links (excluding any edge links to remote clients). Furthermore, each data center typically has at least three independent fibers connecting it to the private global network, thus ensuring path diversity for every pair of data centers. Similarly, there is redundancy of equipment and paths within a datacenter. Thus normally catastrophic events, such as cut fiber lines, do not lead to partitions or to outages."
2) Ninja 2PC
"Spanner uses two-phase commit (2PC) and strict two-phase locking to ensure isolation and strong consistency. 2PC has been called the “anti-availability” protocol [Hel16] because all members must be up for it to work. Spanner mitigates this by having each member be a Paxos group, thus ensuring each 2PC “member” is highly available even if some of its Paxos participants are down."
Anyone know how exactly this is defined for them? (Time? Queries? Results?)
Five-9s means 5 minutes of downtime per year.
>> This feature is not covered by any SLA
So I would guess that you don't get _any_ guarantess. Not five nines and not even one nine.
MTTR is bounded by reelection latency, rather than replica recovery, although you still may eat a write amplification cost for rereplication.
write amplification is 3-5x of non-quorum-backed 2PC system, depending on replication ensemble size.
google further multiplies write amplification with geo-redundancy, so bump that WA by another 3x+.
it's an insanely high cost to pay for availability, but for an advertising company it's important to count the beans accurately.
EDIT: Found quite a bit of my answers in your linked article:
> Cloud Spanner uses a SQL dialect which matches the ANSI SQL:2011 standard with some extensions for Spanner-specific features. This is a SQL standard simpler than that used in non-distributed databases such as vanilla MySQL, but still supports the relational model (e.g. JOINs). It includes data-definition language statements like CREATE TABLE. Spanner supports 7 data types: bool, int64, float64, string, bytes, date, timestamp.
> Cloud Spanner doesn't, however, support data manipulation language (DML) statements. DML includes SQL queries like INSERT and UPDATE. Instead, Spanner's interface definition includes RPCs for mutating rows given their primary key. This is a bit annoying. You would expect a fully-featured SQL database to include DML statements. Even if you don't use DML in your application you'll almost certainly want them for one-off queries you run in a query console.
> Though Cloud Spanner supports a smaller set of SQL than many other relational databases, its dialect is well-documented and fits our use case well. Our requirements for a MySQL replacement are that it supports secondary indices and common SQL aggregations, such as the GROUP BY clause. We've eliminated most of the joins we do, so we haven't tested Cloud Spanner's join performance.
This seems like it'd prevent any kind of easy switch over to Spanner.
Disclaimer: I work on Cloud Spanner
And yeah it makes it sound like writing an OEM adapter will be much more difficult.
>> Cloud Spanner doesn't, however, support data manipulation language (DML) statements. DML includes SQL queries like INSERT and UPDATE. Instead, Spanner's interface definition includes RPCs for mutating rows given their primary key.
Does this mean I need to rewrite my application?
My application uses an ORM and it typically converts my logic to SQL statements and fires them off to Postgres. Would I need to change it such that it doesn't issue INSERT / UPDATE statements?
The join performance is by far the most interesting part of this to me. A more traditional NoSQL solution sounds like it would have worked just as well for you, sans all the atomic clock fanciness. Joining across geographically disparate data is a real trick, and it seems like there would be some physical performance limits?
No, why? Query can be executed in parallel.
BTW, isn't 20k/sec is a very very small performance for 30 node installation. Cassandra can handle 50k+ (both writes and read) on a single node. When in most queries you are trying to collect data from many nodes it will scale almost linearly.
And yes Cassandra will scale linearly-ish as long as you're in the same datacenter. Try running a geo-distributed 30-node Cassandra ring and it's a whole different story at that level of consistency and availability.
Most cases I've seen latency can affect throughput plenty so I doubt your assertion that it won't affect throughput quite a bit. Even more so for anything that relies on TCP/IP.
I'd highly recommend reading the Bigtable and Spanner papers first and maybe then we can have a sensible and fruitful argument.
Google prefers building advanced systems that let you do things "the old way" but making them horizontally scalable.
Amazon prefers to acknowledge that network partitions exist and try to get you to do things "the new way" that deals with that failure case in the software instead of trying to hide it.
I'm not saying either system is better than the other, but doing it Google's way is certainly easier for Enterprises that want to make the move, and why Amazon is starting to break with tradition and release products that let you do things "the old way" while hiding the details in an abstraction.
I've always said that Google is technically better than AWS, but no one will ever know because they don't have a strong sales team to go and show people.
This release only solidifies that point.
Spanner doesn't exactly hide the details, but it lets you make transactions that span multiple shards. You still eat the cost of the transaction, you're just free from having to implement it at the application level, which is a more difficult and error-prone way of doing things. The bottom line is that if you need consistency, it needs to be implemented somewhere in your stack. If you don't need consistency (analytics workloads come to mind) then you have more flexibility with your database.
Disclosure: Google employee, reconstructing what I know from published information.
Discolsure: Also a Google employee, also reconstructing.
Google: Make unique products that push the boundaries of what was previously thought possible.
Amazon: Don't care about inefficiencies and usage. Inefficiencies can be handled by charging more to the clients, usage doesn't matter because the users are mostly the clients and they don't feel their pain.
Google: Had to make all their core technologies efficient, performant, scalable and maintainable or they couldn't sustain their business.
Amazon is philosophy is being 'close to the metal' to allow Enterprise customer to migrate 'regular apps' into a 'regular environment' in the cloud.
Most of Google's offerings are (at least were) novel, but proprietary ways of doing specific things.
Amazon is not a laggard: they have provided a number of interesting and useful 'helper' things to facilitate IaaS - as well as a number of 'pure cloud' type things.
Amazon is very, very customer focused. Their products come from customer demands.
Google often 'cool things they've done internally' and exposes them, hoping that they might have some use-case in the rest of the world.
Google and Amazon are equally interested in profit.
Which Amazon totally didn't have to do with their firehose of cash?
Amazon runs nothing, it's an outsourcing firm. They needed to make services "good enough" to be sold. If a service is somewhat inefficient, it just charges the clients more to cover the costs.
Technologies reflect the business they were created in.
What the fuck are you talking about. It's one thing to say AWS services are "good enough" but "Amazon runs nothing" is a ridiculous statement.
I suspect that Google knows this, and their reputation for have poor customer support and sales comes from that knowledge.
AWS prioritizes building blocks that support very high throughput and avoid leaky abstractions at all costs, and they're happy to push forward as long as these criteria are met. IMO they really succeed at this goal. Minus specific bugs that they're generally good about acknowledging, their services reliably do what they say they're going to. And they definitely solve a lot of problems for you, even if sometimes you're still required to get further into the weeds than you might want.
I'll buy that Google Cloud is better at questioning underlying assumptions and sometimes succeeds in releasing higher-level abstractions than AWS without any leakiness (a great example of this now being Spanner vs Aurora). It also feels to me that with releases like this Google is leveraging the full value of their own experiences running their services, and seems to be more advanced than amazon in some areas so this has a lot of value, whereas AWS seems to build a broader range of products with a specific customer in mind which is not necessarily themselves (e.g. all of their move your on-prem stuff to the cloud helpers).
If you consider Spanner vs Dynamo, it definitely matches up as Google wrapping the old way and Amazon forcing a new way (though to be fair, Dynamo was released 5 years earlier). But on the other hand considering Spanner vs Aurora, Amazon is the one embracing the old way with full MySQL & Postgres compatibility whereas Spanner sounds like a pretty dramatically different subset of SQL in not supporting insert and update statements. It's a very reasonable compromise for basically getting to ignore the CAP theorem, but it is a a significant difference that every developer will have to learn.
(work on google Cloud)
Google tried that a decade ago and found it lacking, this is why Spanner exists in the first place.
You're both right.
I don't think it's easy to port existing applications to use it and in the end you will still need to accommodate shortcomings in your application.
Either way, they are trying to abstract away having to think about eventual consistency with this offering.
The thing thats really different here is Google are basically saying, heres this awesome system, yes it has obvious risks from partitioning, we are going to stake our reputation on those partitions not happening.
In contrast AWS are saying, this is DynamoDb, its really limiting but because of those limitations it should be pretty reliable as long as you write your application correctly.
It will be interesting to see if Microsoft and Amazon have to follow Google's lead here.
No, they said this in the F1 RDBMS or Spanner papers. They originally did the NoSQL, eventual-consistency type of stuff. This had app developers required to do a lot of work to avoid problems that model can create. Apparently, even their bright people had enough problems with it that they decided to eliminate or simplify that situation with stronger consistency. Took some brilliant engineering but now they have a database easy to use as old model with advantages of newer ones.
If anything, they learned some hard lessons with a good solution to them. Now, they're offering it to others. I was hoping they'd do this instead of keep it internal only. F1 and Spanner are amazing tech that could benefit many companies.
As cutesy of a sentiment as it is, it's also full of misconceptions. The pens were invented by an American corporation that wanted better pens to sell in general (a smoother flow in a pen, regardless of gravity/orientation, is a better pen), and they saw a good opportunity to market the pen to NASA for use in space. Both NASA and the Russians used pencils in space, but the problems with pencils is the flakes can pollute an environment pretty quickly in low gravity and the pens turned out to be a much better solution. (So far as I've heard, every space agency these days buys similar pens.)
I meant the differences in design philosophy that permeate aerospace engineering on both sides. Russian, built ugly but for strength and longevity. American, built for high capability with finesse and finer tolerances. The emergent properties of these different principles explain why Soyuz is still a preferred launch vehicle, but it was the Americans who got to the moon and operated the STS.
Amazon is more like the Russians: built in the knowledge that things fail, but less magical as a result. Google is more like the Americans: remarkable technology, you just need a herd of geniuses to run it.
It is an interesting analogy this dichotomy you see in the design philosophies (both between the space programs and the mega-corporations), but perhaps my point, if I were attempting a point, is to beware of false dichotomies.
However, by 'abstracting' this away, you're not being forced to think about failure domains. If there is ever a massive country-wide connectivity break to the wider Internet (feasible for lots of people inside censored countries), you'll be pretty pissed when you can't use the DB services for your servers in the Google-local datacenter that you still have connectivity to because it can't get quorum.
That's the motivation behind both Spanner and F1: take the experience of how painful it is to do transactions on a Regional or Global level, and never make individual teams do it again.
I see it a bit like "Don't roll your own crypto". Clearly some people are exempted from it, but you better be able to tell me why you get an exception.
Disclosure: I work on Google Cloud and want you to pay for Spanner :).
1. Defining high availability in terms of how a system is used: "In turn, the real litmus test is whether or not users (that want their own service to be highly available) write the code to handle outage exceptions: if they haven’t written that code, then they are assuming high availability. Based on a large number of internal users of Spanner, we know that they assume Spanner is
2. Ensuring that people don't become too dependent on high availability: "Starting in 2009, due to “excess” availability, Chubby’s Site Reliability Engineers (SREs) started forcing periodic outages to ensure we continue to understand dependencies and the impact of Chubby failures."
I think 2 is really interesting. Netflix has Chaos Monkey to help address this (https://github.com/Netflix/SimianArmy/wiki/Chaos-Monkey). There's also a book called Foolproof (https://www.theguardian.com/books/2015/oct/12/foolproof-greg...) which talks about how perceived safety can lead to bigger disasters in lots of different areas: finance, driving, natural disasters, etc.
I became a way better winter driver when I started intentionally fishtailing in snow and ice (in low risk situations).
Google launching Spanner is generally a positive thing for our industry and our product. It's more proof that what we're aiming for is possible and that there's demand for it. We expect that in five years, all tech companies will be deploying technology like ours.
One of the big differences is that Spanner only uses SQL for read-only operations, with a custom API for writes. We use standard SQL for both reads and writes, which means we also work with major ORMs like GORM, SQLAlchemy, and Hibernate (docs should be live today or tomorrow). Spanner's custom write API will make it difficult to work with existing frameworks, or to convert an existing application to Spanner.
Cloud Spanner only works on Google Cloud and is a black-box managed service. CockroachDB is open source and can be run on-prem or in any cloud on commodity hardware. (We don't offer CockroachDB as a service yet, but may in the future)
At this point, both products are still in beta and are still missing features like back-up and restore (according to the Quizlet blog post). We plan to launch CockroachDB 1.0 with back-up / restore enabled.
* For anyone wanting to know more about how we make CockroachDB work without TrueTime, see our blog post: https://www.cockroachlabs.com/blog/living-without-atomic-clo...
No startup will be able to replicate that anytime soon, a lot of time (and money) has been put into it by a lot of people over a long time.
Could any government? Has any government?
My impression is that, infrastructure wise, Google is genuinely in a class of size one.
How much more infrastructure do they have besides AWS? How much does Google have besides GCP?
I don't think you can draw any definitive conclusions from this, but calling it a class of size 1 or 2 is probably an overstatement of Google (+/- Amazon)'s advantage over Microsoft at least.
NSA's annual budget is $50bn. U.S. military budget is about $600bn.
Google's revenue is $90bn and they don't spend all of it.
> A simple statement of the contrast between Spanner and CockroachDB would be: Spanner always waits on writes for a short interval, whereas CockroachDB sometimes waits on reads for a longer interval. How long is that interval? Well it depends on how clocks on CockroachDB nodes are being synchronized. Using NTP, it’s likely to be up to 250ms. Not great, but the kind of transaction that would restart for the full interval would have to read constantly updated values across many nodes. In practice, these kinds of use cases exist but are the exception.
CockroachDB is waiting for time keeping hardware to improve.