IMO, what Google realized they needed was a technology prevent people from getting more dependent on AWS. Being open source is key to that goal - if they just offer another proprietary service, its hard to go to someone and tell them not to use an AWS proprietary service but to instead use a GCP one. However, with Kubernetes being open, it means that users can host their own cluster and make it easy for them to move between cloud providers. For customers on AWS, I feel like this provides a bit of an incentive not to use too many AWS proprietary services or otherwise they lose the advantage of mobility. For everyone else, GCP gets to offer the strongest Kubernetes implementation and draw customers in that way.
Its pretty win-win for Google. The article talks about the decision to open source Kubernetes as some sort of decision for the betterment of the world. But if that were really the only reason, why not open source Spanner of BigTable? And that's not to say that Google did anything wrong - corporations are designed to make decisions that are in their own best interests. What I think is interesting, is that this is a really powerful example of how sometimes things that are generally beneficial to the world AND business interests line up together, and when that does, kinda cool things can happen.
Because it's genuinely difficult to open source things like that. They're way too tied to Google and have many closed-source dependencies on internal libraries and services. I think they regret missing the opportunity on some of these and that may reflect why Google has started open sourcing more 'core' things like gRPC and Bazel - it might make it easier.
At a high level Google is open sourcing things that have genuine value, but also make integration with their money making services easier.
We do use gRPC internally, but the bulk of services were built prior to its existence/maturity and use its predecessors. My expectation is that services will eventually migrate, but that's a long way out. I'd say it's less to do with performance than a clear reason for teams to migrate, and several compelling reasons not to (yet).
Bazel is mostly a subset of Blaze, and many packages would "just work" if pulled into a Bazel workspace. Those that wouldn't are mostly broken because they rely on proprietary (but in many cases deprecated) Blaze features. I see a handful of changes go by every week migrating packages to newer Bazel-friendly definitions (a big one here is the historical handling of proto_library rules).
Kubernetes is used for some things built on GCP, but that's mostly suitable for green-field work (it's easier to integrate with existing things on Borg if you just run on Borg).
Google SWEs, like most software engineers, have a lot of work to do, and “learn a new framework that is sort of the same as the framework you’re using now and convert the entire code base to it” is pretty low on anyone’s list. We all have to do it from time to time, but you usually wait until you’ve had a few new employees learn the new framework and use it on a couple of smaller new projects, and then wait until you’re doing a big enough rewrite that the framework change is noise.
My guess is that the the major differences between an open source Google project and an equivalent internal one are (1) mistakes that are too deep in the original design are corrected, (2) some edge-case features are not immediately available, (3) the software is written to be run without a team of 10 SREs managing it, and (4) some novel bugs are created that will take a few iterations to correct.
I’d be surprised if there was a significant “performance” difference; the people writing it will probably be able to benefit from the experience on the first implementation to not make too many mistakes (and they have solid performance targets to shoot for, as opposed to not knowing what an efficient implementation should look like).
Everyone complains about (2) and (4), which are the inevitable result of developing software in the open, but you cannot imagine how wonderful (1) is. I use k8s for my projects at home and borg for my projects at work, and dear god, the configuration language differences alone. (Don’t spit into the wind; don’t tug on Superman’s cape; don’t become the borg configuration expert for your team.)
Are you sure about that?
As far as I know gRPC is literally just as good as Bazel, which is why Google is even migrating to it internally.
For instance this comment agrees with me: https://news.ycombinator.com/item?id=12348286
Honestly though? It'd take a _very_ demanding workload such that your RPC system was the bottleneck (so long as they're within constant factors of each other). There are services like that, but they're the exception and not the norm. Most services don't need to do 100kQPS/task. Even then, at that point you're spending a lot of time on serialization/deserialization, auth, logging, etc.. Your service is more than its communication layer, even if that's important to optimize it's still just a minor constant factor.
The real problem is inertia. There's a lot of code/tools/patterns built up around Stubby and the semantics of Stubby (including all its features which likely haven't been ported to gRPC yet) and that's difficult to overcome.
Our #1 use of gRPC so far I would imagine is at the edge. gRPC is making its way into Android apps since it's pretty trivial for translating proxies to more or less 1:1 convert gRPC to Stubby calls.
Totally agree that world-facing APIs will all be gRPC and that makes perfect sense to me.
I'm not sure where I said that, but yes, that's part of the switching cost.
> The fact is that the highly demanding services have the huge majority of the resources, and are the most sensitive to performance issues. If your service uses 10% of Google's datacenter space, you won't accept a 5% or even 1% regression just so you can port to gRPC,
The thrust of my statement was that for many services, RPC overhead is minimal. So even a 2x or 3x increase in RPC overhead is still minimal. I agree, a 5% increase in resource utilization for a large service is something that would be weighed. But lets explore that idea for a moment:
> because at that scale your team can just staff someone or even several people to maintain the pre-gRPC system forever and still come out ahead on the budget.
Not necessarily. Engineers are expensive and becoming ever more expensive while computing resources are becoming increasingly cheaper. Not only that, but engineers tend to be more specialized and so you can't just task anyone to maintain the previous system, it tends to be people with deep expertise already. And those people also have career aims to do more than long-term support of a deprecated system, so there's retention to be considered.
Pretending for a moment that all your services except a small handful moved on to somme system B from some system A, if the maintenance burden of maintaining system B starts to eclipse the resource cost of moving to system A (which decreases all the time due to improvements in system B and the increasing cost of maintaining system A, and the monotonic reduction in computing resource cost), then you might well just swallow the 5%-10% increase in resources either permanently or temporarily and come out ahead in the end.
Additionally, as system B moves on, staying on system A becomes increasingly risky: security improvements, features, layers which don't know about system A anymore all threaten the stability of your service. If you've checked out the SRE book, you'll know that our SLOs are more important than any one resource. If nobody trusts your service to operate, then they won't use it and then you won't have to worry about resources anymore since the users will have moved on.
To reiterate the point above, these roles tend to be fairly specialized and hard to staff. Arguably these same engineers are better tasked making system B good enough to switch to so you can thank system A for its service and show it the door.
Bringing this back to Stubby vs. gRPC, it's a pretty academic argument so far. They're both here to stay. And honestly, when we say "Stubby" there's already different versions of Stubby which interoperate with each other and gRPC will not be any different. Likewise, we still use proto1 in addition to proto2 and proto3 (the public versions) since that just takes time and energy to fix.
We do make these kinds of decisions every day, and it's not always in favor of reduced resources. If we cared for nothing other than resource utilization, we'd be completely C++, no Java, no Python. Realistically, the cost of maintaining systems with equivalent roles can often lead to one or the other winning out, usually in favor of maintainability so long as their feature sets are roughly equivalent. We're fortunate to be in a position that we can choose code health and uniformity of vision over absolute minimum resource utilization. And again, even if we choose system B (higher resources) over system A, perhaps due to the differences in architecture or design choices the absolute bar for performance of that system will be greater than system A, despite starting lower. Sometimes it takes a critical mass of adopters to really shake out all those issues.
I know that quotes from Knuth are often trotted out during these kinds of discussions, but it's true: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."
That 3% is where we choose to spend our effort, and that critical 3% includes the ability of our engineering force to make forward progress and not be hindered by too much debt. It also includes real data, check our Google Wide Profiling .
> Totally agree that world-facing APIs will all be gRPC and that makes perfect sense to me.
Probably not all. We still fully support HTTP/JSON APIs, but at least in our little corner of the world we've chosen to take full advantage of gRPC.
Anyways, thanks for letting me stand on my soapbox for a bit.
My point isn't that Google could directly open source those technologies. Nor is it that they should create open source versions of them. I'm more just musing on how when the selfish reasons to do something line up with the good reasons to do something, you tend to get cool results.
At a previous job, we had this process that everyone was supposed to do via a script. Doing the process via the script meant we got the logging that we wanted and such. Everyone was supposed to do it this way. However, there were no technical controls preventing anyone from doing that process by hand. We just told people not to. So, one day, a colleague asks me how to make sure that everyone does the process the right way. And my answer is that everyone already does - the script is way easier to run that to do it by hand. The good thing to do (run it the way you were told to), lined up with the selfish reason (it was way easier to do that way). Making sure those two things were in alignment was effectively as powerful as making it impossible to do the wrong way. (since then, the process has been significantly changed).
I feel like its a similar case here. The good thing to do was to create a cool technology and make it free for everyone. The selfish thing to do, was to create a technology that made business sense. The result, is Kubernetes was created as open source (at least, thats my theory). If that wasn't the good thing to do, maybe the person writing the article doesn't talk to the VP on the shuttle about it and it never happens. If it wasn't the selfish thing to do (business case), maybe the VP wouldn't approve it. But, getting those two things aligned, results in positive reinforcement and results.
>We always believed that open-sourcing Kubernetes was the right way to go, bringing many benefits to the project. For one, feedback loops were essentially instantaneous — if there was a problem or something didn’t work quite right, we knew about it immediately. But most importantly, we were able to work with lots of great engineers, many of whom really understood the needs of businesses who would benefit from deploying containers (have a look at the Kubernetes blog for perspectives from some of the early contributors). It was a virtuous cycle: the work of talented engineers led to more interest in the project, which further increased the rate of improvement and usage.
Of course, open sourcing software was never about charity - but I do think that it was always a community thing, and big companies choosing to be a part of that community are neither selfish or selfless for simply participating.
Strongly agree. And to be clear, I didn't use the term "selfish" as a pejorative - I think its pretty reasonable to expect any entity to analyze how a decision will directly benefit them and to act accordingly.
Google open sourcing spanner would have helped tremendously.
At the time of k8s inception GCP was behind AWS in UX and features. App engine had (very arguably) failed, if only from its well deserved high potential. The allure of kubernetes tied people over while Google played catch-up. Now many of Google's services are superior to AWS (very arguably). Developers notice the better network performance, the improved GCP console UI, the GCP mobile app, etc.
Now with kubernetes maturing to the point of production usability Google's sales team is starting to outperform AWS. AWS will have dominate market share for many years to come, but GCP is in the drivers seat.
The generous free tier of GCP (both the $300 credits and the free-tiers of various services) are a big help as well.
I've never liked that Google feels the need to couch everything in making the world better.
Their open source strategy has mainly been strategic business moves to stop competitive threats. Android was about stopping Apple from dominating mobile OSs. Kubernetes was about stopping AWS from dominating cloud. TensorFlow was about making sure Nvidia didn't dominate deep learning.
Also, they learned a lesson from Hadoop. If they'd open-sourced Google MapReduce, the cloud industry might look very different.
Almost all their big open source projects are to mitigate existing threats, and it’s working out very well for them.
It’s annoying to see them trying to frame it as a “public service”, but guess that’s how to do proper PR.
> with the launch of our Infrastructure-as-a-Service platform Google Compute Engine, we noticed an interesting problem: customers were paying for a lot of CPUs, but their utilization rates were extremely low because they were running VMs. We knew we had an internal solution for this.
> most importantly, we were able to work with lots of great engineers, many of whom really understood the needs of businesses who would benefit from deploying containers
That does not read like a decision to pursue this for the betterment of the world. It sounds like a decision to pursue this for the betterment of their customers. That's capitalism, not benevolence, and they're pretty transparent about it.
Leaving out the major business rationale for k8s makes the article seem childish.
There's a lot that goes into turning LevelDB into a real database. :)
This whole article is like “I talked to my VP and convinced him to open source internal tool. We at Google are the best and everything we have build is the best and here’s the link for free trial.”
And how the heck did you got “number of years” worth of coding number?
> But if we do that and make people believe that containers - the massive, heavy, broken abstraction - are the future, and provide complicated infrastructure that will magically fix the problems of containers, and then also provide this complicated infrastructure managed - this will be our way to beat AWS. It will be difficult and clumsy to set it up on your own - you need many machines to set up the cluster so that it's "Google-scalable" and "fault-tolerant" so for most people and companies it will be way too much hassle and too expensive to manage their own cluster purely on VM or physical instances. We "just" provide you with the best managed infra, because c'mon - it will be open source and even given up control on paper - but everyone will associate it with us, we will make sure to have podcasts, and blogs, and marketing that talk about containers, the future and how G created this project. Best people will help us build it "out in the open" and then when we hire them, it will be easy to teach them this Borg monstrosity that we have here. So you know win-win-win - devs think they solved their Docker-is-shit problem with the magic of Kubernetes (yeah, I got a name for it already) so G is now savior; it's a win for GCP; it's a win for hiring.
> [Urs]: Now we talkin...
This seems like a red flag for management at Google, if the best way to pitch ideas up the chain is hoping you can ambush someone important on the company shuttle.
That changed over the past few years; now I would easily recommend that startups funded by GV use Google Cloud if they wanted to.
The last thing you want is to be fiddling with k8s when you have 3 whole users and $0 revenue.
Unless your niche strongly requires fancy architecture start with a $5 VPS and take it up from there.
You will never look back and think "shit if only I had used k8s on day one it wouldn't have failed".
More often it's closer to "shit if I had focused on non-tech stuff more maybe it would have gone somewhere ... instead I spent 2 months fiddling with YAML files".
Also if you host a bunch of side projects, it's actually better in terms of resource utilization and separation of projects, it dynamically schedules pods based on resource requests across the nodes it has at hand so you can host several projects fully separate on a single or a handful of nodes depending on requirements.
The opposite of regrets. I suggest k8s or serverless for almost anyone...
Edit: Though Google could be credited with introducing containers to Linux, as they added the main missing piece, cgroups, to the Linux kernel.