
From Google to the world: the Kubernetes origin story - striking
https://cloudplatform.googleblog.com/2016/07/from-Google-to-the-world-the-Kubernetes-origin-story.html
======
dagenix
Its an interesting article. Though, I kinda feel like we're missing a good
chunk of the story. As dominate as AWS is now, it was probably even more
dominate in the time period we're talking about due to really haven gotten a
big head start. And every time that AWS rolled out some new proprietary
feature, it tied users even more tightly to the AWS platform.

IMO, what Google realized they needed was a technology prevent people from
getting more dependent on AWS. Being open source is key to that goal - if they
just offer another proprietary service, its hard to go to someone and tell
them not to use an AWS proprietary service but to instead use a GCP one.
However, with Kubernetes being open, it means that users can host their own
cluster and make it easy for them to move between cloud providers. For
customers on AWS, I feel like this provides a bit of an incentive not to use
too many AWS proprietary services or otherwise they lose the advantage of
mobility. For everyone else, GCP gets to offer the strongest Kubernetes
implementation and draw customers in that way.

Its pretty win-win for Google. The article talks about the decision to open
source Kubernetes as some sort of decision for the betterment of the world.
But if that were really the only reason, why not open source Spanner of
BigTable? And that's not to say that Google did anything wrong - corporations
are designed to make decisions that are in their own best interests. What I
think is interesting, is that this is a really powerful example of how
sometimes things that are generally beneficial to the world AND business
interests line up together, and when that does, kinda cool things can happen.

~~~
jchw
>why not open source Spanner [or] BigTable?

Because it's genuinely difficult to open source things like that. They're way
too tied to Google and have many closed-source dependencies on internal
libraries and services. I think they regret missing the opportunity on some of
these and that may reflect why Google has started open sourcing more 'core'
things like gRPC and Bazel - it might make it easier.

~~~
Game_Ender
Google does not use gRPC internally it, like k8s, it is an open re-
implementation of an internal google technology. Also, like k8s & Bazel, it is
less advanced and lower performance than their internal technology.

At a high level Google is open sourcing things that have genuine value, but
also make integration with their money making services easier.

~~~
lhecker
> Also, like k8s & Bazel, [gRPC] is less advanced and lower performance than
> their internal technology.

Are you sure about that? As far as I know gRPC is literally just as good as
Bazel, which is why Google is even migrating to it internally.

For instance this comment agrees with me:
[https://news.ycombinator.com/item?id=12348286](https://news.ycombinator.com/item?id=12348286)

~~~
ebikelaw
I think you are misreading that comment. He’s saying that stubby is fast, not
grpc. In fact performance is the big unknown with grpc adoption within google.
It definitely isn’t on par with stubby today and it has to get there before
anyone significant will switch to it.

~~~
Xorlev
I can't comment on raw numbers (because I simply don't have them) but at least
for the service I work on, replacing Stubby with gRPC wouldn't really move the
needle even if it was 2-3x slower (it might be faster, this is just for
illustration) -- we spend our time waiting on IO from other services or
crunching numbers in the CPU. Being a Java service, gRPC/Java might well be
just as fast or faster than Stubby Java, but I could understand that Stubby
C++ has been hyperoptimized over the years vs. gRPC C core which might have a
ways to go. By the latest performance dashboard [1, 2], gRPC/Java is leading
the pack but gRPC C++ doesn't seem like it's slouching too much either. I seem
to remember the C++ impl crushing Java at performance a while back, so I'm
sure that'll change in the future.

Honestly though? It'd take a _very_ demanding workload such that your RPC
system was the bottleneck (so long as they're within constant factors of each
other). There are services like that, but they're the exception and not the
norm. Most services don't need to do 100kQPS/task. Even then, at that point
you're spending a lot of time on serialization/deserialization, auth, logging,
etc.. Your service is more than its communication layer, even if that's
important to optimize it's still just a minor constant factor.

The real problem is inertia. There's a lot of code/tools/patterns built up
around Stubby and the semantics of Stubby (including all its features which
likely haven't been ported to gRPC yet) and that's difficult to overcome.

Our #1 use of gRPC so far I would imagine is at the edge. gRPC is making its
way into Android apps since it's pretty trivial for translating proxies to
more or less 1:1 convert gRPC to Stubby calls.

[1] [https://performance-dot-grpc-
testing.appspot.com/explore?das...](https://performance-dot-grpc-
testing.appspot.com/explore?dashboard=5636470266134528)

[2] [https://performance-dot-grpc-
testing.appspot.com/explore?das...](https://performance-dot-grpc-
testing.appspot.com/explore?dashboard=5652536396611584&widget=1735297033&container=1012810333)

~~~
ebikelaw
You and I seem to be using a different denominator to quantify "most"
services. I'm thinking of it as "most" in terms of who has all the resources /
budget. You seem to be thinking of it in terms of sheer number of services or
engineers working on them. The fact is that the highly demanding services have
the huge majority of the resources, and are the most sensitive to performance
issues. If your service uses 10% of Google's datacenter space, you won't
accept a 5% or even 1% regression just so you can port to gRPC, because at
that scale your team can just staff someone or even several people to maintain
the pre-gRPC system forever and still come out ahead on the budget.

Totally agree that world-facing APIs will all be gRPC and that makes perfect
sense to me.

~~~
Xorlev
> You seem to be thinking of it in terms of sheer number of services or
> engineers working on them.

I'm not sure where I said that, but yes, that's part of the switching cost.

> The fact is that the highly demanding services have the huge majority of the
> resources, and are the most sensitive to performance issues. If your service
> uses 10% of Google's datacenter space, you won't accept a 5% or even 1%
> regression just so you can port to gRPC,

The thrust of my statement was that for many services, RPC overhead is
minimal. So even a 2x or 3x increase in RPC overhead is still minimal. I
agree, a 5% increase in resource utilization for a large service is something
that would be weighed. But lets explore that idea for a moment:

> because at that scale your team can just staff someone or even several
> people to maintain the pre-gRPC system forever and still come out ahead on
> the budget.

Not necessarily. Engineers are expensive and becoming ever more expensive
while computing resources are becoming increasingly cheaper. Not only that,
but engineers tend to be more specialized and so you can't just task anyone to
maintain the previous system, it tends to be people with deep expertise
already. And those people also have career aims to do more than long-term
support of a deprecated system, so there's retention to be considered.

Pretending for a moment that all your services except a small handful moved on
to somme system B from some system A, if the maintenance burden of maintaining
system B starts to eclipse the resource cost of moving to system A (which
decreases all the time due to improvements in system B and the increasing cost
of maintaining system A, and the monotonic reduction in computing resource
cost), then you might well just swallow the 5%-10% increase in resources
either permanently or temporarily and come out ahead in the end.

Additionally, as system B moves on, staying on system A becomes increasingly
risky: security improvements, features, layers which don't know about system A
anymore all threaten the stability of your service. If you've checked out the
SRE book, you'll know that our SLOs are more important than any one resource.
If nobody trusts your service to operate, then they won't use it and then you
won't have to worry about resources anymore since the users will have moved
on.

> because at that scale your team can just staff someone or even several
> people to maintain the pre-gRPC system forever and still come out ahead on
> the budget.

To reiterate the point above, these roles tend to be fairly specialized and
hard to staff. Arguably these same engineers are better tasked making system B
good enough to switch to so you can thank system A for its service and show it
the door.

Bringing this back to Stubby vs. gRPC, it's a pretty academic argument so far.
They're both here to stay. And honestly, when we say "Stubby" there's already
different versions of Stubby which interoperate with each other and gRPC will
not be any different. Likewise, we still use proto1 in addition to proto2 and
proto3 (the public versions) since that just takes time and energy to fix.

We do make these kinds of decisions every day, and it's not always in favor of
reduced resources. If we cared for nothing other than resource utilization,
we'd be completely C++, no Java, no Python. Realistically, the cost of
maintaining systems with equivalent roles can often lead to one or the other
winning out, usually in favor of maintainability so long as their feature sets
are roughly equivalent. We're fortunate to be in a position that we can choose
code health and uniformity of vision over absolute minimum resource
utilization. And again, even if we choose system B (higher resources) over
system A, perhaps due to the differences in architecture or design choices the
absolute bar for performance of that system will be greater than system A,
despite starting lower. Sometimes it takes a critical mass of adopters to
really shake out all those issues.

I know that quotes from Knuth are often trotted out during these kinds of
discussions, but it's true: "We should forget about small efficiencies, say
about 97% of the time: premature optimization is the root of all evil. Yet we
should not pass up our opportunities in that critical 3%."

That 3% is where we choose to spend our effort, and that critical 3% includes
the ability of our engineering force to make forward progress and not be
hindered by too much debt. It also includes real data, check our Google Wide
Profiling [1].

> Totally agree that world-facing APIs will all be gRPC and that makes perfect
> sense to me.

Probably not all. We still fully support HTTP/JSON APIs, but at least in our
little corner of the world we've chosen to take full advantage of gRPC.

Anyways, thanks for letting me stand on my soapbox for a bit.

[1] [https://storage.googleapis.com/pub-tools-public-
publication-...](https://storage.googleapis.com/pub-tools-public-publication-
data/pdf/36575.pdf)

~~~
ebikelaw
Interesting that you allude this the coexistence of C++, Java, Python, and Go
because I think this bolsters my point. The overwhelming majority of services
at Google are in C++. There are individual C++ services that consume more
resources than all Java products combined. I think this speaks to the appetite
for performance and efficiency within the company, since it is demonstrably
the most difficult of these languages.

------
telltruth
I am bit tired by these VP-speak articles. So what really was the discussion
with Eric Brewer? What arguments made him suddenly agreed to open source? Why
was Urs rejected this before? Why are you not allowed to write about this?

This whole article is like “I talked to my VP and convinced him to open source
internal tool. We at Google are the best and everything we have build is the
best and here’s the link for free trial.”

And how the heck did you got “number of years” worth of coding number?

~~~
mlazos
Yeah I felt like barfing when he said “Good ideas usually win out at Google,
and we were convinced this was a good idea.” It’s like dude you’re using an
anecdote about running into a VP on a shuttle ride and saying this? Google is
such a magic non-political place.

------
pi-squared
> ... And, on top of that, you want to open source it?

(speculative continuation)

> But if we do that and make people believe that containers - the massive,
> heavy, broken abstraction - are the future, and provide complicated
> infrastructure that will magically fix the problems of containers, and then
> also provide this complicated infrastructure managed - this will be our way
> to beat AWS. It will be difficult and clumsy to set it up on your own - you
> need many machines to set up the cluster so that it's "Google-scalable" and
> "fault-tolerant" so for most people and companies it will be way too much
> hassle and too expensive to manage their own cluster purely on VM or
> physical instances. We "just" provide you with the best managed infra,
> because c'mon - it will be open source and even given up control on paper -
> but everyone will associate it with us, we will make sure to have podcasts,
> and blogs, and marketing that talk about containers, the future and how G
> created this project. Best people will help us build it "out in the open"
> and then when we hire them, it will be easy to teach them this Borg
> monstrosity that we have here. So you know win-win-win - devs think they
> solved their Docker-is-shit problem with the magic of Kubernetes (yeah, I
> got a name for it already) so G is now savior; it's a win for GCP; it's a
> win for hiring.

> [Urs]: Now we talkin...

~~~
tnolet
That made me chuckle. I’m afraid it might even be half true. Someone should
write a good blog post on how Kubernetes is a tech marketing success story.
Same could be said for Docker of course.

------
keypusher
_A turning point was a fateful shuttle ride where I found myself sitting next
to Eric Brewer, VP of Cloud, [...] Soon after, we got the green light from
Urs._

This seems like a red flag for management at Google, if the best way to pitch
ideas up the chain is hoping you can ambush someone important on the company
shuttle.

~~~
traskjd
You say red flag, others might say 'how the world really works'. There's a lot
more luck and talking to the right person than we'd like to admit in this
world.

~~~
mpweiher
To me "this is a red flag" and "how the world really works" are not
incompatible statement.

------
eloff
The strategy appears to be working, I notice GitLab announced a migration from
Azure to GCP and the reason given is to have better Kubernetes support.
[https://about.gitlab.com/2018/06/25/moving-to-
gcp/](https://about.gitlab.com/2018/06/25/moving-to-gcp/)

~~~
ATsch
I'm sure the fact that Google ventures is afaik gitlab's biggest investor has
a lot to do with it too.

~~~
williamchia
GV operates independently of Google (part of why the name changed officially
form “Google Ventures” to GV) GitLab parters with and supports multiple clouds
- e.g. GitLab announced official support for Amazon EKS, and IBM Cloud runs
GitLab Core for version control.

~~~
dekhn
GV isn't truly independent. Best way to describe it is that multiple companies
in Alphabet have access to Google's tech stack and resource manager. In the
early days, I advised against GV trying to get its funded companies to GCP
because it wasn't mature enough and I wanted the startups to just be
successful, not be political statements about Google's cloud.

That changed over the past few years; now I would easily recommend that
startups funded by GV use Google Cloud _if they wanted to_.

------
rhymenoceros
Title needs a (2016).

------
iamgopal
I am soon to venture in to SAAS business. Should I be interested in kubernetes
? Is old way of doing things still relevant?

~~~
borplk
If you are indie avoid it until you have 100 users and some revenue.

The last thing you want is to be fiddling with k8s when you have 3 whole users
and $0 revenue.

Unless your niche strongly requires fancy architecture start with a $5 VPS and
take it up from there.

You will never look back and think "shit if only I had used k8s on day one it
wouldn't have failed".

More often it's closer to "shit if I had focused on non-tech stuff more maybe
it would have gone somewhere ... instead I spent 2 months fiddling with YAML
files".

~~~
Arqu
I strongly disagree. Initially, I've been reluctant to do so myself, however,
once I read a bit more of it, it's actually fairly simple for simple things
and just a bit more complex for very complex stuff. It's no harder or more
involved than setting up a VM on your own. The cost is also pretty much the
same for small scale as you basically have free k8s masters by now.

Also if you host a bunch of side projects, it's actually better in terms of
resource utilization and separation of projects, it dynamically schedules pods
based on resource requests across the nodes it has at hand so you can host
several projects fully separate on a single or a handful of nodes depending on
requirements.

------
jonny_eh
Wait, so Kubernetes is Google's take on Docker, invented independently?

~~~
tybit
No it’s Googles take on container/compute task scheduling, deciding when and
on which machines a task (e.g a docker container) should run. Docker Swarm is
Dockers offering in the same category as Kubernetes.

Edit: Though Google could be credited with introducing containers to Linux, as
they added the main missing piece, cgroups, to the Linux kernel.

~~~
throw2016
Cgroups are not container specific and are used to limit resources to
processes ie cpu, memory. Namespaces are what makes Linux containers possible.

