What I just can't decide is whether we'll successfully put another layer of container-container-containers on top of Kubernetes, or whether the whole effort will eventually collapse and we'll extract the good parts out into something simpler.
Not really. It's pretty much the same level of complication if you used the same components. You could use all the same tools: chef, puppet, ansible, etc. Once you have it available though other applications are easier to deploy.
At any rate, this tool provides something entirely different. It lets you image the entire data center and reproduce it somewhere else. Not sure how you would've done that before.
Some of this complexity is definitely incidental, not all components are not always coordinating well with each other, e.g. docker and API server, networking layer during upgrades for example.
On the other hand lots of this complexity is essential - K8s is a distributed system with database, network and container orchestration layer solving a hard problem.
It's already been done and couple solutions exist: docker swarm, nomad. Depending on your stack complexity level.
The problem you've described arises when you try to use the wrong tool for the job, e.g. you're trying to use kubernetes for simple projects with small teams without dedicated OPS and SRE teams.
When you have large and complex infrastructure (e.g. GitLab) with complex networking and balancing level where the kubertenes like tools bring the most value you actually win by making your ops team work like a uber drivers (a little bit exaggerated) making standard decisions in standard environment. You just check the licence (certificates) and your infrastructure just work. No need for customised solutions anymore.
It’s also doing a ton of orchestration that many people just simply weren’t even doing before, or they had a human at a keyboard doing it. There’s a lot of value in that.
All of the package and deployment stuff is moving very quickly because the variety of organizations using k8s is quickly growing and using it for new use cases.
Things like application distribution was never a focus of Borg, because google doesn’t really distribute applications.
They worked very well for 30 years. And the new container-based tools inevitably end up doing the same mistakes, rediscovering the same solutions and we'll go back to the beginning.
> turns out Kubernetes itself is substantially more complicated to package and deploy than the old solutions
Exactly, and you can't solve a problem by making it more complex.
Sure you can. A resource scheduler is more complex than not having one, but it solves the single-point-of-failure problem and the bin-packing CPU+memory problem.
A more complex infrastructure means you can have dumber apps.
And there are lots of areas where this is true: TCP is a complex protocol which makes it easier to build reliable communication, CPUs have complex caches which make simple code faster, RAID makes multiple disks behave like a single disk to improve reliability or performance, compression is very complex (esp for audio/video) but dramatically reduces the size...
The implementation of kubernetes may be flawed, but the idea of kubernetes makes a lot of sense. It solves real problems.
And yet various FAANGs choose not to use a smart scheduler, because it does not improve efficiency and reliability enough to justify its complexity and scales poorly.
Netflix uses Titus and Mesos .
I'm not sure what Amazon uses, but I'm sure they have some sort of system to do this. They offer plenty of managed solutions for customers (including EKS).
Apple's more of a product company, but they seem to use Kubernetes for some things .
And finally facebook apparently has something called Tupperware .
So all the FAANGs use something like Kubernetes to manage infrastructure.
> So all the FAANGs use something like Kubernetes to manage infrastructure.
No, you cannot just hand-wave that they are "like Kubernetes".
At this point, I think the fundamental problems are:
1) People desire to give you a "works out of the box" experience, so they write a packaging system that can install everything. The app depends on Postgres? Fuck it, we'll install Postgres for you! This is where things start to go wrong because self-hosting your own replicated relational database instance is far from trivial, and it requires you to dial in a lot of settings. And, of course, it requires even more settings to say "no no, I already have an instance, here is the secret that contains the credentials and here is its address."
2) Installing software appears to require answering questions that nobody knows the answers to. How much CPU and memory do I give your app? "I dunno we didn't load test it just give it 16 CPUs and 32G of RAM, if it needs more than that check your monitoring and give it more." "I only have 2 CPUs and 4G of RAM per node." "Oh, well, maybe that's enough or maybe it isn't. It won't blow up until you give it a lot of users though, so you will get to load test it for us while your users can't do any work. Report back and let us know how it goes!"
I also noticed that when security people get at the project, it tends to become unusable. I used to be a big fan of Kustomize for manifest generation. Someone decided to build it into kubectl by default, and that it should support pulling config from random sites on the Internet. So now if you use it locally, you can't refer to resources that are in ../something, because what if a remote manifest specified ../../../../../etc/shadow as the config source? Big disaster! So now it doesn't work. (They also replaced what I thought was the best documentation in the world, a kustomization.yaml file that simply used every available setting, with comments, with a totally unusable mass of markdown files that don't tell you how to use anything.)
Obviously security is a problem, but they should have said "just git clone the manifest yourself and review it" instead of "you can't use ../ on your local manifests that are entirely written by your own company and exist all inside the same git repository that you fully control". But they didn't, and now it sucks to use.
2. How Gravity solution (and/or approach) is different from a similar offering by Replicated?
EDIT: Oops, sorry, just found the answer to my Q1: about halfway down on the following page: https://gravitational.com/gravity. However, some information on pricing structure and approximate numbers would be appreciated. Replicated is more transparent in this regard. :-) Q2 still stands. Looking forward to hearing from you.
I love Teleport but don't use k8s anywhere.
What is the workflow for baking a machine? Are you using packer under the covers or some other tooling? What on-prem machine image formats are supported?
For baking the machine, we use Ansible under the covers and have a set of Lisp scripts to manage everything. As for image formats: qcow2, raw, vhd (and vmdk in the .ova file).
Not trying to hijack Gravitational's thread, please contact me (email in profile, or 'aw-' on FreeNode) if you want to discuss more.
We use Mesos rather than Kubernetes, but the point is that a central infrastructure team (maybe 15 people) manages the cluster itself. It provides service owners (several thousand people) with a small and straightfoward abstraction to provision, upgrade, and decommission their particular apps.
If each service team had to deal with its own cluster scheduling, these systems would usually be massive timewasters relative to the old ways (Puppet, shell scripts, etc).
Yes it is and yes, it does.
Kubernetes has a fairly soft tenancy model, so in practice, multi-cluster is becoming a big deal, despite the cost in utilisation efficiency.
Various attempts are being made to recover the utilisation efficiencies, such as Virtual Kubelet or Project Pacific. I think that the original sin of Kubernetes was the lack of a firm, first-class, top-down, mandatory access control model of tenancy.
Disclosure: I work for VMware, which is responsible for Project Pacific.
Gravity enables a somewhat unique scenario of companies taking their complex micro-services stack and delivering it as an installer for software application. From the end users' perspective they don't even know that it's a k8s cluster, they consume an application that consists of multiple components.
However Gravity supports the scenario when multiple applications are installed in the cluster as well.
It's something we support right now, but still polishing UX, the idea is to add more support for Helm 3.0 in the future in addition to Helm 2.0 we already support.
tele build helm-package ->> tarball
gravity app install tarball
May have to check this out again, hopefully the quck-start experience has improved.
Sorry if this wasn't clear, but helm actually isn't required at all, it's just a majority of our examples are written to use helm due helms popularity. The installation hooks really just boil down to kubernetes jobs, so anything that can be represented as a kubernetes job can be used for any of the hooks. This can be a simple script, a helm command, or a complicated custom built application.
The only feature really tied specifically to helm is the catalog feature, which is for building additional applications to be installed on top of an existing gravity cluster. That feature was built around a helm chart as a building block.
The cli should only need to invoke a browser when doing third party authentication flows, ie to use github for login. Using the gravity users invite will also generate a link to send to the enrolling user, so they can set their own password, setup 2fa, etc through the web interface.
We've also been trying to use our community site https://community.gravitational.com/ as a resource for being able to search and ask questions.
Disclaimer: I'm a developer on gravity.
Then there was YARN, then Tez, Spark, Flink, and Drill, and various other projects that added to the hype (Aerospike, RamSQL, Kafka + Storm).
And ever new system had to be built like it'll be web scale from day one. Instagram was acquired in 2012, just 18 months after launch, and everyone knew that meant every new even barely "social" thing will blow up even faster than that. So you absolutely need to plan ahead, scale scale scale.
Compared to that people seem to be a bit more wary of k8s, especially because it's targeted at ops folks, and they are naturally predisposed to oppose changes they don't understand.
But that's just my - probably ridiculously non-representative - take on this :)