If you run OSX, TheNewNormal's solo and cluster Kube setup is a breeze and has the Deis Workflow as a one command install, then you have full local parity with deployment.
When I started with straight Docker figuring out the transition to the real world was be really daunting. Getting into a real production environment that I could compost quickly as I learned learn the paradigms was invaluable. Articles like this seem to buzzword their way into making the path muddier than it actually is.
Another big help for me was any time I needed to set something up, first thing, I'd look at how the bleeding edge was being done with Alpine. Practically you might not always run that light, but that community has all the answers.
I evaluated Flynn, Deis and Dokku and wasn't entirely satisfied with any of them opting for Heroku instead.
I've promised people a blogpost on this and I will do it once I get time to evaluate RancherOS too.
I also started with Deis and move on to k8s later on.
I curious how those relate to each other. Are you using SaltStack to bootstrap Kubernetes?
End result will probably look something like Terraform -> SaltStack -> Kubernetes.
This is needed because K8s is not mature enough to host all the things. In particular, stateful sets are still in beta. Not sure if I would trust K8s to run our production databases even with stateful sets. We've had K8s kill pods for unknown reasons, for example, and the volume handling has also been historically a bit flaky. Fine for completely redundant, stateless containers, less fine for stateless ones.
Before GCP, we set up a staging cluster on AWS. It was okay. The biggest pain point is that AWS's VPC does not match Kubernetes' requirement for per-pod IPs, so anyone who installs on AWS ends up setting up their own overlay network (such as Flannel or Calico). That's a big downside, because VPC isn't fun to deal with.
You don't really notice how antiquated AWS is until you move to GCP. Everything — networking, UI, CLI tools, etc. — feels more modern, sleeker and less creaky.
One area where GCP is particularly superior is networking. Google gives you a Layer 3 SDN (virtual network) that's more flexible than the rigid subnet-carving you need to do with AWS VPC. The tag-based firewall rules are also a breath of fresh air after AWS's weird, aging "security group" model.
It's not all innovation, of course. Some services are carbon-copy clones of AWS counterparts: Pub/Sub is essentially SQS, Cloud Storage is S3, and so on, with only minor improvements along the way. Cloud Storage, for example, doesn't fix S3's lack of queryability. I've also been distinctly unimpressed with the "StackDriver"-branded suite of services, which do things like logging and metrics. I don't know why anyone in this day and age would bother making something that doesn't compare favourably to Prometheus.
I should add that the security situation on GKE could be better:
* GKE's Docker containers run in privileged mode.
* There's still no role-based authentication.
* Containers end up getting access to a lot of privileged APIs because they inherit the same IAM role as the machine.
* You can't disable the automatic K8s service account mount .
Another scary thing, unrelated to GKE, is that the VMs run a daemon that automatically creates users with SSH keys. As a team member, you can SSH into any box with any user name. Not sure if real security weakness, but I don't like it.
I love the CLI tools ("gcloud" and so on), much nicer than awscli etc, and hopefully much friendlier to junior devs.
 You can disable the mount by mounting an "emptyDir" on top, but that makes K8s think the pod has local data, which causes the autoscaler to refuse to tear down a node. Fortunately there's an option coming to disable the service account.
GKE is basically a wizard that just runs a pre-built image for the master and node VMs, and comes with some support for upgrades. There are very few settings  aside from the machine type. So it's pretty rudimentary. You'd think that GKE would come with a flashy dashboard for pods, deployments, user management, autoscaling and so on, but you're actually stuck with kubectl + running the Dashboard app  as a pod, which, while nice enough, is not integrated into the Google Cloud Platform web UI at all. Kubernetes runs fine, but GKE itself feels unfinished.
Anyway, a lot of people were asking for privileged mode back in 2015 , and it kind of looks like they turned it on by default rather than developing a setting for it.
Interesting. So the master and the VMs are deployed in KVM or Docker containers running inside of KVMs then?
This is just wrong
> Cloud Storage, for example, doesn't fix S3's lack of queryability.
And this is now fixed in S3
Is S3's queryability fixed? Are you referring to the Inventory service or Athana? Because the former is just a big CSV file generated daily, and the second is for querying content. You still can't query the inventory of objects (e.g. find all objects created yesterday whose key contains the letter "x").
SQS is a "Simple Queue", Pub/Sub is not. There are no topics in SQS, just distinct queues. There is no 1-to-many. There is no push. And so on...
Aside from that, it's not a bad overview. I particularly like this quote:
“Mesos and Kubernetes are largely aimed at solving similar problems of running clustered applications; they have different histories and different approaches to solving the problem. Mesos focuses its energy on very generic scheduling and plugging in multiple different schedulers.” In contrast, says Burns, “Kubernetes was designed from the ground up to be an environment for building distributed applications from containers. It includes primitives for replication and service discovery as core primitives, whereas such things are added via frameworks in Mesos. … Swarm is an effort by Docker to extend the existing Docker API to make a cluster of machines look like a single Docker API.”
Mesos is useful when you have requirements that don't fit in k8s' rather opinionated world view. If you can use k8s, it's great, but it's also changing rapidly, so be aware you're on the bleeding edge. DockerSwarm is Docker finding a business model. (I'm biased here)
Conversely...I'm deeply suspicious of excited discussions of distributed systems these days because it doesn't seem like a lot of people are truly running them "at scale" - there's not a lot of write-ups about how people handle 300+ node clusters, and probably more importantly how they handle them (how do their internal users use them).
This isn't really true any more; it's a reference to the original version of Swarm. The version of Swarm that came with Docker 1.12 is completely rethought and not constrained in the same way.
- it is impossible to trigger a rescheduling / rebalancing. When a new node comes in (via AutoScaling policy or whatever), kubernetes doesn't do anything. Thus, the new nodes can be sitting there doing nothing.
- once a pod has been scheduled onto a node, it never reschedules it anywhere else. The node may be experiencing problems, thus the pod is affected. k8s doesn't do anything to heal that -- it could simply delete the pod so it is rescheduled somewhere else.
- docker itself constantly ships with a lot of bugs. To this day (we are on docker 1.12.6), we constantly have problems with the docker daemon hanging or becoming unresponsive. I'm not sure if k8s can do much about this, but I feel like it should since we don't directly control docker.
- doesn't integrate more tightly with the OS / cloud provider. For example, it could perform health checks and decide if the node should be restarted, terminated, or idle.
All of our services are stateless, so it would be nice to have the option for all the above, especially k8s started as being the solution for stateless apps.
Docker is always packed full of bugs; we had to increase redundancy everywhere in our stack. It's consistently been the source for me getting paged at night, and now it's the first thing I look at when there is a new issue. Some components I feel could fail more easily: our etcd and Consul clusters have been chugging along fore more than a year, even though they solve a problem much more complex than Docker does. Docker has been "production-ready" for years now, but I would not recommend it. The developers always fix that critical, production-affecting bug in the next release, but it's been disappointing for some time. I look forward to rkt + Kubernetes getting more mature.
> - once a pod has been scheduled onto a node, it never reschedules it anywhere else. The node may be experiencing problems, thus the pod is affected. k8s doesn't do anything to heal that -- it could simply delete the pod so it is rescheduled somewhere else.
IIRC, if a node goes from NodeReady to NodeNotReady, pods are drained from it.
I'm really looking forward to rkt so that we can finally have a solid alternative to docker.
To be fair, sometimes these problems are due to the kernel. Specifically, the infamous unregister_netdevice ref count issue (https://github.com/docker/docker/issues/5618) has been around for years. One of the comments from a kubernetes dev says they're bypassing the cause and don't see it in GKE production.
Can you please elaborate this? When you are using replication controllers or deployments, don’t they drive the state to the desired/goal state, which is N replicas of a pod? So when the node is shut down, I guess it should be rescheduling those dead pods somewhere else to satisfy the goal state?
One common issue we have is the pod gets stuck in a restart loop (for whatever reason, including starvation of resources). k8s just keeps restarting it for days on that node, instead of simply rescheduling it after X restarts or some other condition.
Are you saying this doesn't work?
Granted we were using mesos not k8s, but I suspect a similar approach could work here too.
Can anybody that has made the switch give me a convincing argument about why I should switch to (or not)?
My feeling that docker is great if you are VP of Engineering at Netflix, but is probably not the best thing if you are starting a startup and just need to get things done.
Disclaimer: I'm not religious about this and I'm totally open to being convinced that I'm wrong.
I successfully run lots of Docker microservices in production, and I strongly advise you to keep your app on Heroku as long as you can. :-)
Microservices make sense in two circumstances:
1. You have multiple teams of developers, and you want them to have different release cycles and loosely-coupled APIs. In this case, you can let each team have a microservice.
2. There's a module in your app which is self-contained and naturally isolated, with a stable API. You could always just make this a separate Heroku app with its own REST API.
But in general, microservices add complexity and make it harder to do certain kinds of refactorings. You can make them work, if you know what you're doing. For example, ECS +RDS+ALBs is halfway civilized, especially if you manage the configuration with Terraform, and set up a CI server to build Docker images and run tests. But it's still a lot more complex than a single, well-refactored app on Heroku.
The next thing I realise is that I'm very happy dealing with infrastructure now in a way I wasn't before; I've looked through docker files and know what they install and why and if anything goes wrong it provides me with an immediate goto which is let's add more servers or lets rebuild the environment and switch over (should there be more users or more load than expected). Docker will buy you time here.
Docker removes the temptation to start editing stuff on servers in the event of issues.
In terms of doing a startup I think other people here are better to advise; if you are MVP no but anything bigger than that I think it'll pay off.
Managing secrets is still an absolute pain though...
Seriously though, knowing docker really well is more likely to improve my career and also having the ability to remove the devops issues associated with setting up dev environments is awesome. My Mac broke this week and I was able to switch to a different machine in 30 minutes because of that.
Does Ansible provide isolation of different dev environments? I think not.
Unfortunately true -- for now. However, your career would be even better served by gaining experience in non-fad technologies.
Also, "career development" is an offensive reason to deploy a technology for your employer. I recognize that it is common, but it's still improper to prioritize resume points over the employer's long-term stability and interests. Personally, when a candidate gives off that vibe to me, I pass on them every time.
>Does Ansible provide isolation of different dev environments? I think not.
I don't understand. Anything you can script in Docker, you can script in Ansible. They both allow the user to pass in arbitrary shell commands and execute anything they want on the target. How does this not accommodate "isolation of different dev environments"?
Maybe you mean that since you can execute a Docker container on your Mac, you don't need to set up a "local" env? Docker transparently uses a virtual machine to execute a Linux kernel on the Mac. You can execute an Ansible script on a normal VM the same way (optionally using something like Vagrant to give more simplistic, Docker-like (which is really Vagrant-like) CLI management).
When you start to deploy many applications, and they need to talk to each other, and you need service discovery, automatic restart, rolling upgrades, horizontal scaling, etc - then Kubernetes brings a lot of value.
Of course, if you've just started, and are getting an MVP out the door, don't worry about docker just yet. And also don't listen to the microservices people. It'll be like putting the cart before the horse.
PWS is based on Cloud Foundry, which allows routing by path. So as an intermediate step towards decomposing your app into services, you can deploy different copies and have them respond to particular routes.
Cloud Foundry uses Heroku's buildpack code, with very minor changes, with additional testing. Your app will stage and run identically.
If you decide to switch to docker images, Cloud Foundry can run those too.
Cloud Foundry is a complete platform, rather than a collection of components. Test-driven, pair programmed, all that jazz. More mature and production-tested than any alternative that I'm aware of.
I think it's awesome, but I'm biased. Feel free to email me.
"The grass is always greener on the other side"
There are plenty of upsides to the distributed approach. But there are downsides to distributed too which don't get discussed as much. Things like communication between nodes, fault tolerance, monitoring and handling failure. Same case with having many microservices. Also this stuff becomes time consuming if you are a solo dev / small dev team.
IMO one approach isn't better than the other for all cases. Maybe I'm a bit of a laggard here, but I still like Heroku and believe in just doing enough infrastructure to support where your app is / is going in the near future.
Both Docker and k8s change quickly and both lack functionality that most people would consider pretty basic. Google may have transcended into a plane where persistent storage and individually-addressable servers are a thing of the past, but the rest of us haven't. Many things that an admin takes for granted on a normal setup are missing, difficult, convoluted, or flat out impossible on k8s/Docker.
We're converting our 100+ "traditional" cloud servers into a Docker/k8s cluster now, and it's a nightmare. There's really no reason for it. The biggest benefit is a consistent image, but you can get that with much less ridiculous tooling, like Ansible.
My opinion on the long-term: containers will have a permanent role, but I don't think it will be nearly as big as many think. Kubernetes will be refined and become the de-facto "cluster definition language" for deployments and will take a much larger role than containers. It will learn to address all types of networked units (already underway) and cloud interfaces/APIs will likely just be layers on top of it.
The hugely embarrassing bugs and missing features in both k8s and Docker will get fleshed out and fixed over the next 2-3 years, and in 5 years, just as the sheen wears off of this containerland architecture and people start looking for the next premature fad to waste millions of dollars blindly pursuing, it will probably start to be reasonable to run some stable, production-level services in Docker/k8s. ;) It will never be appropriate for everything, despite protests to the contrary.
I think the long-term future for k8s is much brighter than the future for Docker. If Docker can survive under the weight of the VC investments they've taken, they'll probably become a repository management company (and a possible acquisition target for a megacorp that wants to control that) and the docker engine will fall out of use, mostly replaced by a combination of container runtimes: rkt, lxc, probably a forthcoming containerization implementation from Microsoft, and a smattering of smaller ones.
The important thing to remember about Docker and containers is that they're not really new. Containers used to be called jails, zones, etc. They didn't revolutionize infrastructure then and I don't think they will now. The hype is mostly because Docker has hundreds of millions of VC money to burn on looking cool.
If Docker has a killer feature, it's the image registry that makes it easy to "docker pull upstream/image", but the Dockerfile spec itself is too sloppy to really provide the simplicity that people think they're getting, the security practices are abysmal and there will be large-scale pwnage due to it sometime in the not-too-distant future, and the engine's many quirks, bugs, and stupid behaviors do no favors to either the runtime or the company.
If Docker can nurse the momentum from the registry, they may have a future, but the user base for Docker is pretty hard to lock in, so I dunno.
tl;dr Don't use either. Learn k8s slowly over the next 2 years as they work out the kinks, since it will play a larger role in the future. In 5 years, you may want to use some of this on an important project, but right now, it's all a joke, and, in general, the companies that are switching now are making a very bad decision.
I've been playing around the runv project in docker (definitely not production ready) and running containers in optimized virtual-machines...and it just seems like the better model? Which, following the logic through, really means I just want to make my VMs fast with a nice interface for users - and I can, with runv I can spin up KVM fast enough to not notice it.
Basically...I'd really rather just have VMs, and pour effort into optimizing hypervisors (and there's been some good effort along these lines - the DAX patches and memory dedupe with the linux kernel).
1. We could run sanity tests, in production, prior to a release going live. Our deploy scripts would bring up the Docker container and, prior to registering as live with Consul, make a few different requests to ensure that the app had started up cleanly and was communicating with the database. Only after the new container was up and handling live traffic would the old Docker container be removed from Consul and stopped. This only caught one bad release, but that's short bit of unpleasantness that we avoided for our customers.
2. Deployments were immutable, which made rollbacks a breeze. Just update Consul to indicate that a previous image ID is the current ID and re-deployment would be triggered automatically. We wrote a script to handle querying our private Docker registry based on a believed-good date and updating Consul with the proper ID. Thankfully, we only had to run the script a couple of times.
3. Deployments were faster and easier to debug. CI was responsible for running all tests, building the Docker image and updating Consul. That's it. Each running instance had an agent that was responsible for monitoring Consul for changes and the deploy process was a single tens-of-megabytes download and an almost-instantaneous Docker start. Our integration testing environment could also pull changes and deploy in parallel.
4. Setting up a separate testing environment was trivial. We'd just re-run our Terraform scripts to point to a different instance of Consul, and it would provision everything just as it would in production, albeit sized (number/size of instances) according to parameterized Terraform values.
Docker also made a lot of things easier on the development side too. We made a development version of our database into a Docker image, so the instructions for setting up a completely isolated, offline capable dev environment were literally install Docker/Fig (this was before Docker had equivalent functionality), clone the repo and tell Fig to start (then wait for Docker to pull several gigs of images, but that was a one-time cost.)
As I see it, the main thing that Kubernetes and the rest of the container managers will buy you is better utilization of your hardware/instances. We had to provision instances that were dedicated to a specific function (i.e. web tier) or make the decision to re-use instances for two different purposes. But the mapping between docker container and instance/auto-scaling group was static. Container managers can dynamically shift workloads to ensure that as much of your compute capacity is used as possible. It was something we considered, but decided our AWS spend wasn't large enough to justify the dev time to replace our existing setup.
Having not used Heroku, I can't say how much of this their tooling gives you, but I think it comes down to running your own math around your size and priorities to say whether Docker and/or any of the higher-level abstractions are worth it for your individual situation. Containers are established enough to make a pretty good estimate for how long it will take you to come up to speed on the technologies and design/implement your solution. For reference, it took 1 developer about a week for us to work up our Docker/Consul/Terraform solution. If you look at the problems you're solving, you should be able to make a pretty good swag at how much those problems are costing you (kinda the way that we did when we decided that Kubernetes wouldn't save us enough AWS spend to justify the dev time to modify our setup). Then compare that to the value of items on your roadmap and do whatever has the highest value. There's no universally correct answer.
I'm always leary of statements like that. My experience with cluster managers(Yarn, Mesos) and distributed systems in general is that they are almost never "a snap" to run once you move past trivial workloads and requirements.
Also, since I wrote that, k8s has done a lot of work to simplify installation with kubeadm and other improvements.
For Mesos anyone if is interested and you are Ansivle shop you can look at Ansible Shipyard to simplify an install.
Kubeadm looks interesting. Thanks.
A key part of Kubernetes is to the bring the state of the system to that of the spec, through a series of watches - and in some places polling (config maps watching in kubectl as of 1.4.x) - this may look like magic, but it's not fixing problems per se its that your desired state of the system (the spec) is different to what is observed (the status). This is not a disadvantage. Kubernetes if not for someone sitting at the terminal expecting a bell or stop the world when problems happens, although I guess you could configure it that way and fight against the system.
Mesosphere looks like it works similar to Nanobox (http://nanobox.io)
If you are searching for a product in this space, then K8s is the way to go. It has a huge community.
If, on the other hand, you are looking for a small, non-monolithic, resource scheduler + some basic frameworks (i.e. Apache Mesos + Marathon) then DC/OS is not that.
But since Mesosphere are no longer supporting working on Marathon outside of DC/OS, or indeed any framework outside of DC/OS, the Mesos/Marathon is effectively dead, since, while Mesos is open source, the only major supporter of frameworks for it is Mesosphere, and you wont get them unless you use DC/OS.
For some relatively small projects, k8s is sufficient. But pretty quickly you end up needing more and more functionality and you end up in a larger stack like OpenShift.
That's the place that Mesos and DC/OS (IMHO) shine: when you're working with large clusters (1000+) running very different workloads. Because in that scenario, k8s is still pretty immature and you're going to inevitably need to solve all the other problems that OpenShift, DC/OS, Rancher or other stacks solve. Just like Linux distros, in theory, anyone can do it. In practice, it's a pain and you want to pick up a standard distro. That's what these stacks provide: a pre-configured suite of open source tools that get you way more than k8s or Mesos or Marathon on their own provide.
And again, if you are going to be running DC/OS at that scale, then you are going to be running the enterprise version, because you aren't going to be running that much magical shit on million dollar hardware without someone being paid to troubleshoot.
That might be me in a few years, except that Mesosphere seems to be attempting to kill me in the short term by killing non-DC/OS mesos/marathon while crippling non-enterprise DC/OS. So looks like we'll be migrating to K8s and hoping that in a few years K8s scales. The "K8s doesn't scale" argument is losing ground with every new release btw.
Of course you are right that in DC/OS there is a lot more besides Mesos that could fail at scale. That is why we are carefully scale-testing DC/OS in its entirety -- under lab conditions (internally) as well as under real-world conditions (large production environments). We will be more transparent about this in the future, but one example I can give is that we regularly run DC/OS on 10^3 nodes for testing purposes.
> if you are going to be running DC/OS at that scale, then you are going to be running the enterprise version
I am not convinced by that argument. Still, just to clarify, Mesosphere Enterprise DC/OS does not have better scaling characteristics than (open) DC/OS. We are trying to land all corresponding goodness in DC/OS.
> killing non-DC/OS mesos/marathon
I understand that brutal decisions have been made. But if you look at Mesos and Marathon as of today (btw., there is no such thing as "non-DC/OS mesos/marathon"), progress is being made at an impressive rate. I observe more "animating" than "killing".
> while crippling non-enterprise DC/OS
I only see additions, no removals. Also, if you were looking at the core technology stack, the delta between Enterprise DC/OS and DC/OS will probably appear to be surprisingly small to you.
> The "K8s doesn't scale" argument is losing ground with every new release btw.
The "scale argument" is insignificant compared to other aspects in most of the cases anyway. As we all know :-).
Source: I worked on the team, although we had talked about this at mesoscon in the past.
Disclaimer: I work for MS
Disclaimer: I work at Mesosphere.
Marathon isn't the only scheduler - Twitter (and some others) have been using Aurora (http://aurora.apache.org/). Currently we are deployed on Mesos/Marathon, and originally I thought Aurora may become more popular because it was Apache "blessed".
(I am a Mesosphere employee.)
I'm afraid I don't know what the plans are for the project (not working on Marathon myself) but being the most popular open source project run and maintained by Mesosphere, I have no doubt that we will continue adding new features and functionality.
There are three ways to install it:
1) Cloud provider specific templates (e.g. CloudFormation / Azure Resource Manager. GCP support is on the backlog)
2) The ssh installer (this has a UI and a CLI). I believe this doesn't work well for larger installations because of the one to many issue.
3) The advanced install method. This generates a binary that is copied to each node. You can integrate this with Puppet, Chef, Ansible and so on.
https://dcos.io/install/ and https://dcos.io/docs/ has more details on this.
>Dear Marathon Community,
>Because of our focus on integrating the experience of using Marathon and DC/OS, we aren’t planning on updating the old UI further.
And by "old UI" they mean the one that works without DC/OS.
So there is no Apache Mesos + Marathon. There's Mesos. With no actively developed frameworks (Chronos is dead too). And then there's DC/OS.
Marathon is still being developed as an open source Mesos framework. The _UI_ is going to stall out a bit, but it's just the UI.
Chronos is still being developed, but Mesosphere is taking Metronome and folding it into Marathon, making it more like Aurora. Moreover there's Singularity, PaaSTA, and a whole lot more:
Yes, DC/OS is intended to be a (mostly) full stack, but even then, it's still just Mesos under the hood and you can run any Mesos framework on it. The open source edition of DC/OS is fairly full featured, while the enterprise version gives you better account security, networking and integrated secrets (vault) support.
Jaaron: "The _UI_ is going to stall out a bit, but its just the UI."
My options are:
a) running latest marathon backend without the user interface
b) running latest marathon backend and clone/maintain the user interface
c) migrate the whole thing to enterprise DC/OS (since I need authentication)
d) migrate to K8s
I view the lack of option "(e) keep using marathon-the-product as it is now", as a breach of trust, so (c) is off the table (whereas, before, it was my expected endpoint). YMMV.
We have to support the company first which has a more integrated solution that actually has to make money at the end of the day. We are also a pretty damn small team with a huge backlog to deliver, so it sucks that we had to abandon the UI outside of DCOS. We hope that the components of the DCOS UI for marathon can become the native UI for marathon, but again, it's a balance of priorities.
Marathon by itself has a lot more coming in the future, some will be restricted to DCOS, but not everything, it's a balancing act. Given our history of changing course publicaly (I wasn't involved in these decisions), I'm waiting to share our plans for 1.5 until I'm confident we're committing to them.
Just a quick two cents.
I wish you had forked it. If you're resource constrained and need to meet goals for LargeCorp, then fork the entire front-end/back-end. The approach of just ditching the UI half of the app while you iterate the same backend but with the DC/OS UI is really what's causing the grief here.
When I started more than 5 years ago few of the options mentioned in the article were available, so I just used Xen running on bare metal. I just wanted to get things done as opposed to forever experimenting with infrastructure. Not to mention the load was inexistent, so everything was easy to manage.
After I launched the mobile app, I decided to spend some time on the infrastructure. 2 years ago I experimented with Marathon, which was originally developed at Twitter, probably by a bunch of former Google employees. The reason is Marathon felt very much like Borg: you could see the jobs you launched in a pretty nice Web interface, including their log files, resource utilization and so on. Deploying it on bare metal machines however was an exercise in frustration since Marathon relied heavily on Apache Mesos. For a former Googler, Mesos had some weird terminology and was truly difficult to understand what the heck was going on. Running Marathon on top of it had major challenges: when long-running services failed you could not figure out why things were not working. So after about 2 weeks I gave up on it, and went back to manually managed Xen VMs.
Around the same time I spent few days with Docker Swarm. If you're familiar with Borg/Kubernetes, Swarm is very different from it since - at least when it started, you had to allocate services to machines by hand. I wrote it off quickly since allocating dockerized services on physical machines was not my idea of cluster management.
Last year I switched to Kubernetes (version 1.2) since it's the closest to what I expect from a cluster management system. The version I've been using in production has a lot of issues: high availability (HA) for its components is almost non-existent. I had to setup Kube in such a way that its components have some resemblance of HA. In the default configuration, Kubernetes installs its control components on a single machine. If that machine fails or is rebooted your entire cluster disappears.
Even with all these issues, Kubernetes solves a lot of problems for you. The flannel networking infrastructure greatly simplifies the deployment of docker containers, since you don't need to worry about routing traffic between your containers.
Even now Kubernetes doesn't do HA:
Don't be fooled by the title of the bug report, the same component used in kubectl to implement HA could be used inside Kube' servers for the same reason. I guess these days the Google engineers working on Kubernetes have no real experience deploying large services on Borg inside Google. Such a pity!
Using DNS might work for the K8s services, but at least in version 1.2, SkyDNS was an add-on to Kubernetes. This should really be part of the deployed K8s services. Hopefully newer versions fixed that, I didn't check.
Preferably, the base K8s services implement HA natively. Deploying a separate load balancer is just a workaround around the problem.
FYI Google's Borg internal services implement HA natively. Seems to me the Kubernetes team just wanted to build something quick, and never got around to doing the right thing. But I think it's about time they do it.
kube-dns is an application on k8s, so it uses scale-out and k8s services for HA, like applications do. And I agree that it is important, I don't know of any installations that don't include it.
I think the right things have been built. We do need to do a better job documenting this though!