Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Convergence to Kubernetes (medium.com/pingles)
207 points by kiyanwang on June 24, 2018 | hide | past | favorite | 122 comments


Can anyone who used Docker Swarm/Mesos/Nomad and then switched to Kubernetes comment on anything that was done better by Swarm/Mesos/Nomad?

I invested in Kubernetes early and always meant to give the others a try (so I could at least know the differences), but never got a chance to.


Thoughts on Nomad after admittedly minimal dabbling: I pushed for using Nomad at my job without success. Managerial perceptions of Kubernetes as "the consensus" is a self-fulfilling prophecy. Nobody wants to pick a technology with a fraction of the buy-in of Kubernetes or be forced into paying boku bucks for an enterprise contract. Hashicorp's insistence on releasing premium closed-source features doesn't work when its biggest competitor is fully free and open.

I found it dead simple to get up and running with Nomad, but it is (perhaps intentionally) missing a lot of features of Kubernetes. For instance, if you want load balancing and auto-scaling, you need to rig it up yourself. In K8s, you set up a service and horizontal pod autoscaling and you're done.


>be forced into paying boku bucks

Boku is Japanese for "I" (cf Watashi).

Beaucoup is French for "a lot".



Having used both DC/OS (mesos & marathon) and kubernetes in production.

The one thing DC/OS had absolutely nailed was the web interface and bootstrap. The DC/OS interface is brilliant as it allows you to explore all the possibilities and actually have a overview of what is happening.

Also, it was a lot easier to reason about because everything is contained in a single marathon job. No need to split everything up in deployments / services / ingresses. A single JSON file is all you need for DC/OS. Less to think about.

The downside of all this is that DC/OS feels like a solution for a theoretical problem, while kubernetes is the solution to practical problems.


It seems to me that running Kubernetes as a service on DC/OS might be a decent path to take. Mesosphere seem to be pushing this pretty hard, too.

What did you think of DC/OS as a base platform? From an ops perspective, I’m finding a lot to like at least conceptually. Especially the idea of managing one “kind” of cluster that then manages many additional kinds for you. But I have yet to actually use it.


It sounds great, but I've always found the DC/OS marketplace not really flexible enough. I've looked at many services, but in most cases the requirements for running a service from the marketplace were for extremely high throughput situations. I wanted to experiment with arrongoDB but it required a total of 16GB memory (IIRC).

Practically, I think kubernetes with helm charts gets will get you at least 90% of what DC/OS offers.

Also, running Kubernetes itself is complicated enough, running it on a different scheduler will just expose you to the pains of both schedulers.


Mesos means "middle" and it was designed to handle the declarative infrastructure provisioning for different schedulers... container orchestration schedulers: Aurora (Youtube/Twitter), Titus (Netflix), Marathon and Kubernetes (DC/OS).... analytics schedulers: Spark, Tensorflow, etc.

The idea was always that you should manage this infrastructure just like they are managing containers.

discloser: I am the PM of Kubernetes on DC/OS.


FWIW, while the resources themselves are still split up, you can combine any number of resource definitions into a single YAML file for Kubernetes. Just separate each with ---


Couple things: Mesos had a more sophisticated scheduler with better isolation for disk IO. It could also run arbitrary binaries in cgroups. I.e. a JVM based application without having to run it in a docker container.

Uber took advantage of this by running thousands of nodes of Cassandra in DC/OS.


I would love to see a comparison between Docker Swarm and Kubernetes. From talking to peeps I've got that Kubernetes is better for a large number of nodes? I self host all my side projects with Docker Swarm and it's been so good I haven't needed to look into other container management solutions (but I've only got eight nodes).


At the moment I'd say that Kubernetes is only worthy the effort if you have a bunch of idle capacity in your nodes and some tens of machines at least.

The setup can get a bit complex quite early and won't be worth the effort to manage 8 nodes, when you begin to scale to around 20 nodes running a bunch of different workloads (batch jobs, web services, etc.), can avoid provisioning on the application side, etc., then k8s begins to shine more and pay back the investment.


Why should node count matter?

Two clicks to get 1-1000 nodes on GKE. The work is to learn the yaml syntax and the way to deploy to GKE... but most apps need to learn something about how it will be deployed (be it how to use ansible to deploy vs how to setup on k8s vs how to use serverless). But you need to do this for 1-1000 nodes, may as well just do it once...


I'd say that if you're on swarm and it's doing what you want, you probably want to stick with it.

Kubernetes is more flexible and more powerful, but it's also a lot more complex. You either go with one of the managed distro's which take on some of that complexity but also reduce your flexibility, or you manage the whole thing yourself.


I believe this does a good job comparing without going to deep, e.g. for people just getting started on swarm/k8

https://platform9.com/blog/kubernetes-docker-swarm-compared/

Edit: updated url to a more recent version of the article


Mesos is a two layer system. That is what is better about it. Unfortunately, the most common second layer for reliable service execution is Marathon, and it caused us no end of trouble.


Spot on. The two layer system is both a blessing and a curse.

Debugging problems can get really hairy because of the two layer split, and it's just not worth it if one is running Marathon as the only framework.

I came to the conclusion that Mesos' biggest strength is giving people the option to write their own framework. It's not difficult, and gives you a lot of power over execution.

Marathon itself appears to be very simple, but has some really weird shortcomings. For instance, it's not possible to submit a deployment and then check whether everything went well, except if you manually compare the deployment before and after (this was still true as of 1.4.x).

Furthermore, Mesosphere abandoned the Marathon GUI, and I got no clue how the work of transitioning that part into the hands of volunteers are coming along, but this was essentially what tipped the scales for us.

(We're currently migrating from Mesos to K8s.)


While Mesos always supported a number of container orchestration layers (e.g. Netflix's Titus), you are right in that Marathon used to be the only container orchestrator for DC/OS up until 2017. Mesosphere DC/OS has had a Kubernetes package in GA since March and we've had 6 releases since then to make sure always was newest version of Kubernetes and to add features (Strict Mode).

Disclaimer: I am the PM at Mesosphere.


I did take a look at Nomad in an attempt convince someone who was "scared" of the complexity of Kubernetes would bring to our setup.

Unfortunately nice features like "Resource quotas per node" are not available in the open source version, this made Nomad a no-go.


This is pretty eye-opening... If they intended to charge for extensions with functionality like that, there's no way they can compete with Kubernetes -- it has way more development happening, and features like that are already baked in to the platform...

I don't know where nomad could compete to generate income, but it definitely wasn't there...


I haven't used Kubernetes yet since I have a Mesos scheduler that runs heavy workloads on-demand and wants to use all unused resources for a few minutes. I can't see a way to configure a k8 service to use either zero or as many pods as it possibly could, without writing something to watch the cluster's metadata and update the service's config files


Kubernetes has resource limiting built in, in the form of Resource Quotas[0] and more practically (or more straight forward to use) resources for pods[1]. The system's not perfect, so it's likely possible to overcommit machines, but there are QoS controls for critical workloads.

Kubernetes is built in a way that it's easy for people to write something that watches cluster metadata and perform actions (controller pattern), as such a lot of functionality is built in gradually overtime just like that. I'm not sure when resource management first came on the scene but it's been around for a while.

You can also use Kubernetes to manage resources completely unrelated to Kubernetes by bringing Custom Resource Definitions ("CRDs") into play -- that's when you create a "fake" Kubernetes resource that (ex. VirtualMachine) manages some resource that's actually on the machine. The combination of CRDs and controllers to manage them is called the "Operator pattern"[2] and it's gaining a lot of hype right now as people wrestle with it conceptually but it's been around the whole time.

[0]: https://kubernetes.io/docs/concepts/policy/resource-quotas/

[1]: https://kubernetes.io/docs/concepts/configuration/manage-com...

[2]: https://coreos.com/blog/introducing-operators.html


Yeah with bare mesos I don't have to do any of that, I just accept resource offers and release them when I'm done. The scheduler decides how much it will scale, not mesos.

We like to create a new framework for each instance of the job (so we can track slow runs, trial-run new versions), so we can have anywhere between 0 and 50 similar frameworks all running and vying for 100% of the cluster.

Naturally, we have cluster-level scaling, so we add nodes until we hit our max spend per hour, and then jobs take a bit longer to complete.


It was kind of hard to understand what you were describing without reading the Mesos Architecture documentation[0], but I think I get it now. When you say "you" just accept resource offers, you mean the frameworks (in mesos terminology) you're deploying correct?

Weirdly enough, Mesos resembles a system I was building in my head that I thought could compete with Kubernetes... Taking the resource supplier (agents in mesos-speak) and consumer (frameworks in mesos-speak) paradigm to the extreme.

[0]: https://mesos.apache.org/documentation/latest/architecture/


I've been running a reasonably large kubernetes cluster (~600 cores) in production for the last 6 months or so. I've had to deal with hotspots, both IO, CPU and load (which could be either). Getting requests and limits dialed has been a bit of a challenge.

I worked with Mesos a bit last year and miss the sophistication of the scheduler. Granted, I never ran it at the same scale, but I felt like the approach was much easier to grok.

I've seen a few projects floating around meant help dynamically tune resource utilization in k8s. By this time next year, I imagine those will be fairly commonplace. I'm definitely looking forward to using descheduler:

https://github.com/kubernetes-incubator/descheduler


Essentially, yeah. You don't really deploy a framework, you just start its scheduler. I recommend the technical paper, it's clear and short[0].

0: https://people.eecs.berkeley.edu/~alig/papers/mesos.pdf


We are currently migrating from Mesos/Marathon to Kubernetes. One of the things I miss the most is the auto-refresh in the web interface.

I know that's a minor issue, but after getting used to it, it's really weird having to do manual refreshes.

Also, the Kubernetes UI feels too clutered, but that's probably just because I'm not used to it


WeaveCloud makes it a whole lot simpler to deploy, monitor, and manager your K8s cluster. http://cloud.weave.works. Give it a shot. Easy to connect to your K8s cluster and get going.

Disclosure: I'm Director of Product for Weaveworks.


Try OpenShift's UI. It has auto-refresh as well as very good UX design.


The Kubernetes UI is deprecated anyway.


I think he meant Kubernetes Dashboard.


I think Nomad is a good solution when your unable to containerize your application, used it a bit at my last job in a prototype we were building to manage windows services, and it seemed to work pretty well


Swarm:

- works out of the box

- has much less overhead in getting something started

- tracks dependencies between services


Dumb question from someone who doesn't use Kubernetes (or Docker) in production: don't routine security updates mean you're constantly rebuilding and redeploying these images? And if so, how is that more efficient than just using Puppet / Chef / Ansible and a 'real' server?


I think one of the powerful things that something like containers offer is a happy path that's also the easy path path and it meshes well with the principles you need for a scalable or micro-service infrastructure. With "real" servers you can build infrastructure that operates and scales in similar ways but the happy path isn't as clear or enforced. Containers make you think about persistence and state as being distinct from your compute at/near the beginning. You're punished for not doing so fairly early, even on your local machine. Your AWS instance can go away but months down the line you're not wondering what you did wrong, you're wondering what Amazon or your ops team did wrong. So the happy path that was easier to deviate from before becomes much harder with containers.

It's also easier and faster to update a container image than servers. With something like Kubernetes you can even do it in stages. Yes, you still have to keep the underlying servers updated but they just need to run containers. No testing dependencies and prerequisites. And that decoupling of the app runtime makes updating the servers themselves easier. You can create a new machine image and replace outdated ones trivially.


> don't routine security updates mean you're constantly rebuilding and redeploying these images?

It means you should be constantly rebuilding and redeploying these images.

The fewest people I've seen use Docker actually do that.

The answer I've heard most commonly so far is "uh hmm ... right, given that I see new CVEs fixed every day, scrolling by in the `apt-get dist-upgrade` I do daily on my desktop, we should probably be doing that for our Docker as well ..."

Many then refer to per-push CI/CD, not realising that holes need to be fixed regularly and not when you push.

The fewest have automation set up that rebuilds and redeploys when the upstream docker image changes.

Plain Debian/Ubuntu servers have the benefit of `unattended-upgrades`, but with Docker you have to take on that task yourself and build automation for it.

Also have a look at https://hub.docker.com/r/library/ubuntu/tags/

At the time of writing, every single image is labelled with "This image has vulnerabilities".


You probably shouldn’t be using Ubuntu base images anyways. The closer you can get to “scratch” the better, and the fewer security related issues you’ll have. For most use cases I think Alpine is a much better base image.


That's more about trust and faith in the maintainers of the distribution (that they won't screw up).


It's about having the minimal number of system packages and libraries that your app actually needs.

Alpine is leaner so attack surface is thinner.


This. If you start adding stuff to "FROM scratch", you are creating your own obscure Linux distro. When you screw up, there are no other customers to report bugs to you, much less other maintainers to help.


You really shouldn't be adding stuff to "FROM scratch" unless you have one statically linked binary and maybe some config files or something like that. If you actually need packages, you should use something like Alpine if possible.


> At the time of writing, every single image is labelled with "This image has vulnerabilities".

The majority of those are not applicable in any way to your average container. I would highly encourage you to not attempt to make points off such bad data. See below for why I think it's bad data.

I'd also like to point out that those CVEs are even more of a pointless thing to make such a point with since almost all of them aren't fixed in upstream ubuntu... which is to say an ubuntu server with 'unattended upgrades' would be just as 'vulnerable' as these docker containers, except moreso because more of them would actually be relevant in such an environment.

Your overall point is valid, but your reference to those 'vulnerabilities' is egregiously misleading.

Let me analyse as a human all the so-called "critical vulnerabilities" listed there for Xenial:

1. glibc 2.23-0ubuntu10 - CVE-2018-6485

This is exploited by C code which intentionally calls posix_memalign or aligned_alloc (instead of malloc) with unusually large arguments. There are very few codebases out there which make use of that in the first place.

If your container is not running untrusted code which links against libc, you have little to fear. This CVE will not impact the average container running some ruby or nodejs application.

Ubuntu also offers no update yet, so it's not actionable.

2. ncurses 6.0+20160213-1ubuntu1 - CVE-2017-10684, CVE-2017-10685

Only affects you if you're piping un-sanitized user input into an ncurses application which then displays it. I doubt there are many, if any, server applications that do this.

Also not actionable.

3. shadow 4.2-3.1ubuntu5.3 - CVE-2017-12424

I'm sure there are plenty of containers out there shelling out to "newusers" with totally unvalidated input. Highly critical I'm sure.

4. cryptsetup 1.6.6-5ubuntu2.1 - CVE-2016-4484

This one only impacts the initrd of luks encrypted setups.... literally impossible to accomplish in a container. Completely garbage listing, no value, cannot possibly impact a docker container.

5. systemd 229-4ubuntu21.2 - CVE-2018-6954

This requires 'systemd-tmpfiles' to run after an attacker has manipulated the filesystem. 'systemd-tmpfiles' is not run in docker containers, and even if it did, since it starts from a fresh root filesystem each time it's a moot point since any changes the attacker makes won't persist to the next time the container "reboots" and tmpfiles runs (assuming the rare case it's run for some reason as part of the container's boot up).

Another non-applicable one.

6. util-linux 2.27.1-6ubuntu3.4 - CVE-2018-7738

This one's a bug in bash-completions for umount, which aren't installed in the container by default. This one's a false positive because the vulnerable code (the bash-completions script) is not present in the docker image.


You are are right with all you're saying here.

I was not suggesting that `docker build` executed at a given time point is less secure than `unattended-upgrades`. My point in referencing the vulnerabilities was to simply show that there is a constant stream of vulnerabilities that you need to keep patching, and that picking a new base image "every now and then" isn't enough. `unattended-upgrades` just makes it trivial to automate following this constant stream of updates, while with Docker you have to manage that yourself.

Yes, most CVEs don't affect your use case and operations, independent of via Docker or full OSs. But every now and then there is a severe CVE in that stream that affects you. You don't know when it's coming.

There are two ways to be safe: Automatic upgrades, or reading through / subscribing to the CVE stream and analysing everything that passes by (as you demonstrated here; that takes real effort and you need to be awake when it happens). Most people don't do the latter.


who has an app that they don’t deploy daily?


Some of us have contractual obligations not to deploy during the week, and require us to notify and receive permission from our customers for out of band deployments.


Many - and our deploys on weekends are almost zero.


I highly doubt that anyone would deploy everything daily. Your own code, maybe. But what about dependencies? Do you upgrade your Postgres container daily?


Full-disclosure: I’m a Consulting Architect for Red Hat focused on OpenShift.

No.

In Kubernetes and OpenShift you can control whether builds and deployments are automatic or manual, & which events triggers them [0][1]. Combined with fact that each application’s config is an independent object, this allows an admin to host hundreds of apps on a single cluster node. Usually you’re using IaC practice and storing each app’s config in a git repo or next to the app in git.

[0] https://docs.openshift.com/container-platform/3.9/dev_guide/...

[1] https://docs.openshift.com/container-platform/3.9/dev_guide/...


OpenShift has an answer to what GP was asking, but build triggers are not it.

Security vulnerabilities in images is the concern, OpenShift handles this through ImageStream('s) - parent images are tracked in the integrated registry and when one is updated all dependent images are updated.

Good example, the dotnet image in our OpenShift cluster was updated late last week - all of our .Net Core projects were automatically rebuilt with the latest image with no intervention from the developers. It doesn't handle your application INSIDE the container, or if you build the image yourself via Dockerfile so you'll still need some dependency scanning tools and/or release notifications to keep your own stuff up-to-date and secure.

To be honest ImageStream is one of the best value-add features OpenShift has over vanilla Kubernetes, I don't have to worry if developers are keeping their images up-to-date when some vulnerability in a CentOS package or application runtime gets patched.


Kubernetes itself is not a PaaS though. It won't do all of that for you, which has the advantage of being more flexible but the disadvantage that nothing happens if you don't code it.


You can automate the build process and the deploy process. Given that you can do that with any orchestration software it gives you the ability to roll back changes in your environment.

Is it more efficent than a real server? I guess thats only the case if you automate almost everything and use CI and CD.

It sure shows a diffrent viewpoint on servers focusing more on services or containers than on servers.

Comparable with Functional Programming and Object Orientated Programming. While you can archive the same functionality with both it gives you another mindset to solve problems.


I don't think people primarily choose Docker because it's more efficient(?) to deploy than a real server. The benefits of Docker, to me, are having a single artifact and reproducible builds where a developer can run the docker image locally, it then gets built once on Jenkins, and the same image is deployed to staging, production, etc. It eliminates an entire class of problems related to the operating system, installed dependencies, what language the application was written in, etc. That being said, those are also benefits when deploying, because everything is homogeneous at the deployment layer.

As far as constantly rebuilding and deploying things, we would be doing that anyway. We do dozens of deploys a day to push new code, that means building new images every time, if one of those deploys picks up a security fix that was recently merged upstream, great.


> having a single artifact and reproducible builds where a developer can run the docker image locally

This is the key thing here. As a team grows it’s easy to get various kinds of learned helplessness. Docker, for its faults, is mostly simple enough that you can expect/insist that the team use it. Which means fewer kinds of surprises at deployment time.


We enforced that by giving our devs linux workstations with no sudo rights (iso27001 requirement), but they have access to the docker daemon, and MUST install everything they need in docker images, its massively increased tools sharing between teams, and forced people to learn docker. We have some devs starting to use kubernetes locally using minikube .


Sorry for being off-topic but do you have any links or tips on how to achieve this? My understand was that adding a user to the 'docker' group gives him 'sudo'-equivalent rights.


You can set it up so that the user doesn't have to type sudo docker. But they still effectively have root access via docker.

I guess it gives some social pressure not to do superuser things?


It looks like a package manager choice. Dpkg/Apt isn't doing by default what they actually need, so they use Docker instead.


Your devs can pass --privileged to docker run, that gives them root.


Depends who's doing the choosing. Where it takes months to provision a 'real' server, Docker saves an awful lot of time and effort.

But most people choose Docker because it's well-known, well supported, and the experience outside corporate environments is a joy.


docker containers are not reproducible.

if you built the same content on machine a and built the same content on machine b the hash of the image would differ. and the timestamp probably too


Yes, but if you build the image and push it to a repository, everybody working on it can pull it and run it.


If your referring to security updates in the docker images rather than Kubernetes itself, then not really. The rebuild should be automated for you, so it's as simple as triggering a Jenkins master build.

Kubernetes handles the deployment for you, bringing down the old pods and upping the new ones without any connection loss. It makes it a lot simpler to deploy these updates.


> don't routine security updates mean you're constantly rebuilding and redeploying these images?

Yes and no; yes, because you should be rebuilding upon upstream changes, but no, because there will be fewer upstream changes. You can reduce the amount of software that is baked into a Docker image, compared to a full traditional distribution, thus reducing your attack surface and reducing the frequency with which the image needs to be rebuilt.

> And if so, how is that more efficient than just using Puppet / Chef / Ansible and a 'real' server?

"Real" servers are always (exception: NixOS) stateful, even if they are only used to host stateless services. Uninstalled packages leave behind old service users, buggy packages depend on specific user/group IDs that conflict in production but not where they were built, conflicting ports that are already bound. Designing for immutable infrastructure makes it easier to reason about state in production and therefore makes production easier to manage.


Basically just less rolling your own solutions. It's more political than technical.

"Devops" for me is just shorthand for we want the sysadmins on our team (or on a platform our team controls) versus a separate sysadmin org we can't control. K8S is one of many things that makes that more achievable.

That, of course, has it's obvious benefits and shortfalls.


Yes, but its less of an issue mainly because there is an orchestration layer over the top of the deployment layer.

Basically there is a thing that spins up machines, connects them to a thing that places programs on them to run. The scheduler then starts your program and maps in storage, outside world connections (HTTP reverse proxy, or a bit more advance and some level 7 routing [ie /v1 goes to progam a /v2 goes to b])

when it works it means that the developer really doesn't need to think tomuch about _where_ a program runs.

Basically its a feature incomplete mainframe clone.


One point that's not been addressed in your other replies is that Docker images should be a lot smaller than a "real" server so there should be fewer issues to patch.

If you base off a very minimal image (and make use of things like multi-stage builds to remove dev. tools from the image) you can get the package numbers down a lot.

For the ultimate in low dependencies of course you can build off scratch and put a single statically linked binary into the image.


> One point that's not been addressed in your other replies is that Docker images should be a lot smaller than a "real" server so there should be fewer issues to patch.

Could you quantify this?

It's possible to strip down a "real" server much farther than even what distros' "minimal" packages specify, which is why I wonder just how much smaller "a lot smaller" really is.

I commented on a short sub-thread [1] discussing dependency bloat/radii of distros. One commentor quantified it in bytes, though perhaps a more meaningful number in this context would be number of packages.

[1] https://news.ycombinator.com/item?id=17345982


> don't routine security updates mean you're constantly rebuilding and redeploying these images?

In my experience, no. Dockerfiles tend to obfuscate deeper dependencies. Either there's stale dependencies or uncontrolled versioning. In both cases you're running mystery meat in production.

> And if so, how is that more efficient than just using Puppet / Chef / Ansible and a 'real' server?

The fundamental DNA of configuration managers is to take a closed world - a server - as given and try to make it into the world as desired. But this assumes you can come up with a safe, terminating and not-too-long plan for doing so. Very frequently that is not the case.

What gets done in such cases? The server gets backed up, wiped and rebuilt from a clean state.

And that's more or less what you get from a container-centric substrate over a server-centric substrate. Just as I don't care about how the JVM allocates or collects my objects, I don't care about how a container orchestrator builds my processes.

The idea predates containers, but many ideas predate their economical realisation. I am most familiar with BOSH, which does this at a whole-VM-centric level (and which, like Kubernetes, was inspired by Borg). Or the 12 Factor App, which focused on this idea from a devops perspective. But I would be unsurprised to find carvings from the 4th Dynasty outlining a similar idea.


Docker is a dangerous gamble and you can get more of an automated build system, with less devops effort, from Terraform and Packer. Avoid containers and stick with real servers “baked” by Packer:

http://www.smashcompany.com/technology/docker-is-a-dangerous...


There's no reason you can build containers in an automated, routine fashion and use those to run your applications and services on. You don't have to run containers like joeblow/randomservice - start with Alpine from the Alpine maintainers (or CentOS or whatever) and write a custom dockerfile to build your stuff.


Docker aside, your arguments regarding fat binaries are kinda off mark. Python has been able to package an app and all of its dependencies into a single zip file - which is what a jar file really is - since before Go and Clojure even appeared (see PEP 273). Single-file packaging in JS is also common - very much used for browser deployment.

But "let's change languages because we don't have a reliable way to copy more than one file" sounds a bit insane to me. My solution was Debian packages, not Docker, though.


Why would I want the overhead of a hypervisor when I can just use the process isolation features built into the kernel?

Re: your essay

I couldn't quite tell but is it correct to say that you consider Docker a "dangerous gamble" because Docker Inc. may cease to exist in its current form at some point?


I've not been impressed by docker outside of the local development experience (primarily, creating sandboxes and avoiding installing stuff on the main OS, and not having the latency of starting up a VM).

This discussion is about Kubernetes though. Docker is mostly only incidental to Kubernetes. It's a much better thought through system than anything I've seen come out of the Docker world. Docker images are adequate; Dockerfile as a build mechanism for images is deeply flawed, requiring so many kludges and duct tape to accommodate shared dependencies, rebuild on upstream change, shrinking after building, etc.

Hopefully a better system for building images to run on Kubernetes will crop up.


that was a good read, thankyou


From Saturday's posts: http://catern.com/posts/docker.html An assessment of what Docker and similar actually are, and why they are unneeded redundant, insecure technology.


"The result was a system composed of many wavefronts of change: some systems were automated with Puppet, some with Terraform, some used ECS and others used straight EC2.

In 2012 we were proud to have an architecture that could evolve so frequently, letting us experiment continually, discovering what worked and doing more of it.

In 2017, however, we finally recognised that things had changed.

AWS is significantly more complex today than when we started using it. It provides an incredible amount of choice and power but not without cost. Any team that interacts with EC2 today must now navigate decisions on VPCs , networking and many, many more."

Of course, this time, it is different.


I do certainly wonder if the ever-increasing levels of complexity in the layers of abstraction will backfire in some way soon.

It seems the trend has accelerated recently.


Luckily there's an easy fix for that: adding more layers abstraction :D


I'll bet you have an "easy" fix for Social Security, too :)

Joking aside, I certainly understand the benefits of abstraction. As someone always points out in any discussion about ORMs, for example, abstractions are leaky. Whenever one has to learn about the inner workings of what the abstraction is hiding, some of that ease evaporates.


> [...] this time, it is different.

Literally a quote from my managers (not related to k8s or aws alone, though). They also made sure to repeat that phrase a few times so it becomes more believable.


"What's different this time?" is a classic question to ask yourself when evaluating tech choices. Sometimes the answer is that something does make it different this time, it's not just a rhetorical device to dismiss things that have been tried before.

Most things don't take off the first time they're tried. Think of all the current mainstream or trendy stuff, like virtualization, massively parallel coprocessors (now called GPUs), AI, multicore processors, etc. They all tried to come to market decades ago and had just niche success.


Is there a recommended way to handle database migrations in kubernetes? Is there a best practice or a tool for that?


This depends highly on the desired availability of your service and how you're shipping migrations. Some people are using init containers[0] to execute migrations before bringing up Pods in their Deployments. The deployment for Quay.io, for example, requires zero downtime, so for that a series of container images are rolled into production in order. Basically, you want different phases of the migration to have different read/write policies to new and old code-paths as you copy data into a new table. The tedious part of this is actually just writing the migration into these phases, not deploying them.

[0]: https://kubernetes.io/docs/concepts/workloads/pods/init-cont...


We've been packaging migrations into a container and just shipping it as a Kubernetes Job. It executes once, and if doesn't success (non-zero exit code), Kubernetes will reschedule it.


So if you want to migrate a particular env from schema version 3 to version 6, do you deploying 3 different init containers manually, or is there a single migration container that always contains all migrations and it by itself understands from which level to which level should the migration be run? Schema migration failure is a stop the world a scenario for our entire stack.


> Schema migration failure is a stop the world a scenario for our entire stack.

That's the big problem, and I don't believe that Kubernetes can help with that. It's too intimately tied to your application internals. The recommended practice for migrations is to do them in multiple zero-downtime steps, e.g. first deploy code that handles both old and new schema, then migrate, then deploy to remove transitional code.


Sorry, I forgot to mention that it is acceptable to stop the world. We do in-house deployments and a lot can fail on the customer's side (IT disabling the DB Server, network failure, etc.). We'd rather have the stop and be focused on fixing it ASAP, than be tricked into thinking that the issue may resolve itself.

What I need is a recipe on how to handle db migrations, which are quite frequent, every customer has different current version of the app and therefore is on different db schema. We have our own tool to do that, I suppose we could wrap into a kubernetes-something, the only thing special is that it should run : after old version containers are brought down, before new version of containers are brought up, run only single time, include backing up the db and stopping the entire process on failure.


If you're using Helm charts you can add hooks to a few points in the deployment process to give you database migrations. Currently I'm using an install hook to create the DB and upgrade hooks to migrate on deploy.


This is also something I've been wondering about. I've been trying to nudge my workplace to start moving to a containerized infrastructure, but this has been one of my nagging questions I've had. Admittedly I haven't done much research on the subject though.

I currently have a swarm cluster on DigitalOcean that I use for self hosting apps. And, only one of those required migrating the database before it could be deployed, so to get it running I just spun it up on one of the nodes and only connecting the database container to the data volume I was going to have it connect to in the swarm, migrated the db, then killed the containers and then deployed it to the swarm. But, as I was doing it I realized this would hardly be maintainable/enjoyable method of handling this in any non-personal/production environment.


We "kubectl run" the migration in CI as a simple bare pod. It will shutdown itself after migrations are done.

For getting the resource configuration, we render the "helm template", take out the workload's podspec and supply additional parameters to invoke the migration.


What do you mean when you say 'database migration'?

In my mind, there could be a few things:

1) Migrating the database from server A to server B where the server is on Kubernetes

A1) Don't do this. Don't run a (traditional) database server in Kubernetes. Sure, you can do this - there are all kinds of volume support for all kinds of things. Everything I've ever heard and read has told me that containers aren't a good for for this type of long-term service that gets refreshed infrequently.

2) Migrating the database from server A to server B where you have to tell all of the clients what the database is

A2) This should probably be done via service discovery or even just by a short TTL on a CNAME in DNS.

3) Something else?


Database migrations typically refer to schema/ddl changes to an application along side deployment of a new version.

Everything I've ever heard and read has told me that containers aren't a good for for this type of long-term service that gets refreshed infrequently.

That was a safe rule of thumb 5 years ago. Since then support for scheduling a persistent volume to be available along side your long running container has become a bog standard and boring feature.


> Don't run a (traditional) database server in Kubernetes.

With node affinities and persistent volumes this might not be A Bad Thing, per se... For something mission critical I'd keep the DB on a VM, though.

One interesting development in the Kubernetes space are plugins which expose VMs outside of the cluster to Kubernetes as though it were just another container. That may provide the best of both worlds: a traditional VM and automated scheduling.


Spring boot microservices using liquibase as schema evolver, we are using helm hooks to run a job pre-deploy that leverages a spring boot profile which runs the evolutions. If it fails, return non-zero exit code and the deploy doesn't continue. The only pain point we've run into with this model is that the configmaps that the service depends on (we use the same docker image in both the deployment and evolution job) aren't created before the hook is called, so we need to duplicate part of the config with -D startup params.


On premise I wouldn't suggest to run your DB inside kubernetes. It's ugly. Because on prem storage is not really solved yet (and google has no interest in changing that).

And in the cloud what would be a migration scenario? (serious question, never did it in the cloud but would suspect there are fewer scenarios in which migration is necessary)


the steady-state architecture reminds me of this hn story about the Lava Layer anti-pattern

https://news.ycombinator.com/item?id=8772641


Heads up, this article is from the future, apparently:

> In late 2017 all teams ran all their own AWS infrastructure. [...] In a little over a year that’s changed for all teams.


> We have close to 30 teams that run some or all of their workloads on our clusters. Approximately 70% of all HTTP traffic we serve is generated from applications within our Kubernetes clusters.

Sounds big. But then per wikipedia:

> uSwitch.com [...] allows consumers to compare prices for a range of energy, personal finance, insurance and communications services.

And:

> On 30 April 2015, the property website firm Zoopla agreed to purchase uSwitch from LDC for £160 million

So... a low bandwidth business (we're hardly talking Netflix here!) doing maybe, what, $10M in revenue annually and not growing fast enough to justify venture investment or IPO funding (they were a private acquisition!)...

Seriously, I'm sure they like it. But do they really, truly need Kubernetes? This really sounds like the kind of scale that can be achieved with 2-3 hand-managed servers, or maybe twice that number of AWS boxen.


Should any business that has SLAs and needs to be reliable ever rely on “2-3 hand-managed servers”? No.

Kubernetes isn’t only about scale. It also provides rolling upgrades and rollbacks. And failover. And DNS based service discovery. And there’s more. You can find solutions to these without Kubernetes but a lot of Kubernetes use is to get these, not simply for scale issues.


> And there’s more

Being able to reliably create your entire infrastructure on another platform in 'minutes', for example. Not to mention applications architected from the ground up around cloud-friendly and scale-friendly primitives...

2-3 hand-managed servers are great, but will absolutely warp your application and will slowly accrue configuration cruft. That's not terrible, but for many Real World issues portability and freedom to fire up wholly valid test-environments are game-changers. Even the acquisition stories are nicer.


Full-disclosure: I’m a Consulting Architect at Red Hat focused on OpenShift.

I like to say Kubernetes & OpenShift focus on availability of the cluster and applications as their primary concern. Many other concerns my customers want to impose are actually detrimental to the goal(s) they’re trying to achieve.


30 teams' services fit on 2-3 servers? We don't have anywhere near 30 teams, but at my last count end of last year there were ~250 repos, and everyday I get auto-emailed telling me I've been subscribed to a new repo. No way could we have only 2-3 servers.


I'm perpetually curious about what kind of resource usage you'd expect given those 250 repos, and which runtime you're deploying.

I've recently been playing with Go and looking at existing JVM-based services (Spring Boot + starters + our microservices). We've seen rewrites of small services change from ~500MB of RAM to ~10MB of RAM. 250 Go-based services would probably fit very comfortably on a small cluster of cheap-ish servers, depending on what they're doing and how much traffic they're handling.


Can you go into detail about what things changed in the rewrites that enabled this reduction in RAM usage?


The JVM is traditionally memory-hungry. There are lines of engineering underway which will change that quite a bit (GraalVM looks particularly promising), but for now seeing a JVM process merrily consuming hundreds of Mb of RAM as the base case is not unusual.


That's exactly it. Plus add in all of the Spring Boot stuff (autodiscovery/autowiring, behaviour that gets added simply because a .jar is on the classpath, etc etc) and you end up with a pretty large footprint for a pretty tiny app. For reference too, the deployment artifacts with Go are ~5MB vs. a 70MB .war file.

We've also just used really simple packages (the built-in net/http server for Go vs. embedded Tomcat, gorilla/mux vs. the Spring class annotations for routing), and I suspect that dramatically cuts down on the footprint too.


> 30 teams' services fit on 2-3 servers?

Why would system load scale with the number of engineers? Yeah. 30 "teams" worth of work can totally fit on one system. The sum total of all software engineering ever done before, I dunno, 1979 can fit on one box.

Really, that's my point. People are far too enthused with the "feeling" of working on a "big" project and not being sufficiently reasonable or conservative about the technologies they try to use.

Cluster deployment for an application this size just isn't needed or appropriate. It's cargo cult engineering, because all the cool kids are using k8s and we want to read about cool kids.


You can't see other reasons for running more servers than just system load?

We are a bit smaller than uSwitch (no Kubernetes though), but we maybe a thousand VMs spread a couple of dozen physical servers and some cloud stuff.

We have multiple environments across multiple sites for each of multiple enterprise customers with multiple server roles. There are no single points of failure in any server role and contractural/legal/privacy requirements to keep data (and staff access) separate between different 'zones'. There are integration points with 3rd party partners, logging systems, monitoring systems, bastion hosts, internal CI build clusters, internal business tools etc etc.

Beyond a certain size and complexity, you need to separate stuff out to stop people treading all over each other. You need to be able to contain breakage by ideally running a single service on each VM. None of this duplication was due to people wanting a big project - it grew over time out of necessity even with conservative 'enterprisey' attitudes to technology.


Scale isn't everything. Removing "hand-managed" from the entire process is an even bigger benefit.

It's really not that complicated, it's clustering software that lets you encapsulate your applications into containers and just write simple declarative YAML files while it takes care of actually running it as specified, and keeping it running regardless of what happens to the hardware.

Given all the managed offerings now with free master nodes, why would you purposely take on more ops overhead?


> simple declarative YAML files

If the config files and underlying infrastructure are changing all the time and have varying degrees of documentation, that's just shifting know-how from established tools to the newest fad, especially when k8s know-how is extremely scarce/expensive, and will leave you in a trial-and-error situation with unclear diagnostics if anything goes wrong due to the sheer complexity. Automation is of course not limited to k8s at all. I've witnessed moving perfectly running Terraform-like setups to more expensive k8s setups by admins just so that k8s appears on their resume. There are valid reasons to use k8s, mesos or openshift in big shops, but at this point k8s is just oversold and overhyped IMHO, and not a good match for startups ("our k8s guy comes next tuesday"). Especially when it only allows Docker containers, which is another "political" marchitecture solution with awkward technical constraints to compensate rather than a solution based on merit IMHO.


> k8s is just oversold and overhyped

Yes, like anything else is, but it also removes almost all ops overhead and is much faster and easier to maintain at the container level than recreating images and redeploying VMs.

Since the vast majority of startups just want to run apps and aren't doing any sophisticated lower-level infrastructure, trading terraform/chef/puppet for k8s yaml is a net win, especially since it also replaces several other accessory software like load balancers, service discovery, rolling deployments, etc. that you might need otherwise.


> easier to maintain at the container level than recreating images and redeploying VMs

I'm not sure I understand how that could be, since the best practice alluded to elsewhere in the thread is to recreate and redeploy (as part of ones normal CI/CD process, assuming that even exists) containers as a way of keeping up on security updates. How is that different, let alone easier, than doing so with VMs?

> startups just want to run apps and aren't doing any sophisticated lower-level infrastructure

That may be important at the very earliest stages, but that kind of concerted ignorance can be dangerous.

> it also replaces several other accessory software

The "also" being in addition to abstracting away all that tedious, "accessory" software. The problem is that, abstractions are leaky. At some point, it may well become important for someone in that startup to understand how/why the accessory works because it suddenly became critical to the business.

Of course, that point may never come, if the startup doesn't survive that long, so why bother thinking that far ahead?


Kubernetes is a solution for 90% of the effort and services needed for distributed apps running and interacting with each other over a cluster of servers. Those accessory services aren't abstracted, I never used that word. They are instead provided out of the box by K8S, leading to fewer individual components to run, maintain and monitor yourself. You can switch out components at any level, from a single container running your favorite webserver, to a full service mesh, to your own custom controllers.

Containers are much smaller than VMs, all the way down to just a single binary for your application if you want. Definition files are smaller. Restarting or replacing a container is much faster.

Your comment amounts to a big "what if", but this isn't a complex topic. Focusing on what you actually need is what leads to success. Any competent technical leadership will plan ahead and care about low-level detail when necessary. But you definitely don't need to unnecessarily worry about it, and that's what Kubernetes helps with.

Of course you can skip K8S altogether and have your setup, but that is rarely needed because it can now be solved with a common industry framework instead of bespoke solutions or special PaaS provider APIs. None of this is revolutionary, and no different than any other cost/benefit analysis for build/buy of any other component.


> They are instead provided out of the box by K8S, leading to fewer individual components to run, maintain and monitor yourself. You can switch out components at any level, from a single container running your favorite webserver, to a full service mesh, to your own custom controllers.

Fair enough, though I was misled by the word "replaces", which, to my mind, means something different than merely providing out of the box.

> Containers are much smaller than VMs

I don't disput that. I'm also well aware that size matters. However, you initial assertion was that it "removes almost all ops overhead" and that it is "much faster and easier to maintain" (emphasis mine), which merely increasing performance (as performed by computers, not humans), no matter the degree, doesn't do.

I'm still not seeing how the maintenance (done by humans) is any easier (or, for that matter, faster) than with VMs.

> Your comment amounts to a big "what if"

Only on the last sentence asks such a question. The rest of the comment asks different questions, some of which you've addressed above.

> Any competent technical leadership

This is a bit too "true Scotsman" to be useful. The question is what actually happens with actual leadership once a particular tool is in place. Does it encourage (inadvertently or otherwise) long-term dependence on the tool's ecosystem to the exclusion of those bespoke changes, or does it naturally peel away the leaky abstractions once they no longer hold value?

There are lessons to be learned from ORMs. Even early on, there was little (no?) controversy that the best practice would be to start off using them everywhere initially and replace them as needed. I don't believe this practice was followed, despite the pain, at least partially due to the perceptions of ease and consistency in sticking with ORM-everywhere.

> But you definitely don't need to unnecessarily worry about it

You may be conflating premature optimization (a strawman you detail in the next paragraph, which I won't address) and merely considerting the future, the "what if we succeed?" question, which is what I'm advocating.

I say it's quite necessary to at least think about the consequences of a tool choice in the context of eventual success, especially if the initial investment is high, but even if not.

All that said, I hope you've noticed, I'm not actually making any assertions about Kubernetes, just asking questions (and challenging assertions). I tend to take a position of skepticism with anything that seems to gain popularity through network effect, as it becomes extremely difficult to separate facts from hype, even in (especially in?) anecdotes.


"removes ops overhead" is about K8S, not containers. Less stuff in a container on a bare VM means less to manage and maintain, thus easier.

I don't get your point in "considering the future" - what are you saying exactly? Yes people should plan for it. You have to be competent enough to do that, and if you're not then that's a different discussion.

However, even if you have a poor team, then Kubernetes still helps because it's well designed, flexible, reliable, and can grow with your business. As stated, it's incredibly customizable. The entire system is designed around interfaces, like the CRI for container runtimes, CNI for networking, CSI for storage, and more. In fact the extensibility APIs like metacontrollers and CRDs are so powerful now that some of the main primitives (like StatefulSets) can even be recreated in just a few lines of code.

Given that, your question of "what if we succeed" (which must be tempered by the fact that very few need high scale, and fewer still will outgrow the capabilities of K8S) is answered by "you upgrade the things you need". It really is that simple. Kubernetes is well-designed, documented, battle-tested, and supported by a large community and major vendors, so it's a better choice for both early and late stage deployments.

Considering the major investments by every cloud vendor and the benefits detailed by hundreds of companies both large and small, I believe this has proven itself beyond just a hype cycle.


> 2-3 hand-managed servers

I think too many people in this sub-thread are harping on this excerpt, especially with regard to the "hand-managed" and failing to take the most charitable reading, as exhorted by the guidelines.

What commenters are, perhaps, failing to grasp, is that if there are only 3 (or even 6) servers in the environment, and each one has a unique configuration, then they're all, by some definition, hand-managed.

No amount of "automation" will reduce that fundamental administrative/cognitive burden.

Sure, there are some sensible best practices, even for that scenario, like, at minimum, storing config in version control. However, going all-in on a CM system (or, perhaps, Kubernetes) for 100 servers when all you have is 3 is, at best, premature optimization. There's a well-known aphorism in tech about that.


Heh, yeah. I realized after the third reply that my phrasing had turned into a downvote magnet.

There's also the problem that the cluster itself becomes a failure point. It too needs to run on physical servers that need to be provisioned and managed. Anyone who can fat-finger a regular iron server configuration can muck up a kubernetes deployment.

Ultimately, the top of the stack is always "hand-managed" in some sense. Even AWS has had user-induced failures.


Your tone is snarky and dismissive, but I kinda understand what you are saying and it is very relevant in essence. Kubernetes as fashion is a thing and for many teams probably a trap where the time/money invested in the IT infrastructure has no relation to the business value it brings.


Interesting to see this post downvoted. In my 1.5 years hands on experience with k8s I found that most of the problems we faced (even in much bigger scenarios) would have been solved quicker with simple Linux administration.

We also found that most people don't just run k8s but run it on VMs, with each k8s node exactly being one VM. So the overhead is even bigger to employ k8s in these scenarios (instead of reducing overhead which is one of the key claims of using containers vs VMs).


Hand-managed servers should not exist anymore outside of proof-of-concept work. There is no shortage of tools to automate provisioning and deployment, and Kubernetes is one of many options. Bringing up servers should be as automated and reproducible as your CI pipeline for building software -- when you discover problems, go ahead and fix them manually, but then you should update your automated processes to include them.


On the other hand, operating a bunch of price comparison sites sounds like the kind of thing that has tons of small moving parts for data import and such, which seems like a reasonable fit for an orchestration system like this. Especially if there's 30 teams making pieces of it.


I've seen way too many companies who started out with 2-3 hand-managed servers and suddenly found themselves with 100 hand-managed servers.

Configuration management and deployment is one of these things you should get right from the start.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: