What I found wrong in Docker 1.12

andrewguenther · on Aug 26, 2016

Disclaimer: I work at AWS, but on a product which does not compete with Docker or its orchestration tools in any way shape or form. My opinions are my own.

I wouldn't even limit this to just the swarm feature. We've been running Docker in production for a year, using it in dev environments a year before that, and we've had major problems nearly every release. We had to upgrade directly from Docker 1.7 to 1.11 because every release in between was too unstable or had glaring performance regressions. We ended up running a custom build and backporting features which were worth the risk.

Speaking of 1.12, my heart sank when I saw the announcement. Native swarm adds a huge level of complexity to an already unstable piece of software. Dockercon this year was just a spectacle to shove these new tools down everyone's throats and really made it feel like they saw the container parts of Docker as "complete." One of the keynote slides literally read "No one cares about containers." I get the feeling we'll be running 1.11 for quite some time...

tinco · on Aug 26, 2016

To provide some weight the other way, we've been using Docker in production for about 3 years now, and have not had any big issues. Obviously, you guys probably have a bit more extreme use cases at AWS. Things that bug us are generally missing features, but those gradually get added in over the course of the years, though some get less love than others.

For example for some reason it's still not possible to ADD something and change its permissions/ownership in one layer, resulting basically in a doubling of the size of such layers.

I wouldn't go as far as saying it's in any kind of a 'sad' state though. It's a neat wrapper over some cool Linux kernel features, and it's been that way since before 1.0.

I'm curious how you even get performance issues from Docker, what feature did cause performance issues for you?

andrewguenther · on Aug 26, 2016

Always fun to hear experiences from other production veterans. Glad to hear things are working well for you guys.

Our use case involves rapid creation and destruction of containers. Granted, this use case was pretty unheard of when we first adopted Docker, but it is becoming much more common.

Before Docker moved over to containerd, the docker daemon was riddled with locks which resulted in frequent dead-locking scenarios under load as well as poor performance. Thankfully, containerd now uses an event loop and is lock-free. This was a huge motivating factor for us to move forward to Docker 1.11.

To me, the sad state has more to do with Docker the company pushing new features out as quickly as possible and leaving stabilization to contributors. There are some days where it really feels like Docker is open-source so that Docker Inc can get free QA. To most users things may not feel in a sad state, but it can really suck for contributors.

byroot · on Aug 26, 2016

> the docker daemon was riddled with locks which resulted in frequent dead-locking scenarios under load as well as poor performance

I second this. We use Docker in a similar scenario for a distributed CI. So we spawn between 70k and 90k containers every day. Up to very recently we were running 1.9 and got a staggering 9% of failures due to diverse Docker bugs.

It's getting better though, since we upgraded to 1.12 a few days ago we're down to a more manageable 4%, but I'd still consider this very unreliable for an infrastructure tool.

edit: my metrics were slightly flawed, we're down to 4% not 0.5%

andrewguenther · on Aug 26, 2016

You were likely seeing the bug that kept us from deploying 1.9 which was related to corruption of the bit mask which managed IP address application. We saw failure rates very similar to yours with that issue.

liveoneggs · on Aug 26, 2016

how is this acceptable?

byroot · on Aug 26, 2016

You have to design for those failures. In our case we spawn 200 containers for one build, if 9% of those crashes, we still have a satisfactory experience.

In the end, at this scale even with four or five nines of reliability, you'd still have to deal with 80 or 8 failures everyday. So we would have to be resilient to those crashes anyway.

However it's a lot of wasted computing and performance that we'd love to get back. But even with those drawbacks our Docker based CI still run 2 to 3 times faster than our previous one because containers make heavy CI parallelism quite trivial.

Now maybe another container technology is more reliable, but at this point our entire infrastructure works with Docker because besides those warts it gives us other advantages that makes the overall thing worth it. So we stick with the devil we know ¯\_(ツ)_/¯.

zeveb · on Aug 26, 2016

> In our case we spawn 200 containers for one build, if 9% of those crashes, we still have a satisfactory experience.

You spawn 200 containers for one build‽ Egad, we really are at the end of days.

> But even with those drawbacks our Docker based CI still run 2 to 3 times faster than our previous one because containers make heavy CI parallelism quite trivial.

Since containers are just isolated processes, wouldn't just running processes be just as fast (if not slightly faster), without requiring 200 containers for a single build?

byroot · on Aug 26, 2016

> wouldn't just running processes be just as fast

The applications we test with this system have dependencies, both system packages and datastores. Containers allow us to isolate the test process with all the dependant datastores (MySQL, Redis, ElasticSearch, etc)

If we were to use regular processes we'd both have to ensure the environment is properly setup before running the tests, and also fiddle with tons of port configurations so we can run 16 MySQLs and 16 Redises on the same host.

See my other comment for more details https://news.ycombinator.com/item?id=12366824

dominotw · on Aug 26, 2016

CI can just recover from these error by retrying/restarting containers.

falsedan · on Aug 26, 2016

Not Dead containers (which failed their post-shutdown cleanup).

segmondy · on Aug 26, 2016

"move fast and do'break shit" philosophy.

nogox · on Aug 26, 2016

Where do you run the CI containers? AWS?

byroot · on Aug 26, 2016

Yes, on a pool of c4.8xlarge EC2 instances with up to 16 containers per instance.

But very little of our failures are accountable to AWS, restarting the Docker daemon "fix" most of them.

maxavant · on Aug 26, 2016

For a newbie, what is the reason you didn't use hosted CI, like Travis CI?

byroot · on Aug 26, 2016

Initially we were using an hosted CI (which I won't name), but it had tons of problems we couldn't fix, and we were against the wall in term of performance.

To put it simply when you run a distributed CI your performance is:

    setup_time + (test_run_time / parallelism)

So when you have a very large test suite, you can speedup the `test_run_time` part by increasing the parallelism, but the `setup_time` is a fixed cost you can't parallelize.

By setup_time I mean installing dependencies, preparing the DB schema and similar things. On our old hosted CI, we would easily end up with jobs spending 6 or 7 minutes setting up, and then 8 or 9 minutes actually running tests.

Now with our own system, we are able to build and push a docker image with the entirety of the CI environment in under 2 minutes, then all the jobs can pull and boot the docker image in 10-30 seconds and start running tests. So we were both able to make the setup faster, and to centralize it, so that our workers can actually spend their time running test and not pointlessly installing the same packages over and over again.

In the end for pretty much the same price we made our CI 2 to 3 times faster (there is a lot of variance) than the hosted one we were using before.

But all this is for our biggest applications, our small ones still use an hosted CI for now as it's much lower on maintenance for us, and I wouldn't recommend anyone going through this unless CI speed becomes a bottleneck for your organization.

initdaemon · on Aug 29, 2016

You didn't include the maintenance cost to manage your infrastructure and container platform, which you don't need to worry with a hosted service.

byroot · on Aug 29, 2016

Even with those it was still worth it. A couple people maintaining the CI is nothing if you can make the build of the 350 other developers twice as fast.

Also it's not like hosted CI is without maintenance, if you want it to not be totally sluggish, you have to use some quite complex scripts and caching strategies that need to be maintained.

nolite · on Aug 26, 2016

> "To me, the sad state has more to do with Docker the company pushing new features out as quickly as possible and leaving stabilization to contributors."

Side note.. I'm a production AWS user, with no plans to change, but I feel like AWS does this exact same thing with each reInvent. The announced products actually become available 6-12 months later, and actually "useable" and reliable 2 yrs later...

softawre · on Aug 26, 2016

You can spin this as they release "MVP" software and let early users drive direction. I mean, that's what I've heard.

nolite · on Aug 26, 2016

Yeah.. Except that's not how they spin it.

In practice, they wrap it in marketing speak to paint it as something to revolutionize your stack.

Then you jump in spending several days of engineering time diving into it, only to find late in the game that the one (or several) critical details you can't find in the documentation that are essential to making an end-to-end production ready pipeline, are not actually implemented yet...

And won't be for many months

tinco · on Aug 26, 2016

Alright that makes sense, most of our containers are long running, usually months. Only the containers that have our apps that are under active development will see multiple rollovers per day.

Now that I think about it we did have one semi-serious bug in Docker, though that was also our own fault. Our containers would log a lot, and we hadn't configured our rsyslog very well so under some circumstances its buffers would fill up and the log writes would become blocking and be real slow. When this would happen some commands like `docker ps` would totally lock up, which messed with our (hand rolled) orchestration system. It wasn't until one of us noticed the logs would be minutes behind that we discovered killing rsyslog would make docker responsive again and thus found out what was happening.

Since it didn't actually affect our service I didn't remember it as particularily bad, but I can imagine that if our service depended on having fast interactions with Docker that would have hurt bad. IIRC they did recognize the severity of the issue and quickly had a fix ready.

I bet Docker Inc. has a tough mission, building out Docker services far enough to compete with the dozens of platforms that integrate Docker such as AWS or OpenStack so they can actually make money off the enterprise.

frostyfrog · on Aug 26, 2016

If you don't mind me asking, What is your use case? The company that I work for is also spinning up and destroying containers constantly and we've had to develop a "spinup daemon" in order to deal with docker's slow spinup time (1-2 seconds is unnacceptable to me).

I'm curious if it'd be worth it to create some shim layer over runC (or adding the functionality) in-order to have a copy-on-write file-system that could be used to discard all changes when you're done with the container. Similar to how you can do a "docker run --rm -v /output:/output/34 mycontainer myapp" and all changes except those within the mounted volume get thrown away.

The use-case at my job needs the security of SELinux + CGroup/filesystem/network isolation. At a first glance, it looks like runC may handle most of the containerization bits, but not the copy-on-write filesystem stuff that I currently need. :s

andrewguenther · on Aug 26, 2016

I can't go into details on our use case, but if it can work for you, I highly recommend the new --tmpfs flag. If you know exactly where your application writes data and are okay with it being in memory, you can reuse your containers with a simple stop and start rather than waiting for the full setup of a new container.

With runC you can mount whatever filesystem you want, but it is up to you to setup that filesystem. So yes, you would need some kind of shim to set up your filesystem.

piva00 · on Aug 26, 2016

I've been using Docker in production for the past 2 and a half years (in two different companies) and even with no extreme use cases we've had problems with: performance of volumes/devicemapper, random breaking bugs: daemon would restart without warning or errors in 1.4, randomly killing containers in 1.9, having to restart the daemon in 1.8 when it hung pulling images (consequently killing the containers in the process).

I still like Docker and can see myself, team and company using it for a long time if nothing MUCH better show up (rkt is promising to take some of the complexity pain away but we are not diving into it yet) but I can't say I've not been bitten enough to completely avoid upgrading Docker if it isn't needed, we follow a rule to only upgrade to ".1" releases as most of our problems have been with ".0" ones.

SEJeff · on Aug 26, 2016

My favorite was docker exec -it $container bash would cause a nil pointer deterrence in docker 1.6.0 and kill the docker daemon. We've seen gobs of bugs since, but that was the most wtf gnarly one

cyphar · on Aug 26, 2016

I'd recommend looking at using runC (which is the underlying runtime underneath Docker). Currently we're heading for a 1.0, and the Open Containers Initiative is working on specifications that will make container runtimes interoperable and eventually provide tooling that works with all OCI runtimes. If you have anything to contribute, I would hope that you can give us a hand. :D

andrewguenther · on Aug 26, 2016

I'm a huge fan of the work being done on runC and would love to give you guys a hand! You'll probably see me around soon :)

mtanski · on Aug 26, 2016

The same experiences we switched to using rkt, supervised by upstart (and now systemd).

We have an "application" state template in our salt config and every docker update something would cause all of them to fail. Thankful the "application" state template abstracted running container enough were we switched from docker -> rkt under the covers without anybody noticing, except now we no longer fearing of container software updates.

mtanski · on Aug 26, 2016

An example of changing behavior that broke us not to long ago: https://github.com/docker/distribution/issues/1662 . By the time this happened we were already working on the transition, just more motivation.

pescerosso · on Aug 26, 2016

Hi mtanski,

How did you replace docker with rkt? Do you have an howto that you can share?

flexd · on Aug 26, 2016

I haven't replaced Docker with rkt on a big scale (or ran Docker on a big scale), but I recently changed over some Docker containers to rkt.

First off, this and the rest of the rkt docs is a good starting point https://coreos.com/rkt/docs/latest/rkt-vs-other-projects.htm...

Second, rkt runs Docker images without modifications, so you can swap over really easily https://coreos.com/rkt/docs/latest/running-docker-images.htm...

rkt uses acbuild (which is part of the application container specification, see https://github.com/appc/spec) to build images, and I had a very tiny Docker image just running a single Go process.

I just created a shell script that ran the required acbuild commands to get as similar image.

A good place to get started is the getting started guide https://coreos.com/rkt/docs/latest/getting-started-guide.htm...

Docker runs as a daemon, and rkt doesn't (which is one of the benefits). I just start my rkt container using systemd, so I have a systemd file with 'ExecStart=/usr/bin/rkt run myimage:1.23.4', but you can start the containers with whatever you want.

It's also possible to use rkt with Kubernetes, but I have not tried that yet. http://kubernetes.io/docs/getting-started-guides/rkt/

RRRA · on Aug 26, 2016

Not to mention 1.11's restart timer never were reset to 0 even after the container ran well for more than 10 seconds. (ie: after a few restart, your container would be waiting hours to start!).

This and I can attest to 1.12 problems listed in this article.

Can't remember the specific with 1.10, but basically, nothing really ever works as promised which make people waste a lot of time trying to make something work when it can't and second, doesn't give much trust in the product's stability.

I really wish they would collaborate a lot more and fragment their solutions in smaller module while keeping everything simple. I think they have a great product, but too much growing pain.

lacker · on Aug 26, 2016

If I were you I would add a disclaimer when criticizing Docker, mentioning that you work on AWS, since the products are competitive in some ways like EC2 Container Registry vs Docker Hub. It would be great for AWS if Docker simply focused on open source bug-fixing and let AWS provide the profitable services....

andrewguenther · on Aug 26, 2016

Added a disclaimer, however, the product I work on does not compete with Docker in any way. We actually rely on Docker quite heavily. No conspiracy here.

paulddraper · on Aug 26, 2016

"It'd be great if Docker wasn't profitable."

I sort of agree, but that's not entirely realistic :)

lcarlson · on Aug 26, 2016

Why the anti capitalistic sentiment? Is programmers want to get paid for our work, right?

brianwawok · on Aug 26, 2016

No. Programmers want huge paychecks but everyone ELSE should be FOSS and code for us for free

cyphar · on Aug 26, 2016

ahem Free software does not need to be gratis. There are several examples of companies which charge money for free software.

softawre · on Aug 26, 2016

ahem That's obviously not what is being discussed here.

> It would be great for AWS if Docker simply focused on open source bug-fixing and let AWS provide the profitable services....

cyphar · on Aug 26, 2016

I was responding to the specific, sarcastic, wording of this line "Programmers want huge paychecks but everyone ELSE should be FOSS and code for us for free".

brianwawok · on Aug 26, 2016

And I said FOSS not gratis with intent. Perhaps could have made it FOSS and Gratis.

Devs are mad ITunes is closed source. Mad windows is closed source. But happy to get a big paycheck if they work at Microsoft or Apple.

cyphar · on Aug 26, 2016

> But happy to get a big paycheck if they work at Microsoft or Apple.

I wouldn't ever want to work for a proprietary software company. But I admit that I'm on the extreme end on this debate.

StreamBright · on Aug 26, 2016

Would rkt be a worth to try alternative?

andrewguenther · on Aug 26, 2016

I really haven't looked at rkt as much as I should, but we're more likely to invest in looking at lower level tools like runC moving forward.

StreamBright · on Aug 26, 2016

Amazing thank you! I need to chose a containerization tech in the next month and I am pretty worried to go with Docker because I hear many stories about how it is not really production ready. Thanks for mentioning runC I will check it out.

andrewguenther · on Aug 26, 2016

All depends on your use case. RunC could be way too low level for what you need and Docker may be production ready for your specific use case.

StreamBright · on Aug 26, 2016

No, it is perfectly covering my use case. I need a _reliable_ containerization app that does not run any additional service on my boxes. I am working on an extremely low overhead orchestration for our cluster so we can avoid Swarm entirely.

hartem_ · on Aug 27, 2016

Disclaimer: I work at Mesosphere.

There are alternative runtime implementations (such as Mesos/Mesosphere DC/OS) that let you have best of both worlds: developers can still use Docker and produce Docker images but you use production-grade container orchestration (and that same Docker images) without using Docker daemon for your actual service deployment.

colemickens · on Aug 26, 2016

runC isn't an orchestration solution. It's a low-level component that can be (or is already) used by higher-level orchestration technologies.

cyphar · on Aug 26, 2016

We're actually working on getting OCI support into Kubernetes. It's a long way away, but we're very determined to get large orchestration engines to provide support for OCI runtimes (runC being the canonical example of such a runtime).

StreamBright · on Aug 26, 2016

Great, I do not need any orchestration solution at all. I need a container running thing that can encapsulate any software that we are developing (Java, C#, Node.JS, etc.). And now we are approaching the question what is my problem with Docker. I believe it is a misconception to compete with already existing tools like systemd. I especially do not want any mediocre orchestration solutions in my infrastructure that introduce big overhead and complexity that I do not need at all. One thing I learned along the way of managing large clusters (5K+ nodes) that Swarm like frameworks are extremely error prone. If you flip the problem and build a startup script the pulls down the container configuration from S3 for example and the container itself has code that attaches the instance to the right service (EKB, Haproxy, etc.) you can achieve the same without introducing services that sole purpose is to maintain a state that you do not need.

galdosdi · on Aug 26, 2016

If you want a container-like technology that already has the large cluster management, scaling built in and is ideal for software whose source code you control think about trying kubernetes (and/or any similar competitors).

jacques_chester · on Aug 27, 2016

Sounds like you want a PaaS.

Cloud Foundry is currently running real applications with 10k+ containers per installation. We are on track to test it with 250k app instances.

Plus it's been around for, in internet terms, eternity. Garden predates Docker, Diego predates Kubernetes, BOSH predates Terraform or CloudFormation and so on. Used by boring F1000 companies, which is why it's not talked about much on HN.

Disclosure: I work for Pivotal, we are the majority donors of engineering to Cloud Foundry.

sjellis · on Aug 26, 2016

I really do wonder what it would take to get the ecosystem to get behind rkt or something else. The present situation feels to me like it's held together by a shared desire to keep the Docker brand going, and that conflicts between Docker Inc. and basically everybody else just won't stop, because there is a lot of money involved for all sides.

StreamBright · on Aug 26, 2016

For me the question is more like: why should we bundle together containerization with anything else? Why couldn't we follow the unix philosophy and have the containers work together with the orchestration softwares and not tied together. CoreOS seems to have it kind of independent. Our biggest blocker is the lack of RPMs for CentOS/RedHat for rkt.

dominotw · on Aug 26, 2016

> I need to chose a containerization tech

You are more likely to run into issues with containerization itself ( cgroup, namespaces ect) than abstractions on top of it.

Unless you are doing some sort of orchestration on top of containers, you can't go wrong with any of the container abstractions.

fuzzy2 · on Aug 26, 2016

Hm, I don’t think so. It just doesn’t have enough momentum and as such there aren’t many containers available. Of course, if you want to roll your own, that may not be relevant.

I tried it with the nginx and php-fpm Docker containers, but it wouldn’t work – because those containers assume specific process hierarchies (to log to the console where you issued `docker run`) that just aren’t present when using rkt. The advertised Docker compatibility only goes so far.

I still think rkt is a great idea, but I’m too lazy to develop my own containers. The documentation isn’t that good either.

sjellis · on Aug 26, 2016

To be fair, you really have to roll your own containers for applications, and it's not hard at all, if you already know how the applications are hosted are on a Linux server.

I've found the single-service Docker containers from the Hub are useful for development (MySQL, Redis, etc.), but the "official" language run-time Docker containers that I've looked at are basically demoware. They are built to give you something that runs with the minimum of effort, rather than being efficient or anything else.

cookiecaper · on Aug 26, 2016

They need to implement support for the Dockerfile format if they want to win. People value inertia. The switching costs have to be low if you expect anyone to switch; this only become non-true when the incumbent becomes intolerably useless to the general userbase.

philips · on Aug 26, 2016

rkt can run any docker image built by a Dockerfile: https://coreos.com/rkt/docs/latest/running-docker-images.htm...

I agree that we need a better ecosystem of build tools and that is something we are looking to help build out. But, with rkt what we are trying to do is build an excellent runtime; and think the build side is an important and orthogonal problem.

nullcipher · on Aug 26, 2016

There is no proper ecosystem for rkt. All I have seen is marketing hype and I don't know anyone who uses it. Just go to any meetup.

cookiecaper · on Aug 26, 2016

We're in the process of converting our 100+ cloud nodes to Docker+k8s and I have a lot of the same reservations -- the space is very immature and the tooling has a lot of kinks to work out, not only functionally but also aesthetically. It's already been a nightmare and we're not even deployed to prod yet.

justinsb · on Aug 26, 2016

If "cloud" is AWS, you should join the kubernetes slack sig-aws channel. Lots of community people figuring out those kinks together.

jacques_chester · on Aug 27, 2016

It's obligatory for me to recommend Cloud Foundry here. We've already built the platform, there's no need to build and maintain your own.

It just works. Really well, actually.

Disclosure: I work for Pivotal, we donate the majority of engineering to Cloud Foundry.

kordless · on Aug 26, 2016

If you haven't heard of Giant Swarm, I encourage you to contact them. They have a scalable microservices provisioning solution that can use either Docker or Kubernetes. German company. Disclaimer: I worked for them last year. Holler if you need an intro.

InTheArena · on Aug 26, 2016

There are a couple of things here. 1) Right now everyone is afraid that Docker will emulate VMware, and crowd them out of the container space, much like VMware killed most of their competitors. 2) To this end, I have heard that Google and Redhat have massive marketing budgets, and that the marching orders have been over and over - don't say docker, say k8s. 3) The real battle is where the money is - large scale distributed systems. Companies want to freeze docker out, because Docker controls the lowest point of access - the container runtime itself. 4) hence google is trying to push "docker compatible" ideas that are just the OCI standard - nothing to do with Docker itself.

AWS doesn't want to support Swarm, because it gives people portability off of their cloud. Google doesn't want to support swarm, because K8s is a trojan for Google Cloud. No one else wants to support swarm because it competes with their products.

That said, what's happening right now, if we are not careful, will fragment the container ecosystem, and it make it impossible for single containers to target multiple runtimes.

Docker is the only one who can deliver a universal set of functionality that is leveraged by all. From a technology point of view, Docker is going in the right direction. We got burned with Redhat in Openshift 1 & 2 land, and that's left us with a point of view that the only thing we can depend on is a container runtime itself, and 12fa applications.

K8s does not really work that way. It's huge and it's heavy, and it expects every app to be written it's way.

The technical direction here for Docker is good. But the implementation and early release is ridiculous. I was impressed by the first RC release, and then terrified that they released a RC as production.

wstrange · on Aug 26, 2016

> Docker is the only one who can deliver a universal set of functionality that is leveraged by all.

Why do you say that? I have quite a bit more faith in the design chops of the folks behind Kubernetes (Google, Redhat, CoreOS, and many others) than Docker Inc.

Swarm really only touches the surface of the requirements for large scale distributed container orchestration.

Kubernetes is complex because the problem it attempts to solve is complex.

I'd also add that Kubernetes is dead simple to use. The difficulty is in setting it up - but even that is getting much better.

InTheArena · on Aug 26, 2016

Good question. K8s has a network mode that is incompatible with swarm, mesos and nomad. Swarm only touches the very top of requirements for complex deployments, but going into K8s, the way they do thing pretty much prevents separate container orchestration systems from working in parallel.

For it to be universal, it has to live in the container runtime.

atombender · on Aug 26, 2016

> K8s does not really work that way. It's huge and it's heavy, and it expects every app to be written it's way.

I disagree. Kubernetes is quite lightweight, and its architecture is nicely modular. The core of Kubernetes is just four daemons. You can also deploy most of its processes on Kubernetes, which greatly eases the operational side.

> and it expects every app to be written it's way.

Kubernetes makes zero assumptions about how an app is written, as long as it can run as a Docker (or rkt) image.

It imposes certain requirements, such as that pods are allocated unique IP addresses and share networking between containers, but that doesn't really impact how apps are written.

> K8s is a trojan for Google Cloud

Doubt it very much. For one, the Kubernetes experience on GCloud (GKE) isn't particularly good at all — the "one click" setup uses the same Salt ball of spaghetti that kube-up.sh uses, the upgrade story isn't great, alpha/beta features are disabled, you can't disable privileged containers, ABAC disabled, the only dashboard is the Kubernetes Dashboard app (which is still a toy), and GCloud still doesn't have internal load balancers. Setting it up from scratch is preferable, even on GCE.

Additionally:

* Kubernetes has excellent support for AWS as well as technologies such as Flannel for running on providers with less flexible networking.

* Google makes a lot of effort to help you to set it up on other providers (also see kube-up).

* Projects like Minikube let you run it locally.

If Kubernetes is a "trojan" of anything, it's to improve the containerization situation generally, because this is an application deployment model where they can compete with AWS, which doesn't have a good container story at all (ECS is pretty awful).

parasubvert · on Aug 30, 2016

The arguably whole reason Google is sponsoring K8S is to promote GCE and GKE. It's their main long term game play vs. AWS (moving the world to containers instead of VMs).

joefern1 · on Aug 26, 2016

Disclaimer: I work for Red Hat on OpenShift.

I apologize for your experience with Red Hat OpenShift 1 & 2. OpenShift 3, which has been out for more than a year now, is natively built around both docker and kubernetes. Red Hat developers are among the top contributors to docker, kubernetes, and OCI. With OpenShift we seek to provide an enterprise-ready container platform, built on standard open source technologies, available as both software and public cloud service. I hope you will give us another look!

jacques_chester · on Aug 27, 2016

I work for what is a Red Hat competitor in this space, Pivotal.

Like this fellow says, OpenShift 3 is lightyears ahead of 1 & 2.

(Obviously, my horse in this race is Cloud Foundry)

thesandlord · on Aug 26, 2016

I work for Google Cloud (though my opinions are my own).

If people want to run Swarm or Nomad or Rancher on Compute Engine, then more power to them!

In fact, I even open sourced deployment templates to run Swarm on GCE and hopefully will add autoscaling and load balancing soon: https://github.com/thesandlord/google-cloud-swarm

cookiecaper · on Aug 26, 2016

I agree with you that lock-in is a big motivator here. It's always been king in the software space. As you point out, k8s exists as a public project specifically to diminish AWS's lock-in and make it simple to deploy out to other cloud providers (Google Cloud specifically).

cmcluck · on Aug 26, 2016

Disclaimer: I work at Google and was a founder of the Kubernetes project.

In a nutshell yes. We recognized pretty early on that fear of lockin was a major influencing factor in cloud buying decisions. We saw it mostly as holding us back in cloud: customers were reluctant to bet on GCE (our first product here at Google) in the early days because they were worried about betting on a proprietary system that wasn't easily portable. This was compounded by the fact that people were worried about our commitment to cloud (we are all in for the record, in case people are still wondering :) ). On the positive side we also saw lots of other people who were worried about how locked in they were getting to Amazon, and many at very least wanted to have two providers so they could play one off against the other for pricing.

Our hypothesis was pretty simple: create a 'logical computing' platform that works everywhere, and maybe, if customers liked what we had built they would try our version. And if they didn't, they could go somewhere else without significant effort. We figured at the end of the day we would be able to provide a high quality service without doing weird things in the community since our infrastructure is legitimately good, and we are good at operations. We also didn't have to agonize about extracting lots of money out of the orchestration system since we could just rely on monetization of the basic infrastructure. This has actually worked out pretty well. GKE (Google Container Engine) has grown far faster than GCE (actually faster than any product I have see) and the message around zero lock-in plays well with customers.

jacques_chester · on Aug 27, 2016

Not speaking in an official capacity, but the analogy I've seen used is that big companies don't want to relive the RDBMS vendor lock-in experience.

I'm speaking about something other than k8s (Cloud Foundry), but the industry mood is the same. Folk want portability amongst IaaSes. Google are an underdog in that market, so it behooves them to support that effort -- to the point that there are Google teams helping with Cloud Foundry on GCP.

Disclosure: I work for Pivotal, we donate the majority of engineering to Cloud Foundry.

roman_sf · on Aug 29, 2016

k8s is essentially "aws in a box" and it's a product that locks. As soon as k8s cluster is running in GKE - it becomes not that portable at all, due to operational complexity as well as tide up to the google infra.

user5994461 · on Aug 30, 2016

> That said, what's happening right now, if we are not careful, will fragment the container ecosystem, and it makes it impossible for single containers to target multiple runtimes.

Not a chance. There is Packer [0] to get rid of all potential lock-in and monopoly. It's a universal image/container creation tool.

- It re-uses your ansible/chef/puppet/shell/whatever scripts for setting up the image.

- It outputs a docker containers, Amazon AMI, Google images, VmWare Images, VirtualBox Images. Whichever you like, with the same configuration.

[0] https://www.packer.io/

kozikow · on Aug 26, 2016

I wish that docker would adapt more of a Unix philosophy and focus on doing one thing well. Why does everyone have to compete with everyone rather than create set of tools that work well together?

I see docker-machine and docker-swarm as distractions. Reasons why doing all those other things, instead of focusing on containerisation and packaging may be harm-full for docker itself:

- Bundling-in the orchestration with docker make k8s or Mesos more inclined to fork docker and pull out unnecessary cruft.

- Churning out half-ready features causes Docker to be known as unreliable and leads to posts with titles like this. Such reputation tends to stay long after bugs are fixed. SV-esque launch and iterate works for web apps, but IMO not for back-end software.

rglullis · on Aug 26, 2016

"In infantry battles, he told us, there is only one strategy: Fire and Motion. You move towards the enemy while firing your weapon. The firing forces him to keep his head down so he can't fire at you. (That's what the soldiers mean when they shout "cover me." It means, "fire at our enemy so he has to duck and can't fire at me while I run across this street, here." It works.) The motion allows you to conquer territory and get closer to your enemy, where your shots are much more likely to hit their target. If you're not moving, the enemy gets to decide what happens, which is not a good thing. If you're not firing, the enemy will fire at you, pinning you down."

From one of Spolky's finest, Fire and Motion: http://www.joelonsoftware.com/articles/fog0000000339.html

JonnieCache · on Aug 26, 2016

"If only the Generals had not been content to fight machine-gun bullets with the breasts of gallant young men, and think that that was waging war."

- Churchill 1931

SixSigma · on Aug 26, 2016

I want you to think very seriously over this question of poison gas. I would not use it unless it could be shown either that (a) it was life or death for us, or (b) that it would shorten the war by a year.

It is absurd to consider morality on this topic when everybody used it in the last war without a word of complaint from the moralists or the Church. On the other hand, in the last war bombing of open cities was regarded as forbidden. Now everybody does it as a matter of course. It is simply a question of fashion changing as she does between long and short skirts for women.

- Churchill 1944

randylahey · on Aug 26, 2016

"War,” writes von Clausewitz, “is an act of violence intended to compel our opponent to fulfil our will…This is the way in which the matter must be viewed, and it is to no purpose, it is even against one’s own interest, to turn away from the consideration of the real nature of the affair because the horror of its elements excites repugnance.”

gaius · on Aug 26, 2016

Supreme excellence consists of breaking the enemy's will, without fighting -- Sun Tzu

gerbilly · on Aug 26, 2016

This!

This analogy perfectly captures why in software, the second best thing always wins :-) [1]

The utopian 'ideal' systems that can only be built slowly and methodically get crowded out by the systems that start with some scruffy code just keep moving.

[1] UNIX vs Multics, Windows vs OS/2, MongoDB vs ?

fapjacks · on Aug 26, 2016

In the infantry, we learn "Shoot, move, communicate".

joerg84 · on Aug 26, 2016

"- Bundling-in the orchestration with docker make k8s or Mesos more inclined to fork docker and pull out unnecessary cruft."

Mesos 1.0 already introduced the universal containerizer, allowing to run many docker images natively without the docker daemon:

https://www.youtube.com/watch?v=rHUngcGgzVM&index=14&list=PL...

http://mesos.apache.org/documentation/container-image/

solatic · on Aug 26, 2016

Because shipping features is king. When a single vendor ships many products, those products are more likely to work well together than with products from other vendors because they were built under the same roof. Better integration = saving time while reducing risk = more time to build and ship features.

The downside, of course, is vendor lock-in. But that's only a problem if a) the vendor stops updating their products, which is unlikely if those products are popular, like Docker, or b) the vendor raises prices beyond what can be justified to remain that vendor's customer. But that's a problem for whoever takes over your project next year, not for you.

sjellis · on Aug 26, 2016

I would say that the worst problem is the vendor changing the product in some incompatible way as part of a monetization strategy, so that Product X still exists, but the install-base splits.

Docker Inc. themselves make a big play of the facts that containers have to be standardized to be portable, and the portability is the key value. We could have done a lot of this stuff years ago if the virtualization vendors had a totally portable format for transferring VMs between different systems.

Swarm etc. is part of the monetization strategy - other vendors in the ecosystem have already backed Kubernetes, or MesoSphere, or whatever, and do not want or need this stuff tied to the Docker run-time itself. Fortunately, Docker Inc. can add these without breaking compatibility of images or damaging the core features enough that a fork becomes necessary, but it does create market confusion.

helloworldkitty · on Aug 28, 2016

This attitude is destroying their ecosystem. Many integrating authors (not going to name names, but plugins, et al) are feeling the heat and focusing their efforts on Kube and Mesos because Docker will just replace your shit with a half-baked thing in 6 months and everyone will flock to that.

Conversely, it's not been a surprise for those of us embedded in this community for a long time to see Kube and rkt join forces. There are a ton of both technical and political decisions behind this and unfortunately most of the political barriers end in the name Hykes.

kozikow · on Aug 26, 2016

Everyone tries to get the 100% of the orchestration pie. Pie gets smaller as barrier of entry is higher due to the fragmentation.

If those companies would focus on securing position in the pie, rather than owning the whole pie, the pie would grow quicker and thus the absolute returns of each player could be better.

louis-paul · on Aug 26, 2016

As a very well-funded startup, they can't just build tools that do one job. They have to build a complete platform.

cryptica · on Aug 26, 2016

Yes maybe the Docker team should just have joined forces with Kubernetes (K8s) instead of going out on their own and building Swarm from scratch.

K8s is far ahead of Swarm - K8s has practically built its own language using YAML files - Swarm is still at a stage that all the configs for a service have to fit into a single command (and the options are much more restrictive than K8s).

To be fair, I do like some things about Swarm better than K8s (based on the docs), but in practice, Swarm is behind and they should tell you that up front. When I was just starting out, I literally had to install all of them; Swarm, Mesos and K8s to be able to make an informed decision because, in the case of Docker, the docs are like 6 months ahead of reality. I didn't realize that the Docker 'service' command didn't even exist until v1.12 and I couldn't install v1.12 on my machine (last time I tried, installation was failing - Obviously not yet stable).

I think Swarm has potential but they need to accept that they're just not going to be the first to market.

siegecraft · on Aug 26, 2016

To be fair, I don't think they built Swarm from scratch, I think it is/was a rebranding of an acquired product. That being said, Docker and swarm in particular move too fast for their docs to keep up (let's change the syntax for some important commands between the final RC and the release?) and it feels like the only way to be well informed is to scour the github issues, which seems wrong for something that's touted as a stable, commercially-supported product.

majewsky · on Aug 26, 2016

> move too fast for their docs to keep up

That's not a matter of moving too fast, it's a matter of broken processes. A user-visible change that does not update the docs ought to be rejected during review. If they don't have these basic development processes nailed down, that does not instill confidence in me regarding the quality of their shipped code. And that, of course, fits nicely with the reports of buggy .0 releases.

justincormack · on Aug 26, 2016

Swarmkit was built from scratch, based on lessons learned in the previous non integrated Swarm.

lcalcote · on Aug 30, 2016

This is true. The team packed a lot into this new codebase. Many miles to go, however.

drdaeman · on Aug 26, 2016

Swarm works (more or less) if one has just 2-3 hosts, use Docker for packaging, and want a semi-unified view of those machines.

Kubernetes seems to be quite highly opinionated toward "clouds" and "microservices". I just wasn't able to wrap my head over its concepts' applicability to my "I just have one server that uses Docker for packaging, and now want to throw in another, for resiliency" case.

atombender · on Aug 26, 2016

Kubernetes isn't particularly opinionated at all. It runs containers, and doesn't care what those containers are or how they behave. Microservices and clouds not required.

Its core data model, simplified, that of pods. A pod specifies one or more named containers that should run together as a unit. A pod's config can specify many things, such as dependencies (volumes, secrets, configs), resource limits and ports (including how to perform health checks). You can deploy single-container pods, and this is the norm, but it's entirely feasible to run a whole bunch of containers that conceptually belong together.

To expose a pod's ports to the world or to other pods, you define services. These simplify specify what ports should go which pods, and Kubernetes will assign a persistent, internal IP address to it. Kubernetes will (typically) configure iptables so that the service is round-robin-balanced at the network level across all containers that it serves; the idea is that the pod should be reachable from any other pod in the cluster. Together with KubeDNS, which resolves service names, you can do things like call http://mylittlepod/ to reach a pod.

To achieve resilience, Kubernetes lets you define replica sets, which are rules that says "this pod should run with N replicas". K8s will use the scheduler to enforce this rule, ensuring that a pod is restarted if it dies and always has N replicas running, and it can automatically ensure that pods are spread evenly out across the cluster. Replica sets can be scaled up and down, automatically or manually.

There are other objects, such as deployments (handle rolling upgrades between one version of a pod and another), ingresses (configures load-balancers to expose HTTP paths/hosts on public IPs), secrets (encrypted data that pods can mount as files or envvars), persistent volumes (e.g. AWS EBS volumes that be mounted into a pod), and so on, but you can get by with just pods and services, at least to start.

Kubernetes is a bit pointless with a single server, but adds convenience even if you have just two or three.

BraveNewCurency · on Aug 27, 2016

> "I just have one server that uses Docker for packaging, and now want to throw in another, for resiliency"

Yes, when you only have a couple of servers, that is not the sweet spot for K8s.

But few people stop at 2 servers. A few months in, someone asks for a staging environment, and/or QA environment. Someone eventually realizes that they need to regularly test their fail-over and backups. Someone hires a contractor, and wants to give them a copy of the setup that won't block anyone else. Someone realizes we can centralize the logs from all these environments... And so it goes.

Even with one server, sometimes you go to do an upgrade, and find your "one server" is actually a tightly coupled bunch of services. (Made-up example: I want to upgrade Varnish, but it requires a newer library that is incompatible with my WordPress version.) That one server could be a server for the Database, one to run the cron jobs, a few for the cache layer, etc. If you break up those into different boxes, you can scale them better -- Instead of one big beefy server, you can have each layer at it's own scale (one or more wimpy boxes, dynamically adjusted).

You don't do this to save money directly. But by simplifying things, you make it easier to maintain. That saves labor, plus prevents problems (and makes it easier to hire and train ops.)

When you have just a few servers, it looks manageable. As you grow, it gets a lot harder to manage. K8s helps.

gnur · on Aug 26, 2016

The new swarm mode is great when it works, but it is monster to debug. It takes care of so many things that it is nearly impossible to pin point what component isn't working.

The issues I have encountered with swarm mode:

* Some containers could not use hostnames to connect to other containers.

* Sometimes, in a 3 node swarm, containers on A could be reached from B, but not from C.

* After every reboot these issues could be fixed or start occurring.

* It automatically adds firewall rules for every service you port map to be exposed to the internet, without warning

In the end I switched to nomad, it isn't perfect either but at least it is consistent.

dexterbt1 · on Aug 26, 2016

Do you have the links to the issue tracker for each of these issues? For me/us to subscribe/follow.

nickbauman · on Aug 26, 2016

If you look at the design of Kubernetes you'll find a very strong opinion on how networking is done. Kubernetes does not allow the use of Network Address Translation (NAT) for container-to-container or for container-to-node traffic. The internal container IP address must match the IP address that is used to communicate with it. Swarm plays faster and looser than this. I think Google's age and experience shows that Docker went the wrong way.

InTheArena · on Aug 26, 2016

If by google's age and experience, you mean the requirements of the Google Cloud, then i agree.

Docker is doing exactly what they should be, but in a manner that is destructive. Getting a built in consistent p2p routing mesh under every container is brilliant, and fixes one of the biggest problems with k8s and swarm (it's not really possible for these technologies to interoperate because of incompatibilities with the network model).

The big problem is the stability hit. 1.12 had no business loosing the rc label.

thockingoog · on Aug 26, 2016

disclosure: Kubernetes engineer

> If by google's age and experience, you mean the requirements of the Google Cloud

Quite on the contrary. Kubernetes flies in the face of Google's cloud APIs, and has to take advantage of every dirty trick it can. But it does that because the result is better. I can say that without hesitation, having worked on the logical conclusion of port-mapping (Borg).

> Getting a built in consistent p2p routing mesh under every container is brilliant, and fixes one of the biggest problems with k8s

That's hilarious to me, because what Docker calls "routing mesh" is a feature that Kubernetes has had since 1.0. It's different in some subtle ways, but again, for really important reasons.

dward · on Aug 26, 2016

> If by google's age and experience, you mean the requirements of the Google Cloud, then i agree.

Why do you think this is a requirement of Google Cloud?

InTheArena · on Aug 26, 2016

Kubernetes and Google Cloud were both informed by the design and implemention of Borg. Kubernetes is basically for all intents and purposes Google Container Engine. That's fine, but it;s highly tuned for how google sees the world.

nickbauman · on Aug 26, 2016

Huh? Kubernetes can run just fine outside of Google's Cloud. It'll work on any TCP/IP IaaS offering out there. If you mean it demands a clear end-to-end connection model for the important moving parts, then, yeah; because of hard-won experience of what works. They found out that you want to spend your time on bugs in your app, not bugs in your networking infra.

colemickens · on Aug 26, 2016

Swarm Mode is "great". Assuming you've never heard of or used Kubernetes. In which case, Docker Swarm is too little, and a year+ too late.

As for marketing, it does seem a bit funny that a product would announce "deep integration with underlying infrastructure" for a cloud provider when they haven't written a single line of (public) code to actually support that cloud provider.

The fun thing is this arguably critical blog post praises features in "swarm mode" that have long been present in Kubernetes/Mesos/Nomad: [labels/constraints].

There could be a lot written about the fact that Docker ships "Swarm Mode" as stable in 1.12 despite virtually everyone's actual first-hand experiences. I would argue that if "Swarm Mode" were not shipped inside of Docker 1.12 and didn't benefit from riding along with the normal `docker` package, few would be talking about it.

rusher81572 · on Aug 26, 2016

Yeah, Docker has a lot of catching up to do. They bragged a lot lately about how Swarm is faster than K8s but you really can not compare the two when you look at features and stability of k8s.

sjellis · on Aug 26, 2016

It's amazing (and not in a good way) that this stuff is being shipped in a point release.

CSDude · on Aug 26, 2016

I also assumed the new easy swarm would be easier, but the Multi-host network only works with the newly introduced service command only, it does not work with regular docker run command, which is disappointing because you still neeed a third party key-value store for it, but not for docker service. It literally took me hours to realize that was missing. I think 1.12 was just rushed to show it in DockerCon, normally major releases were mostly bug-free and worked as intended.

rusher81572 · on Aug 26, 2016

I feel the pain for I was in the same boat.

systemz · on Aug 26, 2016

Docker team, please focus on one thing - packing apps to containers. Leave everything else to other projects, don't try to do everything wrong, just do one thing good. Thank you.

sjellis · on Aug 26, 2016

The problem is that they literally can't do that - Docker Inc. took a huge pile of VC money, and containerization is a commodity feature. To succeed they have to build higher-level products and services, which put them in direct competition with the other vendors in the ecosystem.

dominotw · on Aug 26, 2016

>To succeed they have to build higher-level products and services

I wish they didn't have to cram it all into single product docker though.

illumin8 · on Aug 26, 2016

Great point - it would be nice if they separated out their monetization products (Swarm, Docker Datacenter, etc) from the core Docker engine. There is plenty of money to be made from enterprise customers who will want the security features of Docker Datacenter, or support for Swarm, without bundling your orchestration layer with the container runtime engine.

This type of bundling reminds me of Microsoft bundling IE with Windows. Initially, IE was much worse than Netscape, and this seems like roughly the same thing - monopoly in one market bundling an inferior product to try and achieve a monopoly in another somewhat related market.

hinkley · on Aug 26, 2016

If they didn't at least try to have a complete solution, someone else would package docker as one part of a turnkey solution and they would get stuck with all the maintenance costs and none of the consulting fees.

Complicit in their own destruction.

jacques_chester · on Aug 27, 2016

> someone else would package docker as one part of a turnkey solution

Indeed, Red Hat have done exactly that with OpenShift 3.

Disclosure: I work on Cloud Foundry for Pivotal, ostensibly a competing project/product.

adamc · on Aug 26, 2016

Sounds like a high-risk ploy. Then again, VC money.

sjellis · on Aug 26, 2016

What annoys me about it is that it requires everyone else to lose for Docker Inc. to win. It's rather like if Linus invented Git, then formed a VC-backed company on the back of it, and then tried to figure out how to extract enough money from the ecosystem so that Git, Inc. could be flipped or IPO'ed for a really big pile of cash.

corford · on Aug 26, 2016

Agree strongly. I think they will lose this fight which is why I've stayed out of the docker race and am currently pinning my hopes on k8s + rkt.

boomstik · on Aug 26, 2016

I haven't played enough with Docker Swarm to run into any of these issues, but Rancher (www.rancher.com) does all of this, and more, very well. I am not affiliated with them - just a happy user and contributor of github issues.

drdaeman · on Aug 26, 2016

Went with them last week, as it seemed to be one of the most sane options out there (I've tried a lot). Still, has its share of issues. Few ones I've encountered so far:

- Their built-in load balancing (HAProxy-based) is nearly impossible to debug. Literally no logging there.

- No locality awareness. DNS queries always return all addresses they know about (including those that don't even work - https://github.com/rancher/rancher/issues/5792) and I haven't yet found any good way to prioritize containers co-running on the local host to the more distant ones (https://github.com/rancher/rancher/issues/5798 - if someone been to this situation and has some ideas, would appreciate any suggestions!).

- Storage management was advertised, but can't find anything besides NFS (which is SPOF) and Amazon EFS (which I don't use). There was GlusterFS support, but it seems it was too broken so they had removed it or something like that. If one wants persistent storage, they'd better pin containers to hosts.

bboreham · on Aug 27, 2016

> I haven't yet found any good way to prioritize containers co-running on the local host

You may find that RFC3484 helps; it prefers the address with the longest prefix in common with your own address so will tend to pick your own address. And you are probably getting this behaviour already.

drdaeman · on Aug 27, 2016

Thanks for the pointers! Unfortunately, I don't think this applies. I'm certainly not getting this behavior - I wouldn't have even thought of it if I haven't observed higher latencies resulting from (sub)requests chain jumping from node to node back-and-forth, instead of staying within the node's boundaries.

The problem is, the network space there is flat, not hierarchical - while I haven't looked at the actual implementation code, I believe container addresses are just randomly chosen from a single big 10.42/16 subnet and I'm unaware if there's a way that I can assign hosts, say, a /20 out of that space (yes, this would've solved things nicely).

bboreham · on Aug 27, 2016

Oh. Right. I happen to work on a different Docker network, which 'chunks' the address space so containers on the same machine are very likely to have contiguous addresses. Hadn't occurred to me theirs doesn't do that.

drdaeman · on Aug 29, 2016

Just curious - what networking/clustering solution you're using?

(I'm asking, so the next time I'll have to make a choice between stacks I would be more aware about the finer details.)

Thanks!

bboreham · on Aug 30, 2016

I work on Weave Net http://github.com/weaveworks/weave.

InTheArena · on Aug 26, 2016

GLuster is a disaster.

drdaeman · on Aug 26, 2016

Haven't used it in non-toy environments, so won't argue with that.

My actual issue is, there's effectively no distributed storage support in Rancher/Cattle at this moment, be it GlusterFS or anything else (for all I know, MooseFS worked quite well for us on one project).

Just pointing it out, because for some reason I got quite a different impression from the website/docs.

Every point of advertisement statements like "Rancher provides a full set of infrastructure services for containers, including networking, storage services, host management, load balancing and more." is to be taken with a huge bag of salt.

(And that's by no means unique to Rancher.)

InTheArena · on Aug 26, 2016

I agree on the general comments on storage. The Docker volume system seems to to be the ideal place to do this.

cryptica · on Aug 26, 2016

I agree, Rancher just ties everything together really nicely and it works awesome with Kubernetes. I'm building a hosted Rancher cluster manager for my OSS project http://baasil.io/.

sschueller · on Aug 26, 2016

Rancher is very cool but I wish that rancher OS was easier to lockdown. If you run it outside of AWS or Exoscale you don't get a firewall so you have to do that on the hosts.

erikb · on Aug 26, 2016

Quality is something long term, but because the world becomes faster and faster people care less about it. Hype can generate a lot of money in the short term. And that is what is valued the most in the current economy. Being a quality guy myself I also feel that this is painful, but I think it's hard to blame anybody for that. I don't know anybody who goes for the short term success because they want to live in that kind of world. It's just about the only thing that gets rewarded.

wrong_variable · on Aug 26, 2016

# Market-Driven-Development

Docker promises too much and delivers too little. Story of every software project in the last 50 years.

agentgt · on Aug 26, 2016

I wish Docker had some more open source competitors particularly some non-profit competitors. Yes I know Docker is open source but I have this terribly gut feeling about becoming too reliant on their technology. I feel like I'm going to screwed some day.

I guess I just know some day Docker's investors are going to want their money back.

Yes sure there are other quasi opensource products that are super critical that I use but they all have alternatives (for example Java has plenty of alternatives).

Hopefully I don't get downvoted to oblivion for this comment. I am sure my trust issues are illogical and I would really like to remove the inhibition to use docker but articles like this do not help.

sjellis · on Aug 26, 2016

I don't think that you are being completely illogical. One thing about the Docker stack, though, is that it is supported by every major vendor, and you may well be getting your server installations through a vendor (Red Hat, AWS etc.), so in that case you are insulated from problems. As other commenters have discussed, the vendors test and patch their Docker distributions themselves already, as well as contributing to development.

We also already have a functional replacement with rkt. It can use Docker images, and Kubernetes can use rkt as a run-time in place of Docker, so Docker is not irreplaceable.

I think that the most valuable bits of Docker today are the developer tooling - easy Windows and Mac installers, Docker Compose, the online documentation, and the Hub for grabbing ready-made images. None of which, AFAIK, makes much money for Docker Inc.

graffitici · on Aug 26, 2016

It seems that using Kubernetes is definitely more mature and usable than Swarm. But how would you rate the other Docker projects, like docker-machine and docker-compose. Does Kubernetes also subsume those projects?

These seem to be way more mature than Swarm.

cyphar · on Aug 26, 2016

Kubernetes is definitely the technically correct solution. The only really hard part is getting started, but if you have a cloud service provider that runs a Kubernetes cluster for you then you don't need to worry about that. :P

fideloper · on Aug 26, 2016

I don't know if Kubernetes is more usable - it's a MONSTER to host yourself.

Other docker projects suffer from various levels of similar issues. Docker-machine is nice, but has a ton of rough edges, especially when spinning up host on AWS. It feels like the programmer(s) of machine never used AWS beyond the simplest use case.

iso-8859-1 · on Sept 6, 2016

Did you see https://github.com/kubernetes/minikube ?

ownagefool · on Aug 26, 2016

Kubernetes isn't particularly hard to host.

Bootstrap etcd

start kubelet

place api, proxy, scheduler and controller definitions manifests folder.

If you're running infrastructure that deals with problems such as maintaining the health of applications then it's going to generally be a whole lot more complex than that.

daxorid · on Aug 26, 2016

We've experimented with docker in a few places, and the deployment workflow is just painful:

  1. sudo docker build -t quay.io/foo/bar
  2. sudo docker push quay.io/foo/bar
  3. <login to production>
  4. sudo docker pull quay.io/foo/bar
  5. sudo docker kill foobar
  6. sudo docker rm foobar
  7. sudo docker run -p 80:80 -p 443:443 -e FOO=bar --name foobar --net=host -d quay.io/foo/bar

I can never understand how people talk about docker making deployments somehow easier.

justinsaccount · on Aug 26, 2016

How is that painful?

I run a service using docker and have your steps 4 through 7 as part of a systemd unit file. Updating the application requires a single systemd restart command.

sandGorgon · on Aug 26, 2016

could you talk about your deployment scripts ? i am trying to deploy a single flask app which uses redis. I'm not sure how to set up logging, etc. and whether redis and flask will go in the same VM.

justinsaccount · on Aug 26, 2016

Not much in the way of scripts, but the systemd file I use is something like this:

  [Unit]
  Description=App
  After=docker.service
  Requires=docker.service

  [Service]
  TimeoutStartSec=0
  ExecStartPre=/usr/bin/docker pull app
  ExecStartPre=-/usr/bin/docker kill app
  ExecStartPre=-/usr/bin/docker rm app
  ExecStart=/usr/bin/docker run --name app --rm=true -p 80:80 app

  [Install]
  WantedBy=multi-user.target

With something like dokku I could just push the git repo containing the Dockerfile and it would accomplish the same thing

detiber · on Aug 27, 2016

You probably also want to add: PartOf=Docker.service

This will ensure that a restart of the docker service will trigger this service to be restarted.

kinghajj · on Aug 26, 2016

ElasticBeanstalk streamlines things greatly, and when it works everything is pretty nice. When it fails, though... for instance, just a few days ago, one of the containers failed with an OOM error. For some reason--still unclear--the ECS and/or Docker daemons weren't able to start new containers to replace them, leaving the instance broken for hours. Auto-scaling groups will mitigate this, but it's still unnerving.

Still, I'm liking many aspects our tools. Using Docker with Rocker (https://github.com/grammarly/rocker/) has greatly sped-up CI builds by caching results when the source hasn't changed (especially important in multi-language shops; the Python guys don't want to wait on the Java code to build every time.) Just upload a tagged image to ECR, generate an "application version" referencing those images, and deploy via the Slack bot ("@bula deploy develop develop-XXX-e83fc3bd").

atombender · on Aug 26, 2016

If you're using Kubernetes, steps 3-7 and replaced with a single "kubectl" line, and you can even eliminate that by baking it into your Quay setup. (Why would you ever do step 1-2, though? Quay supports Github hooks.)

We've started using a self-hosted Drone [1] install (not to be confused with the hosted drone.io service, which is not good) to build containers. Unlike Quay, it doesn't launch build VMs, but rather uses Docker containers, so it's very fast. It also supports the notion of build containers, so you can do things like compile C code or run NPM without ending up with any compilers or build tools in any of your image layers; it completely removes the need for a custom "base image" shared among apps. It also lets us add the Kubernetes deploy as a final step after publishing.

[1] http://readme.drone.io/

benjaminwootton · on Aug 26, 2016

Bring back the 10,000 line bash scripts, Puppet, configuration drift, inconsistent environments. All is forgiven!

dexterbt1 · on Aug 26, 2016

For us: Jenkins do steps 1 and 2. Steps 3-7 is simply "push button" via ansible.

majewsky · on Aug 26, 2016

Replace 4 to 7 with a single "kubectl rolling-update".

gaius · on Aug 26, 2016

If your Unix user is in the docker group, no need to sudo.

epberry · on Aug 26, 2016

I share the pain of this post - except my run-in with 1.12 occurred with Docker for Windows. The shared volumes and host networking were totally nondeterministic. I don't think I had really ever experienced software which seemed to fail so randomly.

On the other hand I think Docker can probably be forgiven for my particular frustration. For one, the software was in beta. And two, working with Windows and all the different flavors must be a nightmare.

djs55 · on Aug 26, 2016

(I work at Docker)

If you haven't already and can spare the time, could you file a pair of issues: perhaps one for the shared volume problem and another for the networking on https://github.com/docker/for-win ?

We've been fixing bugs in both areas and the fixes should arrive in the beta channel over the next few updates. Thanks for your patience!

api · on Aug 26, 2016

Am I the only person around here who is skeptical of all this added complexity?

Some of it is clearly very useful, but some of it strikes me as a tower of babel built upon workarounds to problems that have simpler solutions.

What I see here are several different vendors vying to make sure they remain relevant. While I understand this and even the need for it, I also understand that it can drive poor engineering decisions in the long run.

kennysipe · on Aug 26, 2016

Disclaimer: I work for Mesosphere

This is the reason the latest Mesos and DCOS has the universal containerizer. Docker is great for development but currently doesn't make sense for production. The latest DCOS uses the docker images without the docker process and provides the high scale production quality needed for a large datacenter.

vacri · on Aug 26, 2016

> please take it slow and make Docker great again!

Perhaps they should create some sort of firewall? I mean, the network packets coming through... they bring malware... they're spam... and some, I assume, are good traffic.

nolite · on Aug 26, 2016

Guys.. This was a trump joke

m_mueller · on Aug 26, 2016

I agree very much. Trying to figure out how Docker hacks your host's iptables and how to deal with it in a production network is a pain.

andrewguenther · on Aug 26, 2016

We never let Docker set its own iptables rules. It is a pain at first, but it forces you to understand how the rules work, what is going on, and has the added bonus of keeping those rules consistent across Docker releases and in your own version control.

m_mueller · on Aug 26, 2016

so as I understand you add and remove the port mapping and other FW entries with your own scripts whenever you spin up / stop a container?

andrewguenther · on Aug 26, 2016

Port mapping isn't handled by iptables, only the wiring of the docker virtual interface. We take care of that piece, but let docker do whatever it wants within that interface.

wstrange · on Aug 26, 2016

Those packets should be definitely be deported back to where they came from.

And we should make them pay for the firewall.

MichaelGG · on Aug 26, 2016

The good traffic should be already permitted in the firewall. If you're still having issues the firewall rules are probably not being enforced. I've seen this happen like when an any any allow rule is left in from debugging ("just to get things working").

wastedhours · on Aug 26, 2016

But when your entire service is built on the notion of accepting all types of traffic, it's insulting to block it based on nothing more than opinion (I would point out for the avoidance of doubt, OP was making a Trump reference...)

moondev · on Aug 26, 2016

This should be renamed to "the sad state of swarm mode"

rusher81572 · on Aug 26, 2016

I am looking more and more about using the Apcera Platform. They have everything I need in a container management platform for free: https://www.apcera.com/community-edition

jacques_chester · on Aug 27, 2016

Do you work for Apcera?

Your submission page is heavy on the linux-toys.com domain, implying that it's yours. That same site identifies the author as working at Apcera.

There is also a submission in your account that points directly to an Apcera blogpost, with the same author name.

Disclosure: I work for Pivotal, we donate the majority of engineering to Cloud Foundry. Apcera has beef with us.

rusher81572 · on Aug 27, 2016

Yes, I work for Apcera and linux-toys.com is my personal blog where I talk about open source technologies (mainly Docker lately) and software that I develop. Like the article states, I am a Docker fanboy and want them to be great again. The blog website is actually running on a four node Raspberry pi cluster that I referenced in the blog as a link. The reason I joined Apcera is because they are also developing cool container technology. Until recently, Apcera was only available to enterprises and didn’t have a solution for “homeduction” users like me to host and run applications like my blog and a few other services. With the community edition of the software, I can gather a few spare x86 boxes and make the switch. I will still be running the same Docker images but using Apcera’s orchestration software instead of swarm.

jacques_chester · on Aug 27, 2016

Thanks. I'm glad to hear Apcera is branching out and that you're excited to work there.

finid · on Aug 26, 2016

Trying to run Docker on your own to do anything meaningful can be a very painful exercise.

I gave up after trying to spin up a cluster on DigitalOcean using v1.12. Like the author, I couldn't get my containers to see each other, something that worked before v1.12.

radarsat1 · on Aug 26, 2016

I personally just getting into Docker and am looking at the swarm feature for deploying containerized compute nodes to our cluster. People here seem to be complaining specifically about Swarm mode, is there anything I should watch out for? What does Kubernetes provide that Docker swarm doesn't?

I've tested Docker Swarm a bit and it seems to work as advertised.

Is it just about node selection not being sophisticated enough? In that case I don't need to worry since all my nodes are the same, but if there's any words of warning I'm all ears. Thanks.

maxamillion · on Aug 26, 2016

I've been using docker since version 0.6 and followed almost every version upgrade since and it's an absolute mess. This blog post is spot on.

My old team had a production environment running in docker containers for about a year (this was pre-swarm, pre-kubernetes) and then transitioned to just using ansible for application deployment in a more "traditional" manner because we spent more time trying to fix broken things with docker than it was worth.

sandGorgon · on Aug 26, 2016

Question: we are looking to deploy a flask + redis based application to a single server on AWS using docker. No load balancing, no multiple server

what is the current best practice to do this (with logging, etc.) ? should I even be considering something like marathon/k8s, etc ?

currently I have a fat docker VM with supervisord and all services running in a single VM with highly fragile logging. I dont think this can last.

getting started seems very intimidating in the Docker world.

josegonzalez · on Aug 28, 2016

Dokku should be able to handle all of your concerns. We do load-balancing on the server via nginx, use Docker's plumbing (the docker command/api) to build nice porcelain (our own cli tool) for stuff like restart policies etc. It's targeted specifically at single-server solutions, and migrating once you are large enough to another platform is easy as we tend to not build in platform lock-in.

Feel free to jump on our slack or irc: http://dokku.viewdocs.io/dokku/getting-started/where-to-get-...

Disclaimer: I am a maintainer of Dokku

kozikow · on Aug 27, 2016

I recently moved my django running on docker to google container engine. Basically by following this tutorial you can be up in 15 minutes: https://cloud.google.com/python/django/container-engine . k8s picks up your stdout logs and sends them to stackdriver and you don't need to do anything to set it up. I was running nginx+gunicorn inside my docker image, but nginx part have been taken care of by k8s.

dang · on Aug 26, 2016

I changed the baity, over-general title "The Sad State of Docker" to what the first paragraph of the article says it's actually about.

Generally we moderate HN stories/threads less when a YC startup or YC itself is at issue. But we do still moderate them some, because the standards of the site still apply.

(We haven't done anything here besides this title edit, though, in case anybody is wondering.)

nullcipher · on Aug 26, 2016

Negative comments here seem to be a gross overreaction. This is primarily a code release. Code is always going to keep moving. I have used swarm and it's nowhere close to unstable as people say it is. Is it production ready? Probably not. But which software is production ready in every big release?

Unnecessary fear mongering by people who have an outside agenda.

cpitman · on Aug 26, 2016

Which release is going to be production ready, and which features are production ready in each release? If Docker Inc documented which features are production ready, I'd have more sympathy for that point of view.

What we've had to do at Red Hat is always stay a couple releases behind (we're shipping 1.10+patches right now) and backport all the fixes from upstream releases to make it stable (ie production ready). Docker keeps shipping new versions, with fixes for old issues but then a whole new set of issues added in.

nullcipher · on Aug 26, 2016

OK downvoters, I get it. Everyone hates docker but uses it anyway :-)

dominotw · on Aug 26, 2016

is swarm mode same as docker swarm [1]? Curious, Why they choose the same name if that's not the case.

Why would one use docker swarm[1] vs just docker swarm mode.

1.https://github.com/docker/swarm

eppsilon · on Aug 26, 2016

It's not the same. Strangely the docker-swarm tutorial doesn't mention swarm mode. I wasn't able to follow it successfully with Docker 1.12. That was when I realized docker-swarm != swarm mode, so I tried the swarm mode tutorial instead, and it worked as expected.

Maybe docker-swarm is not supported anymore with 1.12? It was all pretty confusing.

dominotw · on Aug 26, 2016

>Maybe docker-swarm is not supported anymore with 1.12?

I am using docker-swarm with 1.12 so it's definitely supported. Yea I am really confused too :D.

johnhenry · on Aug 26, 2016

Originally, there was the stand alone version of docker swarm, invoked using "docker-swarm". This was more difficult to set up than version 1.12's new swarm mode, invoked using "docker swarm", which does almost the same thing, but more succinctly. Presumably, they chose the same name because they serve the same purpose and, if the latter is set to replace the former, why come up with a new name for basically the same concept?

dominotw · on Aug 26, 2016

>if the latter is set to replace the former

Do you know if there was any guidance by docker team on this? Seems like there is active development going on for both docker-swarm and swarm-mode.

justincormack · on Aug 26, 2016

Development is mainly in swarm mode, but yes the older swarm is still supported, it is just an application that uses docker so will always work.

dominotw · on Aug 26, 2016

I guess my main confusion is when would someone choose to use docker-swarm over swarm-mode.

alpb · on Aug 26, 2016

https://github.com/docker/swarm is going away.

https://github.com/docker/swarmkit is what gets integrated into Docker engine as of docker-1.12.0.

dominotw · on Aug 28, 2016

>https://github.com/docker/swarm is going away.

Their readme says

"Docker does not currently have a plan to deprecate Docker Swarm."

Do you know if they officially say otherwise elsewhere?

bullen · on Aug 26, 2016

If you need a distributed PaaS alternative for RPi and others, you can look at https://github.com/tinspin/rupy.

It's simple and stable!

Nano2rad · on Aug 26, 2016

One reason for release of Open source software is testing.

majewsky · on Aug 26, 2016

Yes, but that's why "alpha", "beta" and "RC" labels exist.

SparkyMcUnicorn · on Aug 26, 2016

rekt.. I mean, rkt

initdaemon · on Aug 26, 2016

IMHO, nobody should care / worry about the underlying orchestration. The industry is moving toward the public service model, and all these management problem is solved by the cloud platform, not app developers. Check out hyper.sh, that's what container service would be.

mixedCase · on Aug 26, 2016

Or I could manage my own network, which isn't that hard and for a lot of (if not most) cases, less expensive.

majewsky · on Aug 26, 2016

Every time I hear "The industry is moving toward X", I can't help but imagine a car racing off a cliff at full speed.

gbrayut · on Aug 26, 2016

So you are saying that a new major software release that is less than a month old still has some bugs? I am shocked. Shocked! Well, not that shocked.

If you can't take the bleeding part of bleeding edge, wait for things to mature before using them. Bitching and whining that they didn't create a perfect product out the gate only belittles the hard work it took to get a new product out the door.

This is how software releases work in the real world!