
Docker not ready for primetime - erlend_sh
http://blog.goodstuff.im/docker_not_prime_time
======
markbnj
I have run docker in production at past employers, and am getting ready to do
so again at my current employer. However I don't run it as a native install on
bare metal. I prefer to let a cloud provider such as Google deal with the
infrastructure issues, and use a more mature orchestration platform
(kubernetes). The author's complaints are valid, and the Docker team needs to
do a better job on these issues. Personally I am going to be taking a close
look at rkt and other technologies as they come along. Docker blew this
technology open by making it more approachable but there is no reason to think
they are going to own it. It's more like databases than operating systems.

~~~
lugg
I'm trying to get started with rkt right now but it's (understandable) lack of
maturity is a bit daunting, I think some usability issues need to be handled /
offloaded to some other tools. And acbuild severly needs caching built in.

Disclaimer, new to containers, if it sounds like I'm doing something wrong let
me know, it certainly feels like I'm missing something right now.

~~~
jzelinskie
You'll see a maturity in this space after the Open Containers Initiative
standardizes the container image format. Then, developers can focus on UX
improvements rather than worrying about what the build tool should even be
producing.

~~~
jmspring
Reliance on initiatives and standards bodies to provide a guidance for
maturity, at least in recent years, is a fools errand. They usually end up
rubber stamping what's been the norm and work back from there.

~~~
cesnja
That's true, OCF is more or less the same as the Docker container format.

------
siliconc0w
We use it in production.

It generally works if:

* you don't use it to store data

* don't use 'ambassador', 'buddy', or 'data' container patterns.

* use tooling available to quickly and easily nuke and rebuild docker hosts on a daily or more frequent basis.

* use tooling available to 'orchestrate' what gets run where - if you're manually running containers you're doing it wrong.

* wrap docker pulls with 'flock' so they don't deadlock

* don't use swarm - use mesos, kube, or fleet(simpler, smaller clusters)

~~~
bphogan
woah, wait.....

It "geneerally works if" you " rebuild docker hosts on a daily or more
frequent basis."

Perhaps I'm misunderstanding, but needing to rebuild my prod env several times
a day seems pretty "not ready for prime time" to me.

That's like when we'd say that Rails ran great in production in 2005, as long
as you had a cron task to bounce fastCGI processes every hour or so.

So, can you elaborate on why rebuilding the containers is good advice?

~~~
StreamBright
Depends, usually you have to be able to re-build your prod infra within
minutes or maximum hours, otherwise you are doing devops wrong. The whole
point of automation is reproducible infrastructure that you can stand up
quickly. With stateless approach you can just do this. Why would you do that?
Imagine an outage in one of the 3 datacenters you are running your infra in
the same region. You need to move 1/3 of the capacity to the remaining 2
datacenters. This is not too much different to re-building it.

~~~
falcolas
Aaah, the classing "you're doing <it> wrong" argument. I can come up with
dozens of different environments where it is simply not feasable to rebuild an
environment within two hours.

\- Any infrastructure with lots of data. Data just takes time to move; backups
take time to restore.

\- You're on bare metal because running node on VMs isn't fast enough.

\- You're in a secure environment, where the plain old bureaucracy will get in
the way of a full rebuild.

\- Anytime you have to change DNS. That's going to take days to get everything
failed over.

\- Clients (or vendors) whitelist IPs, and you have to work through with them
to fix the IPs.

\- Amazon gives you the dreaded "we don't have capacity to start your
requested instance; give us a few hours to spin up more capacity"

> Imagine an outage in one of the 3 datacenters you are running your infra in
> the same region. You need to move 1/3 of the capacity to the remaining 2
> datacenters.

Oh, this is very different. If your provider loses a datacenter, and your
existing infrastructure can't handle it, you're already SOL - the APIs for
spinning up instances and networking is going to be DDOSed to death by all of
the various users.

Basic HA dictates that you provision enough spare capacity that a DC (AZ) can
go down and you can still serve all of your customers.

~~~
StreamBright
I mostly disagree with your points, with the exception of the last one.

I used to work in the team that runs Amazon.com. All of the systems serving
the site can be re-built within hours and nothing can serve the site that
cannot be rebuilt within a very thing SLA. However, I understand that not all
the companies have this requirement. This feature is only relevant when a site
downtime hurting the company too much, so it could not be allowed.

Reflecting to your points:

\- Lots of data -> use S3 with de-normalized data, or something similar

\- Running a VM has 3% overhead in 2016, scalability is much more important
than a single node performance

\- High security environments are usually payment processing systems, downtime
there can be a bit more tolerated, delaying transactions is ok

\- Amazon uses DNS for everything, even for datacenter moves. It is usually
done within 5 minutes

\- This is a networking challenge, using something like EIP (where the public
facing IP can be attached to different nodes) makes this a non-issue

\- Amazon has an SLA, they extremely rarely have a full region outage, so you
can juggle capacity around

Losing a dc out of 3 does not require work because you can't handle the load,
it is required to have the same properties (same extra capacity for example)
just like before. Spinning up instances should not DDOS anything, it is with
constant load on the supporting infrastructure.

The last point I agree with.

~~~
falcolas
First, two important assumption I'm making when I say this (and I feel they
are reasonable assumptions). I'm not just talking about bringing a production
environment back up in the same or adjacent AZ; I'm talking about true DR,
where you're moving regions. I'm also not limiting my discussion to AWS'
infrastructure - not with Google, Rackspace, Cloudflare and others in the
space as well.

> Lots of data -> use S3 with de-normalized data, or something similar

S3's use case does not match up with many different computing models (hadoop
clusters, database tables, state overflowing memory), and moving data within
S3 between regions is painful. Also, not all cloud providers have S3.

> Running a VM has 3% overhead in 2016, scalability is much more important
> than a single node performance

Not when you have a requirement to respond to _all_ requests in under 50ms
(such as with an ad broker).

> High security environments are usually payment processing systems

Or HIPPA, or government.

> delaying transactions is ok

Not really. When I worked for Amazon, they were still valuing one second of
downtime at around $13k in lost sales. I can't imagine this has gone down.

> Amazon uses DNS for everything, even for datacenter moves. It is usually
> done within 5 minutes

Amazon also implements their own DNS servers, with some dynamic lookup logic;
they are an outlier. Fighting against TTL across the world is a real problem
for DR type scenarios.

> EIP (where the public facing IP can be attached to different nodes) makes
> this a non-issue

EIPs are not only AWS specific, but they can not traverse across regions, and
rely on AWS' api being up. This is not historically always the case.

> they extremely rarely have a full region outage, so you can juggle capacity
> around

Not always. Sometimes, you can. But not always. Some good examples from the
past - anytime EBS had issues in us-east-1, the AWS API would be unavailable.
When an AZ in us-east-1 went down, the API was overwhelmed and unresponsive
for hours afterwards.

> Spinning up instances should not DDOS anything, it is with constant load on
> the supporting infrastructure.

See above. There's nothing constant about the load when there is an AWS
outage; everyone is scrambling to use the APIs to get their sites backup.
There's even advice to not depend on ASGs for DR, for the very same reason.

AWS is constantly getting better about this, but they are not the only VPS
provider, nor are they themselves immune to outages and downtime which
requires DR plans.

------
dawnerd
I spent close to 12 hours yesterday trying to get a fairly simple node app to
run on my mac yesterday. Turned out I had to wipe out docker completely and
reinstall. Keep in mind this is their stable version thats no longer in beta.
I've just run into too many documented bugs for me to consider it stable. I
wouldn't even say it should be out of beta.

The issues here are the real telling story: I spent close to 12 hours
yesterday trying to get a fairly simple node app to run on my mac yesterday.
Turned out I had to wipe out docker completely and reinstall. Keep in mind
this is their stable version thats no longer in beta. I've just run into too
many documented bugs for me to consider it stable. I wouldn't even say it
should be out of beta.

The issues here are the real telling story. [https://github.com/docker/for-
mac/issues](https://github.com/docker/for-mac/issues)

I love docker, it's amazing when it works. It's just really not there yet. I
get that their focus is on making money right now, but they need to nail their
core product first. I honestly don't care about whatever cloud platform
they're building if their core app doesn't even work reliably.

~~~
serverholic
I'm a fan of Docker in general but I'm amazed by some of the poor choices
they've made with Docker for Mac.

\- Stable? That's honestly laughable. It is nowhere near stable. As an
example, I was trying to upload images to a third-party image registry but the
upload speed was ridiculously slow. It took me forever to figure out but it
turned out I needed to completely reinstall docker for mac.

\- They had a command-line tool called pinata for managing daemon settings in
docker for mac. They chose to get rid of it. Not only did we lose a way to
declaratively define and set configuration but the preferences window has no
where near all of the daemon settings that are available.

\- The CPU usage is still crazy. I regularly get 100% CPU usage on 3/4 of my
CPU cores while starting up just a few containers. Even after the containers
have started it will idle at 100% 1/4 cores.

\- It needs to be reinstalled regularly if you are using it on a daily basis.
Otherwise it will get slower and slower over time. See my first complaint.

\- The GUI (kitematic) will randomly disconnect from the daemon forcing me to
restart the GUI repeatedly.

\- They really need some sort of garbage collector with adjustable settings.
With the default settings the app will just keep building and building images
and eventually fill up, crash, slow down, etc. How is that acceptable? What
other apps do that?

Like I said, I like docker in general. I think they are tackling some very
hard problems and definitely experiencing some growing pains from such crazy
growth. However, at some point they need to take a step back and focus on the
core of what they offer and make it as simple, and rock solid as possible. As
another example, they still haven't added a way to compress, and/or flatten
docker images. No wonder docker for mac slows down after regular use when it's
building 1GB+ images for simple things.

~~~
dawnerd
Not 100% sure but I think there's a memory leak in hyperkit. Eventually the
memory usage will grow to fill up the allocated space then docker will crash.
It might be something else causing it, but that's just what I've observed.

There's also the Docker.qcow2 file ballooning in size. Only way is to do a
"factory reset" or running a couple commands to clear out old images.

~~~
seeekr
Not sure where the leak is, but I can confirm there is definitely one there.
For me it happens whenever I restart Docker for Mac: 500+ MB of usable RAM
gone. Combined with the fact that we don't reboot our Macs very often and that
Docker for Mac needs frequent restarts because of hogging CPU otherwise,
that's a bit of a problem.

------
jcoffland
One of Docker's biggest problems is that internally they have fomented a
culture of "users are stupid" which is immediately apparent if you interact
with their developers on GitHub.

~~~
shykes
Docker founder here.

It makes me sad that you believe that. We're not perfect but we try very hard
to keep our users happy.

Are there specific issues that you could point out, so that I can get a sense
of what you saw that you didn't like?

Keep in mind that anyone can participate in github issues, not just Docker
employees, and although we have a pretty strict code of conduct, being
dismissive about another participant's use case is not grounds for moderation.

EDIT: sorry if this came across as dismissive, that wasn't the intention. We
regularly get called out for comments made by non-employees, it's a common
problem on the github repo.

~~~
carapace
Look dude, this is hard. Most of the people here are rooting for you, even the
ones complaining. _I 'm on your side._

But this isn't about you, or your feels, or what you think is going on.

All that's happening is people are trying to communicate with you and you're
not listening. You're doing your best, I don't doubt that, but you've got to
step back, regroup, and come at the problem from another angle.

Don't get defensive, don't make excuses. "When you make a mistake, take of
your pants and roll around in it." ;-) Give people the benefit of the doubt
that they mostly know what they're complaining about (even if they don't.)

It's a pain-in-the-ass, but it's the only way to deal with this kind of
systemic (mis-?)perception in your community.

~~~
jsmthrowaway
I have said one form of this or another to Solomon regarding Docker the
company and his own personal defensiveness on two or three separate occasions
on this account alone[0]. So while I laud your effort, I don't think much is
going to change on this front. Honestly, that's too bad and I'm not being
snarky or shitty here; I genuinely wish Docker and Solomon would work on this
stuff because it'd go a long way. I'm discouraged to even bring stuff up any
more.

I don't even think it's intentional, really, but I have never once seen a
response by Solomon to criticism of Docker that did not dismiss the content
and messenger in some way. These things are hard to get right and I'm not
perfect, so I don't even really know what advice to offer and I'm far from
qualified. I am very frequently dismissive of criticism as well and have had
to put a lot of work into actively accepting it, so I can at least understand
how hard it is.

I outright told him his initial reaction to Rocket, to use an example,
directly caused me to plan for a future without Docker. Companies are defined
by their executives, and a lot of Docker's behaviors become clearer when you
consider some of the context around Solomon's personal style.

[0]: the most productive example being
[https://news.ycombinator.com/item?id=8789181](https://news.ycombinator.com/item?id=8789181)

~~~
carapace
I feel a little badly about the exchange now, having more of the context
available.

I told my sister last night, jokingly, "I think I made a co-founder of Docker
cry on HN today." (I also explained what Docker and Hacker News are a little.
She likes me so she didn't call me a nerd to my face.)

Honestly, I was just amused at the reply evidencing (or seeming to) the very
attitude the OP was complaining of, it was incredible.

------
kozikow
I am using kubernetes instead of docker swarm to orchestrate docker images and
all points mentioned in this article do not apply. My cluster is small - I
have <100 machines at peak, but so far it feels ready for prime time.

There are parts of docker that are relatively stable, have many other
companies involved, and have been around for a while. There are also "got VC
money, gotta monetise" parts that damage the reputation of stable parts.

~~~
chx
> My cluster is small - I have <100 machines at peak

Let me ask: WTF are people are doing that <100 machines is a "small cluster"?
I ran a Top 100 (as measured by Quantcast) website with 23 machines and that
included kit and caboodle -- Dev, Staging and Production environments. And
quite some of that were just for HA purposes not because we needed that
much... Stackexchange also runs about two dozen servers. Yes, yes, Google,
Facebook runs datacenters but there's a power law kind of distribution here
and the iron needs are falling very, very fast as you move from say Top 1
website to Top 30.

~~~
necubi
The number of machines you need to run a service is not really a linear
function of your traffic. If you have a mostly static website that can be
heavily cached/cdn'd, you can easily scale to thousands of requests a second
with a small server footprint. I expect that's true of many of the top 100
sites as measured by visitors (like Quantcast does).

But if you need to store a lot of data, or need to look up data with very low
latency, or do CPU-intensive work for every request, you will end up with a
lot more servers. (The other thing to consider is that SaSS companies can
easily deal with more traffic than even the largest web sites, because they
tend to aggregate traffic from many websites; Quantcast, for example, where I
used to work, got hundreds of thousands of requests per second to its
measurement endpoint.)

~~~
chx
Note: the site I mentioned did hit the database quite a few times for each
page. It was a nice challenge.

------
exratione
So far I'm feeling pretty good about the decision to skip the first generation
containerization infrastructure.

At the outset it had the look of something that wasn't an advance over
standard issue virtualization, in that it just shuffled the complexity around
a bit. It doesn't do enough to abstract away the ops complexity of setting up
environments.

I'm still of the mind, a few years later, that the time to move on from
whatever virtualization approach you're currently using for infrastructure and
development (cloud instances, virtualbox, etc), is when the second generation
of serverless/aws-lambda-like platforms arrive. The first generation is a nice
adjunct to virtual servers for wrapping small tools, but it is too limited and
clunky in its surrounding ops-essential infrastructure to build real, entire
applications easily.

So the real leap I see ahead is the move from cloud servers to a server-free
abstraction in which your codebase, from your perspective, is deployed to run
as a matter of functions and compute time and you see nothing of what is under
that layer, and need to do no meaningful ops management at all.

~~~
dmourati
This sounds like I wrote it.

------
okket
Two days ago there was a fairly long discussion about a similar argument ("The
sad state of Docker")

[https://news.ycombinator.com/item?id=12364123](https://news.ycombinator.com/item?id=12364123)
(217 comments)

------
urvader
The title should say: Docker Swarm is not ready for primetime. We have used
Docker in production for more than two years and there has been very few
issues overall.

------
morgante
This article should really be about Docker Swarm not being ready for
production. It's much newer technology than Docker and is predictably brittle.

The only points made against Docker proper are rather laughable. You shouldn't
be remotely administering Docker clusters from the CLI (use a proper cluster
tool like Kubernetes), and copying entire credentials files from machine to
machine is extremely unlikely/esoteric.

Docker, with Kubernetes or ECS, is totally suitable for production at this
point. Lots and lots of companies are successfully running production
workloads using it.

~~~
atemerev
> You shouldn't be remotely administering Docker clusters from the CLI (use a
> proper cluster tool like Kubernetes)

Docker Swarm is advertised as stable, production ready cluster management
solution. Then, if you actually try to use it, it is very NOT. Kubernetes is
great, but it feels like adding another layer to the system, and it is not
always a good thing (especially if you are on AWS and have to work with
_their_ infrastructure management too).

------
forktheweb
I would say that my experience with Docker has been fantastic. I run over 10
Ubuntu Trusty instances on EC2 as 8G instances, mounted with NFS4 to EFS. This
makes it super simple to manage data across multiple hosts. From that you can
run as many containers as you like, and either mount them to the EFS folder,
or just spawn them with data-containers, then export backups regularly with
something like duplicity.

I use rancher with it, and it's retarded simple using rancher/docker compose.

For a quick run-down see: [https://github.com/forktheweb/amazon-docker-
devops](https://github.com/forktheweb/amazon-docker-devops) More advanced run-
down of where I'm going with my setup:
[https://labs.stackfork.com:2003/dockistry-devexp/exp-
stacker...](https://labs.stackfork.com:2003/dockistry-devexp/exp-stackerize-
app)

~~~
forktheweb
Just want to add that I also use it on windows without barely any issues (win
10 x64). I'm not sure how stable it is on Mac OSX but Kitematic is pretty
sweet.

The only problems I've had with Docker container are those where processes get
stuck inside the container and the --restart=always flag is set. When this
happens it means that if you can't force the container to stop, when you
reboot the defunct container will restart anyway and cause you the same
issue...

My solution to this has been to just create a clean AMI image with
ubuntu/rancher/docker and then nuke the old host when it gives me problems.
This is made even easier if you use EFS because it's literally already 100%
setup once you launch a replacement instance.

Also, you can do automatic memory-limiting and cpu-limiting your nodes using
rancher-compose and health-checks that re-route traffic with L7 & HAProxy:
[http://docs.rancher.com/rancher/v1.1/zh/cattle/health-
checks...](http://docs.rancher.com/rancher/v1.1/zh/cattle/health-checks/)

The only thing even comparable to that in my mind would be Consul health
checks with auto-discovery:
[https://www.consul.io/docs/guides/index.html](https://www.consul.io/docs/guides/index.html)

------
perturbation
Most of the complaints I've seen recently about using Docker are about the
immaturity of Docker Swarm. Can this be mitigated by using Docker with
Kubernetes / Mesos / Yarn?

If it's truly a problem with the containerization format / stability with the
core product, I'm not sure what a good alternative would be. I see a lot of
praise for rkt but the ecosystem and tooling around it are so much smaller
than that for Docker.

~~~
lobster_johnson
Using it with Kubernetes definitely helps. One reason is that if you have
enough surplus nodes, then Docker misbehaving on one of them shouldn't screw
up anything; Kubernetes is really good at shuffling things around, and you can
"cordon off" problem nodes to prevent them from being used for scheduling.

------
joshka
> Each version of the CLI is incompatible with the last version of the CLI.

Run the previous version of the cli in a container on your local machine.
[https://hub.docker.com/_/docker/](https://hub.docker.com/_/docker/)

    
    
      $ docker run -it --rm  docker:1.9 version

~~~
amazingman
Or, you know, just set `DOCKER_API_VERSION` to the version of the engine
you're interacting with.

~~~
mappu
If it has the capability to communicate cross-version, it should just do that,
instead of displaying an error message that /doesn't/ point you to
`DOCKER_API_VERSION`.

~~~
amazingman
Not doing so is a perfectly defensible decision. Not informing the user of the
environment variable, however, is indeed a fairly awful mistake.

------
ldehaan
I have several past clients running docker in production just fine.

At my current job we run nearly all our services in docker.

I've replied to this type of comment on here at least a dozen times, it has
nothing to do with docker, it is a lack of understanding how it all works.

Understand the history, understand the underlying concepts and this is no more
complex than an advanced chroot.

Now on the tooling side, I personally stay away from any plug-ins and tools
created by the docker team, they do docker best, let other tools manage
dockers externalities.

I've used weave since it came out, and it's perfect for network management and
service discovery.

I prefer to use mesos to manage container deploys.

There is an entirely usable workflow with docker but I like to let the
specialists specialize, and so I just use docker (.10.1 even), because all the
extra stuff is just making it bloated.

I'm testing newer versions on a case by case basis, but nothing new has come
out that makes me want to upgrade yet.

And I'll probably keep using docker as long as it stays separate from all the
cruft being added to the ecosystem.

------
_jezell_
The Docker team is doing more than anyone to move container technology
forward, but orchestration is a much harder problem to solve than wrapping OS
APIs. I wish they would stick to the core, and let others like the Kubernetes
team handle the orchestration pieces. Swarm is hard to take seriously right
now. I'm not sure bundling it into the core was the best way to handle it.

~~~
coding123
I disagree. Kube, Mesos, etc.. are all great I'm sure. I had Kub documentation
open in the background for about a week or two while doing other things,
trying to get a handle on installation, deployment, how it will fit my use-
cases... I kept wanting to dip in, but kept getting pounded by tons of heavy
dependencies I would have had to learn. After Swarm Mode was released I had a
cluster deploying things locally within 2 hours of reading the documentation.
From there was able to map all of our use-cases on top (rolling updates, load
balancing, etc...). I totally get that it's "hard to take seriously"
perspective with some of the early glitching, but Kube, Mesos, etc.. needed
this egg on their faces to realize how easy the installation and getting
things up and running should be. If nothing else, that alone will make those
other products better.

------
SilverSurfer972
I think we should stop falling into the marketing greed of Docker to manage
container at large scale. They want to get a grip of the corporate market and
catch up with kubernetes at the cost of their containerization quality.
Unfortunately with their current strategy they are loosing credibility as a
containerization tool AND as the container orchestration tool they try to
become. Using it solely as a containerization tool with k8s/ecs for the heavy
lifting is the relevant way to go as for today IMO.

------
acd
Here are alternatives to Docker

One can use Ubuntu LXD which is Linux containers built on top of LXC but with
ZFS as storage backend. LXD can also run Docker containers.
[http://www.ubuntu.com/cloud/lxd](http://www.ubuntu.com/cloud/lxd)

One can also use Linux containers via Kubernetes by Google.
[http://kubernetes.io/](http://kubernetes.io/)

------
bjt
I think the underlying issue here is that no two people agree on what "Docker"
is. Is it the CLI? Is it Docker Machine? Is it Docker Swarm?

The container part of Docker works well. And they've ridden that hype wave to
try to run a lot of other pieces of your infrastructure by writing another app
and calling it Docker Something. Now everybody means a different subset when
they say "Docker".

------
forktheweb
I would say that my experience with Docker has been fantastic. I run over 10
Ubuntu Trusty instances on EC2 as 8G instances, mounted with NFS4 to EFS. This
makes it super simple to manage data across multiple hosts. From that you can
run as many containers as you like, and either mount them to the EFS folder,
or just spawn them with data-containers, then export backups regularly with
something like duplicity.

I use rancher with it, and it's retarded simple using rancher/docker compose.

For a quick run-down see: [https://github.com/forktheweb/amazon-docker-
devops](https://github.com/forktheweb/amazon-docker-devops)

------
dcosson
> Each version of the CLI is incompatible with the last version of the CLI.

I'm pretty sure that as long as the CLI version is >= the server version, you
can set the DOCKER_VERSION env var to the server version and everything works.

I haven't used this extensively, so maybe there are edge cases or some minimal
supported version of backwards compatibility?

------
kev009
I wonder how much suffering would be alleviated in most mid-level IT
organizations if they just used Joyent/SmartDataCenter, and I say this as a
FreeBSD developer with no affiliation.

~~~
Annatar
A lot.

zones in SmartOS provide full-blown UNIX servers running at the speed of the
bare metal, but in complete isolation (no need for hacks like "runC",
containers, or any other such nonsense).

Packaging one's software into OS packages provides for repeatability: after
the packages are installed in a zone, a ZFS image with a little bit of
metadata can be created, and imported into a local or remote image repository
with imgadm(1M).

[https://wiki.smartos.org/display/DOC/Managing+Images](https://wiki.smartos.org/display/DOC/Managing+Images)

That's it. It really is that simple.

------
rjurney
Docker Swarm is definitely not production ready. Try to run any service that
requires communication among nodes and you will agree. It works fine for web
servers, but that is about it.

DC/OS is emerging as the go-to way to deploy docker containers at scale in
complex service combinations. It 'just works' with one simple config per
service.

------
20yrs_no_equity
Network partitions really do happen! They are often short, but if you can't
recover from them, then you shouldn't call yourself a distributed system.

I am shocked at how fragile etcd is in this way. I was hoping docker swarm was
better, but I'm not surprised (alas) to find out that it has the same problem.

I'm about ready to build my own solution, because I know a way to do it that
will be really robust in the face of partitions (and it doesn't use RAFT, you
probably should not be using RAFT, I've seen lots of complaints about
zookeeper too. I've done this before in other contexts so I know how to make
it work, but so have others so why are people who don't know how to make it
work reinventing the wheel all the time?)

~~~
helloiamaperson
I'd love to hear more about your solution. Are you saying that you've created
an algorithm distinct from paxos/raft/zab that's more robust?

------
callumjones
If you're truly running Docker in production you're probably not making use of
both of these issues taken with Docker. No one would dare interact with a
production cluster via the basic Docker CLI, instead you should be interacting
with the orchestration technology like ECS, Mesos or Kubernetes. We are
running ECS and we only interact with the Docker CLI to query specific
containers or shut down specific containers that ECS has weirdly forgotten
about.

It definitely sounds like Swarm is not ready but I wouldn't say this is
representative of running Docker in production: instead you should be running
one of the many battle tested cluster tools like ECS, Mesos or Kubernetes (or
GCE).

~~~
drchiu
Agreed. Wouldn't use Docker CLI for production purposes.

We use Cloud66 (disclaimer - not associated with them) to help with the
deployment issues if any arise.

Also we don't store DB in containers.

------
coding123
Just read through most of the thread, seems like a very large disconnect
between people that are happy with Docker and those that are not. Personally,
I've been extremely happy with it. We have one product in production using
pre-1.12 swarm (will be upgrading in the next couple months) and most of our
dev -> uat environments are now fully docker. It's been stable. On my personal
projects I used Docker 1.12 and yes, after a few days things kerploded, but
after upgrading to 1.12.1 things have been incredibly stable. For Nodejs apps
I have been able to use Docker service replicas instead of Nodejs clustering,
and been very happy with the results.

------
sandGorgon
it seems that none of the container frameworks are generally ready.

Take for example k8s - I just started exploring it as something we could move
to.
[https://github.com/kubernetes/kubernetes/issues/24677](https://github.com/kubernetes/kubernetes/issues/24677)
\- logging of application logs is an unsolved problem.

And most of the proposals talk about creating yet another logger...rather than
patching journald or whatever exists out there.

For those of you who are running k8s in production - how are you doing logging
? does everyone roll their own ?

~~~
lobster_johnson
Logging is definitely not an unsolved problem with K8s. It's trivial to set up
Fluentd to snarf all container logs into some destination, such as Graylog,
ElasticSearch or a simple centralized file system location.

The Github issue talks a lot about "out of the box" setup via kube-up. If
you're not using kube-up (and I wouldn't recommend using it for production
setups), the problem is rather simple.

The logging story could be better, but it's not "unsolved".

~~~
sandGorgon
Thanks for that clarification. We are trying to get our feet wet by building a
small 2 node setup.

We don't want to stream our logs or setup fluent, etc. I just want to make
sure my logs are captured and periodically rotated. Now, the newer docker
allows me to use journald as logger (which means all logs are sent to journald
on the HOST machine)... But I can't seem to figure out how to do this in k8s.

Also as an aside, for production deployment on aws.. What would you suggest. I
was thinking that kube-up is what i should use.

~~~
lobster_johnson
Yes, journald would definitely be superior, conceptually, to just writing to
plain files. I can't help you there, though, as I've never tried setting it
up.

(As an aside, I don't understand why more Unix software doesn't consistently
use syslog() — we already have a good, standard, abstract logging interface
which can be implemented however you want on the host. The syslog protocol is
awful, but the syslog() call should be reliable.)

As for kube-up: It's a nice way to bootstrap, but it's an opaque setup that
doesn't give you full control over your boxes, or of upgrades. I believe it's
idempotent, but I wouldn't trust it to my production environment. Personally,
I set up K8s from scratch on AWS via Salt + Debian/Ubuntu packages from
Kismatic, and I recommend this approach. I have a Github repo that I'm
planning to make semi-generic and public. Email me if you're interested.

------
sergiotapia
We've tried to use Docker a couple of times, but it was always much more
trouble than it was worth. It was always some edge case that caused it to not
work as expected one developer machine or another.

After about 2 years of giving it another shot on and off, we just gave up. And
it's not like we were doing something crazy, just a typical documented "run
this rails app" type thing. I would definitely not use this in production for
anything based on my experience.

~~~
sjellis
Were you using Linux developer machines? Docker, Inc. provide ready-made
installers for Windows and Mac - the Docker ToolBox makes setting up on those
OSes very easy these days. I haven't tried the new products that replace
ToolBox yet.

I can imagine it being harder on some Linux distributions because your host
system will have to have all of the kernel bits etc. Ubuntu 14.04 and above or
current versions of Fedora should be good - the distributions explicitly
support Docker, and Docker, Inc. provide packages so that you can run either
the distribution packages or the Docker, Inc. ones.

~~~
sergiotapia
We are all on Macs.

------
StreamBright
I have the same experience. I am trying to set a new node where I accidentally
installed 1.10 and CLI does not work. Looking for some thing on the internet
how do X with Docker, only articles available for previous versions. I mean
seriously, command line development is not supposed to be this hard, get a set
of switches and stick to it unless something major forces you to change. If
you shuffle around CLI switches between minor releases nobody is going to be
happy.

------
mkagenius
Would love to hear thought from people who use it in production?

~~~
toomuchtodo
Unless you're using it to compartmentalize CI jobs on Travis or Jenkins, don't
use it in production (do use it in development for standing up local dev
environments quickly). We do, and it's a wasted layer of abstraction.

I predict the Docker fad will end in 2-3 years, fingers crossed.

~~~
eropple
_> do use it in development for standing up local dev environments quickly_

You can do this, for sure, but, tbh, I'd still rather use Vagrant. I'm already
provisioning things like my datastores using Chef (and not deploying those
inside of Docker in production); I might as well repurpose the same code to
build a dev environment.

I do agree regarding using containers (I use rkt rather than Docker for a few
reasons) for CI being a really good jumping-off point, though.

~~~
bdcravens
Vagrant is pretty heavy, but if you're already using Chef, makes sense. If
you're deploying using Docker, then mirroring your environment in dev makes
more sense.

I've used both Vagrant and Docker for dev, and find Docker to be super fast
and light. (though I've definitely run into some headaches with Docker Compose
that I didn't see in using just Docker)

~~~
eropple
I think it's probably not worth worrying about "fast" when "slow" is "ten
minutes once a week, maybe, if you're really destructive" and similarly
worrying about "light" when you have 16GB of RAM and a quad-core processor;
the heaviest dev environment/IDE I can find (IntelliJ and a couple hundred
Chrome tabs) doesn't use eight of it. It comes off as super-premature
optimization to me, and those optimizations that you would otherwise do in
Docker (which are just building base images, whatever, Packer does this
trivially) exist in a virtualized environment too.

Running datastores in Docker is _bonkers_ in the first place, which still
leaves a significant place for an actual CM tool--so unless you are a
particular strain of masochist, when you _are_ replicating your local
environment, you'll probably need it anyway.

~~~
bdcravens
> ten minutes once a week, maybe, if you're really destructive

If you do a lot of active machine development, where you're iterating on that
machine setup, it's an extremely slow process. Perhaps it would make sense to
iterate in Docker, and then convert that process to Vagrant once you finish.

> similarly worrying about "light" when you have 16GB of RAM and a quad-core
> processor

It's a concern when you're launching many machines at once to develop against.
Docker tends to only eat what it needs. Also, while I think everyone should
have a 16GB machine, there's quite a few developers out there running 8 or 4GB
machines.

------
bogomipz
Docker not being ready for primetime and Swarm not being ready for primetime
are two different things no? As for the cli compatibility issues, don't most
people use an orchestration tool like ansible/chef/puppet etc to manage their
fleet? I'm not sure I think the title of the post is accurate.

~~~
Annatar
_don 't most people use an orchestration tool like ansible/chef/puppet etc to
manage their fleet?_

I don't. All my configuration management is done with OS packages. And once
you start making OS packages to do mass-scale configuration, the whole premise
of Docker becomes pointless. If I have OS packages, and I can provision them
using KickStart, or JumpStart, or imgadm(1M) + vmadm(1M), without using or
needing any code, then what exactly do I need Docker for?

------
cerisier
We have docker in production for about a year for our ENTIRE INFRASTRUCTURE,
handling about 500 millions requests per month on 20 services including JVMS,
Distributed Systems and so far it helped us spare so much time and money, I
wouldn't consider going back without for a minute...

~~~
bluecmd
How is that relevant? The author didn't complain about the performance - the
complaint was about API stability. Which I agree with. Docker is really
terrible at supporting their own APIs.

------
ktamiola
Fair enough! There is also performance penalty and annoyingly complicated
setups in certain cloud environments.

------
zwischenzug
Can anyone using Swarm seriously in production post any account of their
experiences here? Thanks.

~~~
nostrebored
I doubt you'll get any replies. We tried using swarm in production, but could
not ever get a reliable, working component. Failures of the worst kind would
happen - networking would work 1/10 times meaning that a build could've
conceivably gotten through CI.

------
jasoncchild
"Just imagine if someone emailed you a PDF or an Excel file and you didn't
have the exact verion of the PDF reader or Excel that would open that file.
Your head would blow clean off."

Obviously this person has never spent a good deal of time dealing with
AutoCAD...

~~~
saidajigumi
Or more to the point early versions of the MS Office suite, where major
updates _could and did_ create backwards-incompatible saves. Users had to be
careful that files intended for broader distribution were saved in a
sufficiently old/compatible format version.

Which is to say: this is a sign that the involved technologies are still
rapidly maturing.

------
mrmondo
We've been running in production across thousands of containers for long over
a year now and it's been fantastic, not only a life saver of a deployment
method but it's allowed for reliable, repeatable application builds.

------
ilaksh
Creating and maintaining a service cluster is hard. I dont think you should
just take it back to the store if your magic wand has a hiccup.

------
stevesun21
As I worked for my ex-employer, we use docker based Elatic beanstalk serve
over millions requests across three services per seconds.

------
opHASnoName
You can set the CLI via environment variables to use newer clients with older
machines:

export DOCKER_API_VERSION=1.23

You can export and import machines with this handy node js tool:
[https://www.npmjs.com/package/machine-
share](https://www.npmjs.com/package/machine-share)

------
hellofunk
I agree in general, and find Docker one of those technologies that does not
(yet) live up to the hype.

------
damm
Docker's far from ready from primetime. I'm sure everyone likes taking down
their site to upgrade Docker to the latest release.

Their mindset is very tool driven; if there's a problem let me just write a
new tool to do that.

Ease of use or KISS isn't a part of their philosophy

------
brint
For the versions issue, check out dvm:
[https://github.com/getcarina/dvm](https://github.com/getcarina/dvm)

While versions are an issue, it's at least a reasonable way to work around it.

------
ycombinatorMan
Aye, theres a lot of important bugs that are just sitting there

------
jokoon
I watched again an explanation of what docker really is, it just seems to be
this awesome thing that solve the very hard problem of inter-compatiblity
between systems. I always tend to question how and why a developer had to use
docker instead of making choice that would avoid it.

It's not surprising docker can't always work, but it's nice to see that
programmers are winning. I guess future OS designers and developers will try
to encourage for more inter compatibility if possible. That's really a big
nerve.

------
skrowl
Is lxc any better? Are any of the issues in OP solved in lxc vs Docker?

------
asitdhal
Do they speak English or the caller is expected to know French ?

~~~
steveklabnik
Did you mean to comment on
[https://news.ycombinator.com/item?id=12376886](https://news.ycombinator.com/item?id=12376886)
?

~~~
asitdhal
yes, thanks

------
jacques_chester
Make no mistake: Docker Inc has a lot of excellent engineers.

But there's also a landrush going on. Everyone has worked out that owning
building blocks isn't where the money is. The money is in the platform.
Businesses don't want to pay people to assemble a snowflake. They want a
turnkey.

CoreOS, Docker and Red Hat are in the mix. So too my employers, Pivotal, who
are (disclosure) the majority donors of engineering for Cloud Foundry. IBM is
also betting on Cloud Foundry with BlueMix, GE with Predix, HPE with Helion
and SAP with HANA Cloud Platform.

You're probably sick of me turning up in threads like these, resembling one of
the beardier desert prophets, constantly muttering "Cloud Foundry, Cloud
Foundry".

It's because we've _already built the platform_. I feel like Mugatu pointing
this out over and over. We're there! No need to wait!

A distributed, in-place upgradeable, 12-factor oriented, containerising, log-
draining, routing platform. The intellectual lovechild of Google and Heroku.
Basically, it's like someone installed all the cool things (Docker,
Kubernetes, some sort of distributed log system, a router, a service injection
framework) for you. Done. Dusted. You can just focus on the apps and services
you're building, because that's usually what end users actually care about.

And we know it works really well. We know of stable 10k app instance scale
production instances that are running _right now_. That's _real app instances_
, by the way: fully loaded, fully staged, fully logging, fully routed, fully
service injected, fully distributed across execution cells. Real, critical-
path business workloads. Front-page-of-the-WSJ-if-it-dies workloads.

Our next stretch goal is to benchmark 250k real app instances. If you need
more than 250,000 copies of your apps and services running, then you probably
have more engineers than we do. Though I guess you could probably stretch to
running _two_ copies of Cloud Foundry, if you really _had_ to.

OK, a big downside: BOSH is the deployment and upgrade system. It's not very
approachable, in the way that an Abrams main battle tank is less approachable
than a Honda Civic (and for a similar reason). We're working on that.

The other downside: it's not sexy front-page-of-HN tech. We didn't use Docker,
it didn't exist. Or Kubernetes, it didn't exist. We didn't use Terraform or
CloudFormation ... they didn't exist.

Docker will get all this stuff right. I mean that sincerely. They've got too
many smart engineers to fail to hammer it down. More to the point, Docker have
an unassailable brand position. Not to be discounted. Microsoft regularly
pulled did this kind of thing for decades and made mad, crazy cash money the
whole way along.

~~~
sandGorgon
@jacques_chester - after your last comment, I did check CF out. It is orders
of magnitude harder than k8s for example.

To deploy CF on AWS - i need 20 instances. k8s can happen with a master and
slave. I know that you probably do much much more.... but it will be nice to
gradually scale out the components.

Second .. well BOSH. I see that there are 13K commits to the BOSH repo, so its
probably set in stone anyway... but it would be nice to have something more
standard (like Chef, Salt or Ansible. Ansible would be the closest thing to
BOSH). These are extremely flexible tools... but far more accessible.

Third - the big one : Docker. I know that you have the Garden repository to
run Docker.... but I dont think its production ready.

Cloud Foundry is an incredible piece of technology. But the problem is not
that its unsexy... its inaccessible.

~~~
jacques_chester
Thanks for your frankness.

> _I know that you probably do much much more.... but it will be nice to
> gradually scale out the components._

Agreed. There are intermediate options.

To try a full Cloud Foundry that fits into a laptop, you can install
PCFDev[0].

As an intermediate position, you can install BOSH-lite in the cloud. You lose
a major advantage of Cloud Foundry, though, which is that it's a distributed
system which BOSH can keep alive.

Similarly, the default installation for Cloud Foundry uses 20 VMs. You can, if
you wish, consolidate these to fewer. It's a matter of copying and pasting
some YAML. But not recommended in the interests of robustness.

> _but it would be nice to have something more standard (like Chef, Salt or
> Ansible. Ansible would be the closest thing to BOSH). These are extremely
> flexible tools... but far more accessible._

I have not used Ansible or Salt. I did spend time with cfengine, Puppet and
Chef a few years back.

BOSH's concept of operations is a complete inversion of those tools. cfengine,
Puppet or Chef take a system _as it is_ and transform it, through declared
goals, into the system you want it to be.

This actually means they have to do with a great deal of incidental
complexity. Different distros, different versions of distros, different
packaging tools and so on and so forth.

The BOSH paradigm is: wait, why are you trying to automate the construction of
pets? You want cattle. So it starts with a fixed OS base image and installs
packages in a fixed, non-negotiable way. At the low level, very inflexible.
But it allows BOSH to focus on the business of installing, upgrading,
repairing and reconfiguring distributed systems _as single systems_. The
single distributed system is where BOSH _starts_ , not where it was _extended
to_. It's an important distinction.

> _Docker. I know that you have the Garden repository to run Docker.... but I
> dont think its production ready._

Strictly, Garden can run Docker _images_. It's an API with driver backends.
The original Garden-Linux backend is being replaced by one which uses runC
instead. No point replicating engineering effort if we can share it with
Docker.

 _But the problem is not that its unsexy... its inaccessible._

You're right. We're in the classic position of other heavy, robust tech. Hard
to tinker with. I've been jawboning anyone inside Pivotal about this for
_years_. We're making headway -- PCFDev is a start, bosh-bootloader is under
active development and BOSH itself is going to get major love.

But compared to docker swarm or k8s, it's harder to set up an experimental
instance.

[0] [https://docs.pivotal.io/pcf-dev/](https://docs.pivotal.io/pcf-dev/)

~~~
sagichmal
You consistently respond to this style of CF critique by enumerating
alternatives and workarounds to the individual points. But you're missing the
forest for the trees. CF is by its nature too complex! No amount of
justification of the complexity will fix that; no number of afterthought
scripts will make up for it. The future of this space is scaling 1-dev, 1-node
systems up to production, not in asking devs to wrap their minds and laptops
around intrinsically huge platforms like CF or OpenStack.

I'm not saying CF is dead, because I'm sure you've got lots of paying
customers, and lots of interest in larger enterprises. But I think it's pretty
obvious the way the winds are blowing, and I think the smart money is on
Kubernetes.

~~~
jacques_chester
By design, developers are not meant to have to care about how Cloud Foundry
works. It's meant to be a magic carpet. You send your code and pow, it works.

You mention afterthought scripts -- how many blog posts do you read about
people who've rolled their own 1/3rd of a platform on top of Docker + various
other components? It's fun, it's easy to start, and pretty quickly you're
married to a snowflake for which you carry all the engineering load.

It's not that I don't see the forest for the trees -- I do, internally I am
constantly saying that developer mindshare predicts the 5-10 year performance
of a technology and we _suck_ at it.

But, at some level, we're talking about _different forests_.

~~~
sagichmal
> By design, developers are not meant to have to care about how Cloud Foundry
> works.

Yes, that's exactly the problem. At small scale -- which is where the vast
majority of the market is, to be clear -- there is no difference between "the
person who is responsible for the platform" and "the person who deploys code
onto the platform". Platforms need to optimize for making both of these tasks
easy. Docker understood this from day 1 and are backfilling technical
competence. Kubernetes figured this out a bit late but are now rushing to make
the adminstrata easier. CF and OS seem stuck in maintaining the separation,
which keeps them out of the running.

> how many blog posts do you read about people who've rolled their own 1/3rd
> of a platform on top of Docker + various other components? It's fun, it's
> easy to start, and pretty quickly you're married to a snowflake for which
> you carry all the engineering load.

And doing so is _still_ _easier_ than bearing the cognitive burden of learning
CF!

~~~
jacques_chester
I agree that being swallowed up from the bottom is our main risk. Which is
why, above, I said I talk about this a lot internally.

> _And doing so is still easier than bearing the cognitive burden of learning
> CF!_

Right, the incremental cost seems small. A bit here, a bit there. And
actually, it's a lot of fun to roll your own. You know it, you can push it in
any direction and so on.

The flipside, and this is where Pivotal and other CF distributors are making
money, is that it turns into a nightmare pretty quickly. A lot of big
companies are perfectly happy to pay someone else to build their platform.

Before I transferred to Cloud R&D, I was in Pivotal Labs. I got to see several
hand-rolled platforms. Some of them were terrible. Some were ingenious. And
_all_ of them were millstones. That experience is pretty much what turned me
into the annoying pest I am now.

We're at an interesting moment. Right now, writing your own platform seems
reasonable -- even attractive -- to lots of people. In maybe 5-10 years, it
won't be a thing that many people do any more. These days, for most projects,
most engineers don't consider first writing an OS, or a database, or a web
server, or a programming language, or an ORM, or a graphics library and on and
on. Those are all well-served. Once upon a time it was a competitive advantage
to roll your own; these days it just doesn't enter anyone's thinking.

There will be between 1 and 3 platforms that everyone has settled on, with a
handful of outliers that people tinker with. My expectation is that Kubernetes
will slowly morph from an orchestrator and be progressively extended into a
full platform. Docker are clearly doing likewise. I think Cloud Foundry will
be one of the three, simply because it's the first and most mature full
platform already available for big companies to deploy.

Doesn't mean we can't have some of our lunch eaten.

------
jbverschoor
Docker is the new mongodb. Let's just use freebsd jails + postgresql.

~~~
abz10
+1 I got sick of Docker and switched to FreeBSD. Far better experience.

~~~
X86BSD
Jails and Tredly.com or SmartOS.

Anything else is busted by design. Enjoy your poop sandwich.

------
MrFurious
Docker, container for hipsters that doesn't know deploy a simple linux virtual
server.

------
anotherdpk
> Each version of the CLI is incompatible with the last version of the CLI.

Sure, but I don't think this is a show stopper. You can and should only
carefully upgrade between versions of Docker (and other mission-critical
software). The process is functionally identical to the process you'd use to
perform a zero-downtime migration between versions of the Linux kernel --
bring up a new server with the version of Docker you want to use, start your
services on that new server, stop them on the old server, shut down the old
server, done.

I don't mean to suggest this is trivial, only to suggest that it is no more
complicated than tasks that you/we are already expected to perform.

~~~
coldtea
> _Sure, but I don 't think this is a show stopper. You can and should only
> carefully upgrade between versions of Docker (and other mission-critical
> software)._

Whether I upgrade carefully or not, the Docker CLI should work with all
versions of Docker.

~~~
anotherdpk
I don't get why this is a sticking point. There's no reason to update the
Docker CLI independent of the Docker server, and it would be silly to run two
versions of Docker on the same host, so why would it matter?

