Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What are the disadvantages of Docker?
67 points by codegeek on April 21, 2018 | hide | past | favorite | 32 comments
docker is hot these days and it is everywhere. But someone who likes to tinker with servers using scripting like bash etc, I don't get docker. Ok I get that it allows you to "containerize" things so you can re-use wherever using same set of stuff. But what are some of the disadvantages of "containerization" specifically using docker ?

Just trying to convince myself to start using it but so far, my run_install.sh script beats everything. Why the hassle of containerization ? What overheads does it add which may not be worth it in some cases ?




Lot and lot.

Build. Docker layering do not fit the model of building software and dependency, which makes its caching really brittle. At that point i discourage having layers.

Networking: this is a mess, even k8s do not solve it completely. There is a huge market of 3rd party provider of solutions for that.

Stability: Docker is based on lot of still unstable API and tools. I have a kernel crash per month in prod from Docker and strange stuff happens. Additionaly, docker breaks their own API regularly without respecting semver.

Disk/FS speed. This is a pain.

GC. Docker fill your disk faster than a java logger. And that is something. A friend filled 100GB just trying k8s for a day...

UX. The docker cli got better but still far from there.

Debuggability. Crash without saving core dump, pain in the ass to load debugging tools, etc.

All in all, we are getting rid of it at work. Spent the past 3 months deleting it from all projects in active development.


I would also mention (VPS) hosting costs. Some applications need many docker processes (postgres, redis, elastic, nginx, node.js backend) and each one of them has overhead in both cpu and memory (~15 MB). CPU is usually not a problem for small projects but memory is (5 processes and you're at 75 MB just for docker, then there is operating system and if you need java, you're screwed)

BTW: Docker is written in garbage-collected language (go) so that's why it takes so much (for compiled lang) memory.

BTW2: if anybody knows about memory-cheap (using linode currently) VPS I'd love to know


For certain apps, if you're on OSX, file system performance is horrible when developing, especially on apps with a lot of file churn. I wrote about it here, and how I was able to get most of the performance back: https://medium.com/@bdcravens/fixing-docker-for-mac-and-rail...

Even so, this seems like a silly problem to have to worry about, since I could easily run the app on my machine at full performance with no workarounds.

I think the biggest disadvantage is the herd mentality. Rather than use Docker where it provides a compelling advantage, we're being told to Dockerize all the things and accept any pains that comes with as the price of progress. To me it's as if you had a purely static site yet were told you needed to use React and webpack. Solve for the pain you have, not what others have.

> But someone who likes to tinker with servers using scripting like bash etc, I don't get docker.

I will say that there is a point where if you like tiny utilities and the "unix" way, Docker starts to make sense. Rather than have a giant OS with all its varied dependencies, making tiny containers that do one thing well and are tied together (for instance, I have a tiny container that monitors a networked folder and uploads to s3) - you start seeing Docker not as VMs, but as isolated but connected processes.


> but as isolated but connected processes.

How are they connected ?


You can create a virtual networks and have all the related containers running on one of these networks. They can communicate directly to each other in the docker compose method (define a service `foo` and any container on that network can communicate with it like `http://foo/some/endpoint`) or have a some other message bus (redis, rabbitmq, zeromq, kafka whatever you want really) running on the docker network that the other containers use to communicate.

Is this the best solution always? Probably not.

An example of using a Docker container as an isolated system process would be Spotify's `docker-gc`.

https://github.com/spotify/docker-gc


There are several benchmarks floating around about system call overhead for docker. These are generally low, however, IO overhead in the form of filesystem and network can be quite high.

Over at Wallaroo Labs, we've seen some workloads in Docker at are an order of magnitude worse performance for some workloads. However, in those cases, it was comparing raw performance on OSX vs running on Docker on OSX. IO overhead while it exists, has generally been much lower when running in Docker.

Talking about "specifically using Docker" is a little difficult as Docker is mostly providing a UI over existing Linux technologies and thus, isn't that much different from other LXC based technologies.

If you want to talk about "docker specifics", you should be looking at things like the overlay networking and Docker Swarm.

If you want to provide someone an environment where you know your software will work because you've tested that specific environment then containers are a very nice way to accomplish that.

I'm not sure what you consider the "hassle of containerization". I'm not a big fan of Docker. The UX irritates me as I can never remember how to do anything even if I've done it tons before (and in this way, its very similar to git for me), but the creation of containers is, in my mind, quite easy. Certainly no harder than putting together a `bootstrap.sh` with Vagrant.

Without knowing what your run_install.sh does, its hard to really say much beyond that.


Docker on native bare-metal Linux is the only shared-kernel implementation of Docker. Docker on other OSs runs Docker on Linux in a virtualized environment. I'd expect performance to suffer.


I would be very interested to know what network applications show high overhead. Do you have any apps/number you could share? Does high overhead mean they are generally slower or that they consume more CPU or both? (I ask this as a researcher in computer science looking at container overheads)


The amount of bytes we can push through a network connection (with a single thread), using the same test application is lower when running in Docker than without. Its particularly noticeable with Docker for OSX which shouldnt be surprising as its running a VM.

The overhead is much lower on Linux as one would expect.

I don't have anything handy. I brought up the OSX overhead as I wanted to be comprehensive given the somewhat ambigous nature of the original question.

Docker Swarm inter-container networking is pretty bad. It consistenly collapses on us where throughputs drops by orders of magnitude. I know we weren't alone in this as we found an issue that had been open for the problem for quite some time. (And thus ended our quick experimentation for using Docker Swarm as a demo environment).


All of the OS X performance differences is from the busy box vm, not from docker. And for networking, if you use a different network driver than default the performance issues go away - the bridge/proxy stuff is slower, but you can expose raw interfaces.

That said I’m also not a fan of docker except in certain cases(ie if you need polyglot deployment). The development story is worse- OS X disk performance is bad, and good luck getting a debugger or most tooling to work. It’s an extra layer to worry about, such as now you manage number of containers and number of servers instead of just servers.


"if you use a different network driver than default the performance issues go away - the bridge/proxy stuff is slower" <-- issues go away does not match our experience. "better than bridge", yes. "goes away however does not match our experiences.


We found macvlan did, but I’m sure our tests aren’t exhaustive. Also not talking about swarm networking, know nothing about that.


In our experience, Swarm intercontainer networking is best avoided.


There are two pain points which I have experienced while using Docker.

Cons:

    1. Disk usage
    2. Permissions issues from volume mounts
Docker is really bad at garbage collection and it's not uncommon to have docker use more than 40GB of disk on my machine at any given time.

In development it's sometimes handy to have volume mounts for often changing files. This causes a lot of permissions issues with the containers. There's some support in native docker to handle this (--user) but this support isn't handled well by things like Docker Compose.

Pros:

    1. Easy to develop with
    2. Easy to version
    3. Easy to deploy
    4. Easy to migrate to another server
Gone are the days when you have to copy and paste the manual "apt-get install....." from readmes. Just install Docker, clone the monorepo, run "docker-compose up -d", and the entire development environment is now up and running. You can rollback your entire system to any point in time, build everything, and push it into production with amazing tooling around everything.

You also never have to worry about developing two services that are both networked that both want to use the same port. If I have Web Server A and Web Server B I can run them with all production-similar configs (80/443) in development all on my laptop.


I feel your pain on the disk usage, especially when rapidly making changes and building with `docker-compose up --build`. However `docker system prune` is your friend here.

It would be great if it were automated and a little smarter, it basically leaves no survivors, but for all the advantages of Docker this isn't so bad.


There are some community written garbage collectors out there that are good but none mainlined yet.

I do agree that docker's pros are worth the cons. Anyone who has every migrated a large application that they didn't have good documentation on the installation or configuration process (especially ones which you have written and forgot about) knows how valuable good, self-contained, packaging is.


I've used docker-gc by Spotify for production nodes that needed cleaning up and it's worked pretty well.


Additional, often unnecessary, complexity and the tendency to use it indiscriminately even if it's not pertinent to solving the problem at hand.

I've seen teams spending an inordinate amount of time just for servicing Docker and its surrounding infrastructure while they should've been working on their actual products instead.


In my experience, Docker can be hard to debug.

There are lots of utilities that have been developed over the decades for Unix and Unix-like OSes, but almost all of them assume you’re running on the bare hardware and not inside of a jail or container.

So, if you’re going to use Docker for your deployments, you can basically throw out all the tools you might use to try to help you figure out what is going on in the system, unless they have been specifically developed for use with containers or at least adapted to be container-aware. Which is basically almost all of them.

If you don’t have access to the Docker host, then the only debugging facilities you have available to you are the ones you explicitly build into your container — you can’t assume that there will be any kind of debugging facilities or tools available to you from the Docker host, because you don’t have access to the Docker host.

In my experience, that’s orders of magnitude worse.


Yup, that's going to be the big bucket of icy water for our devs if we're going to evaluate productive docker deployments. I don't mind too much, telegraf and elasticsearch easily integrate with container setups, so I get the data I use reguarly without a problem.

All the java-based monitoring and profiling we're currently tacking onto the artifact during deployment? That'll be gone and you get to do all of that for yourself. Or (and sadly, that's a more probable output in our place), I'll need to build my own images.


Lack of good IPv6 support. Getting origin IPs only returns the IPv4 of the Docker gateway, and the only way to get around this is by setting the network mode to host.


Docker makes a lot of things easy, but the tradeoff is hidden complexity, with default configuration that may not be the best choice for what the user needs.

Let's look at networking as an example. Docker makes basic networking setup very easy with the default docker0 bridge setup. However, this is really a "solve 90% of cases" default that can really damage the 10% of cases that it is ill-suited for. Developers unfamiliar with Linux networking are unlikely to even realize it's a bottleneck. Concrete examples of where it becomes a bottleneck depend on use case, but some can be (a) unnecessary ARP table overflows when scaling to thousands of containers, (b) heavy TCP connections between containers (think appContainer<->redisContainer). The reason for the bottleneck seems to be an overreliance on iptables and ebtables for filtering container-to-container communication.

The default container-container communication is so bad that I actually switched to a shared socket (mounted in a named volume) for communication with redis instead of using the default docker networking. I didn't do any formal benchmarking, but the socket communication was significantly faster than TCP communication for high throughput reads from redis (50mb+).

Since Docker uses netns + veth under the hood, I really wish it were possible to create a netns and launch a Docker container into it with something like --net-ns MYNETNS, like you can launch it into a cgroup with --cgroup-parent. Unfortunately it's not possible without some ugly hacks AFAIK.

Of course, you can mitigate any issues like this, but it requires networking knowledge and awareness of tradeoffs you are committing to by going "off the reservation" in terms of Docker setup.


kubernetes: SO MUCH COMPLEXITY.

If you have a giant app spread across multiple microservices, it definitely could make sense.

For an app I worked on, that was basically divided into two classes of instances: front end rails, and back end api in python/pylons, it was an utter waste of time and energy. Just go with EBS plus a nice base ami and build something useful with all the time you would have wasted on k8s.


For a setup like that, yeah k8s is probably overkill unless you're planning to further decompose the application.

But FWIW k8s != docker. Kubernetes is a piece of software for orchestrating containers and not only docker containers. Most people that are using k8s are using docker but I don't think most people using docker are using k8s (just a guess).


I have no idea if your point of view is sysadmin, devops, developer (front end/backend?), employee, and/or freelancer. I like docker because I am all of those, despite never actually using docker on the production/deployment side. I'll go over how docker helps me. I use OSX and Debian/Ubuntu.

I currently freelance for a project that uses MongoDB. I use MongoDB for nothing else, nor do I want to use it. In the past, I would infect my system with MongoDB PPAs that would give me some version of MongoDB (who really knows which version) and would break my apt-get update randomly. I like to tinker with severs. I consider tinkering to imply fun, interesting, or accomplishing. Fixing MongoDB PPAs breaking my system is the opposite of fun and interesting. It occupies time I should be working for money or tinkering for fun. To refer to "unfun tinkering" we'll use the term "yak shaving".

I fixed this by adding a docker-compose yaml file that specifies the same MongoDB version as used in production. I run docker-compose up when I'm working on the project. I can work on the project faster now and keep my system safe from MongoDB.

With my other projects, I have a docker-compose yaml file that specifies which version of postgre and redis I'm using. It's the fastest and easiest way to get the exact versions I want without commingling the projects databases with other projects I'm working on (this is huge for freelancing). It's faster than vagrant (but I think I heard vagrant is getting docker support as a VM backend).

Finally, one of the docker files is basically a bash script with extra caveats. You put your build script there. Or you can run your existing bash for from it. There might be some caching advantage to converting it, I'm not sure.

Knowing nearly nothing about what you actually do or work with, I'd guess that there is a reasonable chance that the cost of learning docker could be higher than the value you would get out of learning it. It was worth it for me though.


Nginx can be updated and restated without loosing any connections. With Nginx under docker it is not possible.


Why isn’t this possible using docker? I would assume you could still issue an nginx reload command to the container running docker, since it’s the same nginx software running.


Should be the same I agree. Not only that, under docker you could also have them behind a loadbalancer and just upgrade one at a time with no down time.


I have not meant the reloading of a configuration. Nginx supports live update of the executable itself when one can update the binary without loosing any connections. Docker just does not support that unless one uses Docker as a mini-os when one runs updates inside the container defeating the purpose of Docker.


while it may be possible, its frowned upon. The docker way is immutable containers, so you want to edit the nginx config you 'cant' just do it in place, you need to make a new container with the new config and a new process and kill all the existing containers(and hence processes/connections).

docker is basically at the point today where mapreduce was a few years ago. while it may solve some problems, its not a thing that solves everything.


You could mount the nginx.conf files into the container so they are editable. The “correct” way to run a Docker container is to supply the config at runtime, not at image build time.


It can be abit of a pain if you're not using linux OS as the server. Setting up secure image registries or registries in general is also tricky. It can be tricky learning how to debug dockerised applications because of the extra layer of indirection.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: