Hacker News new | past | comments | ask | show | jobs | submit login
Why Docker Is Not Yet Succeeding Widely in Production (sirupsen.com)
502 points by PolandKid on July 28, 2015 | hide | past | favorite | 280 comments



While the article goes into the more technical reasons for not using Docker in production, the practical reason "Why Docker Is Not Yet Succeeding Widely in Production" is that if it ain't broke, don't fix it. The advantages of Docker do not necessarily outweigh the opportunity cost of rewriting the startup's entire infrastructure.

Docker will likely be more prevalent in a few years with startups who have built their infrastructure form the ground up.


Docker solves a problem that most people don't have. It's not a PaaS, rather, it's (some of) the building blocks to create your own PaaS. Most folks don't need that. Most folks want to put files on a server and start a process. For those folks, Docker in the raw ends up being a whole lot of confusing & unnecessary scaffolding.


Moreover, Docker / Kubernetes aims to solve the problem of building services that can easily scale to hundreds of machines and hundreds of millions of users, "Google-style".

That's great, if that's what you need. But most people aren't building a service like that. HN, I believe, runs on one machine, with a second for failover purposes. And HN still has many, many more users than typical company-internal services, community services, or at the extreme end personal services.

When you aren't operating at absurd scale, "Google-style" infrastructure doesn't do you any favors. But the industry sure wants to convince us that scalability is the most important property of infrastructure, because then they can sell us complicated tech we don't need and support contracts to help us use it.

(Disclosure: I'm the lead developer of https://sandstorm.io, which is explicitly designed for small-scale.)


"But the industry sure wants to convince us that scalability is the most important property of infrastructure, because then they can sell us complicated tech we don't need and support contracts to help us use it."

And lets not forget: replace any and all efforts at code optimization with "just throw another rack of blades at it".



i just use google cloud's HTTP Loadbalancer and AutoScale. it automatically spanws/remove's VM's based on load.


There was also a time when most people thought they didn't need version control. Back in the 80s and 90s it was a justifiable viewpoint because existing version control systems sucked.

The problem with Docker is not that it doesn't solve (or attempt to solve) widespread problems. At its best, Docker gives you dev/production parity, and dependency isolation which is useful even for solo developers working part-time. The problem is that it's not a well-defined problem that can be solved by thinking really hard and coming up with an elegant model—like, for example, version control—it's messy and the effort to make it work isn't worth it most of the time right now.

That's no reason to write off Docker though. Pushing files to manually configured servers or VPSes is messy and leads to all kinds of long-term pain. You can add Chef / Puppet, but it turns into its own hairy mess. There's no easy solution, but from where I stand, the abstraction that Docker/LXC provide is one that has the most unfulfilled promise in front of it.


> At its best, Docker gives you dev/production parity

I get that when I use the same OS and built-in package manager?

I would virtualize the environment using something like VirtualBox for my dev and EC2/DigitalOcean/etc on prod.

> and dependency isolation

If you're going to scale something, you're going to split everything out on different virtualized servers anyway, so you'll get your isolation that way.

Basically, current mainstream practice is to virtualize on the OS level, where as Docker is pushing to have things virtualized on the process level.

I personally don't see the advantage ... just more complexity in your stack. I never have to mess with the current virtualization structure, I don't even see it. It looks just like a "server", even though it's not. Isn't that better?


To be fair, I've worked in places where all the devs were on the same OS, same version, and we still had problems.

But I agree, just use VirtualBox. I know Idea already supports deploying to VMs and you they just look like another machine, so no learning curve. All the benefits with none of the hassle.


Yeah, but then there's still the issue of secrets, you need to have testing PayPal credentials, testing mailing service credentials, etc. There's the issue of deploying changes fast without leaving files in an inconsistent state (you don't want half of some file to run). How about installing the required dependencies?

I don't use Docker, but those are problems I can think of off the top of my head.


Docker doesn't credibly solve the credentials problem and the other problems you outline (which do exist) are as practically solved with something like Packer. And I mean, I'm not a Packer fan--oh look, VirtualBox failed to remove a port mapping for the VM that just shut down, throw away the whole build--but it's built on much, much more battle-tested technology with a much wider base of understanding.

(And, later, if you want to play with Docker, Packer lets you do that too. But you should use the Racker DSL in any case, because life is too short to deal with Packer's weird JSON by hand.)


Thanks for pointing me to Racker (https://github.com/aspring/racker). I'm currently building Packer and Terraform images with chunked together Python scripts that work, but I wouldn't call them a great solution. I'm actually using Packer specifically so I can start with regular EC2, and then move to a more Docker-based infrastructure.


Packer severely frustrates me, with the maddening regularity in which it fails just for funsies. Or the consistent but completely inane ways that it fails, like refusing to proceed based on not finding a builder for an 'only' or 'except' clause (making it nearly impossible to re-use provisioners and post-provisioners across multiple projects). Racker does help--my shared Racker scripts are in a Ruby gem--though I think that it pretty much removes Packer as anything more than a dummy solution into which you dump directives on a per-builder basis. As a tool that you carefully feed the bare minimum of information to do its job in any specific situation, though, it works okay.

Terraform, on the other hand, I think is a huge, huge mess, and I don't think they're going to fix it. I wrote a Ruby DSL for it the last time I tried to use it in anger, only to encounter that Terraform didn't honor its own promises around the config language it insisted on instead of YAML or a full-featured DSL of its own. Current client uses it, and every point release adds new and exciting bugs and regressions in stuff that should be caught by the most trivial of QA. For AWS, I strongly recommend my friend Sean's Cfer[1] as a better solution; CloudFormation's kind of gross, but Cfer helps.

[1] - https://github.com/seanedwards/cfer


Credentials have to be managed separately from Docker anyway.

> There's the issue of deploying changes fast without leaving files in an inconsistent state (you don't want half of some file to run). How about installing the required dependencies?

rpm / dpkg also install dependencies, are quite fast and well tested. They have the advantage of working in a standard environment which most sysadmins know but the disadvantage that you need to configure your apps to follow something like LSB (e.g. install to standard extension locations rather than overwriting system files, etc.).

The one issue everything has is handling replacement of a running service and that's not something which Docker itself solves – either way you need some higher level orchestration system, request routers, etc. Some of those systems assume Docker but that's not really the value for this issue.


> the disadvantage that you need to configure your apps to follow something like LSB (e.g. install to standard extension locations rather than overwriting system files, etc.).

Common misconception. You only need to do this if you're going to try to push the packages upstream. If they're for your own consumption, you can do what you like. Slap a bunch of files in /opt, and be done with it - let apt manage versions for you and be happy.

As with many things, this is one area where you've just got to know what to ignore. It's simpler than it looks.


I think we're actually talking about the same thing – I said “like LSB” simply to denote following some sort of consistent pattern, which will vary depending on how widely things are shared.

/opt is defined in FHS for local system administrator use, so installing your company's packages there is actually the recommended way to avoid conflict with any LSB-compliant distribution as long as you use /opt/<appname> instead of installing directly into the top-level /opt:

http://www.pathname.com/fhs/pub/fhs-2.3.html#OPTADDONAPPLICA...


rpm and fast don't really go together. dpkg is much better. dnf will be interesting.


> Yeah, but then there's still the issue of secrets

How would Docker help with this? Genuinely curious.

I store them in bash scripts outside the repo that populate the relevant data into environment variables and execute the code. The code then references the environment variables.

> How about installing the required dependencies?

There are two kinds. On the OS level and on the platform level.

On the OS level, you can have a simple bash script. If you need something more complex, there are things like Chef/Puppet/etc.

On the platform level, you have NPM/Composer/PIP/etc which you can trigger with a simple cron script or with a git hook.

> There's the issue of deploying changes fast without leaving files in an inconsistent state

So the argument here is that you're replacing one file in one go vs possibly thousands? That in the latter scenario the user might hit code while it's in the process of being updated?

Ok. With docker, you would shut it down to update. You would have to.

Same goes for the traditional deployment? Shut it down, update, start it back up?

You can, of course, automate all of this with web hooks on Github/Bitbucket, for both docker and the traditional deployment.

The traditional deployment should also be faster, since it's an incremental compressed update being done through git.


Kubernetes secrets are a really great solution to this problem. [1] They are stored at the cluster level and injected into the pod (group of containers deployed together) via a file system mount. This means that each pod only has access to its secrets which is enforced by the the file system namespace. If an entire machine is compromised, only the secrets of pods currently scheduled onto that machine are able to be stolen. That's a high level, but it's worth taking a look at the design doc.

Edit: forgot to mention, the file system mount means that they don't need to be in env var, which are fairly easy to dump if you have access to the box or are shipping containers around in plain text.

1. https://github.com/GoogleCloudPlatform/kubernetes/blob/maste...


I don't know if Docker helps with this, I don't use Docker. But some kind of solution has to exist.

How AWS does updates is it first downloads the new code into a separate folder and then switches the link to point to the new folder instead.

But AWS has an unsatisfactory feeling because it downloads the entire code instead of doing a git update. These are all issues that could be fixed, and someone has to do them. I have no idea if Docker helps with any of them, but the opportunity is still there.


Ansible, systemd and go is stealing my heart at the moment. Basically pick the tech that doesn't cause the problems to start with.

I still reckon that the main reason VMware ESX is as successful as it is comes down to the lack of isolation and sheer deployment hell that windows has been for years. The same can be said for python or ruby on a Linux machine for example. Docker removes some of that pain like ESX does.


That's a bit of my point... If you're building relatively small independent services with Docker you can deploy service A with node 0.10 as it's tested environment and service B with iojs 2.4 on the same server, without them conflicting... when you need to update/enhance/upgrade service A you then can update the runtime.

The same can be said for ruby, python and any number of other language environments where you have multiple services that were written at different times with differing base targets. I've seen plenty of instances where updating a host server to a new runtime breaks some service that also runs on a given server.

With docker, you can run them all... given, you can do the same with virtualization, but that has a lot more overhead. It's about maximum utilization with minimal overhead... For many systems, you only need 2-3 servers for redundancy, but a lot can run on a single server (or very small cluster/set)

I have to agree on ansible, systemd and go... I haven't done much with go, but the single executible is a really nice artifact that's very portable... and ansible is just nice. I haven't had the chance to work with systemd, but it's at least interesting.


> The same can be said for ruby, python and any number of other language environments where you have multiple services that were written at different times with differing base targets. I've seen plenty of instances where updating a host server to a new runtime breaks some service that also runs on a given server.

This is a solved problem in Python and Ruby. In Python, use virtual environments. In Ruby, use RVM. You won't have the issue of one tenant breaking another.


And with node, you can use nvm... however there are libraries and references at a broader scope than just Python, Ruby or Node... Say you need an updated version of the system's lib-foo.

A runtime environment for a given service/application can vary a lot, and can break under the most unusual of circumstances. An upgrade of a server for one application can break another. Then you're stuck trying to rollback, and then spend days fixing the other service. With docker (or virtualization) you can segregate them from each other.


You're correct, but I can see that there's certainly a place for having a single solution that works across all ecosystems.

Also, RVM in production? Sledgehammer to crack a nut :-)


In local system yes, but in production its painful to work with. With RVM for isolation you would create gemsets for each app with specific ruby version. It OK for 2-3 applications, but anything more than that would be a pain to work with. And then if you plan to put everything behind passenger, it would just be too messy. Think of automating this? Would be a nightmare to maintain. Over here containerization does help.


Node has this too : nave, npm and n. But using these tools means that you are not longer using the package manager of the system and this can be a problem sometimes. Eg you need to open your firewall to something else that it is not the standard pkg manager.

I see docker as a valid attempt to fix limitations of existing and broken package system (eg: apt) at a price that I am not yet willing to pay.


Virtual environments doesn't work for the interpreter itself. Not only that some of the packages will need to build c extensions and they use different versions of the same library which might break.


It partially is, until you need native dependencies.


Which version of RVM are we running again?


If you want to put files on a server and start a process, you are probably looking for something like Apache, not a "PaaS" necessarily.


Docker will likely be more prevalent in a few years with startups who have built their infrastructure form the ground up.

The opposite seems likely ... Docker will fade and become deprecated as building infrastructure from the ground up locally to feed into the cloud becomes cheaper and cheaper still. AWS is not always so cost-effective when you truly dig in and crunch the numbers.

My guess as to why Docker won't succeed widely in production is because it's a software-based solution trying to glue together slippery pieces that just don't want to be glued together. The core issue of security will never be solved by a Docker-like solution; that problem is best solved by integrated hardware.

This very issue is being addressed in ClearLinux: http://sched.co/3YD5


DevOps/infra guy here rolling out Docker startup-wide at the moment. You and minimaxir are both correct.

With regards to docker/lxc/container security, you're right. Some of the biggest players haven't solved the lxc/docker/container security issues yet; its a really hard problem to solve. Breaking out of container will always be easier than breaking out of deeper levels of virtualization (Xen/KVM).


> Breaking out of container will always be easier than breaking out of deeper levels of virtualization (Xen/KVM).

I agree it's not easy to get right, but it doesn't seem necessary that containers will always be leaky. Solaris/Illumos Zones are an OS-level virtualization approach that's pretty airtight, for example.


I agree. But that's my biggest problem with Docker. Who runs SmartOS and uses Zones? Why?

When you have a local server, that supports KVM and Zones, you choose KVM as the cleaner abstraction. While surrounded by neat tech, Zones are actually a bit of a pain and not all that portable between systems IME. OTOH I can `zfs send/recv` over SSH, drop a short bit of JSON in, and have my KVM instance reliably moved to another SmartOS box 100% of the time, no worries.

So unless you're really worried about that last 5% or whatever of overhead, what's the point of Docker? It's not actually very portable at all it seems (on my Mac I'd have to run it inside VirtualBox). I don't have much experience with it, but my guess is that similar to Zones, you're at the mercy of the host system as far as common dependencies like OpenSSL or gcc go.

It seems like a solution to a problem I'm having trouble even imagining. A slightly lower overhead, less secure, less portable lightweight "VM" with slightly less overhead. I guess if you're a PaaS and you could increase margins by 5% overnight by switching to Docker that might make sense?

As someone who's set up Solaris 10, OpenBSD, FreeBSD, SmartOS, Debian, Redhat, Ubuntu, KVM, Xen, etc etc etc, I just have a real hard time figuring out Docker's value proposition. It seems like the Solaris world went from Zones to KVM, and some people are attempting to do just the opposite. Which I just can't think of a good excuse for.


I believe Docker's biggest feature is it's speed of building. It's a trade-off of portability vs. temporary-ness.

I currently use it for MySQL DB restoration and remote bug-checking by having a handful of xtrabackup instances that I can quickly attach a docker to, hand an IP to a developer, and he can then debug the problem with production data _at that exact point in time._

When they're done, I simply throw that docker away.

It's a tool that (in my mind) doesn't solve any existing problems better than a lot of tools out there. It instead should be thought of like a better hammer for the same nail. Think of it like... would you rather have a giant set of wrenches, or a single ratchet with a set of sockets? They both accomplish the same thing, but both are better for certain jobs.


> AWS is not always so cost-effective when you truly dig in and crunch the numbers.

If you have a consistent level of traffic (i.e. you don't have inordinately wild upswings/downswings like e.g. Reddit), AWS isn't even remotely cost-effective. I was going to do the math to compare our current physical server infrastructure with AWS, and even if you factor in that physical servers need to be in pairs (for redundancy) and over-provisioned (for traffic spikes), I didn't even get as far as back-of-the-envelope math before it was obvious that AWS was completely infeasible.


There's one clear cut use case where AWS/Azure are incredibly cheap - Disaster Recovery. At my last job, we maintained a small DB instance and nothing else but an empty VPC. Within 15 minutes, we could spin up the entire DR stack including resizing the DB to support Production. There's no equivalent for this when you ONLY run your own hardware - you're stuck with a second site that sits there idle (unless you intend to do Active-Active which has its own share of problems).


Running your own hardware is always going to be cheaper - but you also need to employ folks with hardware management skills. That's fine if you already have those.

Similarly, cloud offerings give you remote reach easily - one company I work at has it's production servers almost literally on the direct opposite point of the globe. You can do datacentres with remote hands, sure, but it's another layer of complexity. Hardware also has a mild barrier to entry in the form of cost - for small shops, doling out the five or six figures you need for initial hardware is a pretty sizable chunk.


I have mixed feelings about Docker. I've found three major use cases so far:

(1) Testing.

(2) Build environments -- it's helpful to build distribution Linux binaries in older Linux versions like CentOS 6 so that they'll work on a wider range of production systems.

(3) Installing and running "big ball of mud" applications that want to drag in forty libraries, three different databases, memcached, and require a custom Apache configuration (and only Apache, thank you very much).

#3 is really the killer app.

This has led me to conclude that Docker is a stopgap anesthetic solution to a deeper source of pain: the Rube Goldberg Machine development anti-pattern.

More specifically, Docker is a far better solution than the abomination known as the "omnibus package," namely the gigantic RPM or DEB file that barfs thousands of libraries and other crap all over your system (that may conflict with what you have).

Well written software that minimizes dependencies and sprawl and abides by good development and deployment practices doesn't need Docker the way big lumps of finely woven angel hair spaghetti do.

Docker might still be nice for perfect reproducibility, ability to manage deployments like git repos, and other neat features, but it's less of a requirement. It becomes maybe a nice-to-have, not a must-have.

But... if my software is not a sprawling mess that demands that I mangle and pollute the entire system to install it, why not just coordinate development and deployment with 'git'? Release: git tag. Deploy: git pull X, git checkout tag, restart.

Finally, Docker has a bit of systemd disease. It tries to do too much in one package/binary. This made the rounds around HN a while back:

https://github.com/p8952/bocker

It demonstrates that at least some of Docker's core functionality does not require a monster application but can be achieved by using modern filesystems and Linux features more directly.

So honestly I am a bit "meh" about Docker right now. But hey it's the hype. Reading devops stuff these days makes me wonder if "Docker docker docker docker docker docker docker" is a grammatically correct sentence like "Buffalo buffalo buffalo buffalo buffalo buffalo."


>Docker might still be nice for perfect reproducibility

Docker actually doesn't help reproducibility at all, because the underlying reproducibility problems present in the distro and build systems are used are still present. See GNU Guix, Nix, and Debian's Reproducible Builds project for efforts to make build truly reproducible.

I had a good laugh when I read "the Rube Goldberg Machine development anti-pattern". This describes the situation of "modern" web development perfectly. I'll add that such software typically requires 3 or more different package managers in order to get all of the necessary software. And yes, Omnibus is an abomination and Docker is much better.

I think Docker is papering over issues with another abstraction layer. It's like static linking an entire operating system for each application. Rather than solving the problem with traditional package management, Docker masks the problem by allowing you to make a disk image per application. That's great and all, but now you have an application that can only reasonably be run from within a Linux container managed by Docker. Solving this problem at the systems level, which tools like GNU Guix do, allows even complex, big ball of mud software to run in any environment, whether that is unvirtualized "bare metal", a virtual machine, or a container.


> It's like static linking an entire operating system for each application.

You say it like it's a problem, but that's the most concise description of Docker I've yet read. It rhymes with the way all the fed up oldies using Go like its static linking.


This is pretty much how I view Docker as well. Except it's not really the entire operating system. A VM image is the ultimate static linking.


Nothing wrong with linking together a couple things to build an app... I call it Rube Goldberg (a.k.a. ball of mud, pile of crap, etc.) when it's like dozens of things that all have to be tweaked in exactly a certain way or everything assplodes.

I simply will not run apps like that unless I have no choice. If I see that, plonk it goes into the trash.

... and yes, the whole package management situation is comical. Every language has its own package management system, and the OS, and sometimes people use both at the same time. It's ridiculous.


> It demonstrates that at least some of Docker's core functionality does not require a monster application

```The following packages are needed to run bocker.

btrfs-progs curl iproute2 iptables libcgroup-tools util-linux >= 2.25.2 coreutils >= 7.5 Because most distributions do not ship a new enough version of util-linux you will probably need grab the sources from here and compile it yourself.

Additionally your system will need to be configured with the following.

A btrfs filesystem mounted under /var/bocker A network bridge called bridge0 and an IP of 10.0.0.1/24 IP forwarding enabled in /proc/sys/net/ipv4/ip_forward A firewall routing traffic from bridge0 to a physical interface. A base-image which contains the filesystem to seed your container with.```

Is this the "well-written software" pattern that you're talking about? Because to me, this looks like a "big ball of mud" - i.e. dependence on an eclectic combination of libraries, co-programs, and environment configuration - and indeed, if for some perverse reason I felt like I wanted to deploy this in production, it's exactly the kind of thing I'd wind up writing a Dockerfile for. (Which, I notice is functionality "Bocker" doesn't attempt to replicate.)


A few packages are needed, but in their standard configurations. Bocker does not require you to install hundreds and specially tinker with each one the way many web stacks do.


> is that if it ain't broke, don't fix it.

I hate being passive-aggressive so I'll be directly aggresive here: this mentality is a way to say, "I don't want to revisit the operational aspects of my system because I don't like to do that work. Find someone else."

Like any aspect of your system, your ops and deploy components can rot. Pretending otherwise is outright ignoring a consistent lesson offered by those who came before and have failed over and over.

Docker offers to take over as a project many aspects of the system subject to bit-rot and make an explicit and consistent container abstraction for software to compose. While it has many features we do need (I agree wholeheartedly that it'd be great to parallelize layer creation, less so about secret exposure since the environment & volume tooling already can do that), it has also replaced whole categories of software and devops tooling with simple and extensible metaphors.


I disagree. It's all about "the evil you know".

And then there's the part where Weave is slow, so you might as well stick to VMs or hardware...

http://www.generictestdomain.net/docker/weave/networking/stu...


On Weave, yeah. Not the best performance, and improving. Flannel is obviously better, and one of the reasons we're excited about rocket. That changes almost nothing for shops hoping to move to containerized architectures now. It's only for existing early adopters.

The idea that docker introduces that much uncertainty is outright fear mongering. There is a huge amount of recalcitrance in the community to do anything meaningful in the space due to a proposed risk aversion. My personal opinion is that we're all pretending we didn't write incredibly delicate and brittle provisioning and monitoring code with very dated tools.

Many people I know, and more than a few I respect, ultimately point to all their provisioning shell scripts as the ultimate reluctance to change things. "It will be really hard to migrate and test these! Generating them is a pain!" Of course, the elephant in the room is we all knew this going into it and we all know we SHOULDN'T have been doing things like generate shell script execution and using git to provision on production boxes and w/e other hacky shit we've done.

Of course, what we have is not any one thing but all too often an amalgam of spare hours and quick fixes and patches laid over some existing provisioning system like salt, ansible (or just a whole shit ton of puppet work).

Counter-intuitively, suddenly everyone has become a devops luddite when it comes to a genuinely novel approach even though container abstractions have already proven themselves at scale. People hem and haw and suggest that somehow it's not ready for production. Meanwhile major players in the space are already using them, even for core services, with excellent results.

Lightweight containerization has been used to solve this for awhile now. Docker as a product and initiative is relatively new, but to suggest it was the first example of a container engine used in production ignores the actual history of lightweight containers.


Show me how I can easily audit all my docker containers for vulnerabilities from spacewalk

Show me a docker-aware Rapid7

There are a lot of tools for security and compliance completely thrown out with the bathwater when you move to containers. You're not going to get enterprises to bite until you can satisfy the auditors.


Cool. I'm glad that I'm in the only top 10 bank in the US using stacakto.


I had a very smooth workflow on Python/Django/AWS. Thought of checking out Dockers for the last project and boy did that hurt! "if it ain't broken, don't fix it" is very appropriate here. I would suggest that until you've huge issues with deployment, skip dockers. For me, it added loads of work instead of simplifying the flow.


How does Python/Django deploy itself to AWS?

Last time I checked, Python and Django are agnostic to their operational concerns and deployment.


Even if you're building brand new infrastructure from scratch right now many issues (discussed elsewhere in these comments and the article) are still unsolved.


Yup. At my last gig we built out a Mesos cluster and were deploying Docker containers, but we couldn't answer "how do we practically secure this to the same level as independent virtual machines?" and, finding no good answer, we went back to auto-scaling groups and baked AMIs.


Yes, doing the same thing here as well. Still using docker to streamline deployments though, but one docker container/role per instance, no "orchestration" for containers (baked AMIs, ASGs).


I did that at one place, but I wasn't super satisfied with the process--having to download container images on spin-up was annoyingly slow and I didn't feel like we were getting better dev/prod consistency versus Vagrant and Packer.


We bake the container into the AMI, so no fetch is necessary at spin up (there is no cost to generate AMIs, only storage fees, so cost is not an issue).

Packer is used to build the AMIs with the containers built-in, and Docker is used both in Prod (single container to each AMI) and Dev (Docker Compose to bring up entire dev env locally). Both used a shared docker registry.


Ahh, gotcha. That's a neat approach, though the double hit of Docker builds + AMI builds feels a little weird to me. Thanks for the insight.


I also do what the other poster does, but we take it a step further and make sure that the layers on the image have smaller and more variable layers near the last layer add. On instance startup we can do a docker pull and bring down only a few k of bytes for docker image updates. This way we can update the ami less often (which it takes longer anyway) and we don't worry too much about pushing updates to the container repo without having to batch ami builds for quicker turn around deployment.


If you're in AWS, I wouldn't worry about how many bytes docker image updates take. Our registry is using S3 as its backend, and I can pull images under 100MB in a few seconds.


It's not always an option to host and manage a registry, some parts it's easier for the customers to rely on a registry service like quay, in which case it can make a difference for some images to think about layering. But you are right, s3 is fast and that's one of the reasons I'm glad Deis moved to support s3 out of the box.


It's quite possible but the most straightforward answer is somewhat ugly - install endpoint security in every container. For example, each container would need to have intrusion detection, iptables, etc. Other options would include having containers route traffic with a virtual LAN setup and you have a container whose function is to replace your usual network security appliances. And the irony is that shared services like that can be put into both control and data planes which is easy with hypervisors and software defined networking combined with storage fabric security. When it comes to security, you honestly should be securing things at every layer anyway, but in a lot of places I see people not bothering with iptables and delegating 80%+ of the security responsibilities to operations while application teams focus upon application security.


I was wondering, what did you think of the production readiness of Mesos independent of Docker?


Seems reasonable, if you want to be running lots of things on the same boxes without isolation (I'm not comfortable with that, but you might be). If you're sharing those resources for stuff like Spark, ElasticSearch, etc. I think it makes sense as a work scheduler, but there are a lot of other options to consider too.


I have a client, pre-funding but has some revenue, who is hell-bent on using their Rackspace credit for their production environment. "But we have a sysadmin who works for yahoo"

He tries to demo me what they have currently and the damn thing timed out during login. I laughed.

The cost of these headaches is easily avoidable. Get off the ground and running first, pay the kind-of-premium Heroku bill, and when you're ready to really scale, make the switch.

There are few exceptions to managing an infrastructure, such as RackSpace, a cluster of AWS nodes, your own metal, etc. versus something like Heroku.


New startup. Did not use docker. Happy with scripts and ansible.


I would be sold on Docker if that would be easy. I have e.g. this stack:

- 1 webserver/proxy, let's say nginx

- 1 simple Rest API server, let's say in flask

- 1 database, let's say PostgreSQL

and I want to connect all 3 things and I want to preserve logs for the whole time and preserve the state of the database (of course). Also not to forget make all bulletproof for the Internet.

And here all sorts of problems arise: What underlying OS, how to connect this containers, how to preserve state of my database and logs (it's not trivial as the article proofs again). So overall Docker makes life not easier on this simple use-case, it makes life (of the sysadmin) more complicated.


Ultimately I'd say Cloud Foundry solves the problem but requires a lot of "support" VMs to make it work such that it might be overkill for your situation.

For example:

- What underlying OS? CF provides a minimal Ubuntu Linux "stemcell" and then has a standard "rootfs" for Linux containers

- a Python buildpack to assemble the container on top of this OS for your Flask server

- a built-in proxy/LB so you don't need one, if you want a static web server there's a static buildpack for Nginx

- an on demand MariaDB Galera cluster for your database if you want HA; PostgreSQL is there too but non-HA I think

- A standard environment variable based service marketplace & discovery system for connecting the containers to each other or to the database

- high availability (with load balancer awareness) for your containers at the container, VM or rack level

- reliable log aggregation of your containers (which you can divert to a syslog server).

As I said the only trouble is when you want to make this "bulletproof" is that there are a dozen "support VMs" are all there to make your app bulletproof and secure, e.g. an OAuth2 server, the load balancer, an etcd cluster, Consul cluster, and the log aggregator, etc. So it's overkill for one app, but good if you have several apps.

For single tenants and experimental apps, there's http://lattice.cf which runs on 3 or 4 VMs and is a subset of the above, but not what I'd call "production ready".


spoken like a pivotal employee... cf has almost zero support for data services, and suffers from nih at almost every layer of the stack from routing to mq, to one of the worst ux for installs ever (aka bosh), cf is a great example of commercial opensource primarily controlled (inspite of foundation) by one entity (pivotal/vmware) that figure out how to switch from monetizing virtualization to single processes. you ever try the ui on the opensource cf?.. oh there isn't one.


And where do you work? What does any of the above have to do with the OP's question?

1. Data services, not true. There's MariaDB, Cassandra, Neo4J, Mongo, Postgres, among others. Yes, they're in VMs, but recoverable/reschedule-able persistent volumes in container clusters are at best experimental features anywhere you look.

2. NIH, compared to what? CF reuses etcd, consul, monit, haproxy, nginx, etc. will use runC and appC as those get hammered out.

3. Lots of people love BOSH.

4. If you don't like all the decisions Full CF makes, this is why Lattice exists, it delegates config/install to Vagrant or Terraform (which have their own problems) so anyone can take the core runtime bits with Docker images and use them in new and interesting ways.

5. What container or cloud platform project isn't based on code contributed by one or two vendors? Realistically? None. The CF foundation at least is an honest attempt to give all the IP to a neutral entity (including the trademark soon), has several successful variants (mainline OSS, Pivotal, Bluemix, Helion, Stackato), and has customers and users joining the foundation, not just vendors.


- 1 webserver/proxy, let's say nginx

- 1 simple Rest API server, let's say in flask

Dokku - https://github.com/progrium/dokku

Can't really beat `git push deploy/uat`

- 1 database, let's say PostgreSQL

I just run PostgreSQL on the host and connect to it from the containers. Sure I could containerise PostgreSQL itself but I don't really see the point.

I then run my own Dokku plugin (dokku-graduate: https://github.com/glassechidna/dokku-graduate) for graduating my apps from UAT to production.


This is exactly the problem here, just run Postgres on the host means that you have a hybrid setup, some of your services dockerized the rest are not. This is not appealing to some people. There are other services mostly in the heavy disk IO space that is not easy to move to Docker. It might be worth to call these out in the documents and save some time to sysadmins figuring this out the hard way. If you want to dockerize your application like a REST api or a simple Java app, that works perfectly and the advantages are obviously there though.


I have zero problem with a hybrid setup. I'm not running Docker just for the sake of running Docker.

I'm running Docker (specifically Dokku) because it drastically simplifies deploying new builds, and graduating those builds between environments.

I know a large part of this article was that Docker complicates rather than simplifies the situation. I guess if you're trying to be a Docker purist (for no reason) then sure. The same is generally true if you try be a purist of any kind.


I think the deployment is the weakest point of the Docker ecosystem.

The reason why I am using Docker is the forced honesty on the environment side, if your app runs on your laptop it does not mean it will run on the production boxes. If the Docker container runs on your laptop it gives you higher confidence that it will run on the production infra. No missing JARs, environment variables, misconfigured classpaths, etc.


If the goal is to simplify deployment process, why not use something like Capistrano or Fabric? You can run 'Cap deploy <dev|prod>'


Fabric is a great tool at first, but the problem is that it's always procedural. Want a deploy script? Write it. No other way around that.

Something that's more declarative is definitely superior. Why? Because it will be shorter and easier to debug. I am not a fan of `git push` as a deployment strategy (because git is a version control tool, not a deployment tool), but it does force you to create and use a system that's by definition declarative. This is why I use dokku for my new projects.


Because, as mentioned, I already deploy in one line:

    git push deploy/uat
and I didn't have to write a single deployment script to achieve it.

Plus, by using Dokku I get the benefits of containerised apps.


What are the benefits of containerised apps in your case?


It's useful to "mount" the database to the app process via injected configuration.

Application and database servers are different animals. Not sure why a 'hybrid' approach would be surprising or unappealing.


While I do agree it's not as easy as it should be, connecting containers it's actually doable (https://docs.docker.com/userguide/dockerlinks/), while logs are still a big mess because there is no unified strategy.

Databases are also tricky to run in containers, because even those with the best replication strategies can afford losing nodes but at a high cost (like re-balancing nodes, etc), and containers still don't have the stability to provide an acceptable uptime that's worth the risk.

On a side note, since you mentioned nginx and RESTful APIs, I would check out Kong (https://github.com/Mashape/kong) which is built on top of nginx, and provides plugins to alleviate some of these problems (http://getkong.org/plugins/).


Kubernetes solves this by mounting external volumes (say NFS or iSCSI) on the host and then exposing them to one or more docker containers. This seems like a pretty ideal solution for any Docker user.


Are you suggesting that a person with a total of 3 nodes use a system like Kubernetes which requires at a minimum (correct me if I'm wrong) 5 nodes just to function? If you really, really want to use Docker with a typical Nginx-App-DB setup just whip up the necessary shell commands to start/stop/log containers and throw that in Ansible or the like.

edit: I guess you can cram all of the various Kubernetes master/etcd servers on a single node but whoops there goes reliability.



Crap like that actually happens. I have a mathematician friend who does a bit of programming ask me about Docker, as someone had told her to use it. She works as a researcher in academia, so probably only needs to run her script a few times to get results. Why the hell would you recommend Docker?


> probably only needs to run her script a few times to get results.

Agreed that she doesn't need to use Docker. But if she is writing a paper on those results, she might want a way to reproduce her findings years down the road (even after she switched Distros), or to collaborate with others who want to reproduce/build on her research (and may not be running her distro).

It's easy to think "oh, this script just requires python 2.7", but most of the time you actually have many more dependencies than that (libxml, graphviz, latex, eggs, etc.) A Dockerfile requires some work to setup, but it tracks your requirements in an automated way.

So I'm not going to say "all researchers should use Docker". But I will say "Docker could be useful to some researchers". Just like Source Control, it's a tool that solves real problems. Source Control has gotten easy enough to use that it's recommended everywhere. Docker (or some other container standard) will get there eventually.


For research apps docker would be a godsend. Resea ch software is of the "install exactly this version of x,y,z,r,g and h" and then apply these patches....

Docker is really good for dev environments. I've had a relatively painless time dockerizing snapshots of old internal web apps so I can hack on them without installing things into my main desktop environment. It lets me have lots of server things side by side.


How did you come up with that number of nodes?


Sorry I was wrong. I assumed based on Kubernetes' use of etcd it would be 3+ nodes. It turns out Kubernetes master is a single node currently which means they haven't built high availability into the master at all... which is a pretty scary way to run a thing that manages your entire infrastructure. There's already a few topics on the mailing list about etcd losing its data and Kubernetes doesn't know how to recover. Yuck.


The Kubernetes master does support high availability.

https://github.com/GoogleCloudPlatform/kubernetes/blob/relea...


Well, yes, because for one thing, Kubernetes (if done correctly) provides HA/failover that did not previously exist.


Is it really more likely to believe that someone who is not otherwise in the PaaS business is going to find it easier to run 5+ nodes of other services instead of, say, two nodes running their application directly for failover/HA?

This is not to say that Kubernetes is bad but … it's a commitment which isn't appropriate for everyone. If you aren't exercising its abilities heavily, that's probably going to be a distraction from more pressing work unless you're scaling up heavily right now.


thanks I will definitely look more into Kubernetes.


At my work we use HAProxy, gunicorn/Flask (web apps and APIs) and a PostgreSQL container for development - we aren't planning to migrate the production databases to Docker anytime soon). We are using CoreOS for the host servers, dumping logs to logstash, and connect the containers via HAProxy. The big advantages we have seen are that it's easier to ensure consistency between development/qa/production, our deployment process is cleaner (basically docker pull container:latest, docker rm old-container, and docker create container:latest), and it's easier to resolve the few issues we have had (usually just a deploy vs tinkering on a live server). Our attitude has been to only use Docker for the parts of our stack that make sense (web apps and not production databases). It's been 8 months since we migrated and we haven't had any trouble yet.


That's all really easy with Docker Compose. Preserving state is just a matter of mounting a volume.

https://docs.docker.com/compose/ https://docs.docker.com/reference/run/#logging-drivers-log-d...


docker-compose comes to save the day when it comes to how to connect containers. Your Dockerfile will specify which underlying OS is used. Preserve state of your database with data volumes.


Too bad they bundle a version of OpenSSL with known security vulnerabilities that hasn't been fixed in the month since it's been brought to their attention.


docker-compose is no magic, it only maps a YAML file to docker's command arguments. While I think docker-compose is useful in some cases, I strongly advise to not use it at first so you understand how docker actually works.

Once you understand how docker works, using the YAML file can become useful to lighten your load.


agreed, I used a bash script based on glowmachine github repo[1], but switching to docker-compose made everything much easier - as long as you have the knowledge of the docker cli.

[1] https://github.com/glowdigitalmedia/glowmachine-docker/blob/...


It doesn't connect containers across nodes though, so it's use in realistic production scenarios is limited.


From what I read that's where Docker Swarm comes in. Could be wrong though.


The article makes a lot of good points, but the stack you're speaking of would be pretty trivial to get going...particular if you did it on a single host setup. You could probably have it spun up in an afternoon with Ansible or some such.

Multi-Host is moderately more difficult. A full orchestration and resource scheduling stack that scales with load even more so.

But you have to ask what your needs are if you're being realistic.


I've said it before and I'll say it again: pre-mature infrastructure optimization is the root of all evil.

Do me a favor and if you got a startup, stay clear of all this. Everyone wants to reinvent their own flavor of heroku and make your deployment and build pipeline god-awful complex. Their tool of choice? Docker.

Before you know it you'll be swimming in containers upon containers. Containers will save us, they'll cry! Meanwhile you have 0 rows of data before you've paid them their first month's salary and have spent time on solving problems of scale you'll never have.

Focus on your product, outsource the rest. And leave customized docker setups to mid-stage startups and big corps who already have these problems, or at least the money and people to toil on them. Not everything needs to be a container! And most companies are not and will never be Google!!


Until a few days ago I was a DevOps engineer at a medium-sized tech company. My primary responsibility was dockerizing their applications and infrastructure.

I quit the job.

The scenario played out just as you said: I ended up single-handedly and poorly re-engineering something that already existed (they did have a working Ansible setup) for no visible gain. "Swimming in containers upon containers" is exactly what happened; they kinda worked, but the farther we got, the more kludges piled on top of each other. In four months work we didn't even hit production - the most we got was a CI/QA service that was actually nothing more than a loose bunch of Python scripts. Between managing dev/test/prod differences, tracing missing logs, removing unused volumes, networking all that stuff together and trying to provide at least a decent level of security, I realized that I'm wasting everyone's time and money. Developers hated it because it filled their workflows with traps and obstacles. Admins hated it because of the lack of tooling. Business hated it because it caused unexplainable delays. The only thing we really accomplished was some compliance with the The Twelve-Factor App - something that could've been done in a week. Hardly a victory.

My advice? Forget about Docker unless your primary business is building hosting systems. It will take years before Docker gets mature enough for production, and not without a ton of tooling on top of it and some major architectural changes. Until then, go back to the old UNIX ways of doing things... it worked perfectly since the Epoch and it will continue to work long after the 32-bit time_t rolls over. You'll be fine.


Bother you for a little advice? I'm working at a mid-sized tech company and am evaluating Docker for CI, testing and limited, internal deployment usages.

The services in question are built with a hodge-podge of shell scripts and build tools, so getting them all to compile locally is a challenge, let alone deploying them. My hope was that containerizing the builds would isolate any configuration problems, and that containerizing the deployed services would cut down on outages by permitting trivial rollbacks (say, by snapshotting all the service containers before each deploy and merely restoring them should a deployment fail). Of course, all of the above could be fixed by traditional means (e.g. rewriting the build system with a single, standard tool; streamlining the deployment process, etc.), but it seemed like Docker could solve 80% of the problems while easing the implementation of the proper solutions down the line.

Considering the above, do you still think Docker's a poor fit for business that aren't building hosting systems? Oh, and any nuggets of wisdom you could throw to a newcomer to the industry? :)


I think you will get more bang for the buck by using something like ansible or salt (I have only used ansible and love it and I have heard salt is comparable)

You don't have a standard repeatable way to set up an environment now. You need to do that first before jumping on docker I think. Once you have that, you can start replacing parts of the setup with docker and see if it fits your needs.

The advantage of ansible is that it is idempotent and the changes it makes to the system are the same ones you make manually or via bash scripts. So it is quite easy to debug


Seconded. The most important feat you need to accomplish is being able to spin up new instances/environments on a whim. Once you have that, provisioning, testing, failover, scaling and high availability get much easier. Vagrant is decent in this role; the only caveat is that the majority of developers tend to never reset their environment and let the cruft accumulate. It's more of a people problem than a tech challenge, though: if your company culture is "if it works, don't touch it", then you have way bigger problems than choosing the right virtualization solution.


Aah, brilliant -- and they even have a book :D thanks for the recommendation!


The big pro with Ansible and similar tools is that the scripts are actually very readable, with clear best practices.

If you come to a new place and "there's an error in here somewhere", the difference between layers of images held together with shell scripts and a Ansible/Puppet/Chef script is like night and day.


Contra most of the trend on this thread, I am super pro-Docker (I'm actually surprised so many people are unhappy with it - it seems to be clearly head and shoulders over other systems). I would argue that in your situation you need to use docker as your build system at the least.

Something like the following

    docker run -v `pwd`:/tmp/buildresult your-weird-hodpodge build-command
Among other things, I see docker as an extremely useful mechanism to decouple server maintenance (build server and deployment server) from the tool maintenance. It can dramatically help reproducibility, etc.


The moment you do

    docker run -v `pwd`:/...
you're no longer running pure Docker, but Docker + a shell script. Those shell scripts bloat horribly, are not portable, and are a pain to maintain. This is precisely my problem.


The strong pressure I'm laying on people in my company is that Check Your Stuff In, so there might be a run-docker.sh that's checked in, which then can be reviewed and evaluated. It's not per se ideal, but it's a sight better than a nest of Jenkins scripts outside of source control.


Make your build servers match your production environments. Deploying on SuprCoolOS v4? Build on the same. If it then doesn't build on your buildservers, it's a problem for the developers. If your developer can only get it to build on their UltraCoolOS v9 laptop, it's up to them to fix it - after all, their job is to write stuff that works in production, not on their laptop. They can run a VM or change their OS or whatever. Or ask for a change to production :)

As for rollbacks, the exceptionally bad Docker tagging system just adds headaches to rolling back efficiently. If your production OS has a package management system, consider building packages for that - after all, it will have been battle-tested and known to work on that OS. There will be a learning curve for any packaging system, but using a native one means less faffing around later - remember also that docker is changing a lot with each release.

Also, as mentioned in the article, logging with docker is difficult and hasn't been solved properly yet, and if you like production logs for troubleshooting, Docker requires some attention before you can get those logs. My devs just run the app and watch STDOUT... which isn't easy to log in docker. Then, of course, they complain that they don't have production logs to debug, and subsequently complain when I ask them to modify their logging so I can slurp it :)

Anyway, Docker is not a packaging system for use in-house; if you're only using it to package stuff... you will be ripping it out later on down the line (this is what happened to me). On the other hand, if you open-source your stuff and want to provide 'canned images' for random members of the public to use, then there is a point to using docker, since you don't control what those host machines will be running.

In short, Docker is a complex ecosystem with it's own learning curve, and doesn't really save you from learning curves for other things. If you can't articulate the exact problems that using Docker will solve for you in production, I would advise against it.

Edit: If you need a standardised provisioning system, start out with Ansible. It's pretty straight-forward. Admittedly I've only used it and Puppet... and puppet is better aimed at large/complex infrastructure environments.


You don't need Docker to make a standard build environment. You don't need "chroot on steroids", you just need chroot.


I certainly could (and actually hadn't thought of doing that; the hype was really getting to me)! Part of the difficulty is in fixing the setup process, though, as over the years this system seems to have accrued a fair amount of -- shall we say -- character. Jumping in and fixing it up to the point of making environment setup a short, portable and error-free process (such as (1) pull a vm image (2) feed it a commit hash to build, test and deploy) would optimistically take weeks. This drove my desire to short circuit the entire affair and stick it all into a VM. That's what I was alluding to vis. proper solutions :)


I've found docker to be "chroot with all your custom fs layouts and custom mount scripts combined" - in other words a much more convenient chroot.


True. Docker with a "ton of tooling on top of it" is going to be a VM with versioning, which as discussed, has its own set of issues.


We run Docker in production, as a PaaS no less, and this reads like a list of reasons to use a new-generation PaaS.


Thanks for the chocolate again! :)


I'm definitely no fan of Docker or the ridiculous containerisation trend, but I think I may disagree with one key thing you said:

> Focus on your product, outsource the rest.

What do you mean by outsource the rest?

Do you mean, "hey we're using AWS <Everything>-as-Service because we don't want to manage a DB cluster or deal with a load balancer?

Or do you mean, rely on existing available tools and stop reinventing the wheel every week?


> What do you mean by outsource the rest?

iamleppert means: Identify your company's core competency and do that in house, but outsource or avoid that which is not your core.

For example, we're making a game. Gameplay, art, and tech is all done in-house and not with remote contractors because it needs to be -- it's the part of the product we love and the part our players will end up loving. Email, forums, chat, HR, applicant tracking systems, and git hosting are outside of our core and best handled by others.


But if you're making a game, and create your own chat service instead, you end up as Slack.


We in fact started with an HTML5 group chat application very much like Slack but ended up making a game.


Define "handled by others".

Installing an exchange server is arguably letting email be "handled by others" because you are not responsible for how it works, just the setup and monitoring, which could be handled by in-house staff or by a contractor.

My point is that "focus on the core competency" doesn't have to mean "make our company reliant on a dozen other SaaS businesses who may go offline or change their business model/functionality on a whim"


There are reasons other than scale for containerizing your application. Continuous deployments become a bit easier, for example. Reliability, especially during rolling updates, is improved. It's easier to build a zero downtime app that is in continuous deployment with a containerized stack than it is with more traditional instances.

Regarding the outsourcing, that's what we're shooting for at Giant Swarm. We've written a stack that runs containers and manages the metal underneath for you. We run the solution as a shared public cluster at giantswarm.io, but can also do private hosted deployments or managed on-prem deployments. It's a complete solution for running containers that feels like a PaaS, but without all the opinionated crap associated with a PaaS.

We're basically offering to be your little devops team that could - with containers.


I do agree with you, but we're on Heroku and that also has it's downsides - the biggest one we've found is that CPU performance on 1x and 2x dynos is very unpredictable - We've had to move to Performance dynos ($500/month each!) to get decent performance and prevent random timeouts.

Services like Cloud66 are interesting (they manage deployments onto your own EC2 or other cloud infrastructure), but the developer experience doesn't quite match Heroku yet.

Heroku really needs some more competition...


Azure App Service (https://azure.microsoft.com/en-us/services/app-service/) is Azure's Heroku competitor. It's quite close in features and ease-of-use. Disclosure: I work for this team.


That looks quite nice in the few minutes I spent looking at it. The lack of Ruby support is a issue though, and we're also all on OSX and prefer CLI interfaces for most stuff.

Also, one of the killer features of Heroku (which few services seem to replicate) is log drains - I can easily add a http or syslog endpoint and have Heroku send the logs over. The other killer feature which isn't often replicated is One-off dynos, where we can spin up a new instance and get a console attached to it in one command - useful for running database migrations or using Ruby as a CLI to access data.

If we were on .NET that would probably be attractive, but it's still not really competing with Heroku.


you summed up the reasons I use Heroku myself, gz.


I think the reason is because the tooling and the companies (CoreOS, Joyent, Weave, etc.) building all of the tools, are only focusing on grabbing Fortune 500 customers. Nobody is building Docker tools for the "Blue Collar Apps" of the world. And those companies might be completely justified because the benefits of Docker versus Amazon AMIs/RPMs/DEBs/etc. aren't that big enough to make us go crazy for Docker and switch everything over and fork over cash to these companies.

If I have less than 50 (maybe even 100) EC2 instances for my applications there is no way in hell I am going to run 3 service discovery instances, a few container scheduler instances and so on and so forth.


Disclaimer: I'm the CTO at Joyent.

For whatever it's worth, we completely agree with the sentiment (and I like your "blue collar apps" term) -- and we deliberately have designed Triton[1] for ease of use by virtualizing the notion of a Docker host. I think that the direction you are pointing to (namely, ease of management for very small deployments) is one that the industry needs to pay close attention to; the history of technology is littered with the corpses of overcomplicated systems that failed because they could not scale down to simpler use cases!

[1] https://www.joyent.com/blog/triton-docker-and-the-best-of-al...


> Fortune 500 customers

Fortune 500 technology customers. Fortune 500 companies who have hundreds of millions and decades of work invested in their infrastructure generally aren't going to jump on whatever the latest infrastructure trend is.


> If I have less than 50 (maybe even 100) EC2 instances for my applications there is no way in hell I am going to run 3 service discovery instances, a few container scheduler instances and so on and so forth.

You could go "old school" and have some (virtual) servers do more than one thing :)


I think Simon nailed it with the points under this heading "Reliance on edgy kernel features"

Officially Docker is only supported on RHEL 7 and up, and most systems I've seen are still on RHEL6.

I think its just a matter of time before Docker goes into Production, where I'm working we're seriously looking at "Dockerizing" lots of things, but OS support keeps popping up.


This, a thousand times. I'm on CentOS 6 and have switched to docker for some services that don't easily run on that OS. However, I found out the hard way that docker isn't supported on CentOS 6, by having some crashes when building containers (due to bugs in devicemapper). Painful.

I really wish RH had found the time to fix RHEL 6 and support docker.

RHEL 7/CentOS 7 is a big step for many. RHEL 6 isn't even near EOL and many people (including myself) wanted to get more mileage out of CentOS 6.


Most systems I see run debian or a debian based distro(such as Ubuntu). By comparison to RHEL, they are all running with "edgy kernel features".


Some of the points mentioned in the article are in my top hitlist (for decidedly smaller production infrastructure than Shopify): Image building, Logging, Secrets, and Filesystems.

But really, the most painful aspect of using Docker in production, at least in environments where you need multiple physical servers (or VMs) is overall orchestration of the containers, and networking between them.

Things are much better today than they were a year (or 6 months!) ago... but these are two parts of Docker configuration that take the longest to get right.

For orchestration: there are currently at least a dozen different ways to manage containers on multiple servers, and a few seem to be gaining more steam, but it feels much like the JS frameworks era, where there's a new orchestration tool every week: flynn, deis, coreos, mesos, serf, fleet, kubernetes, atomic, machine/swarm/compose, openstack, etc. How does one keep up with all these? Not to mention all the other tooling in software like Ansible, Chef, etc.

For networking: if you're running all your containers on one VM (as most developers do), it's not a big deal. But if you need containers on multiple servers, you not only have to deal with the servers' configuration, provisioning, and networking, but also the containers inside, and getting them to play nicely through the servers' networks. It's akin to running multiple VMs on one physical machine, but without using tools like VMWare or VirtualBox to manage the networking aspects.

Networking is challenging, but at least we have a lot of experience with VMs, which are conceptually similar. Orchestration may take more time to nail down and standardize.


On these points, what are you comparing Docker to? Using bridged networking, you have pretty much the exact same situation with Docker as without. And just because you're running processes with Docker doesn't mean you have to keep up with dozens of tools for automatic clustering and whatnot. If you want to just start your processes manually, or with Ansible or Makefiles or whatever, you can do that.


Agreed -- "Logging, Secrets, and Filesystems" would be interesting problems to solve in any architecture. Docker might shine a clearer light on any inconsistencies -- but wouldn't make these any easier or harder.


On the logging front, there is some options now: syslog, GELF and Fluentd.


TL;DR It's too damn complicated if you're not Google/Twitter/Netflix. Most people would be fine just deploying OS packages and keeping their stacks as simple as possible.


I still love Docker and do think it solves a genuine problem. But yes, where to put your logs, how to manage state, how to schedule containers on machines, how to coordinate processes, how to inspect an app when something goes wrong, how to measure performance, how to manage security, how to keep consistency across your docker containers... are all problems you need to solve from the get go with Docker and they are all non-trivial! Ain't nobody got time for that.


I'm not sure how much you know about docker, so to anyone in whom this list scares:

> Where to put logs

Well, I just throw them aside and use `docker logs [container]`

> How to manage state

One container should perform one service. I haven't run into a problem here.

> How to schedule containers

ECS :) But honestly, I subscribe to the approach that containers = services and thus should just always be running.

> How to inspect app

`docker exec -it [ container id ] bash` ("ssh" into container)

`docker logs`

`docker -f logs` (follow logs)

> How to measure performance

Probably same way you measure system performance

> How to manage security

Everything of mine is in a VPN; some services can talk to certain services over certain ports... Personally, I don't really understand all this talk about security. Protect your systems and that should protect your containers. Why is it that isolated processes are causing people to throw up their arms like security is an unimaginable in such a world? There are ways..

> Consistency across docker containers

This can be a pain if you need this, yea. They see to be adding better & better support to allow containers to talk to one another (and ONLY to one another).

> Ain't nobody got time for that.

Hmm, personally I don't have time to go thru what Puppet, Chef, and even Ansible require to get your systems coordinated. I see this as far more work than creating a system specification within a file and finding a way to run it on some system.

All comes down to requirements though and where your technical stack currently is at. To any newcomers who are also plowing into the uncertain fields of a dockerized stack, fear not! You are in good company and if I can make it work, you can too.


If this is your advice then you shouldn't give advice.

1) 'docker logs' relies on using the json logdriver which means the log file is stored in /var/lib/docker/..... and grows forever. No rollover. No trimming. FOREVER.

2) What if your container dies? What if your host dies? Do you have any state at all or have you abstracted that out? Are your systems distributed

3) Always running does not answer finding where to run them

4) That only works if the container is running. What if it died? Also, docker logs is a fool's game

5) bingo, that's right at least

6) ....


1 - Yes it does. This is a system problem more than a docker problem. For any relatively experienced engineer, they should be capable of realizing the logs must be stored somewhere and can plan around it.

2 - If a system dies and it has a state, then what do you do? If a dockerized process dies, and it has a state, then what do you do? This isn't some new problem to Docker. If my database service dies, you know what happens? It starts back up and connects to the persistent volume. Personally speaking, yes all of my services / systems are distributed.

3 - Most people don't need to start their services exactly at this point and then stop at another certain point (which is why I pretty much brushed over it). If they do, there's plenty of tools to do this that can also utilize docker.

4 - What if a system died? Does this mean you SSH'ing in isn't a viable option? (yes...)

5 - Yes, you love negativity so clearly this is your favorite

6 - ...? What? Do you have something more to say?

It's cute that you like to poke holes and personally attack people, but really my comment was just how I go about things on a day-to-day basis. This is coming from someone who has 6 major Docker services abstracted out running all the time across 3 environments.. all capable of being updated via a `git push`. I think I have decent, practical advice to offer for other docker-minded practitioners and just decent advice to newcomers.

Your grievances circle around logs not being centralized, easily-accessible (1, 2, 4).. You also don't outline any solutions yourself.


> For any relatively experienced engineer, they should be capable of realizing the logs must be stored somewhere and can plan around it.

Yes, the unrotated container logs are kept in a root-accessible-only location in a directory named after a long key that changes on every image restart - not conducive to manual log inspection, and definitely not conducive to centralised logging. That's not a 'system problem', it's Docker just being rude. Yes, a relatively experienced engineer can work around that... but why should they need to 'work around' it in the first place?

Ironic really, that if you put a user in the 'docker' group, that they can do anything they want with the docker process, destroying as much data as they like or spinning up containers like nobody's business... but they can't see the container logfiles.


Good point, thank you!


> 1) 'docker logs' relies on using the json logdriver which means the log file is stored in /var/lib/docker/..... and grows forever. No rollover. No trimming. FOREVER.

Even without that issue, I'd prefer my logs to be centralised. So as well as my app should I be running a logging daemon, process monitoring, etc for each docker instance?


What we do at work is that we have our containers be in charge of talking 'out' on a given address and format for logs, and have things configured so that entire sets of machines end up speaking with the same log server (an ELK stack, in our case). The process monitoring is done per host: There are docker-aware tools that look at the host, and can peer into the container, to do this basic tracking.

People are not kidding through, when they say that everything gets very complicated. All the things that we did by convention and manual configuration in regular VMs that are babysat manually have to be codified and automated.

Docker is going to be a great ecosystem in 3 years, when the entire ecosystem matures. Today, it's the wild west, and you should only venture forth if having a big team of programmers and system administrators dedicated just to work on automation doesn't seem like a big deal.


Similar to hibikir's reply, what we do is attach a volume container to all app containers and logs are written to that. The run elk stack to view parse logs. For process monitoring we run cAdvisor on each host to view the resource usage of each container. Since your apps are containerized it easy to monitor them for resource usage, hook it to nagios etc. We have built custom gui to do all this.


> If this is your advice then you shouldn't give advice.

Stop with the blaming statements.


Do you have a better way to say "your advice is bad. stop spreading misinformation"?


"There are some problems with your approach, and it could bite you in the tail when you least expect it. Here's what you should also know: ... (rest of GP's points)"

This avoids saying "You're an idiot", which is nearly never constructive or helpful, and instead makes education and cooperation its goal. Most people respond better to that.


Sometimes people need to know they're idiots without the person calling them an idiot having to tip-toe around the issue. Just like how some children are smarter than others and could actually benefit from ring labeled as such, put in a separate environment and given advanced problems to work on instead of lumping them with the rest of the herd. Just like fat people should be told they're fat and pay a premium for a seat on an airplane. Hell, if you look fat they should put you on a goddamn cattle scale and make you pay extra.


Without fully knowing what a person is thinking (and I'm talking in whole here), and the context they are thinking it in, it is impossible to always determine if what someone said is "idiotic" or not, much less be capable of determining if they themselves are an idiot. Given this provable fact, that means there exists a possibility someone ends up calling a savant an idiot for saying something that is viewed by them, or a group as a whole, as "idiotic".


And I'd wager to say that by and far the more likely possibility is that the person is actually an idiot. Just given that savants are an extremely small subset of the general population and if the bell curve is anything to go by, at least 50% of all people are below average intelligence (whether inherently stupid or uneducated doesn't matter, from a pragmatic point of view the result is the same)


Geez, this is still going on. You apparently have no problem calling people an idiot and seem to be defending it, so let me show you the same courtesy. People like you are what's wrong with the HN community. You're the type that when I post something sarcastic, I get downvoted and corrected. Why? It's this kind of superiority complex that can't see the forest from the trees to save its life. The forest is burning? Yes, but we passed brown 3 trees which means it shouldn't be. Are you sure it wasn't 2? No, definitely 3. I don't believe you, let's debate it. Burned to death. I would be careful in forests if I were you. (<-- I doubt you recognize what this tone of voice is)

Who the fuck cares? Why are you debating if you should call someone an idiot or not? My personal philosophy is to be nice to others, always because I don't know what they're going through. What's there to be gained by not only calling someone an idiot but defending it on the meta scale? Yes we should always cloak our rebuttals with negativity -- for what other way could there be?! I must call this person an idiot, don't you see?! 50% of people are below average intelligence -- surely I must let them be aware of the fact that I believe they're mundane and forgettable!

I don't care if other people think poorly of me, I'm going to believe in myself... you critics are so annoying. I can't even write a comment trying to help people without jackasses flying in poking holes in what I said AS IF it were gospel! It's a comment! I wrote it in 2 seconds and, sure, maybe I should have put some more time into it but I was just trying to help out anyone who got scared by that list. It's such a different mindset. I didn't set out to be RIGHT, which is what's most holy & sacred around these parts. The best thing that could have happened is some people would have been like, yea but how is X going to solve Y when Z happens? And I woulda been like, good question mate, blah blah and we woulda all been better off.

Instead, a kid comes flying in drunk on keyboard ego and is like "You should stop talking"; think about MY intention vs HIS intention. Think about the INTENTION behind calling someone an idiot and what it does to that person. So stupid.. honestly. There's a bigger picture at play than being right or wrong... You don't do certain things not because it's empirically correct to do it, but because it's the moral thing to do or the mature thing to do or the compassionate thing to do.


There's no need to say that at all. Just address the points and let everyone form their own opinions.


Because opinions are not the same as facts/best practices (nor is everyone's opinion equal).

If you want to argue "this is the right way", be prepared to bring data and defend your statements.

People here might be making critical decisions based on knowledge shared here, and they deserve the most accurate information possible.


I agree, and would enjoy a thoughtful discussion as to why my stack doesn't work and why I'm an idiot. Instead, it's just been countered by 5 bland rebuttals and a debate about whether or not I am an idiot and if I should be referred to as such.

Not sure what you're implying either, but you can tell in my comment, I never came out of the gates saying "This is the best way!"


Fully agree.

That's why you should talk to the points and not make personal statements like "you shouldn't give advice".


In the world of Agile, opinions seem to be treated as fact.


Yeah, I do. Simply say what you find inaccurate about what they said and let everyone else come to their own conclusions. Saying "stop spreading misinformation" is basically speaking for others. Maybe we want to hear the misinformation because it's interesting or maybe you are wrong about it being informational or not. Either way, it's my choice to listen to him/her or not.


Well I used Docker on top of Mesos so I have quite a bit of experience and the above were all problems I faced. They're not impossible problems obviously but they take time and thought to solve. From your responses, I am not sure you fully appreciate the problem to be honest.

You have to remember that you containers and coming and going all the time, which is one of the biggest challenges. It basically means you have to have everything centralised and that means a lot of additional infrastructure/complexity.


Wouldn't it be fair to say these are all problems faced with any / all deployments? When are logs never going to be a problem or security or state? I appreciate the problems (honestly!), but when it's presented as docker-specific, I get confused. Yes.. all these things need to be managed.. at the end of the day you're changing your stack from running 5 systems to running 5 processes acting like 5 systems. This is going to take some thought, but I genuinely believe there's a greater reward at the end of the tunnel in this realm than there is the old world of puppet master / slave.


The problem(s) with "docker logs" is that, without getting logs out of docker you can't see multiple containers' logs interleaved, and without a separate logrotate setup they're not rotated (the files in /var/lib/docker/containers/ grow indefinitely).


If this is what it takes to get procrastinators onto a real logging stack I'm not sure I see it as a problem.


As far as logs go, you should just be doing network logging with Docker to either a syslog server or something like LogStash or Splunk. If you are big enough to have a genuine need for containers, you also are big enough for a centralized logging system.

At the end of the day, you have to view it as building a reliable system that performs a function. Docker is one tool you can use to do that. Virtual machines are another tool. They don't solve all the problems you describe, nor are they intended to. If you're a tiny startup, you can just go the AWS route, but that leaves you beholden to AWS and their pricing. That's fine early on, but eventually you'll want to go full-stack for one reason or another.


There's a few projects out now that do most of this for you; there's a lot of rapid innovation in higher-level docker tools. eg https://github.com/remind101/empire (built ontop of EC2/ECS) You get a 12 factor compatible PaaS out of it, pretty easy.


The funny part is that Docker was supposed to be higher level.


it is? its one more step up the chain towards the ultimate goal: being able to run M isolated instances of N different apps automatically distributed across Y physical hosts (and being able to deploy app A without caring about any of this) We're almost there.


I have a feeling this is not just a problem with Docker. People tend to choose technologies not because they solve their problem, but because it's hip to be using the newest stuff, even if it's far too big and complicated for their simple usecase.


In this regard I think the remarks of McKinley's "Choose Boring Technology" [1,2] is quite relevant.

[1]: http://mcfunley.com/choose-boring-technology-slides [2]: http://mcfunley.com/choose-boring-technology


Big thanks for this links. Actually this is really true for Docker and DevOps. There are proven concepts and known unknown but for Docker the unknown unknown part is really scary especially regarding security for production. Maybe for bigger companies this is no problem but for small dev teams this is very risky and time consuming.

Just one non trivial example: I can secure Ubuntu against sshd attacks pretty good and easy with `sudo apt-get install fail2ban`. Now try to secure CoreOS against sshd attacks. There are guys out there who tried to run fail2ban in a container (without luck) and so far I've only found one hacky script which tries to do the same oO https://github.com/ianblenke/coreos-vagrant-kitchen-sink/blo...


CoreOS is not docker. You can run docker on ubuntu and install fail2ban on it. I don't see the problem here.


Yes, exactly. I really can't recommend that my company use this right now because of the complexity. The Hello World image is easy, but after that, there's quite a bit involved. And my company is already happy enough with spinning up VMs.


That's what you got from the article? I got the opposite. It works fine until you start to have large, complex images where build time becomes a factor and the fundamental design of Docker starts to get in the way (e.g. how it manages diffing of images, the lack of caching, and the inability to build different parts of the image in parallel). These shouldn't be as big of a deal at smaller scale.

That's not to say you're wrong; containers probably aren't that useful to most small shops. But that summary doesn't make any sense for this article.


A number of issues discussed in the article would be factors regardless of the scale: --logging, secrets, edgy kernel features, security

Also, see https://titanous.com/posts/docker-insecurity


Same here. AMI with direct from git updates when they start. Docker as part of the build and deploy process of upgrades is gonna be too costly anyway, because setting up the first configuration takes time, moving the containers take time etc.


More accurate TL;DR: Docker and containerized architectures generally would be improved by solving this list of problems.


Is it too complicated because people already have complicated set ups? or just in general?


I'm pretty disillusioned with docker so far. Haven't put in too much time with it, but the little time I have put into it has produced nothing of value. I'm surprised we're still talking about it to be honest, with so much progress in unikernels like OSv, HalVM, Mirage, and Ling.

In the two hours I've spent with OSv, I've gotten much lighter weight VMs that boot my large scala app extremely quickly (a few seconds, max), with less configuration and more predictable performance.


Just my personal opinion but Docker still reflects the developer-centric culture that inspired it and by that I mean security is still getting more mature but isn't quite there yet.

For instance there's still work being done to add native PAM and by extension Kerberos support, and the daemon runs as root, thus requiring extra caution about who may run docker commands.

If you're (for example) in an enterprise where developers may never have root access under any circumstances, you end up with a chicken and egg scenario: if developers don't have the ability to test container creation (because doing so might grant them root access in a container), who does?


This is why it is not yet adopted by DevOps and IT 'in the wild'.

In summary from a person in that scenario:

1. Not known of and too short of time horizon - People still run Windows XP in the real world. Changes where the rubber meets the road (IT and DevOps) take years of hard evidence, infrastructure cost, justifications, etc. to catch on. It does not behove these groups to be an early adopter.

2. Not flexible enough yet - I have a ton of use for this if I could run it more like a VM but faster and easier to deploy. I devop with a product that uses its own kernel... I tried to talk Dev in to compiling a kernel with Docker for a use case I have - you can guess where that went.

Docker is great, but I can only use it with my devs in its current state and for myself in specific cases.


"However, for many production users today, the pros do not outweigh the cons. Docker has done fantastically well at making containers appeal to developers for development, testing and CI environments—however, it has yet to disrupt production."

I keep hearing about people putting Docker in dev and test environments and not production. This use case makes no sense to me as you would throw away the entire point of containers and have a wildly inconsistent path to production.


Not necessarily. If you're in a microservice architecture, it can be very attractive to "docker run" all the services you're developing against on your development VM.

Relying on Puppet (as with prod) means development VM setup/change time is measured in hours. My company's Puppet catalog takes 15 minutes to compile, 6 hours to run. Entire days of developer productivity are lost trying to get development VMs working. Docker would make that instantaneous. It's also very hard to manage and synchronize data (i.e. test fixtures) across all those services. With Docker you could have a consistent set of data in the actual images and revert to it at will.


Does anyone have any good links to using Docker for development?

Even a simple `npm install` in a docker container fails on Windows because of the lack of support symlinks (adding --no-bin-links means npm's run scripts can't be used to their full and useful extent).


We are looking at using Docker to help make our dev environments a little bit more sane. We deploy to Heroku so we use the Heroku docker image to give us (what is hopefully) as close to production environment in dev as possible. I have got docker working in a proof of concept project but the performance on OSX when using volumes, so you can actually dev on it, pretty terrible.


I work in a place which, in order to solve the dependency nightmares, had some highly paid people do magic tricks manually in order to save the day... every day. Every upgrade was hell. Yes, we have Director of Upgrades.

Simple tools (rpm + yum + docker) allowed us to replace these people with a simple shell script. Literally.

I agree with the article Docker is missing some things. Two that I would like to see: - Auto cleanup - Clean and easy proxying


Hey, cool, trough of disillusionment!

That means we're like a year away from it being boring and just working, right?


I still haven't been sold on Docker. Why would an otherwise competent company that runs things just fine, ditch it all and adopt Docker? Just because it's the shiny new thing? What do we actually gain here in production?


Because "running things just fine" across production, continuous integration and on dev machines is actually quite a hard thing to do.

But then, if you don't feel like you need it, that's probably because you don't need it.

(If people are downvoting your question, it's probably because you're giving off a bit of a "I don't understand Docker so it must be crap" vibe, which is not helpful.)


OK, now we're getting somewhere. What is difficult about getting things right across production and CI? What are the pain points? What are the exact problems we're being asked to solve here? I don't think dev environments need to be harmonized the same as production. If your tests are good, you should catch most of the "it worked on my laptop" problems.

Sorry if my initial question came across with a weird vibe. I'm generally curious. I have colleagues working at places and they actually are being asked to drop everything and implement Docker. I asked why and what's driving this and got the predictable response of "management/dev/someone wants something new".


Most "it worked on my laptop" problems get caught somewhere between commit and push to production. What docker helps with is catching them "early" rather than "later". Because a failed push emergency push with reason "Doesn't work in production environment" as a reason is a really stressful way to work.

Catching it before pushing your changes is far more preferable.

There are a lot of different methods and processes to fix this. Docker is a new one that simplifies a number of the pieces of the puzzle by constraining the environment is useful ways.

However, if you already have a process worked out and aren't experiencing pain then you probably don't need to switch for the sake of switching.


> I don't think dev environments need to be harmonized the same as production. If your tests are good, you should catch most of the "it worked on my laptop" problems.

That's true until something break in production: then you want to replicate the same situation in the dev environment, as close as possible.


Please take note of the companies that are using docker.

Docker isn't magical, but the process that it lends itself to can be very useful. Those companies aren't using docker to be successful. They are successful because of the processes (and intensity) that docker fits into.

I'm sorry that your colleagues are being asked to drop everything and look at anything (much less Docker). That's not a nice way to work -- and I'm sure it influences their notions of Docker.



containers provide a reasonable level of abstraction/isolation for applications, and have been used in production for some years now. Docker may be shiny, but containers not so much.


But why the need for abstraction and isolation? If I'm a reasonably well run web shop that knows how to run their apps and balance out server load, what does Docker get me?

Also, shout out to the fanboys for downvoting my question, which was just a question asking for thoughts and answers and didn't make any statement whatsoever.


@mateuszf can you give actual examples of each? I can upgrade nginx on my servers without affecting memcache or MySQL or redis or....Why the need for isolation there?


Docker is a tool to empower developers to better manage the applications that we currently leave to people who, and I'm trying to be charitable here, seem to think knowing how to type and be rigid in their opinions is some kind of qualification for managing systems.

It has literally nothing to do with the ability to upgrade independent pieces on the sysadmin's schedule and everything to do with abstracting sysadmins clean out of the process. The entire profession has established itself as a roadblock to progress, so like good engineers do, we're busy coding the problem away.

Basically.


Isolation - so you can run / snapshot / restore multiple applications not influencing each other on one machine. Abstractions so that you code doesn't have to worry about different os-es, cloud providers, etc.


I thought the obvious reason was storage. I don't see it mentioned here, but storage is a huge pain point. How do you store your critical data "with Docker" is a labyrinthine set of steps.

Docker's answer to storage so far has been "don't use Docker". That's their answer. Use volumes to map some other storage, but then you have to have some way of mapping storage to containers outside of Docker. Now you're really stuck.

Containers are awesome, but unless your product doesn't do work, you'll need to store data at some point. And that's when the magic stops.


This tutorial shows how to deploy a micro docker container with WordPress. Each microcontainer has its own instance of Nginx and PHP-FPM. An Nginx server as a proxy sits on the front end serving connections to one or more sites hosted in the containers. It uses Alpine Linux and persists all data on the host's file system. The logging is also available on the host system. The benefit of doing it this way is that each site sits in its own container, so if it is compromised, no harm comes to any other site or services running on the host system.

It also does not link containers, instead opting to attach the database to the first IP address of the network Docker sets up, thereby avoiding the need for complicated service discovery. It also includes instructions on how to deploy Redis on the same box and use that with WordPress. Also includes instructions on how to do SSL for each site. It's being used in production.

http://www.dockerwordpress.com/


For a forum like this, it should go without saying that many of these problems are really opportunities for successful businesses.

Containers are only going to grow in uptake; companies like Weave and ClusterHQ have a very bright future if they can solve real pain points like the ones in this article.


Maybe because Docker isn't really needed ?

I mean if your app needs the entire fucking OS to provide isolation from other apps, then you are clearly doing it wrong.


Don't underestimate the amount of people doing things wrong.

Docker could be much more successful in the Windows world, the ability to package very precise versions of databases, libraries, weird obsolete application into one image that can be deployed easily would be extremely helpful in many companies. It would be the wrong solution, but an easy work-around for broken upgrade paths.


As a Windows admin, this sounds like a recipe for disaster. My networks are already getting crowded with Appliances and Appliance VMs that are provided to us as black boxes, and we have to depend on a 3rd party for security patches and a durable security implementation.

Having containers able to package weird obsolete (unpatched) applications, specific (out-of-date) versions of libraries, and poorly-written homespun code is a recipe for exploits. The out-of-date version of the library (e.g. Java 7) likely has exploits out in the wild that have been patched in more recent versions. The weird obsolete application (e.g. DTS) likely not only has exploits patched in the active codepath, but has multiple bugs and integration issues. The homespun code likely reimplements something done better in another application or library, and introduces more bugs and vulnerabilities to the network.

Sorry for going off on this, but being able to repackage unsupportable applications would be a nightmare in places I've worked before.


Docker is called App-V when it is runs on Windows instead of Linux. App-V virtualizes the registry, provides app portability.

Unfortunately full App-V is only for Windows enterprise customers.


I think it's more that other people developing other apps don't want to have to worry about your app impacting them when colocated on the same server.


eh, this is part of hiring the cheapest people to save the bottom line. it creates problems where there should be none.


The security question (it's possible to break out of containers) isn't solved, and the workaround (use VMs) eliminates many of the advantages of containers and adds a massive burden.


> it's possible to break out of containers

Prove it. I'm not saying it's impossible, but it's certainly not trivial.

Also, take a look at what Joyent are doing with Triton.


I was going to mention Triton/SDC. It does solve the security issues though it does it by running SmartOS. SDC is pretty cool but docker really needs to be secure in its own right.

It is also worth mentioning that since Joyent has implemented their own docker client, not all features are there yet. Last time I tried docker-compose didn't really work right yet. There is a full list of divergences on their github page. It has a lot of potential though.


> since Joyent has implemented their own docker client

Not our own docker client, our own Docker engine, https://github.com/joyent/sdc-docker , which was necessary for the whole DC to be the host. For a taste of the details see https://www.joyent.com/developers/videos/bryan-cantrill-virt... .

Your larger point is correct, we're still working hard every day to add increase the support particularly for the newer docker apis and extensions. Now, docker-compose 1.2 is working in the production datacenters with docker-compose 1.3 in the east-3b (beta) dc.

https://www.joyent.com/developers/triton-faq

https://github.com/joyent/sdc-docker/blob/master/docs/api/di...


I know that 'appeal to authority' is a bad argument technique but the Docker authors themselves have mentioned this repeatedly in the past. IIRC one of the holes was sysfs. Has something changed?


It's more that containers haven't been proven secure than that they are inherently insecure, so I disagree with the blanket statement "it's possible to break out of containers".

VMs have the advantage of shielding the kernel with a hypervisor, but they also have the disadvantage of lots of complicated driver code that can allow exploits such as VENOM.


For multitenant situations, sure. Still more isolation than running a bunch of services on the same box.


what is the benefit to any isolation of a process for a single tenant ? and why cant you just run cgroups without the overhead of docker ?

see : bocker


>what is the benefit to any isolation of a process for a single tenant?

Build, test, and ship the same artifact. Whether it's a Vagrant on your Mac, AWS, or metal in your colo datacenter.

>and why cant you just run cgroups without the overhead of docker ?

If you're running cgroups, you've created your own half-baked implementation of Docker in giving yourself a reasonable API to work with. This might make sense if you're Google but otherwise probably not.


cgroups is docker now ? what does that make systemd-nspawn ?


Docker is a simplified interface for controlling cgroups, yes. (Some people are working on/using alternative backends now, but that was the whole point at the beginning - a nice API for cgroups.)


Where's the evidence that Docker isn't succeeding widely in production? In the past week alone I've talked to a dozen or so companies who are all using it in production.


I agree with many of the points expressed, and as someone who has used docker in production I have run into many of these issues myself. At the same time, I value composability and I don't want docker to have a single monolithic approach to everything. Garbage collecting old images, fine, even though its not that hard to deal with the issue. Logging and distribution of secrets don't feel like docker-level concerns to me. There are good solutions for both.


I run a tiny startup, and honestly don't see a benefit to using docker.

Every service I deploy gets it's own VM (which is automatically provisioned/locked-down by a bash script), and they automatically update when a new revision is pushed to our production git branch.

It seems that docker is more useful when you have physical hardware? and/or lots of under-utilized infrastructure?


Yes, and even when you run your own hardwares, it's still far easier to just KVM up your virts and "bash bootstrap.sh". For your developers, tell them to "vagrant up" with the same "bootstrap.sh". This setup and a Jenkins server for build artifacts solves all my devops needs.


I like Linux containers, but Docker's image layering system and imperative Dockerfiles have got to go. A lot of pain points can be fixed by using declarative, functional package management and not relying on COW file system hacks to sort-of deduplicate files amongst many containers.


I wrote about my experience with deploying Docker & ECS here: https://news.ycombinator.com/item?id=9759639

I'm frustrated though because I keep pinging them about adding branch information to their (dockerhub) webhooks so I can actually deploy environments via branches.. It's crazy vital in my opinion and seems like it should be an easy fix, but 2 months later and still doesn't seem to be scheduled in.

Nevertheless, I'm sure Docker has its technical shortcomings but really, I wouldn't say it's not succeeding.. it's just young. Adoption takes time.


Just up front: I'm one of the folks developing Empire.

That said, what we do is we have our CI system build our docker images, push them to dockerhub (private registry) if the tests all go great, and then we deploy using https://github.com/remind101/deploy. We also tag all our images with the git SHA that they were created from, so we have immutable identifiers for each image, which has been useful.

We just recently put direct github deployment support in Empire, so that's been really nice (before we had to use another service that pulled deployments and put them into Empire).

Anyway, not quite the workflow you're talking about, but it's really worked well for us, so maybe it'd help you as well :)


I have seen similar issues with a other packages, where their popularity has outstripped the core teams ability to incorporate feedback. Basically the core team can't scale the feature set fast enough to meet demand. On the plus side over time they get to things, on the minus side if someone executes better they sometimes can take away the momentum/lead from the original package.


It's not succeeding because it's usually not necessary.


From my experience it is still buggy. For example, this bug:

https://forums.docker.com/t/docker-export-intermediate-size-...

No one seems to know anything about it.

Also, when we upgraded from 1.6.3 to 1.7, devicemapper started having issues.

On top of the bugs, the limited networking support is very, well, limiting.

I would be very hesitant about using it in production at the moment. That said, I can also see the potentials and it seems to be heading in the right direction. It's just not ready at this moment.


Yes indeed. I'll trust Docker in production when I can go 6 months without hitting any weird bugs like that, or having to wipe out `/var/lib/docker` and restart the daemon to get it out of some inexplicable state.


> Configuration management software like Chef and Puppet is widespread, but feel too heavy handed for image building. I bet such systems will be phased out of existence in their current form within the next decade with containers.

Hmm, I don't think so. My reason is that, in addition to the maturation and feature growth of containers, there will also be feature growth in Puppet et al.


Docker is part of the first generation of a good idea - containerization. The problem is that there's too much stuff in the box. Each application doesn't need its very own copy of everything. You get portability at the expense of maintaining your own distro. There will be a second generation of this, hopefully not so bulky.


Just to oppose the pessimism here - we use docker in production and so far it has worked pretty well. We do run into the problems mentioned in article but they aren't insurmountable. We also built a tool around cluster/deploy management just like we had to do with chef.

IMO any tool that does procedural run-time configuration like chef/ansible/puppet will generally be inferior to an image based infra management solution. (unless you're using said tools to build images - which is another ball of wax that will likely end up looking like a reimplemented docker)

The problem with procedural run-time config is that unless you blow away the VM, build from scratch, and run a test suite you don't really have good assurances your infrastructure is in a good state. With images, you have a bit for bit copy of what was built and tested in CI or QA. This is, for us, worth the price of admission.


Docker for reproducible Science is an intermediate solution. While a Docker image can be moved and rerun (with some luck) the content of a Docker image is actually not transparent.

Reproducibility implies being able to regenerate the full container including software version control and visibility of the full dependency chain all the way down to BLAS and glibc! You can't do that by using apt, rpm, Perl CPAN, rubygems, Python pip and the like. None of these package managers have been designed for true isolation of packages and full reproducibility. That is why today people go with Docker. The shortcomings of these package managers drive people to Docker.

The technology for regenerating exact Docker containers exists in the form of GNU Guix and/or Nix packages. The fun fact is that when using GNU Guix, Docker itself is no longer required.

Watch GNU Guix.


Am I the only person noticing that these problems only exist because you're using containers, and that maybe by not using a container model you can simplify everything except running 10 different versions of an app at once? Maybe containers provide more headaches than they solve.


The security point here is something that confuses me about the current state of the ecosystem.

The article mentions that "most vendors still run containers in virtual machines", presumably since if someone hacks an app in a container they might be able to break out of the container and access other apps running on that host. But clustering systems like Kubernetes, CoreOS, AWS Container Service, etc. seem to be all the rage these days and they seem fundamentally at odds with this. The cluster might schedule multiple containers on the same host at which point somebody who hacks one can hack all of them.

How do you reconcile this? Do people running these clusters in production typically run tiers of separate clusters based on how sensitive the data they have access to is?


Once you've automated the entire process of bringing up a container cluster with monitoring, metrics, logging, etc then it becomes trivial to make as many as you want. The same is done with separate virtual networks for security concerns.

It becomes as simple as asking what name the cluster should have.

It also makes sense from managing resource concerns to some extent, such as a cluster with cheap instances for low priority applications but need HA support or a cluster with beefy instances in a subnet that has fewer hops should be used for edge tier applications.


Using Docker as a local development system (especially with boot2docker on OSX creating a bunch of containers with different major versions of OS (el6/el7 for example) and being able to develop/test multiple apps at the same time is the only benefit I can justify for Docker.

But that's as far as I will take it, Docker is mainly used (from what I've seen) as a nice way to package something without having to write an actual package (RPM/deb) that will work across multiple platforms (for the most part). If you take the time to learn how to properly package your application, docker is unnecessary in almost every case.


My application compiles to a jar that runs on a server and expects an accompanying config file. I've tried giving Docker a whirl a few times and I never fully understood what need I had that it was solving.


Is your application just a jar? In the yes case your conclusion is correct.

Throw in a database, a cache server, couple of versioned libraries your jar file needs, and more developers, and suddenly a reproducible image with all this packaged will make a lot of sense.


Our build process already includes the application's dependencies inside the jar, and packages it into an installable deb file that places the app, its config, and the init file in the right locations.

I have a database and a cache server. They don't run on the same server as the application jar... they run on separate machines tuned to their purpose. Why would I want them packaged together? So my team doesn't have to run "apt-get install postgresql" on their dev machines? Or to maintain an exactly consistent dev environment?


In that instance, docker doesn't buy you much over vagrant. Docker stands to win where you've got 15 different apps which all need to come up together so you can QA the combined set of services in a single VM on a tester's desktop.


I think containers are the future, but I think this generation is still a bit early for widespread use. Once they get polished a bit and are made easier to use, then I think companies will begin adoption.


the comments nailed it here too ;d

ill highlight the website is off by one... reading the website i have no idea how it works and what technical debt im adding to my teams stack by using it. "Build/Run/Ship", I'm doing that already. I have no idea if its using VMs or something else for containers. no idea if my hardware works on it. and no idea if the distros used for images are 1 year old or -nightly, so whos security issues am i inheriting?


So far my experience with Docker has been quite exciting, but I'm still to find a real good use case for it.

Also moving around a 700MB+ image when you can deploy some Debian package (or even setup a virtualenv, I do mostly Python), sounds a waste of resources. Add to that that moving volumes around is still an issue and... well, Docker has a lot of potential, but I doesn't fit very well in any of the projects that I'm involved in.


docker security issues[0] should really be listed as #1. The overhead/complexity of getting it secure (using SELinux) outweigh its benefits in production.

[0] http://blog.valbonne-consulting.com/2015/04/14/as-a-goat-im-...


I don't think its a good choice if security is a main concern. I wouldn't put multi-tenant apps on it either.


Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: