At the end of that essay I go into detail about a recent Docker/AWS/ECR/ECS system I worked on, much of which seemed like a whole lot of useless work that slowed the project and offered no benefits.
So I'm wondering, what is the biggest project success that people can point to regarding Docker? In the above post I link to an article where a developer talks about coming in under budget on a project where they used microservices and Clojure. Where are similar articles regarding Docker? I'd like to read an article where an individual software developer, preferably someone who already has a good reputation with the tech scene, writes an essay where the theme is "We thought this was going to take 6 months, but once we switched to Docker we were able to get everything done in 3 months." I find it stunning that there are basically no such articles. Meanwhile, it is very, very easy to link to all of the articles where developers have written about the problems they had when trying to use Docker or Kubernetes.
The flexibility and modularity is a big plus to me.
What’s common among them is that they run on different technology.
So we have 500 different ways to do front-ends, middle-ware, back-ends and databases. So literally thousands of combinations.
We have 5 IT-operations staff to run and maintain it. They’re very good, but they are very good at specific things. Like one guy knows networking, another knows server infrastructure and user management, but only for Microsoft products because that’s our backbone.
But we’d like to run a JBOSS server on redhat, and we’d like to run some Django and Flask servers, and we’d like to have a PostgreSQL and a MongoDB for GIS, a MySQL for some of the cheap open source stuff and a MSSQL for most things. And so on.
Guess what makes that easy, docker for Windows. Our IT staff can’t support all those techs, but they can deploy and maintain containers and keep surveillance on them. Sure the content of the containers is up to the developers, but it’s infinitely easier to operate than having to support the infrastructure to do it without docker.
The same thing is less true in Azure or AWS where your cloud provider can act similarly to containers, with web-apps and serverless, but it’s still nice to keep things in containers in case you need to vacate the cloud for whatever reason.
But, counterpoint - I've seen Docker being a gateway to that sort of mess, too. It tears down a lot of the barriers that might have otherwise made it inconvenient to add more technologies to the stack, and some developers just can't help themselves in that department. One fairly new commercial offering I evaluated, which is distributed as a Docker-compose stack, was using 4 different databases. That's just crazy - now their dev team needs to understand the behavior and idiosyncrasies of 4 different data storage technologies, and their clients will end up needing to deal with blowback from all that excess complexity, too.
You can use Postgres for GIS too
Traditionally, most content we productized was delivered to end-users in the form of an RPM. This is important because it ensures we deliver content is a reliable and reproducible manner. The RPM works well because it supports multiple architectures, handles dependencies, etc.
Enter containers. To ship these products on our container platform Openshift, we built Docker images and their Dockerfile basically just had some common stuff and then installed the RPMs we publish already to install the particular application we intend to deliver as a container. Nothing super crazy, no huge wins here, just a bit of extra work actually.
However, I work on a new project that is specific to Kubernetes and Openshift, and our platform is oriented around running containers. On this project, we have some really huge Java applications that I need to package up as a container. However, traditionally that meant I had to also build an RPM. Let me tell you, packaging huge Java applications (like HDFS) as an RPM isn't exactly fun, especially when they themselves have tons of dependencies which you need to also package as an RPM. When I looked at this task, I was estimating a few months of work. We have tools to import things into our systems, but it still requires both packaging, and building a bunch of individual RPMs for each dependency.
After some convincing and arguing, we a new pattern which allows us to bypass building any RPMs in some cases where the application is not going to be delivered outside of the container platform. This meant we could directly build Docker images, which ended up making the actual process a lot easier. We still need to import the sources and things into our systems, but we can avoid the potentially hundreds of RPMs that we would have needed to make before, and build a single Docker image which just downloads the final artifacts we've built internally. This process took a few weeks compared to the months I was originally anticipating.
In the end, I had to build a Docker image anyways, but the value came from being able to basically completely avoid having to build content in a system that provided me no direct value, and in my scenario, effectively only added more steps. I still use that system for other stuff and love it, it works really well and does provide value, but in my other use-case, it was more about using existing process because it's what we have, not because it's what we needed, which is why things changed.
I work on a project that recently switched from ami/deployed packages to docker. Took us about 3 weeks to convert our jenkins pipeline and a dozen Java services (and some odds and ends) to docker images.
A huge advantage baked AMIs don't have is you can't run them locally as easily as docker. We can run a full local environment with a docker-compose file.
Honestly, once it clicked, it was almost too easy to blog about. Is this something people want to read?
Docker is more like standard measurements and lumber, allowing one group to build the rooms, and another group to build the plumbing, roof, etc and shove it all together.
It can take you longer to go the latter route, at least until you're all very practiced at this new method. But once you get the hang of it, changing the house becomes cheaper, easier, faster and more reliable.
From a developer's standpoint, Docker can help speed development by keeping all developers on the same page. If everyone develops and tests using one container, there's less chance for certain kinds of bugs when integrating. You can do the same with, say, Linux packages, but Docker alleviates the need to learn different distros' packaging formats, and solves common dependency and isolation problems.
Less time trying to fix your builds due to incompatibility, almost no need to use configuration management tools. This saves you time and complexity, at the cost of the complexity of Docker itself.
The big problem is technical decisions like the use of layers, single process containers and ephemeral storage that add a lot of complexity and overhead have not met technical scrutiny and thus not well understood. But they should be as they can add a lot of management overhead and debt. 
Many users who use LXC will realize a lot of benefits and flexibility actually flow from containers and not these 'modifications'.
Disclaimer: Founder at Flockport and trying to build a simpler alternative with LXC containers.
It's also great to use Docker to run test databases and other tools locally. For example, we have a script that takes a DDL file, spins up a fresh Postgres in an ephemeral Docker container, then runs SchemaCrawler on the database from yet another container to generate a useful entity-relationship diagram. The tool is portable and repeatable and carries no risk of affecting a production system.
1. When some native dependency change is needed, you have to make at least 3 changes: the actual code update, the packer script, and your Jenkins worker. Unless you empower all devs to change this, you have a bottleneck. And it only really works as long as you can have both versions of deps installed concurrently.
2. It's hard to really run tests in production like environment. Spawning new instances is a long and annoying step. Spawning a docker container is trivial in comparison.
3. It enables devs to run meaningful tests without having the exact same OS on the desktop like what's running on the servers.
Docker has other issues, but for these use cases it really solves problems.
I'll consider writing this up as said blog post, but here's the gist.
In my view, the advantages of Docker containers are roughly as follows:
- At a high level, Docker allows thinking our services as opaque units in a way I haven't really seen in practice anywhere else. Our containers have CPU and memory requirements, but other than that, when thinking about infrastructure, nothing else matters.
- Docker allows all knowledge of how to run software and its dependencies to live with the corresponding code. While I understand your reasons, I don't agree with the idea that "regular programmers" should not be trusted with devops code. The advantage here is that we can write our application code and its requirements once, and run it in lots of places. In all my years of trying systems like Puppet, Chef, etc, I've never seen this work quite as well.
- With Kubernetes (or ECS, or Docker Swarm, etc), all our infrastructure instances are identical. They are agnostic to the workload they will eventually run, and likewise the containers don't "know" anything about the machines on which they run, except that they are guaranteed to have the resources that container needs. If we really need to do something special for performance, like run a CPU intensive container on a C5 AWS instance, and another on an R5 instance to optimize for memory, Kubernetes or ECS would let us do that without much trouble. This makes managing our fleets of servers super easy.
- As consultants on a codebase we run with many clients, the level of abstraction with Docker lets us avoid most of the hard work of adapting to a client's infrastructure. We have clients using Kubernetes on AWS and bare metal, using just Docker Compose (no Swarm) on virtualized and bare metal machines, and using ECS. Basically, we tell them: "give us something that can run Docker containers, and integration will be a breeze".
Conversely, here are some times I _wouldn't_ recommend Docker:
- You really have hard performance requirements beyond "make sure it runs on the right EC2 instance type". If you are in HFT, sure, avoid Docker.
- You have an existing, working, relatively stable infrastructure running on a sane modern-ish setup like Packer/Terraform, and everyone on the team knows how it works. This is especially true if you mostly have a monolithic codebase, with few other services.
- This is more for Kubernetes, but Databases with their own clustering/coordination are a bit scary to run on Kubernetes, at least to me. It's getting better, but I feel like the probability of two "auto-healing" clustering mechanisms fighting each other is too high. We currently run Elasticsearch on bare EC2 instances in our most critical setups for that reason. Someday I hope this is no longer required.
Finally I'll say that Docker itself is nothing special, and I actually don't expect we're still using Docker in 10 years time. The implementation itself has a lot of quirks, the idea of a Docker service running as root on all machines is very scary (but is already going away), and the limitations of Docker requiring root to build images is also concerning (and is also going away). But the idea of containers or something similar as the "unit" of managing workloads is really great.
What about networking and storage? From my experience managing storage with Docker (or Kubernetes) applications is a hard problem.
Like I said, our primary service with storage is Elasticserach which we still manage with good old Terraform and bash scripts.