Hacker News new | past | comments | ask | show | jobs | submit login
What's in a Docker image? (cameronlonsdale.com)
171 points by clonsdale 6 months ago | hide | past | web | favorite | 33 comments

I've been building some fairly sophisticated systems with and without Docker, and it is always easier to build such systems without Docker. See here:


At the end of that essay I go into detail about a recent Docker/AWS/ECR/ECS system I worked on, much of which seemed like a whole lot of useless work that slowed the project and offered no benefits.

So I'm wondering, what is the biggest project success that people can point to regarding Docker? In the above post I link to an article where a developer talks about coming in under budget on a project where they used microservices and Clojure. Where are similar articles regarding Docker? I'd like to read an article where an individual software developer, preferably someone who already has a good reputation with the tech scene, writes an essay where the theme is "We thought this was going to take 6 months, but once we switched to Docker we were able to get everything done in 3 months." I find it stunning that there are basically no such articles. Meanwhile, it is very, very easy to link to all of the articles where developers have written about the problems they had when trying to use Docker or Kubernetes.

Docker isn’t supposed to make development faster, it’s supposed to give a standard way to package code and dependencies so that a small infra team can support a much larger dev team developing a large number of services with heterogeneous language and library versions.

Exactly this. I’m the DevOps lead at my current company and my life would be hell if I had to manage not only remote machines but also local dev machines and all the package versions, dependencies, etc, that can differ. Docker adds some overhead and some headaches, sure, but once we got our CI/CD pipeline running with Docker life got about ten times easier because we weren’t spending hours in a Slack channel trying to chase down a package that someone forgot to upgrade or compiled with the wrong modules.

I agree too. I kicked off an application server project for a business unit in a large company using Docker. Aside from the core image of my application server, I used docker-compose to orchestrate nginx and Redis instances to support it. That setup remains available to a small team of developers. From the first production release and to date, corp DevOps build the core app server image in their pipeline, and deploy it with a different load balancer and a compatible key value store. It worked in that configuration the first time, and there's never been an issue.

The flexibility and modularity is a big plus to me.

I disagree. The main reason Docker is superior to packer and other VM-based systems is speed. Less time waiting for builds, less time starting your servers (including during development), less time setting up a dev environment, ...

I work in the public sector. We operate more than 500 systems at the muniplacity where I work. Some are major, some are not. Some have support completely handled by the providers, some do not. Some are build in house, some are bought by local software companies and developed specifically for us others again are standard software.

What’s common among them is that they run on different technology.

So we have 500 different ways to do front-ends, middle-ware, back-ends and databases. So literally thousands of combinations.

We have 5 IT-operations staff to run and maintain it. They’re very good, but they are very good at specific things. Like one guy knows networking, another knows server infrastructure and user management, but only for Microsoft products because that’s our backbone.

But we’d like to run a JBOSS server on redhat, and we’d like to run some Django and Flask servers, and we’d like to have a PostgreSQL and a MongoDB for GIS, a MySQL for some of the cheap open source stuff and a MSSQL for most things. And so on.

Guess what makes that easy, docker for Windows. Our IT staff can’t support all those techs, but they can deploy and maintain containers and keep surveillance on them. Sure the content of the containers is up to the developers, but it’s infinitely easier to operate than having to support the infrastructure to do it without docker.

The same thing is less true in Azure or AWS where your cloud provider can act similarly to containers, with web-apps and serverless, but it’s still nice to keep things in containers in case you need to vacate the cloud for whatever reason.

Once you get to that point, absolutely, I can see the value.

But, counterpoint - I've seen Docker being a gateway to that sort of mess, too. It tears down a lot of the barriers that might have otherwise made it inconvenient to add more technologies to the stack, and some developers just can't help themselves in that department. One fairly new commercial offering I evaluated, which is distributed as a Docker-compose stack, was using 4 different databases. That's just crazy - now their dev team needs to understand the behavior and idiosyncrasies of 4 different data storage technologies, and their clients will end up needing to deal with blowback from all that excess complexity, too.

Or go multicloud, or avoid lock-in. Containers are cloud neutral. While admittedly some of the App Service/Engine equivalents require fairly thin configuration, docker images are pretty directly portable.

>> we’d like to have a PostgreSQL and a MongoDB for GIS...<<

You can use Postgres for GIS too

I'm not reputable, but my employer Red Hat, is, and I have a great example for you.

Traditionally, most content we productized was delivered to end-users in the form of an RPM. This is important because it ensures we deliver content is a reliable and reproducible manner. The RPM works well because it supports multiple architectures, handles dependencies, etc.

Enter containers. To ship these products on our container platform Openshift, we built Docker images and their Dockerfile basically just had some common stuff and then installed the RPMs we publish already to install the particular application we intend to deliver as a container. Nothing super crazy, no huge wins here, just a bit of extra work actually.

However, I work on a new project that is specific to Kubernetes and Openshift, and our platform is oriented around running containers. On this project, we have some really huge Java applications that I need to package up as a container. However, traditionally that meant I had to also build an RPM. Let me tell you, packaging huge Java applications (like HDFS) as an RPM isn't exactly fun, especially when they themselves have tons of dependencies which you need to also package as an RPM. When I looked at this task, I was estimating a few months of work. We have tools to import things into our systems, but it still requires both packaging, and building a bunch of individual RPMs for each dependency.

After some convincing and arguing, we a new pattern which allows us to bypass building any RPMs in some cases where the application is not going to be delivered outside of the container platform. This meant we could directly build Docker images, which ended up making the actual process a lot easier. We still need to import the sources and things into our systems, but we can avoid the potentially hundreds of RPMs that we would have needed to make before, and build a single Docker image which just downloads the final artifacts we've built internally. This process took a few weeks compared to the months I was originally anticipating.

In the end, I had to build a Docker image anyways, but the value came from being able to basically completely avoid having to build content in a system that provided me no direct value, and in my scenario, effectively only added more steps. I still use that system for other stuff and love it, it works really well and does provide value, but in my other use-case, it was more about using existing process because it's what we have, not because it's what we needed, which is why things changed.

I don't really understand your complaints at all. You simultaneously talk about how much work it is but the main anecdotal complaint was that it makes changing things too easy. You don't really argue against docker as much as talk about liking packer, which is fine and all but its not very compelling.

I work on a project that recently switched from ami/deployed packages to docker. Took us about 3 weeks to convert our jenkins pipeline and a dozen Java services (and some odds and ends) to docker images.

A huge advantage baked AMIs don't have is you can't run them locally as easily as docker. We can run a full local environment with a docker-compose file.

Honestly, once it clicked, it was almost too easy to blog about. Is this something people want to read?

You can build a house out of random pieces of scrap wood, but it probably won't hold up well over time, or during a hurricane, etc.

Docker is more like standard measurements and lumber, allowing one group to build the rooms, and another group to build the plumbing, roof, etc and shove it all together.

It can take you longer to go the latter route, at least until you're all very practiced at this new method. But once you get the hang of it, changing the house becomes cheaper, easier, faster and more reliable.

From a developer's standpoint, Docker can help speed development by keeping all developers on the same page. If everyone develops and tests using one container, there's less chance for certain kinds of bugs when integrating. You can do the same with, say, Linux packages, but Docker alleviates the need to learn different distros' packaging formats, and solves common dependency and isolation problems.

Less time trying to fix your builds due to incompatibility, almost no need to use configuration management tools. This saves you time and complexity, at the cost of the complexity of Docker itself.

Layers are interesting but the benefits of layers are vastly overstated, you can achieve the same without layers and the overhead and complexity in image management. Piling layers upon layers to build a container is a really bad idea and adds a lot of complexity in basic image management. [1]

The big problem is technical decisions like the use of layers, single process containers and ephemeral storage that add a lot of complexity and overhead have not met technical scrutiny and thus not well understood. But they should be as they can add a lot of management overhead and debt. [2]

Many users who use LXC will realize a lot of benefits and flexibility actually flow from containers and not these 'modifications'.

Disclaimer: Founder at Flockport and trying to build a simpler alternative with LXC containers.

[1] https://www.flockport.com/guides/understanding-layers

[2] https://www.flockport.com/guides/say-yes-to-containers.html

Your blog posts are interesting, thank you for that. I have questions though: Since you are using both Terraform and Packer from HashiCorp, do you happen to use Nomad too? If you do, what is your experience with it? Can you compare Nomad vs. Kubernetes?

I think Docker/containers make perfect sense in that they make portable software you can ship around. The catch is: you have to really understand how the sausage is made to reason about building a proper container. Most people simply don’t really know what they are doing - they simply bash keys until it works. so the challenge of building new container images is usually more work for developers. The person who was really proficient, this barely affects their day to day productivity - it just makes life easier when they go to distribute their app (or hand off to other teams).

Wholeheartedly agree. There are so many powerful development and debugging tools that are easy to configure in a standard IDE outside of a container environment. Docker is an excellent packaging and deployment tool. During more than 3 years of use, I've rarely encountered runtime issues caused by differences between my host and the container environment (they occur periodically when using libs for things like image processing). So long as you have logging and other service monitoring configured so you never have to deal with SSH and attaching to containers, you seldom have to think about Docker and you have a system ready to deploy in a variety of environments.

It's also great to use Docker to run test databases and other tools locally. For example, we have a script that takes a DDL file, spins up a fresh Postgres in an ephemeral Docker container, then runs SchemaCrawler on the database from yet another container to generate a useful entity-relationship diagram. The tool is portable and repeatable and carries no risk of affecting a production system.

I'm happy using docker. What I think you're missing is dependency changes and general ease of running CI. (this may not apply to projects which have long-term-stable base and just add lots of code)

1. When some native dependency change is needed, you have to make at least 3 changes: the actual code update, the packer script, and your Jenkins worker. Unless you empower all devs to change this, you have a bottleneck. And it only really works as long as you can have both versions of deps installed concurrently.

2. It's hard to really run tests in production like environment. Spawning new instances is a long and annoying step. Spawning a docker container is trivial in comparison.

3. It enables devs to run meaningful tests without having the exact same OS on the desktop like what's running on the servers.

Docker has other issues, but for these use cases it really solves problems.

Docker can save tons of time by letting you bundle up a copy of your staging environment/backend onto an image that mobile developers can run a copy of on their laptop without having to know all of the custom CFLAGS you used to install your wonky dependencies.

I quite like Docker for compiling Linux apps to run on multiple distributions. See eg. https://github.com/mherrmann/fbs/releases/tag/v0.4.6.

Containers belong in a invisible subsystem that make higher level services like FaaS available to application developers.

I've been using Docker with very positive results in consulting contracts for several companies on very different hardware. The same codebase is used for hosted services that we run ourselves on Kubernetes running on AWS.

I'll consider writing this up as said blog post, but here's the gist.

In my view, the advantages of Docker containers are roughly as follows:

- At a high level, Docker allows thinking our services as opaque units in a way I haven't really seen in practice anywhere else. Our containers have CPU and memory requirements, but other than that, when thinking about infrastructure, nothing else matters.

- Docker allows all knowledge of how to run software and its dependencies to live with the corresponding code. While I understand your reasons, I don't agree with the idea that "regular programmers" should not be trusted with devops code. The advantage here is that we can write our application code and its requirements once, and run it in lots of places. In all my years of trying systems like Puppet, Chef, etc, I've never seen this work quite as well.

- With Kubernetes (or ECS, or Docker Swarm, etc), all our infrastructure instances are identical. They are agnostic to the workload they will eventually run, and likewise the containers don't "know" anything about the machines on which they run, except that they are guaranteed to have the resources that container needs. If we really need to do something special for performance, like run a CPU intensive container on a C5 AWS instance, and another on an R5 instance to optimize for memory, Kubernetes or ECS would let us do that without much trouble. This makes managing our fleets of servers super easy.

- As consultants on a codebase we run with many clients, the level of abstraction with Docker lets us avoid most of the hard work of adapting to a client's infrastructure. We have clients using Kubernetes on AWS and bare metal, using just Docker Compose (no Swarm) on virtualized and bare metal machines, and using ECS. Basically, we tell them: "give us something that can run Docker containers, and integration will be a breeze".

Conversely, here are some times I _wouldn't_ recommend Docker:

- You really have hard performance requirements beyond "make sure it runs on the right EC2 instance type". If you are in HFT, sure, avoid Docker.

- You have an existing, working, relatively stable infrastructure running on a sane modern-ish setup like Packer/Terraform, and everyone on the team knows how it works. This is especially true if you mostly have a monolithic codebase, with few other services.

- This is more for Kubernetes, but Databases with their own clustering/coordination are a bit scary to run on Kubernetes, at least to me. It's getting better, but I feel like the probability of two "auto-healing" clustering mechanisms fighting each other is too high. We currently run Elasticsearch on bare EC2 instances in our most critical setups for that reason. Someday I hope this is no longer required.

Finally I'll say that Docker itself is nothing special, and I actually don't expect we're still using Docker in 10 years time. The implementation itself has a lot of quirks, the idea of a Docker service running as root on all machines is very scary (but is already going away), and the limitations of Docker requiring root to build images is also concerning (and is also going away). But the idea of containers or something similar as the "unit" of managing workloads is really great.

> - At a high level, Docker allows thinking our services as opaque units in a way I haven't really seen in practice anywhere else. Our containers have CPU and memory requirements, but other than that, when thinking about infrastructure, nothing else matters.

What about networking and storage? From my experience managing storage with Docker (or Kubernetes) applications is a hard problem.

Definitely, storage is still hard. I don't have a ton of experience doing anything fancy with storage and K8S yet.

Like I said, our primary service with storage is Elasticserach which we still manage with good old Terraform and bash scripts.

I hear this quite a bit, avoid Docker for insert task here. Containers are just processes. Avoid using containers the same way you would avoid spawning a process, may be a better way.

Brilliant article. Kind of reminds me of the articles in Dr.Dobbs journal in the pre internet days where they used to dissect various file formats and explain how the file was organized so you can create your own viewer. That was the best way to learn about how the original program worked too.

Thanks! I really like those articles too. With this topic, I found many posts online just stated information about containers almost like they are trying to rewrite the documentation, which didn't seem like a good use of writing an article. This experimental process is what I went through to learn the content and I felt as though it would be helpful for others to go through it too.

Enjoyed that. Demystified docker images for me. Can you do a follow up on how Docker actually runs the images please?

Yes I'd love to read that as well. It would really be the best resource for understanding docker.

A related big thread from the other day: https://news.ycombinator.com/item?id=18528423

Sightly concerned that it's using a version of Ubuntu that's been end of life for 35 months.

The official documentation used that as an example, and because I wanted to steal their image, I did that too. Of course, please don't use Ubuntu 15.04 for containers you care about.

The problem with docker is the Docker installation. In legacy system, you're out of luck.


Can you clarify? I think this is actually a pretty good explanation of what's literally inside of a Docker image.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact