Hacker News new | past | comments | ask | show | jobs | submit login
Dockerfile Security Best Practices (cloudberry.engineering)
407 points by gbrindisi 46 days ago | hide | past | favorite | 192 comments

I would add one more general security tip. Always restrict your mapped ports to localhost (as in unless you really want to expose it to the world.

I think it's counterintuitive, but I learned the hard way that 3306:3306 will automatically add a rule to open the firewall on linux and make MySQL publicly accessible.

Yep, this was especially deadly a few years back with Redis because it bound to by default and by default Redis allows connections without a password.

So if you had 6379:6379 published anyone from the internet could connect to your Redis container. Oops.

> Always restrict your mapped ports to localhost (as in unless you really want to expose it to the world

It's also worth pointing out that typically you don't even need to publish your database port. If you omit it, containers on the same network can still communicate with each other.

Very rarely do I end up publishing ports in production. If you wanted to connect to postgres you could still docker exec into the container on the box it's running on and interact with your db using psql that's already installed in that container.

A while back I wrote a blog post on the difference vs exposing and publishing ports at https://nickjanetakis.com/blog/docker-tip-59-difference-betw....

This is why I like to use both a host-based firewall __as well as__ a network-based firewall. For the VPS's that I have running on the internet, I always use the hosting provider's firewall offering in addition to iptables (or ufw).

This is something AWS gets right, and Digital Ocean and Linode both get wrong (they offer no cloud firewall of any sort, to my knowledge).

It should be trivial for me to lock down the ports of my instance, from the VPS web UI. AWS lets me create a new instance which is entirely closed to incoming connections except for port 22, which is closed except for whitelisting my IP address. This gives me good assurances even if my instance is running a vulnerable SSH server. It's also trivial to block outgoing connections, where that's appropriate.

It also means my instance is spared from constant probing, which keeps the logs clear.

> Digital Ocean and Linode both get wrong (they offer no cloud firewall of any sort, to my knowledge)

DO has had a cloud firewall for a while now (a year or 2?) https://www.digitalocean.com/docs/networking/firewalls/. It's free too.

Looks like Linode has one coming soon https://www.linode.com/products/firewall/.

I mainly use Vultr for low cost VPS's and they also have had a firewall service for quite a while too.

Thanks for pointing this out, I just setup the DO firewall on my account.

That's good news, thanks.

> automatically add a rule to open the firewall on linux

There's no way this is true. This completely defeats the purpose of a firewall.

If it is happening, then it's Docker doing this - not the linux firewall that comes stock with most distribution (iptables and the like). They would never simply add rules to the chains just because something was listening on that port.

Really, the best security advice for using Docker is to not use Docker. Unfortunately, there aren't very many "hold your hand" alternatives available. Aside from LXD and that family of technologies, which are criminally underused.

Docker uses iptables for port forwarding, and those rules typically end up ahead of the rules inserted by your firewall (firewalld/ufw/manual scripts/whatever).

It's not so much that they explicitly open a firewall rule, as that they take a networking path that isn't really covered by traditional firewalls.

Another way of viewing it is that Docker "is" your firewall for your container workloads, and that adding a port-forward is equivalent to adding a firewall rule. Of course, that doesn't change that public-by-default is a bad default.

This is right, I remember now - docker does mangle your iptables chains. I remember fighting with this a while back.

Terrible practice, in my opinion. Docker shouldn't be touching firewall stuff.

I've resorted to adding my own firewall rules to the 'raw' table, which pretty much preempts all the rules Docker or the distribution inserts.

It's not as powerful as the later tables in the chain (see https://upload.wikimedia.org/wikipedia/commons/3/37/Netfilte... ) but a lot more robust.

Iptables magic is essential to how a lot of container networking stuff is implemented, though.

This is (imho) a huge flaw in the concept of a "container". I don't think most people comprehend how much crap is going on in the background.

For most container purposes, host networking and the default process namespace is absolutely fine, and reduces a lot of problems with interacting with containerized apps. 95% of the use case of containers is effectively just a chroot wrapper. If you need more features, this should be optional. This would also make rootless federated containerized apps just work. But nobody wants to go back to incremental features if Docker gives them everything at once.

If you think that’s bad, wait til you see what the iptables-save output is like on an istio-proxy sidecar ;)

Kubernetes as well. We ran into instances where iptables contention was so bad during outage recovery that things just stalled. iptables-save looked like a bomb went off.

This has been a major pain point for me. Despite my `firewalld` configuration only allowing specific traffic, all my containers were exposed.

My current policy is to set `"iptables": false` in Docker's `daemon.json` on any public machine. I don't understand why this isn't the default.

> My current policy is to set `"iptables": false` in Docker's `daemon.json` on any public machine. I don't understand why this isn't the default.

If you don't muck with iptables then you need a (slow) userspace proxy to expose your pods. That also means losing things like the source IP address for any incoming connections.

Interesting, I haven't noticed any slowdown, but I am running fairly low traffic services.

I do see that RemoteAddr is from a private IP range. Luckily I'm not using this information anywhere, but good to know.

If this is a surprise to you, scan your servers and see what else has helpful behavior which you didn’t expect. Services like shodan.io will even do this regularly and email you when things change.

It’s easy to blame Docker but I’ve seen this failure mode happen many times over the years - even experienced admins make mistakes. As always, defense in depth and continuous monitoring trumps raging about software.

I'm a Docker maintainer and have been for years.

The default is terrible. It got into 1.0 and just got stuck there :(

Changing defaults on such widely used software is, unfortunately, hard.

For whatever reason Docker prepends rules. There are things you can do to add your own filtering (Docker forwards to the "DOCKER-USER" chain where you can put your rules), but it requires people to know what is happening I order to use it securely.

It would be really nice to be in a more secure situation by default... open to suggestions and contributions for Docker 21.

Have you considered at least logging to the terminal on container start whenever a port gets exposed to the internet because of a 3306:3306 (i.e. without an explicit ip to bind to)? Part of the issue seems to be that people haven't read the docs and so don't really understand what that snippet they copied from that helpful blog, you know, does

I like this idea. I believe we already have a mechanism for warning on container create.

The nice thing is admins can already define a default value that the eninge will use to bind to (when no address is specified on -p). Warning can point users to that setting.

I do worry a bit about noise, though.

Perhaps make it easily quieted in the settings, and some kind of backoff between warnings? I definitely agree that it could be too much, on busy development systems especially!

That's definitely a tricky situation to be in since you'll inevitably get someone complaining that an upgrade broke something they depend on.

I like Godel_unicode's suggestion of logging and that could probably done in a stronger manner if there was some point (post-install, maybe starting a container) where it checked the existing rules and used a more prominent warning when there are existing rules which would prevent a container which would be reachable now from being reachable in the future. Given how widely Docker is used, I'd assume that'd be the kind of thing you'd need to add as a warning for multiple releases before even doing something like having it switch to a more secure default on new install.

What about a "docker secure" command which updates the configuration to a more secure default?

This also raises the possibility of different security profiles like dev, prod, etc.

A default Docker install would be documented as being for development, and you run "docker secure" to change that for other environments.

I like the general idea, but ultimately suffers the same problem in that people have to know about it. There actually is a setting to set the default bind address already.

If you can't change the default because of backwards compatibility and inertia, you can at least provide a well-documented, recommended, easy way of fixing the default.

Is there official documentation that tells users to set the default bind address as a best practice?

I wasn't thinking so much of just changing one setting, but rather having a way to easily reconfigure an installation to set multiple settings to improve security.

In addition, elevating this to the status of a command and documenting it as a best practice helps spread awareness.

I don't disagree with any of your points, but they aren't relevant to anything I've said. I never said Docker is the only one doing this. It's also not a surprise Docker in particular is doing this - Docker has a long history of doing bad things.

It is relevant because you said “there's no way this is true” when it is in fact true, which means that your understanding of how the system works doesn't match the actual behaviour. I mentioned the importance of scanning to catch those situations quickly.

It has to do with the implementation of the networking. The DOCKER chain in the nat table gets hit by inbound traffic on the PREROUTING chain before UFW's chains on the filter table. IIRC, can get around this by directing UFW to write its rules to the DOCKER-USER chain.

Firewalld is implemented differently and will exhibit the expected blocking behavior: traffic targeting docker ports will encounter firewalld before it encounters docker.

>Aside from LXD and that family of technologies, which are criminally underused.

Criminally underused indeed. I have no idea why it's not more popular for 'average' users/orgs. I don't know what issues may come up with scaling this up, but in our small org we've been running 20-30 (mostly unprivileged) LXD containers in production for several years now for all sorts of intranet and external-facing services (auth, DB, web, etc). Sure, it requires a bit more thought to set up than Docker, but it's well-documented (for most people's uses at least), secure, stable and lightweight.

>I have no idea why it's not more popular for 'average' users/orgs.

Maybe because many devs use Macs / Windows? Maybe WSL may tilt the balance in LXDs favour, but on OSX? Run it a VM yourself without the conveniences of docker-compose up?

As a Linuxuser myself, i looked at the competing solutions (podman, LXD, and the one by Canonical whose name i forgot) and thought: "Ain't gonna fly in a mixed environment."

They may be technologically superior, worse is better once again i guess. Would prefer them to Docker too.

Isn't docker running on OS X in VM anyway?

Yes it does, but it's very transparent - until you run into a few things that make it obvious, eg if you disk-image size limits, or memory limits.

I work with a few people who believed it was native, and used it for quite a while, until things started going wrong and we logged in to the vm to fix them.

Kinda my point. The reason docker is popular is simply that most of the devs are lazy, not very knowledgeable in the actual underlying technology and very short-sighted.

And docker marketing has very effectively used that to their advantage.

Yes it is, but docker for desktop runs it for you. You could use LXD etc by running Linux yourself in a virtualisation solution of your choice like VirtualBox or VMWare.

Yeah, that's my point.

On Linux you are going to run docker in VM anyway if you care a bit about security, but I know that almost all devs run docker directly on their laptops and with the user's full access to docker - ie. their user effectively becomes user. Without second thought ...

* becomes root, of course

> I have no idea why it's not more popular for 'average' users/orgs.

https://news.ycombinator.com/item?id=24782999 tongue in cheeck indeed, but better marketing it is, of course

I'd map them to sockets instead, since you can't restrict a TCP/UDP port to a specific user.

I like creating a security choke point, like a firewall in a vm serving as a nat gateway or actual cloud security groups and network acl.

This way you can make all your servers private and manage the firewall in a single access point to the outside world.

Making your servers public to the net by default and without a separate firewall solution is not so advisable in the first place.

People who know enough to consider architectures like this aren't the ones most likely to accidentally expose databases to the internet. It happens, but most often these mistakes are made by people who just don't have the experience to be wary.

I think software like docker have a responsibility to encourage secure-by-default configurations, but unfortunately "easy" is often the default that wins mindshare.

I agree with you, but since Docker is kind of a given, how can one learn the necessary stuff about networking as to not make these mistakes?

I always see best practices like this, but they don't really help in grokking what's happening and why. I'd like to know more about the networking stuff, but whenever i look something up it's very specific, so you don't really learn why it's bad.

How can a regular user understand how the network stack works? At least enough to get an instinct why something would be bad.

It's difficult for me to answer how other people should learn these things, since I personally just... tried to figure things out? It's been so long since I found basic networking mystifying that I'm not sure how to explain it to someone who doesn't have the same intuition. If you have something that's very specific, maybe make a guess on how it could be generalized and then test that guess. Try to build a mental model, and test that model.

I don't like using systems that are complete black boxes, so whenever I use something, I try gain a reasonable understanding of how it works under the hood. If a system claims to make X easy, I want to know at least what is involved in accomplishing that, even if the implementation details aren't relevant knowledge. I don't often need to dig into the nitty-gritty of how the Linux TCP stack works, but even having a broad idea of how the TCP protocol works is pretty useful, and especially how it relates to other networking protocols.

I guess for practical networking, it helps to first focus on IP addressing and routing; ie. how does a packet sent from your computer actually get through all the switches and routers to the destination computer? The short answer is that every node (including your computer) makes a routing decision on where to send the packet, and then it's sent forward. This happens at each "hop" until it appears at the end (or gets dropped by a firewall).

And from this simple logic and some fancy tools to help you make dynamic routing decisions in response to changes in network topology (router went down? update local route information and send the packet to the other router that's still up), you can build the internet in a fault-tolerant manner.

I guess you are cutting straight to the chase and overlooking the fundamentals. I took a lot from the Well-Architected framework from AWS and applied in all my projects.


Take a look at the security pillar with extra care. For the cloud I would suggest you take a basic practitioner exam, or at least a preparation course in a platform like whizlabs. There you would get a basic understanding of how networking is laid on the cloud.

For private, on-premises projects, it really comes down to what you have at hand. In this case maybe the Google SRE book would be good. You take good practices in maintaining a data center and apply the distilled knowledge to what makes sense to your infrastructure:


Read this book as in topics, not sequentially, coming back to the fundamentals when you feel lost, otherwise you might end up lost in technicalities that make little sense to your work.

Also take a look at the shared responsibility principle. There it is exposed what are the client and cloud provider responsibility. When you have a private on-premises project all you have to do is implement the entire responsibility stack that the cloud does for you.

I am not sure why you were downvoted. I agree with that. I prefer technologies that are restrictive by default and more flexible and potentially harmful configurations hidden behind explicit and well structured options.

Either an exception should be raised or a default safe behaviour should be adopted when the example is encountered. I prefere breaking as soon as possible because the alternative is harder to debug.

Exactly. A public facing ip for a server, especially a database, is just a bad idea. You need something a bit more hardened to route the traffic. And a publicly accessible database is just asking for trouble.

For UFW users, installing this will make docker compatible with the firewall.


> I learned the hard way that 3306:3306 will automatically add a rule to open the firewall on linux and make MySQL publicly accessible.

Is that true? 3306:3306 would bind the port on all interfaces, but I was under the assumption you'd have to explicitly enable firewall port 3306 for the machine to accept traffic to port 3306 from outside of your machine.

I'll have to test that.

From memory and a quick glance at one of my servers:

When an IP packet related to your container arrives (<host ip>:<host port>):

- docker rewrites it to target <your container's ip address>:<your container's port> (NAT table, chain PREROUTING delegates to DOCKER chain with the relevant DNAT entry)

- since the IP does not match your host, the packet is forwarded (there's a relevant routing entry pointing to docker's virtual interface)

- the first thing you encounter in the FORWARD chain of the FILTER table is a few jumps to the docker-related chains, DOCKER chain in particular accepts all packets destined to <your container's ip address>:<your container's port>

So a few takeaways:

- your standard firewall might not be involved because its chains are plugged in after the docker chains in the FORWARD chain (e.g. ufw under Ubuntu)

- if the above is true and you want your firewall to matter, you have to add stuff to DOCKER-USER chain in the FILTER table

- at that point the host port and IP doesn't matter since it's already been mapped in the NAT table's PREROUTE chain at the beginning of processing - write your firewall rules to address specific containers

Why would you have a database running on a machine with a public ip?

One of the most common mistakes I see is not using a .dockerignore file or, better said, relying on .gitignore when calling `COPY` on entire directories. Without a .dockerignore file in place, you could be copying over your local .env files and other unwanted things into the final image.

On top of that, you might also want to add `.git/` to your .dockerignore file, as it could significantly reduce the size of your image when calling `COPY`.

A more subtle issue I've noticed is the fact that `COPY` operations don't honour the user set when calling `USER`. The `COPY` command has its own `--chown` argument, which needs to be set if you'd like to override the default user (which is root or a root-enabled user in most cases).

I wrote up a similar article a while back, though it's focused on general best practices: https://lipanski.com/posts/dockerfile-ruby-best-practices

Better yet, only COPY the actually needed needed files instead of the whole working directory. That way, there's no need for a `.dockerignore`.

While this is a good idea, having a `.dockerignore` reduces how much Docker has to load into the build context. For projects with large histories, the `.git` directory itself can be rather large. Add to that directories that hold build artifacts, documentation, and you are unnecessarily increasing the time it takes to start the build process.

Wondering why you think this is better. Not sure the trade off of a messy dockerfile and/or adding a bunch of layers (possibly bloating the image size) is worth the trade off if the concern is just about forgetting to update the dockerignore. The same could be said about gitignore.

Not the person you replied to, but personally, I like having control over what exactly gets into the final image, and (IME) have found that devs aren't great about remembering to update .dockerignore files. Re: extra layers, if you use multi-stage builds to separate the builder and final app images, you can avoid that.

It's nice to know what files you need to build the image. Sort of like importing libraries at the top of a source code file.

I'm not sure that's always practical. Consider the average Rails or Symfony app - you'd have to include quite a few files (even if you add entire directories at a time).

1. "Don't update system packages" is bad advice. There are base Docker images with out-of-date packages that need security updates. CentOS for example doesn't update their base image for months on end.

2. Given you do want system package updates, you need to deal with Docker caching breaking your security updates. So you need to rebuild your image from scratch, without caching, either every time there is a security update or on a weekly basis. https://pythonspeed.com/articles/docker-cache-insecure-image...

3. Not mentioned: don't use build secrets via ARG.

Some approaches to secure build secrets:

1. BuildKit supports them: https://docs.docker.com/develop/develop-images/build_enhance...

2. Via the network, which is a hack, but it works: https://pythonspeed.com/articles/docker-build-secrets/

3. Via multi-stage builds, but this destroys caching.

> "Don't update system packages" is bad advice. There are base Docker images with out-of-date packages that need security updates. CentOS for example doesn't update their base image for months on end.

Generally, if you want to update system packages, rebuild the container make it the new base for you. Updating with every build provides a potentially non-reproducible build.

If you're using a tag like centos:8 that gets updated periodically, you already do not have reproducible builds. This just ensures you get updated packages as soon as possible

If you're using a tag like centos:8, I agree. That's why it's not a good practice to use centos:8 as your base image.

It's a tradeoff, yes. You can do more work to rebuild a base image, or you can say "technically have slightly newer version of glibc isn't reproducible but in practice I don't expect that to break anything so I'll live with the risk".

Overall great post.

> If you rely on latest you might silently inherit updated packages that in the best worst case might impact your application reliability, in the worst worst case might introduce a vulnerability.

On this one I disagree. Your CI should handle reliability (and if not, you have a bigger problem) and you're more likely to patch a vulnerability than to introduce a new one and it's unlikely that by the time a PR hits production that the version is compromised.

I understand that updating cuts both ways when it comes to security, but I agree with Matt Tait's ultimate conclusion when he spoke on this issue a few years ago: For most medium size and smaller companies constantly updating is safer than delaying. He had real world data and graphs of compromise windows, etc. Short answer was that attackers are more time motivated than defenders.

Pinning the version tag for base images in Dockerfile is a good idea beyond the scope of security. It helps with onboarding as well. I've been in situations where depending on the `:latest` tag of a base image caused different versions of that image to be used on different machines, resulting in developers having weird issues that no one else was having ("I thought Docker was supposed to solve this!"). Now, I only use non-specific tags like `:latest` or `:12-alpine` before I distribute the Docker image, by either collaborating with others or pushing it to Docker Hub. It just gives me peace of mind to know that others are building on the exact same stack as I was building on.

This is a reasonable choice to make depending on the complexity of your project.

For the projects I've been on, the latest version plus the test suite is enough to catch weirdness creeping in and, as a side benefit, it gets fixed faster than if it were pinned. Sometimes the issue really is caused by the base image and it is easier to get a fix merged if the issue was caused quite recently because the developers responsible see early reports of issues as more endemic than if they're reported days or weeks later.

For many popular images, you can use a major-version tag and fetch the latest patch but avoid breaking changes.

Eg. postgres:12 instead of either postgres:latest or postgres:12.0.1

Operating system updates are the responsibility of the base image. Use a base image that is regularly updated. For your own images, ensure that they are also being rebuilt regularly.

There are a lot of semi-official Docker images on Docker Hub that are published once and never updated until the next software release. That is a huge anti-pattern in my view, those images should not be relied on for production.

This makes me wonder, why there has not been a bigger push towards microkernel/minimal OS with audited toolchains that were "done". Minimal features and minimal surface area. A plug and play distribution with security at the forefront which rarely needed updating because only the essential was available.

I would be fine taking a healthy performance hit if I knew that the base OS was secure. (At this point I expect the BSD folks to chime in that they have had this for years)

Isn't that (one of) the design goals of CoreOS, Alpine, Clear Linux, etc.?

Further in that direction, https://github.com/GoogleContainerTools/distroless — stripping out as much of the OS code as possible.

Shameless Plug: I wrote a cli-plugin for docker, docker-lock, that will track the image digests (SHA's) of your images in a separate file that you can check in to git, so you will always know exactly which image you are using (even if you are using the latest tag). It can even rewrite all your Dockerfiles and docker-compose files to use those digests.


Would love for anyone dealing with this issue to check it out!

Would you have a link to the talk? I couldn't find it with a google search.

I don't have time to track down a link to a video at the moment, but it was at Infiltrate, an offensive-minded cybersecurity conference out of Miami.

Is it this one? https://vimeo.com/267445424

I always thought `:latest` was a bit of a special case. Ex: Pushing `2.0.1` followed by `1.1.4` would leave `1.1.4` as the `latest` image which could be an issue by itself. Is that wrong?

I've always tried to pick stable tags [1] if they're available.

1. https://docs.microsoft.com/en-us/archive/blogs/stevelasker/d...

There is absolutely nothing special about `latest`, it’s just a tag like any other. It simply happens to be the default tag when omitted, just like how `origin` is the default remote name when performing a `git clone`.

To expand on this: Well maintained projects do not often push to `:latest`; They treat that tag like a `stable` version and push to it only public, stable releases. Roughly, it should always be max(semver), though practically it often trails when a new major release happens.

> There is absolutely nothing special about `latest`

> It simply happens to be the default tag when omitted

That makes it special though, right? It's different than any other tag because it can be inadvertently pushed via accidental omission.

A note about not running as root: In certain systems if you break out of the application and have access to a shell, it might already be game over. An attacker probably already has access to secrets, to the database, and all the assets worth protecting.

On the other hand, changing the docker user to non root might introduce some failure scenarios (eg file ownership) which might lead to other problems like availability incidents.

Security should start with threat modelling, and taking a risk-based approach. You can spend hours fearing that someone might break out of the Docker virtual machine through a zero day, instead of using that time to fix much more likely and plausible threat scenarios. Pick your battles.

> On the other hand, changing the docker user to non root might introduce some failure scenarios (eg file ownership)

If you application needs root to execute, with very few exceptions, it is already wrong.

Soooo... docker? haha

edit: I joke, I love what containers accomplish, and working with Docker has been a joy (:

That's why I don't use docker anymore. I build images using buildah and unprivileged containers.

If you are are writing the app, I agree with you. Unfortunately in some cases the person/team that wrote the app has been gone for a long time. I've even seen a case where the source code was missing and nobody knew where it was, yet the service had to continue running.

If you're in that boat, there isn't much you can do except work with it.

Agreed. But when copying files to docker and building the image, you will have to take care that files are not written with root ownership in any stage of the build, which would make them inaccessible to the application running as non root.

That's the case I had in mind when writing that quote.

> Agreed. But when copying files to docker and building the image, you will have to take care that files are not written with root ownership in any stage of the build, which would make them inaccessible to the application running as non root.

That's not the case, either. And root inside the container != root outside the container. A completely new user:group namespace is created inside the container. This is, in very large part, what Linux namespaces are for.

Further, you can certainly have a root-owned file accessible to non-root users, via chmod bits.

There are only a handful of excuses, ever, to run a privileged container. If you're not 100% sure, then it is not one of those excuses.

A completely new user:group namespace is created inside the container. This is, in very large part, what Linux namespaces are for.

No. root inside is root outside (if you can get outside). The behavior you describe only applies if you enable user namespace remapping, which docker doesn’t by default.

I'm actually running into this issue right now, in the context of a test that I'm writing. The test needs to copy some data out of an image. The image contains a full filesystem, and some of the files are read-only. The copy is done via docker cp, which requires elevated privileges on the host to copy read-only files (see https://github.com/moby/moby/issues/35987 for more info).

As a result, the test needs to be run using sudo to work correctly. The test environment is containerized, so there is a Dockerfile, but as the article mentioned the user in the container should not be root. Is there any plausible way around this? Updating the image is not under my control, and changing the copy code is also riskier than I would like.

Since this is in the context of CI, the threat is lower than live production, but still....

Something really useful I recently discovered is multi-stage Dockerfiles. Using FROM and then COPY --from to copy from a previous stage to prevent unwanted intermediate build steps that might expose secrets or just bloat your images.

Two ways this was useful for us. Firstly, we needed a private key in the image to pull some private git repos. By doing this in a previous stage, they're not included in the final image layers. Secondly, we have a python backend and small react app served from the same image. By splitting their build steps into a backend and frontend stage, changes to frontend code don't break caching for the later backend steps or vice versa. E.g.

  FROM python:3.8.3-slim-buster AS frontend
  # Do frontend build steps
  FROM python:3.8.3-slim-buster AS backend
  # Do backend build steps
  FROM python:3.8.3-slim-buster AS final
  COPY --from=frontend /app /app/frontend
  COPY --from=backend /app /app/backend

This is how C/C++ services are done in production, too. You have a build step that does `apk add --update alpine-sdk git cmake` etc., builds the service, and then you start again fresh with a FROM (new stage) and `COPY --from` over the build artifacts.

Reading this thread I'm surprised this isn't common knowledge by now given how it's so incredibly paramount to efficient production releases.

It may well be common knowledge now. I discovered it when writing our Dockerfiles earlier this year and before that hadn't seen it. It looks like multi-stage builds were added to the best practices in 2018 [https://github.com/docker/docker.github.io/blob/master/devel...] which is probably why I missed it before

Docker has official support for secrets using Buildkit.

This looks like exactly what I need. Even supports forwarding ssh agent. Thanks!

> Using ENV to store secrets is bad practice because Dockerfiles are usually distributed with the application, so there is no difference from hard coding secrets in code.

This sounds wrong. If secrets are in the environment they are not in the Dockerfile, so they are NOT distributed with the application.

What they're referring to is more specific than environment variables in general: you can use the ENV command in a Dockerfile to bake in a secret at build time, and that's what you generally shouldn't do.

Injecting environment variables at runtime, however (through docker run -e or whatever orchestration system you're using), is good.

Ah yes, that makes sense. I didn't understand they were talking about the ENV command at build time.

It was the heading that got me on the wrong path, I think that should be clarified further:

> Do not store secrets in environment variables

I'm bit aganist having env-secrets inside container.

Because PID 1 has that env, all processes spawned from that can read all of those.

I prefer mounting them to /run/secrets via tmpfs. Which can also have selinux policy attached.

This way, someone else cannot read them by spawning shell inside container

I don't think the author is talking about loading secrets from the environment - I think they're specifically talking about hardcoding secrets into the Dockerfile and using the Dockerfile ENV directive to set secrets for the processes running in the container (baking them into the image), instead of passing them at runtime, which sounds just horrifying enough that I'm sure people do it in real codebases.

This confused me as well. I didn't know that there was the ENV command for dockerfiles. It never occurred to me that someone would want to use that, and certainly not to put a secret in a plaintext dockerfile.

Seems such an obvious tip first up that it put me off reading the rest of the article.

About the "apt upgrade". In major distributions, that's safe. They won't introduce major package versions, only minor (and security fixes).

It should be totally OK to upgrade the packages.

Also I don't get that it's advised to use "apt update" when you can't do "apt upgrade". What's the point??

It's not entirely clear from the article but my guess would be that you should really update the repository + base dependencies in the base image, not the "application" image or whatever you wanna call it. But not the end of the world if you're not reusing that base image. Feels weird that an article on "Security Best Practices" would advise _against_ latest versions of your software, especially using apt as an example which is usually used in stable distributions (major version upgrades happens seldom) like Ubuntu/Debian.

But yeah, that last part doesn't make any sense. If you're not running `apt-get upgrade`, it doesn't make sense to run `apt-get update` as nothing is using the newly fetched data anyways...

Apt update just updates the cache of the package lists - /var/lib/apt/lists/ - based on your lists - /etc/apt/sources.list.d/. "upgrade" actually upgrades the code on the system.

`rm -rf /var/lib/apt/lists/*` in the same RUN decreases bloat and I think possibly decreases cache misses as well.

Base images usually ship with that done. `apt-get update` just undoes the good work of removing irrelevant lists.

Huh, TIL. I'll give that a try. Usually, that's the least of my problems. In my line of work, I'm often doing things which would likely horrify most webapp devs, like building a container with multiple conda envs in it. OTOH, most of these containers run on airgapped systems with petabytes of storage, so shipping a bit of bloat barely hurts the end user. But every time I do, I die a bit on the inside.

Some base images do, but some don’t, and they update on their schedule rather than yours. Doing the update on your schedule is trivial, has no meaningful downside, and means it’s done and tested as quickly as you need.

> and I think possibly decreases cache misses as well.

Unlikely, you'd still have misses because of stuff like file metadata mismatches (think modification times).

If you upgrade your packages during a build then inadvertently if 2 people build your Dockerfile at two distinct times they can end up with 2 different images. So rather than focusing on "we're using base-image:1.0.2" you need to start asking questions like "list all packages and versions and start comparing those".

If there's a security issue you need to rebuild your images anyway and optimally you have a system in place that represents images and their dependencies as some sort of dependency graph structure, so when you upgrade your base image all dependent images get rebuilt automatically.

I also use Red Hat UBI (Universal Base Images). If you pull down any package from DockerHub (even official images) you'll be surprised about how many vulnerabilities they have.

Second this. The UBI images are meticulously maintained by Red Hat and are freely available and redistributable without a subscription. Some of the largest companies in the world are using these in production and putting dollars behind them, so you can be pretty confident that they will work and be maintained.

If you want to create container images for an application you wrote yourself in a commonly used programming language (Go, Python), consider using Bazel with rules_docker:


rules_docker allows you to create byte-for-byte reproducible container images on your system, without even having a Docker daemon installed. So much cleaner to use than 'docker build' once you get the hang of it!

If you're building images from go binaries you can do that without Bazel using github.com/google/ko :)

Google's distroless containers are an interesting approach for both security and performance as well, albeit with limited language support: https://github.com/GoogleContainerTools/distroless

Relatedly, rust-musl-builder [0] is useful for getting Rust binaries to run on the `static` instead of `cc` base image.

[0] https://github.com/emk/rust-musl-builder

What's the advantage of doing that?

you can drop the resulting binary in a scratch container that has no shell, etc.

Regarding using

    USER somenonrootuser
... how do people deal with the need to write things out on container start? Some of my services require config files, and I need to interpolate the values of environment variables into those config files, write them out, and then start the application. (Also assume the application itself doesn't have options for dropping privileges.) I'd rather not make the filesystem locations in question writable by `somenonrootuser`.

My current strategy is to let the container entrypoint start as root, but then I have a wrapper program installed that drops privileges before it exec()s the actual service. It works, but is there a better / more accepted way of doing this?

If the services can't be modified to load their config directly from env vars, write the config to an off-root scratch volume (e.g. mounted to /tmp/) and have them load from that. The root volume should be mounted read-only either way to prevent modification of your services should something get RCE.

If you control the services, make them read from user-writable files instead of root ones. (Ideally you could skip the middleman and just read from env vars directly!)

If you don't control the services, maybe in the Dockerfile you could write a dummy config file and chmod it?

[deleted] this idea doesn't work on attempting

Great post. Two points:

..If you don’t inspect the wget script you might as while pipe it into bash.

.. How to distribute secrets if not by env? (which I agree! Honest question)

Disclaimer: I work for Red Hat as an OpenShift consultant so I'm biased

There are competing pieces of advice for secure distribution of secrets, but my current preference comes down to one of these ways, depending on the organization:

1. OpenShift/Kubernetes Secrets mounted into the Pod at runtime.

2. Hashicorp Vault (has a really well designed API. It's very usable just with curl, which makes using it a joy)

3. Sealed Secrets (less experience here but it's looking positive right now) - https://github.com/bitnami-labs/sealed-secrets

If you're using a different PaaS besides OpenShift, it may also offer options worth considering (although do think about portability. These days apps move platforms every few years on average, though I think that may be changing now that K8s is becoming the standard).

> 1. OpenShift/Kubernetes Secrets mounted into the Pod at runtime.

Do you recommend mounting secrets as environment variables to the kubernetes pods instead of files?

Yes, that is by far my preference. Much more 12 factor app-ish and framework independent. A lot of Java apps will want files though, so sometimes it isn't possible.

Files should be used over environment variables. The file system at least as some form of RBAC through file permissions.

Thank you for the pointers! I’ll have a look!

I think they meant to not ship secrets inside the container using the Dockerfile keyword ENV, because they're retrievable. If you must ship a ENV value in an image to the public (it's quite useful for config values that need a default), then know that it isn't secret anymore.

If you need to provide a secret value to an image and it needs to remain secret (like a database password), you most commonly would set the env values at runtime or volume mount a config file at runtime.

On a different side of this, if you need a secret at image build time (like an SSH key to access a private repo), you can use build arguments with the ARG keyword and they won't persist into the final image. Multi stage dockerfiles are also a great way to keep your final image lean and clean.

Thank you for your pointers and the clarification about ENV! I actually misunderstood.

Secrets are an after thought in docker. When I first started using docker I was surprised at how _rubbish_ it was

I've found its best to use the secrets provider that comes with your cloud provider.

For AWS using SSM's get_parameter seems the best thing. But it means you need to find a custom shim to put in your container that will go and fetch the secrets and put them somewhere they are needed.

There’s also Secrets Manager which integrates with other services and has hooks for custom secret-fetching and rotation, so your application doesn’t need to.

Keep them away from the container and use one or more of the following:

- A vault (Conjur, HCV, something else)

- A built-in credential service that comes with your cloud

- A sidecar that injects credentials or authenticates connections to the backend directly (Secretless Broker, service meshes, etc)

If you are doing a poor man's solution, mounted tmpfs volumes that contain secrets are not terrible (but they're not really that much safer than env vars).

Keep them away from the container image

Keep them away from both the image and the container! Getting env var values dumped for a process is trivial outside of the process and even easier within the container process space.

It astounds me how many developers don't realize just how many places environment variables end up, even on a properly functioning server.

common info pages (ex: phpinfo), core dumps, debug errors and logs are notorious for containing them. And those aren't even counting the ways a malicious actor can persuade a program to provide them.

We use `sops`[1] to do this and it works really well.

There is a Google Cloud KMS keyring (for typical usage) and a GPG key (for emergency/offline usage) set up to handle the encryption/decryption of files that store secrets for each application's deployment. I have some bash scripts that run on CI which are essentially just glorified wrappers to `sops` CLI to generate the appropriate `.env` file for the application, which is put into the container by the `Dockerfile`.

Applications are already configured to read configuration/secrets from a `.env` file (or YAML/JSON, depending on context), so this works pretty easily and avoids depending on secrets being set in the `ENV` at build time.

You can also, of course, pass any decrypted values from `sops` as arguments to your container deployment tool of choice (e.g. `helm deploy foo --set myapp.db.username=${decrypted_value_from_sops}`) and not bundle any secrets at build time at all.

[1] https://github.com/mozilla/sops

I did not know sops, thx for the pointer!

> How to distribute secrets if not by env? (which I agree! Honest question)

You'll want to use BuildKit (`docker buildx`), see https://docs.docker.com/develop/develop-images/build_enhance...

[edit] My bad, that works for secrets needed at build time, not at runtime of course.

I use docker secrets[0] and a script like this[1] to inject them in the ENV hashmap in my app.

[0]: https://www.docker.com/blog/docker-secrets-management/

[1]: https://gitlab.com/-/snippets/2029832

Hadolint, a docker file linter checks for most of those best practices


It looks really great at first to have a hassle-free alternative with sensible defaults.

Unfortunately installing seems non-trivial. Would love simple binary or .deb/.rpm install.

Looks like it is packaged as .rpm at least https://pkgs.org/download/hadolint

> Using ENV to store secrets is bad practice because Dockerfiles are usually distributed with the application

Nevermind Dockerfiles, env variables are preserved in stopped containers that are hanging around even when you use docker-machine AFAIK, you can easily `docker inspect` them.

Always use `--rm` flag to automatically remove containers when it exits could probably be another best practice.

Is this referring to ENVs inside the Docker image or on the system running it as a whole?

Aren’t they the same thing (sort of)? The ENV in Dockerfile (or .env if you’re doing docker-compose) will be available during the build as well as runtime.

Really solid post.

I've helped a lot of people with Dockerfiles ranging from horrendous security issues to simple bad practice making lives hell, and much of this is solid advice.

A lot of what I tell people boils down to: keep your container lean and clean. Don't do things in the container that you wouldn't do on the host (like curl-ing from the internet into bash as root :-D, or using questionable base images).

My deployment life has been vastly improved by shipping in containers, but I have seen a lot of security regressions because people feel safe to be reckless (like running the app as root) due to the container guard rails. Don't think this way.

Besides the Docker practices - does someone have experience with OPA? This is the first time I've heard of that tool, but an extensible policy tool like that might solve a lot of challenges we have at work.

We use OPA for use cases ranging from kubernetes admission control, to microservice authorization and CI/CD pipeline policies. It's one of those tools you can't realize you could have lived without once you start using it.

Why would you write such an article and not show what the actual _recommended_ practices look like?

Solid post though there are a couple of things I would disagree with:

> Do not upgrade your system packages

Most distros will have smooth upgrades and provide you with patched libs that your app may need and the latest image may not provide. It's slightly more prone to breaks but it creates a less vulnerable runtime app env.

> Do not use ‘latest’ tag for base image

Depends on the org but sometimes pinning means that you will likely end up using and end-of-life image because it requires proactive work to maintain. If you leave it as 'latest' this won't happen but you will get out-of-band breaks to keep that working. Choose wisely.

A few things I would add too:

- Don't mount Docker socket into any container unless absolutely necessary

- Your biggest security threat will be from your app's dependencies, not the container's setup

- Do not run a full init system unless absolutely necessary as this is just a security disaster waiting to happen. There are valid use cases for it but they're rare.

Can you explain your last point further please?

“Full” init systems tend to need to do things that are hard to secure in a container.

Many must run as root, and the reasons not to do that are discussed in the article this HN thread is discussing.

Systemd is particularly tricky because it needs to be able to control the cgroups of its child processes, which means the container needs to be granted that capability. See https://developers.redhat.com/blog/2019/04/24/how-to-run-sys... about how to run systemd in a container via Podman, and is a follow up to https://developers.redhat.com/blog/2016/09/13/running-system... which discusses why the Docker case is even more difficult.

That said, if you just want a process supervisor for a multi process container, there are several more minimal init systems that will work well, for example, supervisord.

Thanks for the response

I think he mixes 2 aspects. There is security and there is reproducibility/traceability/reliability.

For security using the latest versions of both base images and packages is typically a good thing. The cases that the newest package is more vulnerable since something 1, 2, 3 years old are not that common.

However, if your process requires reproducibility/traceability (medical and other regulated domains) you cannot just deploy the latest and greatest. You need to pass it through some release process first. That should not be an excuse to run outdated, vulnerable software though. The same holds if you require high availability. Even if you might not need to document what you are using, you want to test whether it causes performance issues (zero performance is the worst one...).

imho: always check what is upgrading - before make a decisions ..

the ubuntu:20.10 now want to upgrade the "libssl1.1"!

  docker run --rm -it ubuntu:20.10 bash -c "apt update && apt upgrade"
  The following packages will be upgraded:
   debianutils diffutils findutils gcc-10-base 
   libgcc-s1 libgnutls30 libprocps8 libssl1.1 libstdc++6
   libsystemd0 libudev1 procps sed zlib1g
the ubuntu:20.04 is better

  docker run --rm -it ubuntu:20.04 bash -c "apt update && apt upgrade"
  The following packages will be upgraded:
  gcc-10-base libgcc-s1 libstdc++6 zlib1g

One could do worse than using (more) stable distribution for their base images.

20.10 is not a LTS, so it's more likely to get semi-spurious updates than a LTS version like 20.04 or, say, debian stable.

For most of my non-alpine-based images, I use debian:buster-slim as base, as it's got a fairly stable base and gets quite routinely updated:

    $ docker run --rm -it debian:buster-slim bash -c "apt update >dev/null 2>&1 && apt upgrade"
    Reading package lists... Done
    Building dependency tree       
    Reading state information... Done
    Calculating upgrade... Done
    0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

> I use debian:buster-slim as base

this image has been upgraded "39 hours ago".. so you have to check 1-2 month later

  debian  buster-slim    f49666103347   39 hours ago   69.2MB

That's part of my point ;) It's updated very often.

I check for base image updates every day, and it's one of the most oft-updated ones -- hence my preference for using it as "the" base of all others.

Monthly typically (and when important security updates are released): https://github.com/docker-library/official-images/pulls?q=is...

Something that I would love to see some specific guidance on is TLS certificates. It seems like commonly used webservers (eg: nginx) pretty much insist on the private certificate being available effectively unencrypted to the web server process. On a traditional server that gets protected by user permissions, but inside Docker I am not sure how to achieve this. Is there a pattern people use to deploy certificates inside docker containers, or is the pattern "don't do that"?

Don’t deploy your certificates with docker. That’s not to say: Don’t give your web servers running in docker certificates, but that you shouldn’t distribute certificates with your docker images. The Private Key/Certificate pair should be added at runtime. How it gets there is specific to your management framework.

To give a specific example, on Kubernetes you store your Private Key and Certificate as a secret (there’s a special “type” of secret for this, but K8s doesn’t actually treat it any differently). You then mount that certificate as a volume (in /var/run/secrets/tls, or wherever you define it), and it shows up as a normal file, accessible to whichever user your container runs as.


So for clarity then, if somebody shells into the container - at that point it's still sitting there mounted and they can read the file? Or does k8s somehow manage this in a way that only the web server process can see it? Or do we just accept that you have to treat anyone with access to kubectl as authorised to know your private cert ?

> at that point it's still sitting there mounted and they can read the file?

Yes, if someone has exec access into your container, they'll be able to see the secret, unless you do something like making the K8s secret an already encrypted blob, and then in process decrypting it again and reading it. If someone's got exec access to your container though, you've got bigger problems.

> Or do we just accept that you have to treat anyone with access to kubectl as authorised to know your private cert ?

You can set up access so that someone can login with Kubectl but still only be readonly, and you can preclude read ("get") access to secrets to the readonly role, so they won't be able to view the contents of the secret.

There’s a default “view” clusterrole (don’t let the name fool you - you can bind it in a namespace with a RoleBinding instead of ClusterRoleBinding and provide view only for that namespace) that K8s defines, that specifically excludes read permissions on secrets. Use that.

You can add your certificates and keys to k8s secrets or docker secrets, then they will appear as files within the container.

Adding the certificates into the image... don't do that :)

Do not pass sensitive data to docker build via --build-arg. When you access this with "ARG" you will log the information in the docker history, visible to all. Use "--secret" or use the ARG in an intermediate build stage which doesn't have it's history preserved, then copy any necessary files form the intermediate image to your file image manually.

A perfect example of this would be passing your NPM_TOKEN to install company scope packages.

Is it just me or does this article highlight bad practices (and ways to detect them), but then not really address the “correct” way to avoid many of the bad practices?

+1... was just thinking this.

Quick question of you're not meant to use env for secrets then how are you meant to get secrets into your application?

What's the best way to handle this?

Probably either via a third party service (such as AWS secrets manager), or mounted as files scoped to the user your process is running as (which is not root, right? :) ).

Typically your host will have a service specifically for secrets. For example: https://docs.github.com/en/free-pro-team@latest/rest/referen...

Two kinds of secrets: build-time (I need an API key to access a download needed to set up the container) and run-time (I need an API key to talk to the database). The post argues against using ENV for build-time secrets, but it's still recommended for run-time secrets. Sibling comments address alternatives for getting build-time secrets in.

> Do not upgrade your system packages

Ah, the joys of working somewhere that isn't required to document, answer for, and ultimately remediate every CVE that is present in any package installed on any of your containers within your production application. Sadly, compliance and regulatory oversight don't leave this option open to everyone.

Is this a good argument for building containers as “bare metal” as possible? You don’t have to remedy CVEs (and rebuild your containers) for anything that isn’t actually your application.

That is pretty much the only thing possible in these scenarios. Anything Debian or Ubuntu or pretty much any "normal" distribution is right out: external vulnerability scanners always seem to go by package version, and `packagename-12.5.1-debian-security-fixes.b` is still the vulnerable version 12.5.1 as far as any scanner is concerned. At this point, we `FROM scratch` when possible, and deploy on AL2 when not.

There's good reasoning against the concept of barebones containers, but unfortunately everything from bricks, knives, and well-reasoned arguments all bounce harmlessly off of regulations and external compliance requirements.

A very nice web page, too! Snappy, no javascript apart from analytics, tastefully designed, mobile-friendly.

The number one advice should be to use a linter. The number two advice should be to use an image security scanner. These tools combined will prevent most issues. Integrate them with CI to enforce a common set of best practices across an organization and to prevent security bike shedding.

I have one more that I have seen used within corporate networks. If a proxy is used to build docker a lot of engineers add ENV http_proxy, https_proxy in the Dockerfile. This causes the proxy to be on by default. Use ARG instead if you just want the proxies at build time.

This article is interesting but it would be far more helpful to link to a solution for each point.

You can also use talisman to make sure you are not checking in secrets in dockerfiles https://github.com/thoughtworks/talisman

How about using a hardening guide such as CIS as part of the build process?

Great post, love the use of Open Policy Agent and Conftest!

How do you get around sometimes needing root inside the container to build things? For example building a container with buildroot inside.

Using intermediate container to build and copy over resulting binaries should work.

Dockerfile security best practices: treat is as you would treat any linux server. :)

> Do not store secrets in environment variables

Yes, definitely don't put secrets in the Dockerfile itself. I'm curious if there are reasons not to use a .env file though?

> Only use trusted base images

This is a good sentiment, but docker hub also has plenty of images that are built directly from a github repo. You can inspect the Dockerfile and (as long as you trust Docker) trust that it was built as written. In this case I recommend pinning to a specific container SHA (image_name@sha256:...) in case the source repo gets compromised. For official images, you can pin to a tag IMO.

> Do not use ‘latest’ tag for base image

Regardless of the security concerns, pinning to latest will probably bite you when there's a major version bump (or even maybe a minor one). Imagine you built a container on ubuntu:latest when latest was 20.04 and some new employee gets hit with 22.04. That's a bad surprise.

> Avoid curl bashing

Evergreen fight here, but I agree with the author that you should at least validate checksums on external files before executing or using them.

> Do not upgrade your system packages

This is where the throne of lies about Docker idempotency crumbles. You have to apt-get update because Debian/Ubuntu repositories remove old versions when there are security fixes (not 100% sure on this, feel free to correct me). So if the ubuntu:20.04 image is released and there's a security update in openssh-client, running "apt-get install openssh-client" without "apt-get update" will fail. So we all run "apt-get update" and pretend that the containers we build are time-invariant. They're not, and in fact we occasionally get security updates snuck in there. Luckily Debian and Ubuntu do a good job not breaking things and no one complains. But if you build a container on Tuesday it's not guaranteed to be the same as on Wednesday and there's nothing you can do about that with a Debian distro. But it's actually fine in practice so we pretend not to notice. The point about not running "apt-get upgrade" is kind of moot - you're going to effectively be upgrading whatever you "apt-get install", so it's probably worth taking the same trade on the built-in packages.

Importantly, there's no security risk - just a configuration one.

> Do not use ADD if possible

Same as above - you should be checksumming external dependencies or self-hosting them.

> Do not root

> Do not sudo

I haven't thought deeply about these. I'd prefer to trust the container system isolation rather than playing with users and permissions. I don't understand this risk well enough to have a well-formed opinion.

> Yes, definitely don't put secrets in the Dockerfile itself. I'm curious if there are reasons not to use a .env file though?

Because often that .env file will get checked into source control (sometimes intentionally, sometimes accidentally), and then you have secrets in your git history that you need to rotate.

In general the best thing to do is store your secrets in a place/service specifically designed for secrets, and either fetch them directly into the container in your entrypoint, or pull them into a location that you mount as a volume in the container.

> I'd prefer to trust the container system isolation rather than playing with users and permissions.

Don't. Containers don't give you full isolation. The 'root' user inside the container is the same 'root' user as outside, and container escapes may be possible. A good defense-in-depth strategy suggests that you should run things using the least privileges possible, and that doesn't change just because something is running in a container.

>> Do not store secrets in environment variables

> Yes, definitely don't put secrets in the Dockerfile itself. I'm curious if there are reasons not to use a .env file though?

Environment variables are a terrible place to store secrets, regardless of whether you're using docker.

Environment variable values get dumped all over the place. /proc/*/environ, docker inspect, /var/log/..., core dumps, error messages, info pages (phpinfo), etc. Also, unlike file handles and secrets services (hashicorp vault, etc), every child process inherits all of its parents' environment variables, greatly increasing the attack surface.

> you should at least validate checksums on external files before executing or using them.

I would do that by getting the checksum off the website, and hardcoding it into my Dockerfile, right? At that point I've pegged to a version, and I might as well just keep the binary local and COPY it in.

Is that right? And am I accidentally reviving the 'evergreen fight'? :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact