> The problem with this approach is that is not portable. What if I am developing using more than one computers where in each computer my user has different ID?
Make the build script use local $USERID and $GROUPID as args during the build process.
In docker-compose.yml (or, if using docker directly, using --build-arg):
But that doesn't solve the problem, just works around it:
1. Images are still pre-baked with a given UID/GID pair, so you can't distribute them as something universal and reusable.
2. This requires workarounds / extra steps on a local workstation, so it doesn't work for everyone unless they follow a given project's unique quirks setup.
Shell/compose duct tape like this doesn't make for a great experience, this really should be solved by upstream projects themselves as it's an extremely common issue when attempting to use Docker.
1. Nope, they are not pre-baked. They are built at runtime from env vars on each machine.
2. One step, setting up two vars. They can be set by a build script. Lots of things have build scripts way more complicated than this.
The only tedious thing is you have to adapt this for every image type you run.
> The only tedious thing is you have to adapt this for every image type you run.
The tedious thing is that this escalates into complexity whenever you have to deal with K developers using M projects developed by N teams each using a different way to handle this:
Do I need to set USERID for project foo, or UID? Does it default to 1000 or the author's UID? Oh, someone has a problem with our project, did they remember to set COMPANY_USERID in their bashrc? Oh, wait, they're using zsh, how do you do that there? Oh, but they followed this other project's readme and that set COMPANY_USERID but not COMPANY_GROUPID...
Docker is supposed to simplify this by unification and a limited API surface, and applying hacks like this on top kind of kills that whole premise.
> Do I need to set USERID for project foo, or UID? Does it default to 1000 or the author's UID? Oh, someone has a problem with our project, did they remember to set COMPANY_USERID in their bashrc? Oh, wait, they're using zsh, how do you do that there? Oh, but they followed this other project's readme and that set COMPANY_USERID but not COMPANY_GROUPID...
You set it to the output of id -u and id -g. It's two lines. There are definitely lots of things more complex when dealing with docker than this.
You provide the team with a script containing those two lines and a docker-compose wrapper and you're set.
Of course it would have been better not to have to care about these things, but hey, at least you're not installing and configuring 4-5 services to bootstrap an application.
If you have to build it on each machine, I would not consider that easily/universally distributable. One of the key points of Docker is you can build once (in your CI or someone else's) and run it on any machine. I think that was GP's point.
Sure, great, let me just rebuild all my docker images on every single machine they run on thereby completely defeating the point of having images in the first place.
The solution is for docker-compose (or plain docker).
I don't think the reproducibility is out. It's the same app, the same image, the same intended user, you just inject, once, the local user and group ids.
but that _requires_ you to build-at-runtime, which is sometimes not the best way to deploy a docker app. if you have one app that you want to run on many nodes, you'll want to set up a docker registry and have the nodes pull pre-built images.
For anyone that uses immutable infrastructure where servers’ configuration is never once built and subsequent deployments result in replacement with entirely new VMs, building once per machine still happens every time there is a deployment. You don’t ever reboot these machines.
In environments where vulnerability scanning of docker images used is important, running anything in production that isn’t stored in a docker registry kind of breaks things.
This approach also won’t work with container orchestrators like Kubernetes, ECS, Lambda, CloudRun, etc.
Where I can see doing a docker build of a small layer that just sets file perms potentially being useful is for container based dev environments to be ran on laptops and workstations.
This has been a major Docker pain point, and not many people know about this trick. I didn't know you could have the variables in the Compose file directly, does that really work?
Our approach so far was to add yet another layer (a script to pass uid/gid to Compose), but if we don't need the script that would be fantastic.
EDIT: Ah, I just saw the bashrc wrinkle you mention. Yeah, that's why we had the script, and it's a damn shame Docker can't do this natively. It has been a major hassle.
> I didn't know you could have the variables in the Compose file directly, does that really work?
Yep, it's because the build args get read in from a .env file by default and then from there Docker Compose sends those build args to Docker when it builds the image.
This was one of the topics from my talk at DockerCon last week (creating a production ready Docker Compose set up). The video and 6,000 word blog post for it will be coming out tomorrow. Both things will be added to the talk's reference links at https://github.com/nickjj/dockercon21-docker-best-practices.
That's interesting, thanks! My shell sets the USER variable (but no USERID or GROUPID), which might be good enough for all our developers, but probably not reliable enough for a general audience.
Honestly in practice everything tends to work fine without any hacks or extra scripts.
I run all of my containers as a non-root user and create the user in the image with its default values of 1000:1000 for the uid:gid. I haven't bothered to expose the uid:gid as build arguments because it's pretty much never an issue in development or production.
With a uid:gid of 1000:1000 built into the image any bind mounted files end up being correctly owned by the Docker host's user under the following conditions:
- Docker Desktop on macOS
- Docker Desktop on Windows using WSL 1
- Docker Desktop on Windows using WSL 2 and native Linux (as long as your dev box's user is set to 1000:1000)
IMO it's really rare that your dev box's user wouldn't be 1000:1000 on native Linux or WSL 2.
In production you also have full control over the uid:gid of your deploy user.
The only time where it kind of stinks is CI, but it's super easy to get around this by simply not using volumes in CI.
> IMO it's really rare that your dev box's user wouldn't be 1000:1000 on native Linux or WSL 2.
Any company-wide (GNU/)Linux deployment that uses LDAP or some other centralized user directory will not have devs with UID/GID 1000:1000. Hope is not a strategy.
> Any company-wide (GNU/)Linux deployment that uses LDAP...
You can go the extra mile and turn the UID:GID into build args like the original parent and you're good to go. No hacks necessary, and since it's all self contained into a .env file there's nothing extra you need to run since you're likely using an .env file already for other vars.
> You can go the extra mile and turn the UID:GID into build args like the original parent and you're good to go.
That doesn't help you if you're attempting to use pre-built/existing Docker images that are not built internally and make the assumption that “1000:1000 is good enough”. You then not only have to hack around Docker limitations, but also around someone else's broken assumption.
> That doesn't help you if you're attempting to use pre-built/existing Docker images that are not built internally
Most pre-built images that I've come across don't require bind mounts to function.
Images like PostgreSQL aren't affected by this because you can use a named volume, and most pre-built applications that are shipped as images tend to store their state in a database and don't require bind mounts to function.
Hm, you're right, I guess I've seen a non-1000 user very rarely. However, for a company of tens to hundreds of people where you want them to be able to develop locally, you might very well hit this issue, and if you hardcode 1000 it's going to be hard for them to work around it.
This method works well until it doesn't work at all, and I think I would prefer one that works slightly less well but also had an easier way to override it. Then again, I might try this and see if we ever hit an issue, thanks!
On Linux with Bash it runs with your current user and most other platforms it runs with id 1000, which is setup as the default user in the Dockerfile. This is no problem on MacOS or Windows because of the way Docker-Desktop uses VM's.
ZSH or other shells don't necessarily set $UID, so if you're running Linux, not id 1000 and not running Bash you might need a little .env file with `UID=1001` in it to make it work. And then the user is still nameless in the container. This is kind of rare and I only use it for dev containers where most relevant files (and permissions) are bind-mounted from the host, so it hasn't really been a problem in practice.
Remaps would be cleaner but I find it too much work to explain for normal developers just wanting to use a dev container.
Containers are ideally meant for a single service. The best way I've found is to just pass the `--user` flag to `docker run` and have the service run as whatever user it is that you want. The only challenge is that you need to make sure that the volume mounts are already created on the host with the correct permissions.
If you built the container or inspected it before running you should know what the container is doing. Again, containers like Docker aren't really "meant" to run multiple processes. They are meant to run a single process and your app should be able to run as whatever user you run the container with. If you want to run multiple processes or services inside a single container then ultimately you're better off with a different container solution.
1) pass user/group names around and resolve them at the destination to UID/GID;
2) ignore them entirely; assign ownership of all newly created files to the currently authenticated user (if authorized).
There is a new mount syscall in Linux 5.12, see "ID remapping in mounts" [1], that should help with all the permission madness, eventually.
It allows different mounts to expose the same content with different ownership, and in general to map permissions IDs between mounts in any way we like.
systemd-homed wll use that to abstract over the uids and gids of portable home directories, for example.
This doesn't even really seem like a problem that docker introduced. All these problems have been encountered by anyone running an NFS server, or a dozen other ways you can have systems with disparate uid/gid mappings using a shared or removable file system
Using uid 0 in containers is asking for trouble. Any privileged resources (such as low ports) can be mapped in without messing with capabilities so there should be no need for it.
The port mapping is done by the container engine, not the container. Using low ports is allowed if the engine runs as root.
Moreover I think it’s acceptable to use uid 0 inside a rootless container like podman since it’s by default only mapped to the user running it.
AWS Fargate won't let you remap ports. Whatever the container exposes, that's the port it's going to listen on. To work around this and other problems, I ended up making fat containers that start as root, and add entrypoints that can either run a process as root (to listen on low ports) or sudo to a user to drop perms before starting a process (to listen on high ports).
There's also weird junk you sometimes need to do in order to capture file handles depending on how a container engine is running the container, which you need to do before you fork or drop privs. But it took me years to finally run into that use case, most people will never need to do this.
Shameless plug: a boilerplate where I had to solve UID permissions, running as non-root user, publishing files to another container, mounting fs as read only, and hot reloading in dev environment.
It's still pretty much a proof of concept and it relies on docker compose but perhaps some of you may find it useful as a starting point: https://github.com/tacone/loki
- `-R`: There is existing content in `~/alpine` you want made avalable
- `x`: You want your container to be able to create directories
However, you can still run into problems if
- Your container copies data from outside your bind-mount to inside. It sort-of worked except somehow the mask was `r--`, making things lose writeable.
- Your container moves data from outside your bind-mount to inside. This fully preserves the permissions.
I ended up creating a `.keep` file in the bind mount and doing a `cp --attributes-only --preserve=mode,ownership,xattr .keep <target>`
For the local development scenario, I made an open source utility that uses the setuid bit to change the UID/GID of a particular user and any files that user owns inside of the container at runtime:
I think all those problems disappear when you run containers with proper orchestration tools, such as Kubernetes.
And not only that, I think that examples given in the article ("Assume that your Apache/PHP container is mounting the host’s /home/alexandros/myapp/ application directory to the container’s /var/www/html directory.") are in fact anti-patterns. If your container depends on specific file being available at specific location on the host then you're doing it wrong. The only place where that makes sense is on developer's local environment. In shared enviornments you want something like Kubernetes ConfigMap to contain config files, and dedicated persistent volumes for everything else.
The orchestration tool does not provide any additional functionality to fix this problem, it's up to the container execution environment, and today's container execution environments have no way (that I am aware of) to natively map file permissions outside of the container.
It could be I just haven't dug enough into the kernel internals, maybe there is a transparent permissions remapping thing. But something would absolutely have to map permissions. Otherwise there is no way to use filesystem ownership between execution environments without them using conflicting UID/GIDs, to say nothing of changing the file perms.
It makes sense that mounting a volume requires understanding a user mapping tbh. I think the answer is twofold:
a) Many problems solvable with a volume can be solved with a bind-mount, cache-mount, etc [0].
b) In the event that you actually need to map in a user-file, wrap the docker command in a script that manages the logic. At this point you're writing a system tool that's doing things outside of the context of a container - it's not really docker's fault that it doesn't try to make this trivial.
>
First of all, security issues may rise in a production system. If a container is compromised and the container is executed as root (uid = 0), then the intruder has access to any file of the host filesystem that has been loaded to the container filesystem through a mount. The owner UID of files that belong to the host root will be 0 in the container. So, they will be accessible to the intruder.
Use supervisord to coordinate the processes inside your Docker container, as easy as that. Bonus point, you don't need to wrangle with properly handling "docker stop"/ctrl+c.
Isn't this a bit of an anti-pattern? There really are very few situations in which you should be mounting things in production. Apache/PHP/etc is definitely not one of those situations.
I would absolutely say it's a production antipattern to run a container with access to some already existing host files belonging to some other user.
However, this is something that's basically unavoidable if you're attempting to use OCI/Docker for dev where you access a developer's source code checkout from a container running a standardized language runtime. And that's what a lot of people use OCI/Docker for...
Sure, that's one of the cases when this might needed in prod (although in the parent post I meant only access to honest-to-god data files, not things like bindmounting /dev).
In practice bindmount smell can also be somewhat alleviated by using things like k8s device plugins to request things at a higher level ('I want GPU access' vs. 'please bindmount /dev/drm... and use the proper modes'). It's still effectively a bindmount, but some extra security precautions can be made to ensure exclusive access and that no arbitrary mounts from the host are permitted. And things like k8s device plugins can also poke at file modes and other namespace magic at runtime so that the end user never has to worry about things like UID/GID and chardev modes. That IMO prevents the smell associated with random host bindmouts.
They're also very easy to write, so if you ever happen to run k8s and need to give workloads access to some odd/custom host hardware, implementing a proper plugin for it is quite painless and gives much better guarantees than plain bindmounts.
Every additional mount can be considered as extra failure in design in terms of security or just being considered as laziness. Those all increase the attack vector. Even though containers are not designed in terms of isolation, every mount and volume are one step closer to break this isolation. Of course, the total risk depends on where from you are mounting.
Related to this post, a recent runc version included a change that inadvertently made a number of images built on the distroless base image difficult to use: https://danielmangum.com/posts/runc-chdir-to-cwd/
This is something CharlieCloud was built around for HPC and something podman can work around. User namespaces and fuse-overlayfs are the building blocks to fix this
Ive always solved this by just having a proxy script that creates a user when the container starts with the right UID/GID then executes the given command.
Make the build script use local $USERID and $GROUPID as args during the build process.
In docker-compose.yml (or, if using docker directly, using --build-arg):
So you're passing the local uid and gid as variables to the build process.(1)In build/Dockerfile:
(1) $USERID and $USERID might not be available as an environment variable on your system. To do so, place this under .bashrc: