Hacker News new | comments | ask | show | jobs | submit login
Systemd's DynamicUser feature is currently dangerous (utoronto.ca)
100 points by pwg 5 months ago | hide | past | web | favorite | 45 comments

This is the first time I'm learning about DynamicUser, and I'd appreciate if someone check if the following line of thought is valid:

• We have compute environments that use a central LDAP service for end-user account info. End-user account info is served from LDAP, with `sssd` or `nslcd` being used on the systems (the only locally-defined accounts are system accounts and special admin accounts).

• We do have accounts with UIDs in the range that systemd uses for the DynamicUser feature.

• If systemd starts a DynamicUser service before the LDAP client (`sssd` et al) is up and running, it might allocate a UID that is in use by LDAP.

• The systemd documentation does not specify if UIDs are chosen sequentially, at random, or via some other method. So, one must assume that any UID in the range—if (apparently) free—may be allocated.

Does that thinking make sense? It seems like we will have a pretty big problem to deal with when we decide to move to a distro version which includes DynamicUser support (for example, Ubuntu 18.04). It also doesn't look like it's possible to set a custom range for DynamicUser UIDs.

Looks like the range is compiled in.



From the man page:

> Dynamic users/groups are allocated from the UID/GID range 61184…65519. It is recommended to avoid this range for regular system or login users.

It looks like you will have to compile systemd with a different range to behave well on your system.

Edit: Apparently the UID allocation is randomish:


> Looks like the range is compiled in.

Of course it is. UNIX devs...

You should checkout suckless projects, they compile in everything. There is no config. Or rather, the #define statements are the config.

I think the only time I legit used that is on an Arduino where both code and data have severe size limits.

Alright, but they're designed for you to read through config.h (note that these settings aren't inlined) and adjust them for your system before building a binary. They're very explicitly not designed for a "grab your vendor binary and you'll be fine" model. It's a different paradigm.

EDIT: The programs themselves are intended to be written in a programmer-modifiable way (they host a list of common "plugins" in patch format). I think the last thing you can say about systemd is that it encourages you to jump in and modify its behaviour to suit your needs. That's not a Supported Configuration(tm).

>That's not a Supported Configuration(tm).

Depends on what you modify but it's to my knowledge supported if your patches didn't cause the bug or behaviour problem. Plus there is plenty of ways to configure systemd at compile time, you can toggle a lot of switches in systemd.

No, Linux devs...

No, systemd devs...

qmail was written in the mid-90's, and all of the UIDs it would run as were compiled in. Meaning that if one of your mail users was UID 650 at compile time, the value of 650 would be compiled into the binary and it would always (and only) run that binary as that UID; if it didn't match up, it would fail.

There were a lot of things about qmail that were outright stupid and awful, but not being able to port binaries between systems without a lot of checking was one of my favourites.

leni536 is correct, the range is specified at compile time.


However, do note that if the service file sets a User= name, and that username already exists, then it will simply use that user, and disable DynamicUser for that service. All of the service files that ship with systemd itself set User=, making this possible for them, but 3rd-party service files might not.

One thing you could look in to is writing a systemd generator (see the systemd.generator(7) man page) that injects User= in to DynamicUser=yes service files, and then ensures that local system users with each of those names exist.

This would be quite difficult to debug for the average user. A lot of systemd functionality appears to be designed for military environments served by Redhat for instance journalctls binary logs for security auditing. But this is not required by the vast majority of Linux users.

Offloading complexity to everyone to serve a specific use case is bad design. It's like implementing high security military procedures in the average office, not needed and a waste of time and resources.

Shouldn't security features over engineered by design like a time daemon launched by dynamic users in a new mount space left to user choice. Surely those who need that level of security should take the responsibility to enable it, accept the debt and deal with the complexity, rather than imposing it on everyone else. In this case ntpd is a better solution for average users.

Most distributions voted for an init system. An init has a limited role. Systemd is proving to be anything but.

I agree entirely and the complexity over systemd has been one of the major concerns.

I do like having a standardized way of managing processes. Systemd does make packaging deb/rpm files way easier, but I don't really like the price. Everything is abstracted to systemd. Mounts. udev. mult-users/logins (consolekit).

I like fstab with UUIDs. I like manually mounting a USB stick when I insert it. I like having the options of using an automounter or not using an automounter.

At home I stick to Gentoo and Void. runit is super simple and I like the concept behind it (although it does lack in some exceptions/logging issues).

I think ideally on my hosted solutions, the best thing going forward is a thin Alpine with Docker and running all services as docker containers.

I really wish the FreeBSD port of Docker was still maintained. I'd switch everything to FreeBSD+Docker if I could.

> I think ideally on my hosted solutions, the best thing going forward is a thin Alpine with Docker and running all services as docker containers.

That sounds like a lot more trouble than it's worth, compared to CoreOS or Ubuntu Core. If everything is running in Docker, why does the "hypervisor's" use of systemd matter? It's not using it for anything.

> Surely those who need that level of security should take the responsibility to enable it

That implies two code paths, one that enables the security and one that doesn't. That is more complicated (and less testable!) than either code-path on its own.

Security costs more than insecurity, but sometimes-security is the worst of all worlds.

Most security features which try to lock things down properly, especially when doing no-access-by-default, cause problems in unforeseen cases.

I don't think it's dangerous to develop better implementations that improve security, even if they go wrong occasionally. Feels more dangerous to me shooting down the attempts of people trying to raise the security bar.

I prefer Linus philosophy of not breaking userspace, so they should try finding a solution that is not breaking things(I assume someone will comment that is impossible in this case, add an attempt to prove/motivate it)

Things are broken in this case because of a bug, not because of design decisions. All of the features "locking things down" are opt-in in the service file (to avoid "breaking userspace"), and the service file for systemd-timesyncd opted-in.

The "bug" being systemd making all sorts of undocumented assumptions about the environment it's running it?

I see that systemd is still junk.

I love Alpine Linux. If we (ZeroTier) had our infrastructure to do over we'd use it instead of CentOS for servers. It dumps systemd and countless other pieces of over-engineered cruft that you don't need. It's a thing of beauty. If you appreciate clean, well designed, fast, and parsimonious systems check it out.

FreeBSD is also worth checking out for the same reason. It lacks a bit on the hardware front but it's clean and fast and does not have systemd cancer.

Over-engineering is the plague of all modern software.

Completely agree, but what about Alpine using musl rather than GNU libc? AFAIK this has caused trouble in the past (such as https://github.com/gliderlabs/docker-alpine/issues/11), and will eventually in the future again since third-party packages don't test against non-GLIBC Linux or do they? I'm wondering if it's prime time for Devuan.

So long as Docker works on Alpine and Docker+musl bugs/issues are addresses and fixed, Alpine could be a great base for running a container infrastructure. Within the containers you could use regular Ubuntu/glibc based images.

I'd be careful with such assumptions. Docker forwards implementation details of the host system/userspace and isn't a VM after all. A Docker image is not forward-compatible with future Docker or host versions. Which implies the question whether people use Docker for the wrong reasons if their intention is to obtain future-proof reproducible builds etc., rather than merely increase image-per-machine density

Docker provides a lot more future-proof reproducibility than other deployment strategies, though. It's not 100% guaranteed, but whether a given docker image is forwards-compatible doesn't matter as long as the reproducibility of creating the Docker image doesn't change substantially.

In other words, as long as I can get (or make) an ubuntu:xenial docker image and apply the same (or similar) transformations to it to make the end result, it doesn't matter nearly as much whether specifically this Docker instance works across all versions of Docker forever.

Hating on systemd to hate on it is never going to give you a positive outlook. I personally really like the packaging simplicity and the concept of target files. But I have had lots of issues with the implementation and this article shows a pretty good example of just how complex it can get.

I wish there were drop in replacements that just reused the target/service files, but almost all of those have been abandoned.

systemd does solve legit full process management problems, but I think as we move to more docker based deployment strategies, its usefulness in that regard will start to decline.

I don't hate on it to hate it. I hate on it because it's orders of magnitude more complex than it needs to be, has a history of being broken, and somehow managed to end up harder to use and more obtuse than the ancient wad of shell scripts it replaced.

Its like its designers set out to make it as un-intuitive and hard to use as possible. I say this as a 20+ year Linux veteran.

> The user is automatically allocated from the UID range 61184–65519, by looking for a so far unused UID.

WAT. That's not OK. The UID (and GID) namespace is not that big (32-bit), but it's big enough to avoid conflicts with existing uses: just use a range within the larger range between (uid_t)(1UL<<31) and (uid_t)(-2).

Solaris 11+ and Illumos do this for dynamically assigning UIDs and GIDs to SIDs that are not mapped by name to Unix users/groups.

Inside of a container (with user namespacing enabled), you won't have the full 32-bit range, and this must all work inside of containers; I'm not sure about other container managers off the top of my head, but systemd-npawn only gives containers a 16-bit subrange.

The container UID namespaces should be the same size, damnit. (That is how it is in Solaris/Illumos zones...)

Linux user namespaces work as a 1-to-1 mapping of UIDs. Every UID in the container has to map to a UID on the host, so the UID range of the container is necessarily smaller than the UID range of the host (unless of course the map is the identity, but then what's the point of having a separate namespace?).

Maybe that is bad design, but if it is: it's Linux's fault, not systemd's.

Systemd already reduces UIDs to 16 bits, using the upper 16 bits for container IDs.

* https://news.ycombinator.com/item?id=10519578

That's dumb. There should be a container ID.

Since something's probably going to be said eventually, I'll do it this time.

> ...how timesyncd is supposed to get access through an inaccessible directory. I'll quote the explanation for that:

> > [Access through /var/lib/private] is achieved by invoking the service process in a slightly modified mount name-space: it will see most of the file hierarchy the same way as everything else on the system ([...]), except for /var/lib/private, which is over-mounted with a read-only tmpfs file system instance, with a slightly more liberal access mode permitting the service read access. [...]

Reading this, I didn't quite completely facepalm, but...

This solution - the high-level general architecture/approach; the ideas used - is, IMO, frankly insane.

It means your running system's state can no longer be easily and straightforwardly reasoned about: no longer can you run a few commands and get a high level idea of what's configured (with respect to filesystems) and how everything's set up, see what files are where, and immediately know what a given file's permissions are.

Instead, it seems you're now being expected to consider any arbitrary, given filesystem path you're puzzling over from the perspective of every FS namespace as viewed by each process (to be clear, this means every file * every namespace * every process). No sysadmin/devops type is going to do that; it's not sustainable.

This architecture is bizarre enough that few tools will be built to do adequate introspection unless developers (glares at one in particular) actually extend and build on this further and additional even more wonderful breakage happens as a result, meaning that the tools must be created in order to keep systems manageable. Hopefully things don't get that bad? - but in the meantime said tools don't exist, so people get to to reverse-engineer PID 1 (AHEM) the Fun™ way, and keep all the half-square, half-circle pieces they discover along the way.

Looking further afield, I'm more hesitant about the future of Linux as a viable trustworthy platform to have confidence in. I say that both from the perspective of straightforward enjoyable maintenance (which Linux is already struggling with) and from the perspective of reasonably consistent and surprise-free mental modelling to aid security best practice. UNIX was based on the idea of "everything's a file". Not, IMHO, the best/most efficient model; but okay. This... this blows that model out the window, because suddenly we have architectural interestingness being built on building blocks that exceed the scope of the original file model (look at a file, see the permissions of that file), but without pivoting/extending the basic building blocks of the system to incorporate the new models. Linux is still known as a UNIX clone, and the UNIX standards ("everything's a file" being fundamental) hasn't changed anytime recently, so this is... not dishonest, but definitely a potential source for a lot of confusion. And kind of technically dishonest.

Furthermore, there's no defined direction for this new... standard? that seems to be appearing. I can't effectively model this seemingly byzantine architecture; I can't intuit landmarks or similarities from other systems (although I'll admit I've only used Linux, Windows and DOS).

I do understand mount, PID and network namespacing. These concepts are not that difficult to reason about, in isolation. But they can be combined in very very unintuitive ways that make state analysis very difficult, and what I'm trying to express here is that I don't consider the architecture presented to be intuitive, easy to debug, or effective. (I never envisaged namespacing being used like this, of course.) Perhaps it was the simplest solution, in isolation, but it doesn't feel well-designed or thought through (with respect to sane diagnostics and transparent low-level housekeeping).

Part of my freakout is that the tools available to examine namespaces are very target-specific; they don't consider the system as a whole. The question is whether the developers (briefly resumes glaring) nearest the namespace bits would be willing to maintain tools to help introspect at a holstic level. That may be needed soon.

I guess the other part is that it feels Linux is getting really complicated. I think, based on my understanding of psychology, that this may be because I've been using Linux for a few years now (a decade or so), and my usage of it has perhaps become ingrained and rusted in place. Maybe so. But I do also wonder if the bazaar has scaled to the point where nobody can keep track of all the pieces as they move forward.

> I can't effectively model this seemingly byzantine architecture; I can't intuit landmarks or similarities from other systems (although I'll admit I've only used Linux, Windows and DOS).

This sounds kind of like Plan9, but that used filesytems as a unifying principle to simplify things. This sounds more like complexity stacked on top of complexity like a house of cards...

> UNIX was based on the idea of "everything's a file".

Namely it was. From 1970 to 1983 when sockets came. Are network interfaces files? Routing tables? I have not counted, but there should be numerous system calls do not take a single file descriptor parameter. And even those who take one, the object behind the file descriptor is often not discoverable in the filesystem.

> No sysadmin/devops type is going to do that; it's not sustainable.

sysadmins need to accept that 1970 Unix skills are useful, but no longer sufficient. When you look at your system, always run "lsns" first.

(That said I recently had the feeling that "lsns" did not show all namespaces. Did not hunt it down though, because it wasn't really urgent/important at that moment.)

Edit: to be clear it should be "sudo lsns" otherwise it will not be system view.

> Namely it was. From 1970 to 1983 when sockets came. Are network interfaces files? Routing tables? I have not counted, but there should be numerous system calls do not take a single file descriptor parameter. And even those who take one, the object behind the file descriptor is often not discoverable in the filesystem.

Good point. Plan 9 re-encapsulated everything, I think, but no other UNIX has done so.

I kind of didn't really factor this in, I guess I conveniently forgot about sockets in the mental model I was using to reason with in my previous comment.

So, that means the complexity train will just move forward, I guess. This exists now, it's presumably not the end of the world, and yay now I have more things to remember about Linux internals.

My two quibbles that remain are that

- this is not easily discoverable (yay) or (currently) able to be visualized, so people's understanding of this will depend on their mental modelling being good

- no, everything isn't a file, but we just moved closer to "everything's a file isn't really a file though", because this messes around with what's left of that idea, in spite of the fact that yes there's not much left of the concept.

TIL about lsns, although it isn't listing all the groups I have (I made a memory namespace earlier to contain some processes, and it's not showing up in sudo lsns, heh).

> I made a memory namespace earlier to contain some processes, and it's not showing up in sudo lsns,

What kernel do you run? I have never heard about memory namespaces. And http://man7.org/linux/man-pages/man7/namespaces.7.html neither.

Did you possibly mean memory cgroups? Obviously lsns doesn't show them. I don't know a tool that gives a good overview over cgroups at one glance. I have used custom scripts based on

  find /sys/fs/cgroup/ -name tasks
in the past.

_Whoops_, I completely conflated cgroups and namespaces, thanks :)

Can't you say the same thing about containers? Each container has its own filesystem, PID and network namespace. I don't see people experiencing containers as byzantine.

The issue isn't the mere existence of namespacing or scoping. It's about having explicit and consistent models for how these things work. Container frameworks try very hard to introduce a single boundary that applies to all those resources, and so are straightforward to reason about.

Containers are a nice clean abstraction: a system within a system, where every process running inside it shares the container's filesystem, PID list, and networking. Things can get a little gnarly when the abstraction leaks, but in principle they're easy to understand. This is a lot uglier precisely because it's not total isolation: the processes that use it share most of their filesystem view, networking, etc with the rest of the system, can interact with everything else on the system, but some of their filesystem view is different and the exact difference depends on when the process was started.

Containers are byzantine, but presumably you can at least run your own process inside the container and debug from there? How can you hope to even start to debug if your debugger has a different view of the filesystem from the process you're debugging?

You can use the nsenter(1) command to enter the namespace of another process.

For a Hacker News discussion of this feature when it was first announced, see https://news.ycombinator.com/item?id=15419100 .

https://news.ycombinator.com/item?id=17714360 had the correct title from the original. (-:

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact