Hacker News new | comments | ask | show | jobs | submit login
Systemd as tragedy (lwn.net)
288 points by wyldfire 20 days ago | hide | past | web | favorite | 401 comments



Back in 2016, systemd started killing user processes on logout (rather than send them the SIGHUP signal, as POSIX says should happen). This caused problems for programs like nohup, screen and tmux, which deliberately keep running. Systemd's response was to say that they should incorporate systemd's library, and use systemd's new daemonization API. As far as I know, none of them did.

Two years later, you can find hundreds of support requests across the internet, from frustrated users who are having their sessions killed by systemd.

Bugs are annoying, but that's life. On the other hand, when you're an impacted user who's lost work, and researching the bug leads you to a years-old discussion in which someone is actively denying that the bug exists and refusing to fix it, that's infuriating. I don't think systemd's developers deserve the trust that maintaining a core piece of infrastructure requires; they don't seem to care enough about whether they've broken things.


You know, because we knew this would be controversial we made sure it was both a compile-time option and a runtime option. Yes the upstream default of both defaults to on, but that's just upstream. We made it very easy and supported for downstream distros to switch between opt-out and opt-in of this option for their users. We have encouraged distributions to leave it on, but we were fully aware that for compatibility reasons this is something downstreams likely wanted to turn off, and most compat-minded distros did, as we expected.

Now I am used to taking blame for apparently everything that every went wrong on Linux, but you might as well blame your downstream distros for this as much you want to blame us upstream about this, as it's up to them to pick the right compile-time options matching their userbase and requirements in compatibility, and if they didn't do that to your liking, then maybe you should complain to them first.

(And yes, I still consider it a weakness of UNIX that "logout" doesn't really mean "logout", but just "maybe, please, if you'd be so kind, i'd like to exit, but not quite". I mean, that's not how you build a secure system. We fixed that really, fully knowing it would depart from UNIX tradition, but that's why we made it both compile-time and runtime configurable)

(Also, nobody has to "incorporate" systemd's library to avoid the automatic clean-up. In fact, there's no library we provide that could do that. What was requested though is to either run things as child of systemd --user or just register a separate PAM session, neither of which requires any systemd-specific library.)

Lennart


> Now I am used to taking blame for apparently everything that every went wrong on Linux, but you might as well blame your downstream distros for this as much you want to blame us upstream about this, as it's up to them to pick the right compile-time options matching their userbase and requirements in compatibility, and if they didn't do that to your liking, then maybe you should complain to them first.

It's up to you as a systemd developer to pick sane defaults. Claiming that it's okay to introduce opt-out breaking changes upstream and then abdicate responsibility is a quite bit like walking around while waving your hands and arms around and then blaming whoever you hit for walking into you.


Well. What is a distro for then if not for picking the most highlevel of defaults suitable for them?


> Well. What is a distro for then if not for picking the most highlevel of defaults suitable for them?

IOW the distros maintainers made a mistake by picking systemd? Agreed.


You are right. Distros failed us completely by choosing systemd.


Killing software that might be running after a valid login session is a sane default.


And that's what SIGHUP is for. The process will exit by default. If that's not the desired behavior a handler can be registered. Killing things that are explicitly designed to run after logout is a piss poor default.


We send SIGHUP btw. The kernel's own sending of SIGHUP is bound to the TTY concept btw, which is specific to TTY logins only, not graphical ones.

That said the question is not so much about who sends what, but more about whether a secure system should allow user code to escape lifecycle management or whether logging out means logging out and giving up all resources.


I get what you're saying. However, I'd probably apply the kernel rule of "when maintaining the kernel, do not do something which breaks user programs/applications". Yes, this isn't the kernel, but it's comparable in being a core function that heavily affects userland stuff.


Sometimes the ole way o' logg out is just insecure. And there is no way to conjure up a new backward compatible and secure way. cgroups work well, especially because they are not opt-in. That means programs daemonizing either has to set themselves up as a system service or start a new logind scope (or PAM session, etc. which translates to escaping the cgroup, which requires user approval to remain secure).


I know right, I run openvpn as user nobody and I keep thinking that nobody user better stay logged in!


If you created a problem, it's your duty to provide a workaround or a solution to the problem. Why not provide systemd specific version of `nohup` for such cases and encourage users to use it instead of old and insecure version?


> more about whether a secure system should allow user code to escape lifecycle management

Please stop trotting that tired old line out. It is simply untrue. Systemd does the exact opposite of providing increased security. If nothing else the greatly increased surface area of systemd makes for a less secure system.

The pwnie articulates a number of other ways in which your code and your behavior are actively reducing the security of Linux.


This. There's a reason the defacto way to keep running post logout was named "nohup". This wasn't some deep dark unknown secret behaviour that was broken.


It was called that because connected pty devices could hang up. Whether hanging up due to intentional logout or actually hanging up the modem was, and is, left as an exercise to the user. Unless we try to disambiguate it via login/pty manager programs, that is.


Because 1) maintainer can be overloaded, so (s)he will stick to defaults, 2) maintainer needs a logical reason to change default setting to something else, which is not obvious in most cases. Maintainer is not a QA team.


Look, it's everyone's responsibility, this doesn't just fall on Systemd. While it's clear that Systemd made some difficult changes to how user processes operate, it still performed the due diligence of providing the original behavior as configurations. They should reconfigure their tools. If they're not doing that, then it's not necessarily Systemd's fault that things don't work for sysadmins trying to use their tools.


Wait a minute. Why isn't it the distro's responsibility to choose the most compatible defaults?


Isn't it more efficient if 1 upstream picks the sane defaults rather than N distros? The situation was exactly the same when PulseAudio was introduced in Ubuntu. Audio broke for a huge amount of users and according to upstream it was because they had configured it wrongly...

IMO, it is part and parcel of designing great software that you pick as universally agreeable defaults as possible.


It's the responsibility of both to pick sane defaults. When the software developer picks insane defaults they are being antisocial, those distro packagers are people too and developers who pick insane defaults are causing unnecessary grief for packagers.


If you smell shit while walking down the street, maybe someone dropped a deuce on the sidewalk. If you smell shit everywhere you go, maybe it's you, maybe you shat your pants. When you violate the principle of least astonishment you're creating a huge stink.

That you can configure systemd to behave in a less obnoxious manner is well beside the point. Systemd should be unobtrusive and predictable without any extra action on the part of the distribution folks or end users.

That the suggestion is to simply read the code or documentation is the height of arrogance considering how sloppy and insecure the systemd code is (parse error equals root privileges? come on…).


Your argument assumes that systemd is simply meant to be a in-place compatible drop-in for what it replaces, which I don't think is something anyone would/should expect. If systemd was meant to behave the exact same way as systems it is replacing then there wouldn't be much point of it. For those cases it sometimes will break things, and will sometimes have settings to follow previous behavior.


There's plenty of room within the POSIX specs to address service management without requiring kernel integration, breaking userland tools, etc. When your init replacement manages to interfere with the kernel you've done something very, very wrong.


Not sure if I missed something here but how has it interfered with the kernel? AFAIK it has broken some userland tools (which is bad in itself in most cases), but actually breaking kernelspace is not something I've heard of.


https://igurublog.wordpress.com/2014/04/03/tso-and-linus-and...

Yet just two days ago, we see Linus Torvalds (the creator of Linux and maintainer of the Linux kernel), launching into a tirade against – yes, you guessed it – systemd developers because of their atrocious response to a bug in systemd that is crashing the kernel and preventing it from being debugged. Linus is so upset with systemd developer Kay Sievers (gee, where I have heard that name before – oh, that’s right, he’s the moron who refused to fix udev problems) that Linus is threatening to refuse any further contributions from this Red Hat developer, not just because of this bug, but because of a pattern of this behavior – a problem for Kay because Red Hat is also foaming at the mouth to have their kernel-based, no doubt bug- and security-flaw-ridden D-Bus implementation included in our kernels. Other developers were so peeved that they suggested simply triggering a kernel panic and halting the system when systemd is so much as detected in use.

The key phrase there is:

a bug in systemd that is crashing the kernel and preventing it from being debugged

Honestly though when you get Linus flaming your behavior you're doing something really wrong.


_Honestly though when you get Linus flaming your behavior you're doing something really wrong._

Haven't been around here long, have you? :-)


Yeah I know Linus likes to go on a good tear. But I'm not talking about flaming your code or design decisions, but flaming your behavior.


Likewise, of course, or you'd know that the tirades were more often than not in response to things that were indeed "really wrong" (at least by his standards).


from 2014. I'm only pointing it out to make it clear that the post wasn't recent. Not questioning anything else about it.


Some distros focus on user convenience some on security. Different defaults are required.

And sometimes security requires breaking compatibility.


There's a bug here, which impacts end users: a variety of programs which are clearly intended to persist in the background (nohup, tmux, etc) are failing to persist. This is a real bug. We care about it. I won't be satisfied until it appears that the bug is on track to be fixed, and a lot of other people won't either.

The options for fixing the bug are:

* nohup, tmux, emacs, etc all take dependencies on systemd and use the new systemd daemonization procedure. This is not a viable path because the maintainers of those utilities have refused (see https://github.com/tmux/tmux/issues/428), and because there are too many of them.

* Each distro separately works around the problem by maintaining forks of nohup, tmux, etc. This is not a viable solution because it's way too many forks; people will be finding broken distro+utility pairs forever.

* Each distro separately works around the problem by putting loginctl enable-linger in /etc/profile and KillUserProcesses=no. This would effectively be overruling a systemd's decision. Some distros won't know they need to do this, and the github systemd repo becomes a trap.

* Or: systemd backs down and changes the defaults so that the old daemonization APIs work again.

If you have a fifth option, we'd all love to hear it. But the status quo is that there's a user-facing bug, and the bug is still there. Rather than make the case for it not being a bug, you're currently making the case for it being someone else's bug, but the "someone else" doesn't actually have the power to fix it. You are the only one with the power to fix this bug.


> If you have a fifth option, we'd all love to hear it.

Replace systemd with something else.


There's literally nothing wrong with OpenRC


Devuan


I don't understand the issue. systemd offers the option to override the default. Its literally a config. If its such a big deal, why don't the distros just override it? Its a one time change.


> And yes, I still consider it a weakness of UNIX that "logout" doesn't really mean "logout", but just "maybe, please, if you'd be so kind, i'd like to exit, but not quite". I mean, that's not how you build a secure system.

As an aside this is the height of arrogance to suggest that the systemd is somehow a more secure alternative. Lest this be considered an empty ad hominem attack, let me quote the pwnie you won in 2017[1]:

> Where you are dereferencing null pointers, or writing out

> of bounds, or not supporting fully qualified domain names,

> or giving root privileges to any user whose name begins with

> a number, there's no chance that the CVE number will

> referenced in either the change log or the commit message.

> But CVEs aren't really our currency any more, and only the

> lamest of vendors gets a Pwnie!

1: https://pwnies.com/archive/2017/winners/#lamestvendor


> giving root privileges to any user whose name begins with > a number

https://github.com/systemd/systemd/issues/6237

oh my god, what a spectacular issue. And, seriously, the Poetterings' response is basically "not my job" and "not a bug". And this person develops something that sits at the core of a modern linux system...


> oh my god, what a spectacular issue. And, seriously, the Poetterings' response is basically "not my job" and "not a bug". And this person develops something that sits at the core of a modern linux system...

All the while Lennart claims that he's making Linux more secure. FFS.

Edit: I forgot about this

https://igurublog.wordpress.com/2014/04/03/tso-and-linus-and...

> He (Theodore Ts’o) goes on to describe how he previously had to neuter policykit’s security (rendering his system very vulnerable) just to get his system working, and how he has found systemd "very difficult sometimes to figure out".

And:

> As for Kay Sievers, maybe he should rename himself to Kay Sewers, because that’s exactly what he smells of. He told to IETF internet area director and previously DHCP working group co-chair “Tod Lemon” to lmgtfy when he asked about a systemd related git repository.

This gem sums it up perfectly though:

> Yet just two days ago, we see Linus Torvalds (the creator of Linux and maintainer of the Linux kernel), launching into a tirade against – yes, you guessed it – systemd developers because of their atrocious response to a bug in systemd that is crashing the kernel and preventing it from being debugged. Linus is so upset with systemd developer Kay Sievers (gee, where I have heard that name before – oh, that’s right, he’s the moron who refused to fix udev problems) that Linus is threatening to refuse any further contributions from this Red Hat developer, not just because of this bug, but because of a pattern of this behavior – a problem for Kay because Red Hat is also foaming at the mouth to have their kernel-based, no doubt bug- and security-flaw-ridden D-Bus implementation included in our kernels. Other developers were so peeved that they suggested simply triggering a kernel panic and halting the system when systemd is so much as detected in use.


Only the root user can put such an invalid unit file into a directory where systemd will read it - what is the security impact exactly?


The security impact is that if you allow a user to choose their own username, and you use a standard POSIX specified way of verifying that the username is valid, and at any point in time you run a service as that user, an attacker can gain root privileges.


Or if you have a package that generates a service user that starts with a digit. Then you'll be running an arbitrary service as root in which case any vulnerabilities become that much more serious. Or have things regressed so much with systemd that the standard is now verify each and every thing you have the init system do?

The other problem is, of course, the utter lack of understanding Lennart demonstrates by being so dismissive and the increased potential for systemd to be hiding future security vulns.


You know it's open source and that you could actually get involved? If you submit a pull request and it doesn't get merged you can take your concerns to the the larger group.

As to the stuff mentioned in the pwnie. Those sound like great contributions that would be appreciated.

You could also take your concerns to the distro development group. If that doesn't work you could also customize your distro with a custom build of systemd.

If you still don't get satisfaction you can stop using it.

If you dislike how they do thing you have options. Or, you could just be mean on a forum...


For what it's worth, systemd makes my life easier.

When I switch distro, it's almost always systemd, and not the system du jour, so I know how it works. Creating service files is a google query away, and makes common use cases a breathe, while advanced features that were hard to bash script yourself into, are now just a few options to type.

I understand that many people may have problems with systemd for their particular situation, but that's not my experience.

As a dumb user with a few laptops and servers that needs an occassional daemon, I'm glad systemd won. I know you get a lot of heat since it came out, so thank you for working on it.


Sure, systemd solves a number of real problems. This is good.

What is not as good: (1) systemd takes over or duplicates functionality not related directly to its primary purpose, and (2) is not solid enough to trust it in a number of cases, while (3) the developers' attitude does not give a lot of hope that the situation will materially improve.

(Of course, I run a distro without systemd.)


> I still consider it a weakness of UNIX that "logout" doesn't really mean "logout"

Ok, but UNIX and it's behaviour has evolved over forty years, and users have a certain set of expectations about it.

Also, it should be noted, systems like UNIX are cultural artifacts. The way they are is the result of forty years of back and forth debate and negotiation and eventually compromise.

I can't speak for all of them, but I think that people that are bothered by systemd are upset that all of history has been brushed aside to make place for the preferences of just a few influential developers.

Whether a feature like logout is "logical" or not, is besides the point. Operating system design isn't just about logic, it's about serving users.


Yes, indeed, it's not about logic, as those same users cheer Linux instead of sticking with BSD, and then complain about not being UNIX enough.


That was the point of OP's article. That it's hard to change.


Completely agree. The problem is not upstream, but downstream. Distros should have done better job and chosen a better default system manager and not systemd.

You build your software the way you want and like. If others don’t like that it breaks POSIX they should stop using it instead of complaining. Or fork it.


> What was requested though is to either run things as child of systemd --user or just register a separate PAM session

When you run your screen or tmux below `systemd --user`, you still would have to `loginctl enable-linger`, no? I remember having to do that when I set up a PulseAudio server on a headless machine where I don't maintain an active session.


Lennart, thanks for the information. Mind explaining why you chose to kill user processes on logout as the default?


I think my comment above explained that already.


I think tasuki is asking you to elaborate a bit further on what kind of security issues you have solved by not using SIGHUP signal. I would personally also like to hear more in-depth details, preferable with some examples of security vulnerabilities that was caused because of that POSIX design choice.


Well, this boils down to: in a modern operating system, is it good design that an unprivileged user who logs in once can consume arbitrary runtime resources uncontrolled, unbounded forever, even after logout just because they decided to mask SIGHUP? I think not, I think the system should default to behaviour where unprivileged processes are clearly lifecycle bound, and when the user's sessions end they end comprehensively. I mean, other OSes don't really allow this unprivileged either, for good reasons: the lifecycle of the unpriv user's processes should be controlled by privileged code, and clearly be defined by the act of logging in and logging out in its lifetime.

It's entirely OK if the admin then opts out specific users or even all users from this behaviour, i.e. if a privileged players decides to liberalize unbounded, unlifecycled resource consumption for unprivileged players. But a default where unprivileged code can just stick around uncontrolled and consume as much as it wants forever is just a strange choice security wise.

i.e. I think the fact that SIGHUP masking is unrestricted, i.e. is not subject to privilege checks is the problem really. Something is unpriv by default that should be priv by default. And that's pretty much what this option in systemd provides you with.


> Well, this boils down to: in a modern operating system, is it good design that an unprivileged user who logs in once can consume arbitrary runtime resources uncontrolled, unbounded forever, even after logout just because they decided to mask SIGHUP?

This was well known and accounted for where necessary. You considered everyone else to be wrong about the issue and went ahead and fixed it according to your opinion. Don't be surprised that a considerable portion of "everyone" doesn't agree with you.


> This was well known and accounted for where necessary.

Could you please explain that in a bit more detail?


> is it good design that an unprivileged user who logs in once can consume arbitrary runtime resources uncontrolled, unbounded forever

A unprivileged user can still do this by setting up an intermediary box that keeps a persistent ssh session open. Incidentally, this is exactly what I plan to do if I ever need to ssh into a server with KillUserProcesses=yes.

> other OSes don't really allow this unprivileged either

On Windows, if I remote desktop from a laptop into a desktop, and start a web server, then shut down the laptop, the server stays running. On iOS if I start drafting an email, and reboot my phone, I don't lose my work. On ChromeOS, my tabs will stick around after a system crash. The world is moving toward processes being _more_ persistent, not less.


Windows has a different concept for services and processes. All of your processes are killed when you logout


If you already have a middle box, then great, but usually malware (eg a nasty Chrome extension) likes to stick around to snoop on user activity. (Preferably on all user activity, forever.)


> If you already have a middle box, then great, but usually malware (eg a nasty Chrome extension) likes to stick around to snoop on user activity. (Preferably on all user activity, forever.)

Well I'm certainly seeing why people get so frustrated with systemd junkies. Killing a "rogue" Chrome extension doesn't provide any meaningful form of security. There's no privilege escalation in play here. Whatever snooping it could do with you logged out could be done when you're logged in. Snooping on all users? Yeah, not going to happen without privilege escalation (which systemd will happily provide). So while systemd introduced this obnoxious behavior that broke all sorts of commonly used utilities no benefit was gained (except perhaps reinventing the wheel).

Meanwhile if you're worried about security don't forget that systemd has introduced a number of denial-of-service vectors (including one that results in a kernel panic) as well as an actual privilege escalation bug (which, in a fit of irony, could've been mitigated significantly by respecting return value tradition of zero = success). Take a look at the privilege escalation bug remedy, the vuln was due entirely to breathtakingly sloppy code. I'm ignoring the whole dereferencing unchecked pointers thing because that's such laughably bad practice I don't even know where to begin. Then take a look at Lennart's response and his unwillingness to mention CVEs anywhere.

The end result is that you have a combination of: breaking changes offering zero benefit, sloppy code resulting in reduced security, and a complete absence of any sort of security culture. Lennart, IBM, and systemd can claim all sorts of things (perhaps there really is a value in moving away from shell scripts) but security? No. There is absolutely ZERO merit to any claim that systemd increases security. The lack of security culture and defensive coding that permeates systemd all but guarantee future vulnerabilities.

Edit:

But wait! There's more!

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9445

Systemd is also remotely exploitable. Sure, no program is perfect, but most programs strive to decrease the attack surface where systemd strives to increase it.


> Well, this boils down to: in a modern operating system, is it good design that an unprivileged user who logs in once can consume arbitrary runtime resources uncontrolled, unbounded forever, even after logout just because they decided to mask SIGHUP? I think not, I think the system should default to behaviour where unprivileged processes are clearly lifecycle bound

If there were some way to design this so that nohup would give a permission denied error on start and tmux would give one on detach, rather than die on logout when it's too late to display a warning, that would be a lot better. There may not be a feasible way to do this, but it would solve a key part of this problem, which is that people don't find out about this behavior until something has already gone wrong, and don't find out that systemd is responsible for the behavior until after they've gotten frustrated enough to be mad about it.


From a hosting perspective I understand the issue being addressed but I don't see specific problems being solved. For example, if I lock out a compromised account by locking the unix user they can still be currently logged in with running processes which I then need to manually address and kill, and they can also have cron jobs which restarts them. Services like Apache (with mpm_itk) will still change user-id to those locked users. There is no general system-wide method to declare that a user and all its connected aspects should stop being available, and therefore a compromised account must currently be handled rather individually.

What I see most companies in this industry do is use per-user virtual machines to address the issue, which completely bypasses the question about logged in and logged out. It would be interesting if the intention in current development is to give us administrators more options here and allow for cleaner handling of compromised accounts.


But the point is that Linux and Unix isn't a modern operating system. It's ancient, and built upon decades and decades of work by hundreds of thousands of developers. You can't just decide to break norms handed down through the decades.

And I don't think anyone really had a problem with the old default of letting things run. Worse is better, after all. Pursuit of a perfect system will just make things too complicated, too brittle, and too obtuse.

I like Unix because it doesn't try to solve every problem. It's a libertarian operating system, if you will. Sometimes this causes problems, sure, but if the system is simple and liberal, you can always fix them without much effort.


They show that you do not understand what "log out" is for in Unix.


> still consider it a weakness of UNIX that "logout" doesn't really mean "logout" ... I mean, that's not how you build a secure system

so, unix has been running for 20+ years laden with this security flaw? strange that nobody has been screaming out to plug it all this time.

this feels like you have a bee in your bonnet that it is not a very 'pure' logout by some interpretation of what a "logout" should be. imho, "logout" should mean what it has always meant in the past.


I feel like the kernel policy of "don't break userspace" would be a valuable one for y'all to adopt.


So you made the default the worst possible option, because... why exactly? And now that the problem is apparent, you haven't changed the default because...? I don't know what goes through your and the rest of the systemd's team's heads, but good software engineering it is not.


>You know, because we knew this would be controversial we made sure it was both a compile-time option and a runtime option.

This is standard from you. You knock the glass on the floor and blame the maid service for not cleaning up after you.

It's everyone's faults but yours.

>And yes, I still consider it a weakness of UNIX that "logout" doesn't really mean "logout", but just "maybe, please, if you'd be so kind, i'd like to exit, but not quite".

Oh how hyperbolic. Nuances and caveats in terminology is not a weakness.

I don't see why you're splitting hairs over this but can't be bothered to care about your UID numbering bug.

Or he fact systemd-resolv is responsible for DNS leaking on VPNs.

But yes, tell me more about how a functionality that enables terminal multiplexes is a "weakness"

>Now I am used to taking blame for apparently everything that every went wrong on Linux,

It's because of your smarmy, arrogance.

You break POSIX compliance, which has a real world effect in multiple areas and you accept bug reports with the humility of Donald Trump being interviewed by MSNBC.

Then when you retreat into your safe space, you play victim to the situation you created.

You talk of Linux culture toxicity, smearing the likes of Linus Torvalds, while essentially being the metaphorical sibling putting your finger in people's face repeating "I'm not touching you" over and over. Then you acted attacked when someone claps back.

You're a cry bully hiding behind a vaneer of professionalism acceptable for Red Hat's HR department which enables you to mark one more bug as "wontfix"; your attitude, your arrogance, your conceits that things not broken in fact, are so you can provide solutions no one asked for and no one benefits from.


To be fair, at least poettering presented an argument and is responsible for software that helps a whole bunch of us get things done.

You're just kind of yelling, and it diminishes any point you may have made.


After awhile, anyone who deals with Lennart just starts yelling, because he is impossible to reason with. He's very intelligent, and absolutely convinced that his is the One True Correct Right Way. It doesn't matter than hundreds or thousands of voices oppose him; I don't think it would matter if every single human being on earth opposed him.

What makes it worse is that he's often not completely wrong. Linux did need something like PulseAudio, something like Avahi and something like systemd. But his reach exceeds his grasp (which probably applies to us all, as I've found on my own projects), which leads to the well-known problems of PulseAudio & systemd.

I don't actually want him to quit the Linux world. But I wish he would scale back his ambitions just a tad, and consider that maybe — just maybe — other people have some good points, and valid concerns.

And also Windows/DOS are not terribly good design exemplars.


I get what you are saying, but

> It doesn't matter than hundreds or thousands of voices oppose him; I don't think it would matter if every single human being on earth opposed him.

makes it seem like everyone that uses systemd hates it or sees the same flaws as you or the other people yelling.

I and many others started admin'ing during or slightly before the systemd transition (ubuntu14->16 and rhel6->7) and have found it a much easier path to running services in a sane way than before. It was certainly possible before it, but with systemd I can do it a lot better and easier than I would have been able with previous inits.

For every person saying that systemd made things worse I expect there to be 10 silent sysadmins that appreciate what it did. I have no evidence of that, but that is my experience.


It does a lot more than manages services.

It breaks screen and tmux functionality, leaks DNS when connected to a VPN, it riddled with "wontfix" security vulnerabilities stemming from a refusal to be POSIX compliant.

Systemd replaced udev for crying out loud.


That might be true and still not contradict what I said. A lot of the systemd critics still seem to not see what it actually did for most people using it. You're free to hate it and some of that is certainly justified, but don't assume that the contrary opinion is based on uneducated or misguided opinions.

Most of what I see/use of systemd I like. Some of it I don't, and some of it is a dumpsterfire. I think I could say the same or worse for any ambitious software project.

As for the security issues I certainly place those in the dumpsterfire category and I'd like for the systemd team to handle them better.


You know what? Systemd generally works for me. Sure there's teeth gnashing at having all my userland tools upended. I've frustration at the unit file specs. But it mostly works.

That, however, does not mean that systemd is anything other than a giant fucking dumpster fire. Looking at how Lennart interacts with other Linux devs, how he reacts to bug and security reports, looking at the lack of code review and the shoddy design decisions that get baked into systemd… it appears as if systemd mostly works through sheer luck. That sort of approach may be acceptable when you're talking GNU vs X emacs, but it's absolutely the wrong approach to such a critical piece of software.

The other thing I'm missing is any improvement. All of this upheaval has been for what? Assuaging Lennart's ego? Not good enough.

> You're free to hate it and some of that is certainly justified, but don't assume that the contrary opinion is based on uneducated or misguided opinions.

When the article being discussed consistently wrongly characterizes and dismisses technical arguments against systemd I think it's fair to say it's a bit more than misguided.

> As for the security issues I certainly place those in the dumpsterfire category and I'd like for the systemd team to handle them better.

Yeah, no. Security as an afterthought is a bad approach in general but it's even worse when you're talking about low level bits like PID 1, the kernel, boot loader, etc. This right here is enough reason to run, screaming far far away from systemd.

You know the best part though? I've had plenty of frustration with upstart (especially with features they've decided to remove over the years). None of this compares to the heavy handed, anti-social bullshit that seems to engulf systemd. Hell, I recently bought a replacement laptop. I even entertained the idea of a Linux machine. Systemd and its effect on Linux on the desltop was one of the top reasons I went with another MacBook Pro.


I agree. I love systemd as compared to the other ways (though I think launchd is pretty nice too).


You've done great work as a whole, as you probably know. Try not let the lowlifes get to you.


Absolutely. I can understand implementing this feature for some special cases, like containers that should clear all hint of a user away on log off. It should never have been the default, and breaks an entire category of software. In my standard .bashrc file, I have the following snippet to warn me if I am on a system with that stupid setting enabled.

    if which loginctl > /dev/null && loginctl >& /dev/null; then
        if loginctl show-user | grep KillUserProcesses | grep -q yes; then
            echo "systemd is set to kill user processes on logoff"
            echo "This will break screen, tmux, emacs --daemon, nohup, etc"
            echo "Tell the sysadmin to set KillUserProcesses=no in /etc/systemd/login.conf"
        fi
    fi


Thanks, now I know why Emacs daemon keeps delaying my restarts in the system (just discovered that NixOS defaults KillUserProcesses to false).

Turning this on to true, for me it does no make sense to a user service (yeah, I run emacs as a user's systemd service) to keep running after I logout of my system.

P.S.: And the fact that for some people this behavior makes sense is why I think Lenart decision to put this as an option makes sense.


I'm glad that it helped resolve your issue, though I still don't think it was an appropriate choice for a default. I tend to do most of my work on a remote server, using tmux and emacs daemon to pick up right where I left off in the case of a dropped connection. That systemd would terminate my process when I explicitly requested it not to be is very abnormal.


You haven't requested systemd, you started a user scope, and haven't started a service for what you need.

POSIX is nice, but rather lacking in certain aspects, such as security anf administration-friendliness. cgroups help with both, but people have to understand them and use them well.


Handling and ignoring SIGHUP is the explicit way to indicate that a program should not be terminated. That systemd invented a new category and then ex post facto declared that everybody else was wrong for not using it is ridiculous. Systemd changing behavior such that I must "Simon says nohup" is completely asinine.


Systemd developers, if you're reading this: this isn't the sort of bug where people grumble for awhile and then get over it, because things are still broken, and the workaround being circulated (KillUserProcesses=no) doesn't fully work. (https://github.com/systemd/systemd/issues/8486) As long as people continue to encounter this issue anew--and they still are--people will be angry at the systemd maintainers.


The bug you've linked to was closed[1] by the reporter with "Thanks for the clarification guys. Much appreciated!", after it was pointed out to them that something they were trying ho do with "KillUserProcesses=no" was better done in another way.

1. Edit: Not literally closed by the reporter. Lennart Poettering closed it, "closed by the reporter" as in "the issue was resolved to the reporter's satisfaction".


> The bug you've linked to was closed by the reporter

Are we reading the same bug report? The one I'm looking at was closed by the creator of Systemd.


That comes down entirely to how systemd is configured. If you don't like what your chosen distro has picked as the default then complain to them. systemd didn't force anyone's hand on the subject, they just added the feature. It's a pretty natural design choice IMHO. When I want to log out, I don't want to let some hung up daemon keep running just because it wasn't able to process the SIGHUP sent to it.

How else do you propose to make sure that when I log off my ssh-agent is really terminated and not just locked up with my keys still in memory? The POSIX approach is insufficient, there's no way to know if a process received a signal and chose to ignore it and keep running or if it received a signal but it was deadlocked and kept running.


The problem is that you're breaking compatibility by changing the default. It's one thing to add a feature that can solve a problem. It's something else to break existing programs that don't use it.

If you're not going evaluate each individual program to determine whether the new behavior is appropriate then it should be opt-in rather than opt-out. Then ssh-agent and anything else that knows it should be forcefully killed can opt-in without breaking other innocent programs.


So you think backwards compatibility is so important that we should keep old BROKEN and INSECURE behavior just for the sake of not inconveniencing few power users with technical knowledge to override it? Instead those few loudest complaining should be catered to and regular users left for the wolves…

I think some people sometimes lack any perspective on the topic.


Yes.

I’m not being emotional about it, just irritated.

Systemd has tangibly caused me to lose work with tmux; I appreciate there are root causes for this, but frankly, if some piece of someone’s code does that, for whatever reason that is beyond my control to immediately stop using it...

...it feels justified to be annoyed.

How do you suggest an alternative meaningful response would look?

Create my own distribution?

What tangible and meaningful alternatives do I have other than encouraging people not to use systemd?


> Create my own distribution?

> What tangible and meaningful alternatives do I have other than encouraging people not to use systemd?

Sure, if you think you can actually “test every single program and make everything opt-in.” I think you will however find that making everyone happy and having new features are just simply contradictory by the very definition. At some point you will want new stuff and you’ll have to break something.

The best you could do is adopt BSD’s model and fork tmux and other userland and ship outdated/patched versions. It’s a ton of work, of course.

I am not actually seriously suggesting you create your own distro, after all you can probably just fix the annoying issue with systemd and move on with your life, and Systemd actually makes it easy for your by making it a configuration switch and supporting the non-default workflow.

I am simply suggesting you put yourself in the position of someone that has to make those decisions and really think about it from that perspective. Everything’s always a trade off.


> I am not actually seriously suggesting you create your own distro, after all you can probably just fix the annoying issue with systemd and move on with your life, and Systemd actually makes it easy for your by making it a configuration switch and supporting the non-default workflow.

Given the extraordinary scope of systemd, what happens with the next major issue? Having to perpetually work around poorly designed software is infuriating.

> I am simply suggesting you put yourself in the position of someone that has to make those decisions and really think about it from that perspective. Everything’s always a trade off.

Why should the onus be on the end user? Perhaps the distributions should be making choices that are less antagonistic of their users (e.g. upstart instead of systemd).

You're right about the tradeoffs though, and one of the tradeoffs for buying into systemd is angry users.


> Given the extraordinary scope of systemd, what happens with the next major issue? Having to perpetually work around poorly designed software is infuriating.

Systemd doesn’t break stuff if they just feel like it. Everything is compatible if it can be, for example you can still run /etc/init.d scripts and manage them through systemd on Debian. Lingering processes are also still supported! It’s a configuration switch that most distros decided to turn on by default, because...

> Why should the onus be on the end user? Perhaps the distributions should be making choices that are less antagonistic of their users (e.g. upstart instead of systemd).

... it’s a net benefit to most users. It’s only “antagonistic” to a particular subset of powerusers perfectly capable of working around the issue but somehow more motivated to loudly complain about it on Internet.

> You're right about the tradeoffs though, and one of the tradeoffs for buying into systemd is angry users.

Fair deal if it helps with even 0.1% desktop market share.


> particular subset of powerusers perfectly capable of working around the issue

What is the actual workaround? Is there a patch that unbreaks nohup by passing cwd and env to systemd-run --user or something?



I see arguing but no consensus on what ought to be done.

My use case: I run a shell pipeline that will probably take all weekend to finish. On a POSIX box I start it with nohup. What do I do on a systemd box? Does nohup need a patch that doesn't exist yet?


There's a couple ways to work around the issue, you can just configure systemd to not kill processes that were in the user scope when the user scope is closed in which case it behaves exactly as it did before. Or if you want to keep systemd cleaning up hung applications but not e.g. some script that you typically ran with nohup you can just use systemd-run instead.

https://www.freedesktop.org/software/systemd/man/systemd-run...

In particular you'd probably want --user so that it runs it under your user instance of systemd and --scope so that it's all run under a scope for that command instead of just a transient service. For most uses of nohup you could literally just make it an alias for systemd-run --user --scope instead.


I expect that the formal answer is that you should be running that within the service framework (be it systemd or other). My answer is: if you want POSIX-like behavior don't run it on Linux.


>I think some people sometimes lack any perspective on the topic.

Apparently you think Linus is one of those who "lack perspective"?

http://lkml.iu.edu/hypermail/linux/kernel/1711.2/01701.html

I get that systemd isn't the kernel, but it's close enough. There are many who would agree that breaking existing behavior in the name of security isn't wise. I have also not yet seen anyone point out specific security issues this solved. Unix has worked this way for a long time.


User launches voice chat, logs out, application stays around and listens on user/other users. Just one example. Having programs running despite being logged out is unintuitive and wrong. Most users do not know or care about going into a task manager. And if you want Linux to ever have a chance to succeed on desktop, they shouldn’t have to.

As to the Linus’ post, if you want to argue that there wasn’t enough notice about this change, then that’s fine, but this isn’t what anyone here is arguing.

Also it’s a configuration switch, any distribution could have decided to revert it or postpone it at their choosing.


SIGHUP isn’t broken & insecure: it works, and it is secure. Processes which don’t want to handle the hangup signal are terminated, and processes which want to ignore it do.


But this just isn't the case. If something stays around after receiving SIGHUP, it was probably because that application intended to do so but it could also just be a hung up application that one way or another is going to stay around until it's killed. Sending a signal doesn't give you any sort of feedback to see if you're waiting for the application to close or if the application shouldn't be closed. Signals alone are insufficient.


Tell me more about this perfect world with no bugs and nondeterministic behavior.


Well, there are some pretty severe restrictions on the type of code you can put into signal handlers. Only atomic operations are allowed. And, in my experience, almost all applications react appropriately to signals.


>Well, there are some pretty severe restrictions on the type of code you can put into signal handlers.

Err... Maybe I'm missing something but I don't believe that's the case. There's a lot of things that you shouldn't do inside of a signal handler that will exhibit undefined behavior, but it's not like the kernel puts any restrictions on what the application can do inside of a signal handler. If an application wants to make SIGHUP just call whatever existing application exit logic they already have, they can. It's a terrible idea because if the application was signalled in the middle of some library call then it's anyone's guess as to whether or not it's just going to crash but that doesn't mean that you can't do it.

I think you're underestimating the difficulty of gracefully shutting down an application in a signal handler. If it's waiting for the application to finish some operation it's stuck in it'll just do the exact same thing as using nohup and there's no way to know that outside of the application.


If an application is handling SIGHUP then it presumably intends to continue running. If it used systemd-run instead, it could still get into a bad state at any point thereafter and you have the same problem. Even using a watchdog couldn't fix every buggy application, because there are ways for an application to crash or misbehave yet continue to send the watchdog notification. We still haven't solved the halting problem.

Meanwhile if the process isn't handling SIGHUP then there is little chance of undefined behavior in the default handler, which merely terminates the process immediately.


>If an application is handling SIGHUP then it presumably intends to continue running.

That's not correct, for stuff running in the user's scope more often than not a SIGHUP handler is just to gracefully exit the application. I.E. close any open files, finish any writes in process, etc.

But also, you don't know what the SIGHUP handler does to begin with. That's the crux of the problem. Outside of the process the SIGHUP handler is just a black box.

>If it used systemd-run instead, it could still get into a bad state at any point thereafter and you have the same problem.

No, if it was started with systemd-run there's no SIGHUP sent to it in the first place. Reaping applications that won't close in the user scope isn't about preventing them from breaking in the first place, it's just sweeping up the broken pieces so that it doesn't break the next user scope because it's still holding some exclusive lock on something.

It's like putting the user session into its own container. It doesn't fix anything, it just keeps the breakage contained to the user's scope so that when you log out, it really does shut down that "container".


> That's not correct, for stuff running in the user's scope more often than not a SIGHUP handler is just to gracefully exit the application. I.E. close any open files, finish any writes in process, etc.

That's essentially the same thing, and the application would have to do something similar to protect itself.

Suppose the user would lose data if the application doesn't exit gracefully, but this may take a variable amount of time depending on how much unsaved data there is, current load on the machine, etc. So it handles SIGHUP, continues running to save its state, but hasn't finished before systemd kills it.

To prevent this it would have to use systemd-run to preserve itself long enough to finish saving its state, and we're back to square one again. Or it doesn't do that and the user loses data.


When they work, sure. And when they don’t the user is wondering why his laptop is playing sounds when she’s logged out. Systemd’s solution is the right one from technical POV. No need to hope applications cooperate when you can just ask the kernel to make sure they do.


What on earth is broken or insecure about not killing processes?


You watch porn, log out, but mpv is somehow stuck and still playing. Broken enough?


This, right here is an example of what those who oppose systemd mean when we say that it's monolithic.

What gives the init system the right or the duty to reach down into a user's processes and determine[0] that they are stuck (versus running appropriately, as e.g. the user indicated with nohup(1))? Why is it the init system's job to handle that?

That's just not its job. If I wanted to run some sort of misbehaved-process killer, I could. Or, y'know, not running misbehaving processes. Ideally, that would include not running misbehaving processes like anything from the systemd project.

0: or, as in systemd's case, blindly assume


KillUserProcesses is enforced not by systemd (PID 1) but by systemd-logind.


> What gives the init system the right or the duty to reach down into a user's processes and determine[0] that they are stuck (versus running appropriately, as e.g. the user indicated with nohup(1))? Why is it the init system's job to handle that?

If this behavior was mandated by some other piece of software named FluffyUnicorn and had nothing to do with Lennart, but was still widely adopted just as systemd is, would you be ok with it?

It’s in systemd because it makes sense to be there. Systemd already groups services into cgroups so it makes sense to also do that for user sessions.

> That's just not its job. If I wanted to run some sort of misbehaved-process killer, I could. Or, y'know, not running misbehaving processes. Ideally, that would include not running misbehaving processes like anything from the systemd project.

So toggle a configuration switch on your system. What you are actually trying to do is to FORCE this bad and confusing behavior as a DEFAULT on regular users that have no need or want for it.


> If this behavior was mandated by some other piece of software named FluffyUnicorn and had nothing to do with Lennart, but was still widely adopted just as systemd is, would you be ok with it?

If this behavior was mandated by some other piece of software, it wouldn't be as widely adopted as systemd is.

That's the true problem with systemd. It tries to do everything and does 80% of it well enough that many people use it, but then is too complex and integrated with itself to easily identify and carve out the problematic bits and replace them with third party alternatives.


> If this behavior was mandated by some other piece of software, it wouldn't be as widely adopted as systemd is.

So your argument is that this is forced on people because of systemd’s political power?

There’s a configuration option to reverse this behavior, it’s not hidden away somewhere, it’s been widely publicized. Any distro could have flipped the switch and easily reverted to preserve backwards compatibility, but none did. This is because this change is a net benefit to the majority of users.

> That's the true problem with systemd. It tries to do everything and does 80% of it well enough that many people use it, but then is too integrated with itself to easily identify and carve out the problematic bits

Again, you don’t need to fork systemd to change this behavior. If that was the case I would understand the criticism. But that is not the case. The alternative workflow is perfectly well supported. All we’re arguing about is the defaults. Systemd developers go out of their way to not break things.

You’re arguing for making up some abstraction layers for plug-n-play components that no one is demanding, and would probably never be used. Modularity has a cost, and not only that, but you also have to know where to draw the line between core and addon.

And if systemd actually did all of that, I’m pretty sure all those habitual complainers would just argue that it’s over-engineered and should have been kept simple. You can’t win with the peanut gallery.


> Any distro could have flipped the switch and easily reverted to preserve backwards compatibility, but none did.

No, many of them did. The problem is that this is not the only such issue, and distribution maintainers don't have unlimited time and resources to re-evaluate every individual default chosen by upstream, so most of the upstream defaults end up in the distributions. The distributions can fix this once you identify the problem, as e.g. Debian has done, but "you can change it" is no argument for a bad default, because changing it is work in the meantime things are broken.

> Again, you don’t need to fork systemd to change this behavior. If that was the case I would understand the criticism. But that is not the case. The alternative workflow is perfectly well supported. All we’re arguing about is the defaults.

If the defaults weren't important then why are you arguing about them?

> Systemd developers go out of their way to not break things.

Yet tmux and screen are broken on the distributions that use upstream's default.

> You’re arguing for making up some abstraction layers for plug-n-play components that no one is demanding, and would probably never be used. Modularity has a cost, and not only that, but you also have to know where to draw the line between core and addon.

You say that as if it wasn't the way everything works in many other init systems. The init system doesn't typically have a DNS server, you can use dnsmasq or BIND or unbound or djbdns or whatever you like. It doesn't have its own cron, there are many choices and you can choose any of them.

And just drawing any hard lines would help. Even if you had to replace two modular components to replace one thing, or one component that does two things when it should be one, that's certainly a lot more feasible than having to understand and touch thirty integrated pieces to replace one component.


> The problem is that this is not the only such issue, and distribution maintainers don't have unlimited time and resources to re-evaluate every individual default chosen by upstream, so most of the upstream defaults end up in the distributions.

Well they should. Otherwise, what’s the point of them?

> Yet tmux and screen are broken on the distributions that use upstream's default.

Of their own volition. And btw, distributions could patch them to work with systemd. None of this is systemd’s fault. Since when is it upstream’s job to make sure downstream properly integrates their software?

> The init system doesn't typically have a DNS server

There’s no DNS server in systemd core. It just lives under the same umbrella. Do you know FreeBSD has DNS server in the same repo as kernel? Does it mean it has a DNS server in the kernel? You know perfectly well that this is just plain false.

> It doesn't have its own cron, there are many choices and you can choose any of them.

Why would you need “many choices” for a simple timer? What are you going to do, invent new type of time?

Anyway, you’re completely ignoring the other perspective on this. Because old style init did so little and so poorly, cron used to be a de facto service manager. Also don’t forget inetd. So you had duplicated, poorly implemented, but nevertheless, redundant functionality in several separate systems. How is systemd’s approach not both less complex and much more sane?

> And just drawing any hard lines would help. Even if you had to replace two modular components to replace one thing, or one component that does two things when it should be one, that's certainly a lot more feasible than having to understand and touch thirty integrated pieces to replace one component.

Why? If you can’t point to where the line is then what’s the point. It’s like saying you want cars to be more modular, so let’s just arbitrarily invent a “motor carriage[1].”

You could replace the engine without the coach, wouldn’t that be swell?

Anyway most of systemd’s components communicate over a common system bus. You could provide alternatives just by speaking the same API.

[1] Sorry, I’m not a native speaker; I mean this: https://en.wikipedia.org/wiki/Coach_(carriage) but with an engine instead of horse


> Well they should. Otherwise, what’s the point of them?

If the distribution is supposed to micromanage everything from upstream then what's the point of upstream?

> Of their own volition. And btw, distributions could patch them to work with systemd. None of this is systemd’s fault. Since when is it upstream’s job to make sure downstream properly integrates their software?

Since when does everything have to integrate with the init system at all?

> There’s no DNS server in systemd core. It just lives under the same umbrella.

It isn't a matter of which repository it's in, it's a matter of how much work it is to swap it out. Can I just run dnsmasq or dnscache and change an IP address somewhere, or do I actually have to change the code because it's expecting something more than a general purpose DNS resolver?

> Why would you need “many choices” for a simple timer? What are you going to do, invent new type of time?

An existing implementation has poor code quality and I can do better, but my new implementation is less feature complete, so some people prefer the one with more features while others prefer the one that has fewer bugs and uses less memory etc. etc.

> Because old style init did so little and so poorly, cron used to be a de facto service manager. Also don’t forget inetd.

Which they still are, because they're still there and there is nothing stopping people from using them in that way as ever.

But runit et al don't require that either, so let's not pretend that there is no third way.

> Why? If you can’t point to where the line is then what’s the point.

Your argument was that it's hard to know where to draw lines. But it's more important that you draw them somewhere than the specific place where you choose to draw them. Otherwise everything mushes together into a single piece of spaghetti that can't be disentangled from itself.

> Anyway most of systemd’s components communicate over a common system bus. You could provide alternatives just by speaking the same API.

Where are the RFCs for these APIs, so that I can write my application against the spec and be assured that it will continue to work against future versions of the software on the other end?


If you don’t like systemd so much then write something better. I mean you’ll find literally anything to dislike about it, I don’t get it. You can still use cron or rsyslog if you like. Or don’t use systemd. This is stupid. I’m done. The default makes sense for 99.99999% of users, literally the only point I was trying to make.


> If you don’t like systemd so much then write something better.

Writing something better doesn't get rid of the dependencies other projects now have on pieces of systemd, which pieces then have dependencies on other pieces until you need the whole thing.

> I mean you’ll find literally anything to dislike about it, I don’t get it.

This thread is about one specific complaint: It has too many interdependencies without well-specified stable interfaces between them, and actively encourages things to take on more of them, as with replacing SIGHUP handling with systemd-run.

> The default makes sense for 99.99999% of users, literally the only point I was trying to make.

This doesn't make any sense. Most applications don't handle SIGHUP and are terminated by the default handler. Applications that do handle it continue to run. If they used systemd-run instead they would also continue to run. Where is the benefit from forcing applications to do something systemd-specific and breaking existing things that don't?


> What you are actually trying to do is to FORCE

It's a rule: if you're advocating systemd, you don't get to accuse anyone else of forcing anything.


What do you disagree with in that sentence? There are defaults, distros have defaults, they’re the subject of this discussion. Anyone arguing for any default is likely dictating the de facto behavior for majority of nontechnical users, which is the majority of users period.


If I've nohup'd mpv or put it in a tmux shell, then that is the behavior I want. For instance, if I ssh into a controller for a home entertainment system to kick off a video, then this would be exactly what I want.


Then you can toggle one simple configuration switch, instead of forcing confusing behavior on the other 99% of users that don’t want or need it.

Take a step back and consider if say Windows did it like that, wouldn’t you agree it is broken?


> Then you can toggle one simple configuration switch

Only if I have root permissions (granted, I probably wouldn't be watching porn on a machine I wasn't admin on but that was just an example application).

> instead of forcing confusing behavior on the other 99% of users that don’t want or need it

Who is forcing users to run programs with nohup or tmux shells?

> Take a step back and consider if say Windows did it like that, wouldn’t you agree it is broken?

I'm pretty sure Windows does do it like this; if I were to remote desktop into a Windows box and start playing a video, it should keep playing even if I disconnect, reconnect, and log back in. It does this for normal applications, at least, though videos are a special enough case where it might be accelerating with the remote GPU.


>Only if I have root permissions (granted, I probably wouldn't be watching porn on a machine I wasn't admin on but that was just an example application).

It doesn't take root to do so, in most cases you probably still want to run the transient scope under your user so you'd use systemd-run --user in order to create it not with the main system instance of systemd but with the user level instance of it.

>I'm pretty sure Windows does do it like this

No it doesn't, as for your remote desktop example you can have the exact same behavior on Linux with systemd reaping user scopes by just using a VNC server. Windows is different in that when logging off it won't allow you to while an application is still running. It gives you the choice to either stop and go back to whatever application isn't closing (because you have unsaved work or something) or to kill it.


> It doesn't take root to do so, in most cases you probably still want to run the transient scope under your user so you'd use systemd-run --user in order to create it not with the main system instance of systemd but with the user level instance of it.

If a non-root user can do it and leave a program running then doesn't that invalidate all that BS about security?


None of this is about trying to prevent the user from using resources. The user is the one who is logging out in the first place. If the user wants to terminate all of their processes except for one daemon they can do that. The security benefits aren't the primary benefit, security wise all you gain is that after you log out there's no chance that anything with any sensitive information is still hanging around. I mentioned ssh-agent as an example but you could also have stuff like maybe chrome didn't close on SIGHUP and as a result maybe this makes your saved passwords accessible to someone who can dump the RAM later by getting physical access to it. It definitely helps security but it's not really that big of a deal.

Ironically enough when I went to Google to search for an example the result that came up was my comments on HN on the same subject from a year and a half ago.

https://news.ycombinator.com/item?id=14735145

Here's a great example of the kind of real life breakage that reaping the user scope on logout actually fixes.

https://bugs.freedesktop.org/show_bug.cgi?id=94508


> Only if I have root permissions (granted, I probably wouldn't be watching porn on a machine I wasn't admin on but that was just an example application).

If you’re not an admin you probably prefer the systemd default. OTOH if you do need to run tmux between sessions you probably have root as well.

> Who is forcing users to run programs with nohup or tmux shells?

You’re forcing confusing behavior (media playing despite logging out) on unsuspecting users. This is unintuitive to to nontechnical users, and just “wrong” to most that know the reasons behind it. I haven’t heard any good technical argument for keeping this behavior, only that it should remain like that because a minority is used to it. Though you’re welcome to change my mind.

> I'm pretty sure Windows does do it like this; if I were to remote desktop into a Windows box and start playing a video, it should keep playing even if I disconnect, reconnect, and log back in.

If you connect and disconnect you are not necessarily logging out, it’s equivalent to locking the session, which does keep music playing on Linux/systemd, and btw even offers MPRIS2-based media control right on the lockscreen, at least for Plasma.

Also it can pause the music if you log in concurrently as a different user. This is because systemd (and PolKit) have a very sophisticated seat management built in. For example it treats you differently if you log in remotely or have a seat right at the console. It can offer different authentication mechanisms and permissions (e.g. you need root/admin to shutdown the machine remotely, but don’t if you’re physically at it). All of this is possible and configurable thanks to the work of Lennart and others.

The question at hand is only whether you make the default the behavior that makes sense to 99% of regular users or to the few loudest.


complain to distros then. (systemd set the secure default, even if that breaks backward comp, as usually upstreams do, when it comes to security.)

or better yet, read the release notes, it likely mentions this breaking change. (if not, that's a bug.)


> systemd set the secure default, even if that breaks backward comp, as usually upstreams do, when it comes to security.

Breaking compatibility is generally avoided to the utmost. Even security-sensitive things like TLS continue to support older, less secure versions to retain compatibility with peers that haven't been upgraded yet, much to the chagrin of everyone when they screw up the version negotiation, but better than the chicken and egg problem where nobody can upgrade until everybody has.

But the other point is that the claimed security improvement doesn't actually seem to be there in this case. They haven't made it so you can't have a program continue to run after the end of the current session, they've only changed what you have to do to make that happen, thereby breaking everything that did it the traditional way.


If only there was a way for the system init program to identify and keep a list of processes it has spawned, you could imagine like a unique numerical Process ID, and then if there was a program that could check the Process Status, and another that could kill the process identified by this... PID with increasing levels of aggressiveness...


PIDs get reused so this doesn't work well.


He's sarcastically alluding to systemd's approach at solving this.


If a process doesn't handle SIGHUP it dies. So all the daemon has to do in that case is nothing.


If a process doesn't set its own SIGHUP handler it dies. If it does in order to gracefully handle shutting down but it's deadlocked then there's no feedback as to whether or not the process actually finished handling the signal.


So the answer to your hypothetical deadlock is to break everything else? What kind of complex and graceful shutdown does ssh-agent really need?


>So the answer to your hypothetical deadlock is to break everything else?

It's not a hypothetical situation, everyone on here has seen applications hang and have to be terminated. SIGHUP handlers are no different in this regard.

>What kind of complex and graceful shutdown does ssh-agent really need?

That's a straw man argument, and the whole point of SIGHUP in the first place instead of just some "persistence" bit set per process is because for real world applications it's not as simple as just kill -9 to stop a process. But for ssh-agent in particular it needs to go through and unlink the socket that it binds to on startup. More to the point it also has to go through and close every PKCS11 provider that is registered which means calling functions that aren't even in openssh to begin with so who knows if some PKCS11 provider will hang during that.


wasn't GP specifically mentioning user processes and not system daemons? e.g. for daemons it's perfectly expected behavior to not shut down on SIGHUP. Apache, and other system daemons would re-read configuration files when receiving SIGHUP (as a way to reduce downtime during config updates).


> How else do you propose to make sure that when I log off my ssh-agent is really terminated and not just locked up with my keys still in memory?

Perhaps with a signal handler?


That was the nice and friendly POSIX way, turns out it's really convenient for malware to stick around that way. Now user session isolation and termination works (cgroups), but it of course breaks backward comp.


There is no NOHUP signal, you're referring to SIGHUP.

See the enable-linger option for loginctl and KillUserProcesses for logind.conf. KillUserProcesses was set to default enabled on 4/9/2016, prior to that it didn't happen, but was configurable if desired. So you were always able to change the config to restore the previous behavior from the moment the default turned it on.

Edit:

Here is the commit where it happened

https://github.com/systemd/systemd/commit/97e5530cf2076a2b4f...


I just checked a few Debian stretch boxes that I setup, and "KillUserProcesses=no" is set on them all. And until a few minutes ago, I didn't even know to check.

So how can it be the default?


If you comment out that line it'll be on by default - Debian fixed it for you with their own default configuration file, because 99% of their users would only be annoyed by it.

This is why we have distro vendors, to build a system that works in the real world with software from developers with opinions that... differ to say the least.


Debian maintainers make many improvements to upstream and only rarely mess up (ssh key generation).


> So you were always able to change the config to restore the previous behavior from the moment the default turned it on.

No, you were not.

The thing that people are missing here is that neither of the systemd-logind behaviours, with KillUserProcesses=yes or KillUserProcesses=no, is the long-standing behaviour of kernel login sessions all of the way back to 7th Edition that nohup, tmux, screen, emacs --daemon, mosh-server, deluged, and more all interoperate with.

The behaviour of kernel login sessions is that end of login session is a HUP signal to the session leader, and that termination of the entire TTY login service (such as at system shutdown) is a TERM signal to everything followed by a KILL signal to everything then remaining.

The systemd-logind session behaviour with KillUserProcesses=no is no signals at all at the end of the login session, and at termination of the TTY login service both HUP and TERM signals together then KILL signals, to everything.

The systemd-logind session behaviour with KillUserProcesses=yes is both HUP and TERM signals together then KILL signals, to everything, both at login session termination and at TTY login service stop.

As I pointed out years ago, the fix is to make systemd-logind use KillUnit at hangup and StopUnit at service termination, actually providing the conventional behaviour which it currently does not in any mode and addressing the original problems (with some background GNOME utilities in a login session that were never being sent a HUP signal at logout and would have exited had they been) that motivated this whole mechanism in the first place.

* https://news.ycombinator.com/item?id=12335128

* https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=825394#221

* https://news.ycombinator.com/item?id=11798604


I meant SIGHUP. Edited.


Eleven years earlier when SMF was added to what would eventually be Solaris 10, we had this same problem. Some of us had to drop everything to fix "bugs" in cron, sshd, ... introduced by SMF.

Systemd is basically SMF, done poorly, because NIH.


I agree that on Linux-based systems, SIGHUP is a reasonable mechanism for killing processes when a user closes an ssh session, and that ignoring SIGHUP is a reasonable way to avoid getting terminated.

I disagree that POSIX says that processes should expect a SIGHUP when a user logs out (SIGHUP means the controlling terminal was closed). I am not at all a POSIX expert, so please correct me if I misunderstand, but afaict POSIX explicitly does not specify what happens to the controlling terminal when a user logs out (http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_...):

> POSIX.1 does not specify how controlling terminal access is affected by a user logging out (that is, by a controlling process terminating). 4.2 BSD uses the vhangup() function to prevent any access to the controlling terminal through file descriptors opened prior to logout. System V does not prevent controlling terminal access through file descriptors opened prior to logout (except for the case of the special file, /dev/tty). Some implementations choose to make processes immune from job control after logout (that is, such processes are always treated as if in the foreground); other implementations continue to enforce foreground/background checks after logout. Therefore, a Conforming POSIX.1 Application should not attempt to access the controlling terminal after logout since such access is unreliable. If an implementation chooses to deny access to a controlling terminal after its controlling process exits, POSIX.1 requires a certain type of behavior (see Controlling Terminal ).


Is there a daemonization API as such? I think there was only the "way of doing" shown in man 7 daemon.


The systemd people have their own version of that manual page.

* https://freedesktop.org/software/systemd/man/daemon.html

IBM was explaining what to do back in 1995.

* http://jdebp.eu./FGA/unix-daemon-design-mistakes-to-avoid.ht...


  killing user processes on logout
By "killing", do you mean some other signal than (or in addition to) SIGHUP? Does it send SIGKILL?


That's the whole issue here. It does.


Also two years ago, I explained how one could make this work, by having logind use KillUnit at hangup and StopUnit at shutdown.

* https://news.ycombinator.com/item?id=12335128

* https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=825394#221


> killing user processes on logout (rather than send them the SIGHUP signal, as POSIX says should happen)

TIL what nohup(1) is for.


Sort of. While it's debatable when SIGHUP should be sent as part of controlled system logout/whatever, the signal itself was originally used upon abrupt disconnection (hang up) of the controlling terminal of a program.


Systemd's response was to say that they should incorporate systemd's library, and use systemd's new daemonization API.

By "use systemd's new demonization API" you mean, instead of

$ screen

systemd asks you to write

$ systemd-run --scope --user screen

instead. Annoying to have to learn a new thing, but hardly the unbearable burden.

On the other hand, when you're an impacted user who's lost work, and researching the bug leads you to a years-old discussion in which someone is actively denying that the bug exists and refusing to fix it, that's infuriating.

Because it's a bug for some, and intended behavior for others. Look, you make it as if they introduced a bug on purpose to screw with some people. It's clearly not the case, there was a specific tradeoff involved.


> Because it's a bug for some, and intended behavior for others. Look, you make it as if they introduced a bug on purpose to screw with some people. It's clearly not the case, there was a specific tradeoff involved.

They broke userland.

It doesn't matter what tradeoff they made - they went against POSIX behaviour, and as a result, broke numerous utilities, both past and future.

Let's say that again - systemd introduced breaking behaviour on userland, against POSIX, and instead of backing down and allowing for expected and specified behaviour, they said it's everyone else's problem.

That is neither professional, nor responsible.

When you make a mistake, a mistake that breaks the behaviour of POSIX, and POSIX utilities like _cron_, you apologise, and fix the problem.

You don't turn around and say that all the sysutils should incorporate your new idea.


First of all, as mentioned above, we made this compile-time as well as runtime-configurable, so that downstream distros can choose whether they want to make this opt-in or opt-out. Hence blame your distros if you picked it in a way you didn't like.

Moreover, this doesn't affect cron at all. Cron creates its own PAM session for each job it runs which means those jobs are independent from any real login session (i.e. ssh, graphical, tty login), and thus also don't get cleaned up by them.

This affected stuff that is forked off a login session and then stays around as "orphan" if you so will, i.e. with all session resources released, except for these processes that try hard to avoid clean-up (usually by double forking + detaching explicitly from any TTY/ignoring SIGHUP).


As many, many others have stated, ignoring SIGHUP is not a way to "avoid clean-up". It is the explicit and intended method that a program should use to indicate that it should not be cleaned up.


This has more to do with feelings about you and the perception of you as a "bad guy" than it does about the technical discussion.

I tend to agree with the idea that the choice of defaults belongs to the distro's. If the distro's are deferring to the upstream project on default settings for a critical system component then they need to be more thorough and validate what they are shipping.


Maintaining of all these special cases requires lot of knowledge. If maintainer is responsible for just systemd package, then it's not a problem, but when number of packages per maintainer is measured in hundreds, maintainer will stick to defaults, unless users will complain loudly enough to sacrifice whole working day on the problem.


> Maintaining of all these special cases requires lot of knowledge.

Distro maintainers need to have a lot of knowledge about their init system. There's no way out of that. It's probably something everyone should know a little about as well.


> Distro maintainers need to have a lot of knowledge about their init system. There's no way out of that. It's probably something everyone should know a little about as well.

Then maybe the init system should be simpler and not attempt to ingratiate itself with UEFI or attempt to replace su, sudo, syslogd, netcat, resolvconf, etc.


> They broke userland.

That alludes to kernel development, which systemd is largely uninvolved with. A userland program chosen by various distributions failed to support conventions from a different userland program. That's all. Were the programs involved fundamental and highly important to many users' experience? Sure. Is busting out "you broke userland" like some magical shibboleth useful as a means of your conveying your unhappiness that your distribution maintainers chose to replace a widely-depended-upon program with a different program useful? I think not.

> they went against POSIX behaviour

Which? There's "tradition" and "specified behaviour". Both are important in different situations and in different degrees.

> You don't turn around and say that all the sysutils should incorporate your new idea.

Why not? They're no more privileged by the POSIX specification, or by the user/kernel -space divide than any other program.


POSIX was broken first. It's insecure by default.

Intel, the kernel, even Chrome broke my userland by mitigating Spectre.

It happens.

CRON was and is run as a system service, in its own scope. If you run your own cron instance, but forgot to set it up as a system service, yeah, it gets cleaned up as you exit your shell/session/scope.


> They broke userland.

So? "We don't break userland" is a Linux kernel thing. Systemd is not kernel, it's userland, and userland things break other userland things all the time. They already broke lots of existing stuff when they replaced /etc/init.d/ scripts with systemd definition files, should systemd also have not done that?

> It doesn't matter what tradeoff they made - they went against POSIX behaviour, and as a result, broke numerous utilities, both past and future.

Linux is not POSIX, so I don't see how that's relevant. For what it's worth, I don't even know what part of POSIX it broke. Care to enlighten me?


Right; the Linux kernel has a "we don't break userland" policy, systemd doesn't. That's a selling point for the Linux kernel, and a strike against systemd. Both systemd and the Linux kernel are infrastructure projects which, if they're doing their jobs well, will never cause me problems so I get to ignore them. Systemd has been causing other people problems, and doesn't seem to understand that in the role they're trying to fill, preventing that from happening is their first and most important responsibility.


Like it or not, the Linux kernel is clearly the outlier in terms of backwards compatibility. For example, Postgres changes their data format in most non-bugfix releases. Would you consider that "a strike against" Postgres?


They provide an upgrade process that makes this invisible to the end user, so it's not a fair comparison. If it started deleting tables when I exit a session, that would definitely be a strike against it.


Postgres has session-bound resources, and in most cases no way to disable those from being deleted when exiting a session. For example in postgres you can't persist a prepared statement, but you can of course persist data within a table. Any function running will be killed when you exit (or at least not complete since the transaction is cancelled).

IMO when a user has logged out and has not had the permissions/foresight to setup a task in the system to run without a session it should be killed.

I get that this has not been the default behavior in linux/UNIX, but to me it seems like the sensible one.

And that's before we ever argue about the possibility to turn it off.


Systemd offer a compile and runtime option to turn this option off, so it is a fair comparison.


I think you're completely missing the point.

If you ruin everyone else's day, and change behaviour everyone else is expecting, then it's probably your own fault.

Approaching it as if everyone should simply change and do what you want, is the height of arrogance. You are generating work for others. And in this particular case, not only are you generating work for others, you are eradicating a category of software.

When a distribution adopts systemd, they let everyone know how things are changing, and slowly transition things over, releasing when stable.

We know systemd replaces init.d. It was difficult, but distributions using systemd got over that hurdle, but it did take time.

However, this is not the same.

Yes, systemd is userland, however it is also PID 1. It is a layer between most userland and the kernel, and so needs to reflect the responsibility of it's position.

Ignoring how NOHUP is supposed to be interpreted, is a _bad idea_, and yes, a violation of POSIX, specifically signals (SIGHUP and nohup), and how they are supposed to be handled.

Moreso, it greatly heightens the difficulty of many utilities that are expected to work.

Why should cron (all implementations of cron), suddenly need to rely on another userland library to maintain it's function?

You just broke most Linux automation. Across an entire industry.

Why should screen (all implementations of screen), suddenly need to rely on a userland library much bigger than most implementations, to continue it's base function?

You just broke an entire category of background systems - including systems communicating with embedded hardware. You might have caused a factory-floor fault. Which could cause injury, or worse.

A breaking change of this level can cause industry-wide ramifications that are not just limited to the digital. Unexpected behaviour is exceptional, and should take time and considerable thought before occurring.

Systemd has responsibility that no other userland system has. It's PID 1.

If they're going to require a massive change in process behaviour, then they are going to require consultation, awareness within the industry, and transition time. They should be working with distributions, aware of the man-hours they're generating, before they put something in place.


This discussion is very much apropos of what the article is talking about:

> The whole systemd battle, Rice said, comes down to a lot of disruptive change; that is where the tragedy comes in. Nerds have a complicated relationship to change; it's awesome when we are the ones creating the change, but it's untrustworthy when it comes from outside. Systemd represents that sort of externally imposed change that people find threatening. That is true even when the change isn't coming from developers like Poettering, who has shown little sympathy toward the people who have to deal with this change that has been imposed on them.

The posix violation is by design. If you think that posix dictates the wrong thing, then you will do something different and this is what Poettering has done. The fact that systemd has more or less been embraced by linux is an endorsement of his design philosophy, even if distributions reject specific features.


I am not upset that there was divergence from POSIX.

Design choices are fine - I can understand why systemd takes a different approach.

What I don't like, and completely disagree with, is systemd not working with the community they directly effect to reduce disruption.

Like it or not, the product is an industry standard, and so will be held to industry expectations.

Rather than turning around and requiring everyone to change, they could have said, "Sorry, we're making changes, here are some preliminary patches that could help."

Or a timeline for a breaking change, wherein they can negotiate with others.

I don't have significant issues with systemd's software, though some reservations about quality. My main concern, and it has been since the beginning, is that systemd acts without thought or conscience to the effects that they might cause.

They lack the ability to be a team player, despite creating an environment where people depend on them.

systemd's adoption rates is an absolute credit to it. They have some very good design thoughts, and those working on it have done some excellent work.

However, it would be better if they communicated with the people they effect, rather than letting the community be an accidental Q&A team when things go wrong.

They do get this right sometimes, but that seems to be the exception, rather than the rule.

They approached the init.d situation calmly, and slowly. They worked with Debian, and Fedora and others to make sure it would work without interruption or loss of quality.

They approached the sigkill situation like they were a kid who just learned how to light a fire and wanted to burn the library down.


You make plenty of assumptions there, in particular that there was no communication about the session killing thing. Turns however there was. We informed downstreams about our intention and the reasons in detail, and we documented this for everybody else in NEWS. We also made sure there was an easy compile-time option to pick the default for this option, and then left the rest for the downstreams to decide: whether to default to on or off to this, taking in the information we got from us and from the rest of the community. If you think they made the wrong decision, then complain to them really. But seriously, you really just assume we wouldn't talk to anyone, without actually having any idea what it communication is really taking place.


> We informed downstreams about our intention and the reasons in detail, and we documented this for everybody else in NEWS.

From The Hitchiker’s Guide to the Galaxy, regarding the plans to destroy the Earth:

‘But the plans were on display …’

‘On display? I eventually had to go down to the cellar to find them.’

‘That’s the display department.’

‘With a flashlight.’

‘Ah, well, the lights had probably gone.’

‘So had the stairs.’

‘But look, you found the notice, didn’t you?’

‘Yes,’ said Arthur, ‘yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying “Beware of the Leopard.”’

Back in the real world: you built & shipped a system whose defaults were and are broken, and now you blame others for not enabling the DONT_BE_WRONG setting. You might as well blame end users for not becoming fully-versed with your code before their first login.

It’s not the users’ fault. It’s not the distros’ fault. It’s yours, and your project’s, for shipping code which breaks the user experience.

I appreciate your vision. It’s a good one. You’re a smart guy. But have some humility! Have a sense of your own limitations, and those of the distros and users who will use your code. You’re a human being; the distros are made up of human beings; your end users are … human beings. Think of them.


This is kind of a ridiculous reply. Is the only solution then to admit that Linux is "done"? Because it sounds like there's no room for change, even when change is communicated and multiple options to avoid it are provided.


> What I don't like, and completely disagree with, is systemd not working with the community they directly effect to reduce disruption.

> Rather than turning around and requiring everyone to change, they could have said, "Sorry, we're making changes, here are some preliminary patches that could help."

> Or a timeline for a breaking change, wherein they can negotiate with others.

But they did exactly that.

They contacted the tmux mainteners and asked if some modifications would be possible to accomodate the new option (see poettering comment here: run things as child of systemd --user or just register a separate PAM session). If I remember correctly, it would not even have been the first special case in tmux ; there already is one for OSX.

The discussion was actually progressing nicely until the anti-systemd flooded it. I remember seeing posts in a lot of place urging people to comment on the bug report with specious arguments. The whole thing was kind of upsetting.


They did that 6 days after releasing the version that broke tmux, that's hardly preparing for or negotiating.


POSIX isn't a law. You don't "violate" POSIX. It's a standard for compatibility. You can choose to not be compatible with a standard when you think it makes sense. That's something that lots of projects do. You are using standards compliance as a moral cudgel.

Your argument is way too impassioned to be just technical. You just basically accused Lennart of hurting people with no evidence whatsoever.

This sort of stuff really doesn't help.


When there is a standard and someone doesn't follow it, it is said that the standard has been violated.

It follows that when someone implements functionality that doesn't follow POSIX, POSIX has been violated.

There's nothing wrong with the statement.


He accused Lennart of hurting people with no proof. Is that reasonable?


Please point out where in my comment I make any reference to reasonability.


Apologies for that part, then. I just don't see standards compliance like other people do. Personally, I don't see standards as things that imply some kind of morality. They are tools to accomplish a goal. sometimes other goals may supersede their usefulness.


That is fair enough. I have not argued against your point of view. My comment was more on the linguistic side of things.

You criticised the parent's language saying that "you don't violate a standard" because it "isn't a law". I was just pointing out that you do indeed violate a standard because it's a standard, and saying that does not add any kind of moral or passion value - it's just using the language the way it's intended.


Aren't we just a few weeks after Rich Hickey's "you have no right to make demands of open source software" rant?

Systemd has responsibility that no other userland system has. It's PID 1.

No, you have the responsibility to check what the software you are installing does, and if you don't approve, change it or reject it. Or, don't check, and deal with it.

Systemd developers do not owe you working POSIX, working cron, industry wide working Linux automation, screen, separate userland for everything. They don't owe you anything. If you don't like their thing, don't use their thing.


Although I very much like the "don't break userland" approach, I agree with you. Especially in the light, that 1. You can start your background process the systemd way (shown elsewhere in this thread) 2. You can configure the desired behavior 3. Your distro probably already has configured it for you (Debian)

So it comes down to "something changed which is absolutely extremely important for me but I would rather discuss about it for hours then take the few seconds to configure it". Especially since the new behavior is intended behavior and also has upsides for a lot of use cases.

So don't be ungrateful. Be happy that some people are really putting a lot of work behind the software you use daily FOR FREE and just configure the darn thing the way you like.

And last but not least, most people here (me included) are not in the position to complain so much about free software, unless they show some commitment to open source themselves.


>If you don’t like their thing, don’t use their thing

Oh how I wish that was a course of action I could reasonably take in this instance...


> Annoying to have to learn a new thing, but hardly the unbearable burden.

The problem is now your scripts won't work on systems that don't use systemd. Shell scripts work on FreeBSD, but now you can't use them because they require systemd-specific code.

I am not necessarily anti-systemd in most respects (I like a declarative definitions of services and less shell script hell), but the fact that they keep trying to get people (including container runtime developers like myself) to use _their_ API rather than the preexisting ones is fairly "anti-social".


Aleksa,

I am not trying to get you to use our APIs. You talking about the cgroups APIs again, if I am not mistaken? As I tried to explain again and again: if you want container runtimes to manage their own cgroups then just set Delegate=yes in the unit file of your manager, get your own cgroup subtree, and you can do below it whatever you want, you do not have to call into systemd ever. Not a single API call, no C call, no D-Bus call, nothing. You get your own kingdom if you set Delegate=yes, and systemd won't interfere with that. This is extensively documented.

I wished you'd actually listen to what I keep repeating to you. We tried to be really nice to container managers, knowing that they disklike systemd APIs, so we put a lot of work in making the delegation boundary clean, so that they can be entirely systemd agnostic beyond setting the Delegate=yes boolean in their unit file, but alas, we just keep hearing the same nonsense.

The LXC/LXD people btw did get this right: they manage their own cgroup subtree now, and systemd doesn't interfere, and they don't link to or do dbus calls into systemd either.


> then just set Delegate=yes in the unit file of your manager

In runc we don't have a dedicated manager or long-running daemon. Yes, Docker and cri-o use Delegate=yes (so I am quite aware of this option) but that really doesn't help people who are using runc in their own user sessions or wrote their own wrapper and aren't aware of Delegate=yes.

I get that we are quite odd, and don't fit into a system-service model. After all of the back-and-forth with both you and Tejun (especially when it comes to "rootless" delegation -- which systemd only offers if you get a privileged user to delegate for you), I'm not sure that there's much I can do on this topic. I get that what I care about is not something you care about, but I would hope you accept that I'm not just being obstinate for the sake of it.

> Not a single API call, no C call, no D-Bus call, nothing.

Right, unless you need to set this up for someone else. And we have code that does this too -- I don't really recommend people use it, but it is necessary (and I'm pretty sure some folks at Red Hat use it based on how many bug reports they submit related to it).

Since systemd is managing the entire cgroupv2 tree (and the fact we can get around that for cgroupv1 appears to be seen as a design flaw by both you and Tejun), obviously we have to talk to systemd to do this type of thing. I just wish this wasn't the way it was done (and if cgroupv2 had a named cgroup concept -- which is what systemd needs for tracking services -- I would think that this wouldn't be such a pain-point).

I guess I'm just annoyed that we can't use "better rlimits" with "rootless" container runtimes because of all of this.

> I wished you'd actually listen to what I keep repeating to you.

I am listening, and I am aware of Delegate=yes and all of that history. But as I outlined above, I don't necessarily agree with it entirely. And unlike a lot of people around here, I don't think any of these pain-points are coming up because of malice or something stupid like that -- I just think we disagree on our priorities.

> We tried to be really nice to container managers, knowing that they disklike systemd APIs, so we put a lot of work in making the delegation boundary clean

Don't get me wrong -- I do appreciate that we have Delegate now (there was a period of several years where "systemd decided to reorganise the cgroup tree, un-containing my containers" happened on several occasions -- and Delegate solved those issues).

And from what I've heard from the LXC folks, you were quite reasonable about getting systemd to work inside LXC. Which is good to hear.

> The LXC/LXD people btw did get this right: they manage their own cgroup subtree now, and systemd doesn't interfere, and they don't link to or do dbus calls into systemd either.

We do basically the same thing. We just don't support cgroupv2.


They changed a decades-old behavior many people rely on, and it must have been obvious from the start people will loose work because of it.


It's a bug because it violates the expectations of an uninformed user. You aren't given a warning about it, it's not documented in big bold letters anywhere, and it's also not POSIX compliant.


Annoying to have to learn a new thing, but hardly the unbearable burden.

Rather, a breaking change to everyone's scripts and processes for zero benefit.


Our scripts and tools work similarly on the four Unix systems we have in-house. Are you saying that it's OK that they don't work on Linux? Please do not forget that Linux is a POSIX system, basically a re-implementation of Unix, and until systemd it's been a fully compliant -nix system. Where I work we have transparently been able to deploy our products on all -nix, including Linux, since the nineties.

EDIT: My reply was supposed to be to xyzzys's post below, not the one I apparently replied to.. sorry about that.


There's a benefit, you're just not seeing it. Again, do you think that the systemd developers decided to implement it just to screw with people? As I said, there's a specific trade-off involved here.

I agree that it might not be the most desirable default, but if that's the case, then the guilt also falls on the distribution maintainers, who either ignored the big bold letters in the changelog, or didn't bother to test the everyone's standard workflows before pushing to stable.


> Again, do you think that the systemd developers decided to implement it just to screw with people?

Based on Lennart's behavior, yes I do.


Instead of pretending the benefit is so obvious it doesn't require you to discuss it perhaps you could explain it.


Not the parent nor Systemd developers, but apparently they think it's the only way to make sure the user's session is cleaned up.

But frankly, 100% people would be fine with it if the default was left at no instead of changing it to yes. It's all about giving users a choice when a new feature is introduced, something Systemd developers understand only partially.


There's a benefit, you're just not seeing it.

Not to appeal to self-authority, but I have been maintaining production Linux systems in large-scale environments since the late 90s. If there were a benefit that outweighed the unnecessary breaking changes, I would see it, even if I didn't appreciate it. There isn't.

You should stop and think before you assume that other people are incompetent, both because it would make you a better interlocutor, and as a bonus it wouldn't violate HN's principle of charity.


The benefit is, of course, clean up of orphan defunct processes. One might argue if this is outweighing the drawback of the change (it might not, but that’s what some distro maintainers chose to enable), but you shouldn’t suggest that they just broke you for no purpose, instead, you should stop and think before you assume that other people are incompetent, both because it would make you a better interlocutor, and as a bonus it wouldn't violate HN's principle of charity.


Your copy/paste doesn't apply to my comment, since I didn't assume you were incompetent, just that you'd made an overaggressive claim you didn't care to back up.

Of course, a defense of systemd's comically broken reaping behavior removes all necessity for assumption in this case. sysvinit at least consistently reaps on SIGCHLD -- systemd randomly reorders into the sd-event API and then does something random based on the order receipt.


> Your copy/paste doesn't apply to my comment, since I didn't assume you were incompetent, just that you'd made an overaggressive claim you didn't care to back up.

Sorry, I assumed you're competent enough to figure it out, or at least look at the original sources where authors of the change explicitly explain the reason why they do it. Of course, since you assumed that they are incompetent, you didn't bother to do so, instead, completely uncharitably assumed that there's zero benefit for that.


I'm sorry to bring bad news, but there's indeed a benefit, you just don't see it.


Surely it can be articulated, then.


It was, many times, you can just google and educate yourself.


> This argument, he said, seems to be predicated on the notion that systemd is a single, monolithic binary.

Can we please stop misrepresenting the complaints against systemd? The only time I ever hear this "monolithic binary" argument is from systemd advocates. The actual complaint is about tightly coupling important features together. Not only does this make it difficult (often impossible) to replace individual components, when tight coupling happens at the (internal) protocol level, any replacement component ne4cessarily hast to implement a bunch of (sometimes unwanted) systemd baggage.

Busybox implements all of its features in single monolithic binary, but it isn't a monolithic design that tightly couples those specific components together. Replacing one of busybox's components is often as simple as removing busybox's symlink and installing the replacement. This isn't even a "Unix philosophy" issue. Even inexperienced designers shouldn't have as hard time Understanding why systemd is a monolithic design but busybox isn't.


https://suckless.org/sucks/systemd/ has items like "pid 1 does DNS". It's an incorrect complaint that exists in the wild, though it certainly isn't the basis of all accusations that it violates the Unix philosophy.



What baggage are you specifically referring to?


It runs its own logging system with non-standard interfaces and formats. It runs its own DNS resolver with non-standard behaviour. It maintains compatibility only with a narrow range of udev versions, which in turn maintain compatibility only with a narrow range of kernel versions. And all the d-bus interfaces between these pieces may change at any point without notice. So you can't replace any piece of it, because even if you provide your own component that implements one of the systemd d-bus interfaces, you've got no forward compatibility.


If there was a serious effort to replace/port parts of it, the needed internal APIs can be stabilized ( https://www.freedesktop.org/wiki/Software/systemd/InterfaceP... ).


> "It's software" so of course it's buggy, he said. The notion that systemd has to be perfect, unlike any other system, raises the bar too high.

systemd is a PID 1 program, it means it have to raise bar higher. When troubles begin, you would need tools to fix them, and if PID1 is crashed, you are out of luck. If system cannot boot into shell, you'd need to fix it from initrd shell. Or to boot other system, to fix this one. It sucks.

Linux kernel chases very high standards of reliability, because when kernel panics it is even worse than PID1 crash. Init system should follow the same standards as linux.


Have you ever had pid 1 (systemd or any other init) crash? For the last ~three years I've been paid to maintain high reliability algorithmic trading systems that ran systemd and a whole lot of other stuff, and systemd has never crashed on me. Lots of other stuff, including the kernel itself, has crashed.

The bar is higher for pid 1 - if I were designing systemd I would have made a tiny pid 1 that just did message-passing to a more complex secondary process that could be restarted, or something, just to be safe - but I think systemd has empirically cleared the bar.


I've had shutdown (for reboot) hang a few times after a systemd update, forcing me to cut the power. It's made me a bit paranoid, so I block the systemd package from having updates automatically installed, and every 6 months or so carefully manage update and reboot of each and every server ...

EDITs:

there's the classic case of the linux "debug" parameter: https://bugs.freedesktop.org/show_bug.cgi?id=76935

and the even more classic case of firmware loading events: https://lkml.org/lkml/2012/10/3/484

and while "all software has bugs" systemd really has the most annoying bugs (by virtue of trying to do everything core to the system) and always insists that they are features and we are backwards whiny geeks for complaining.


Yep.

3AM, deep slumber, called out to look at a stricken server. Its problems included that systemd was frozen. Reluctantly I came to the conclusion that a restart was the only route forward. Cept, that is when you discover that the commands that have served you well for 2 decades don't work, as they are all wrappers for systemd, which has keeled over.

To this day, the `shutdown` man page, which I was checking in, makes no mention of how to resolve, tho in fairness the other commands (poweroff, halt, init) do. I discovered this after stumbling across https://github.com/systemd/systemd/issues/3282

If you find yourself stuck in the middle of the night, reading through docs to try and figure out how recover a machine with a crashed systemd, then `systemctl reboot -ff` or equivalent is what you are now looking for, the `-ff` being the key to "JUST £&*(ing RESTART THE MACHINE!!!".

Experiences like that, don't win you friends.


The worst thing about this is when stuff goes down, it does so at the least convenient time. Back in 2003 I was on a customer site who had a RH server and there was no internet connection available (as it was routed through the box) and my phone was a Treo 180G which had precisely fuck all useful internet on it. The company still exists and is in the middle of nowhere on the end of a shonky ADSL line and no mobile phone reception so the story hasn't improved.

If this happened to me today with systemd I'd be up shit creek without a paddle.


Did raising elephants not work (SysRq + R E I S U B)


systemd disables the magic sysrq keys by default.


I’ve had Systemd completely stop responding before on numerous occasions on centos 7. As in can’t reboot or hangs rebooting or all commands hang.

Only recourse has been to reboot the instance from AWS dashboard.

I can’t get to the bottom of it because the tools don’t work when it’s down and there’s nothing there when it comes back up. I am not enjoying boiling to death in this pot of shit.

And then there’s the situation where it just won’t boot. I just fire up a new instance then because it’s easier than debugging it.


> Have you ever had pid 1 (systemd or any other init) crash?

No, I have not. But I have seen how systemd gracefully failed to boot system to login, with good looking colorful error message. Something that reminded me "Keyboard is not found. Press F1 to run setup."


Oh, to be clear I'm real mad about how systemd fails boot if (say) one of your filesystems is unavailable and makes you log in with a root password to fix it.

But OP was asserting that systemd crashes under normal operation because its pid 1 is too fragile, which is very different. At scale I already expect that there's a chance a machine won't come back if I reboot it - it's annoying if I can't ssh in, but, well, I already lost a disk I care about and it won't return to service and I need to fix it anyway. (And it's an easy fix, just add "nofail" to fstab.) At scale I don't expect init to crash under normal operation.


Yep. CentOS 6's upstart can be felled by generating a bunch of inotify events in /etc.

http://rachelbythebay.com/w/2014/11/24/touch/


I've never experienced an outright crash, but I've been bitten by [0] on some of my servers.

[0] https://github.com/systemd/systemd/issues/719


There was that time it was bricking computers by erasing UEFI variables, but I'll allocate equal blame between systemd and UEFI


Personally, I lay the blame for this issue squarely at the feet of various UEFI implementations which fail to boot when the system's EFI variables are, for whatever reason, wiped clean. The UEFI spec explicitly states that clearing all of the variables on a system must not result in an unbootable system.


You shouldn't, because the maintainer of the kernel subsystem concerned told us all that systemd wasn't to blame for it.

* https://news.ycombinator.com/item?id=15973577

* https://news.ycombinator.com/item?id=11152880


Actually, that was nothing to do with systemd. That was definitely a UEFI implementation issue. And systemd didn't delete anything, the user did - they ran:

  rm -rf --no-preserve-root /
https://lwn.net/Articles/674940/


The bug was in the kernel (it should not have allowed userspace to write arbitrary UEFI variables), but AFAIR it was exposed by systemd because it eagerly mounted the UEFI variable filesystem provided by the kernel into /sys/something/efivars.


Indeed, but again that was a firmware issue. systemd didn't delete the variables. And systemd was setting EFI variables, so consequently it needed it to be mounted as read/write.

The configuration files should have set that to read only after boot.

The kernel patch where this was fixed can be found here:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...


Using systemd automount and NFS you can easily get pid1 unresponsive, hung in uninteruptible sleep forever.


Very recently I had this issue[0] as the result of a systemd upgrade, requiring the use of a recovery disk to downgrade to the previous version as the keyboard input had failed to be initialized.

[0] https://github.com/systemd/systemd/issues/11314


If this bug hits you in staging, no big problem, just don't promote that particular update to production. If this bug hits you in production, your lack of a staging environment is the bigger concern IMO.

(I have been hit by the same issue on my private notebook, but I have procedures in place to cleanly recover from failed upgrades on all systems, so it was not a big deal.)


"Yeah the software completely broke, but that's fine because you should be able to deal with that" does not make me feel better about the software in question.


No. Not mine. Not systemd. Not others. And I touched upon how rare this was in practice in my experience some years ago on Hacker News.

* https://news.ycombinator.com/item?id=8384251

But it does happen to other people.

* https://unix.stackexchange.com/questions/440229/

And there was one crash that made the headlines.

* https://news.ycombinator.com/item?id=12600413


The current debian testing version crashes with a NULL pointer segv in the kernel module. You need to downgrade to the previous version.


In what kernel module? There is no "the kernel module" in a systemd context.


There is. udev loads kernel modules. See eg. http://www.linuxfromscratch.org/lfs/view/development/chapter...


Waving in the direction of udev does not clarify what kernel module is supposedly the kernel module, which is what you were asked.


'rurban is one of our resident trolls - see also https://news.ycombinator.com/item?id=13364173


> Have you ever had pid 1 (systemd or any other init) crash? For the last ~three years I've been paid to maintain high reliability algorithmic trading systems that ran systemd and a whole lot of other stuff, and systemd has never crashed on me.

You see, that's the argument I hear a lot from Systemd advocates. The problem with anecdotal evidence is obvious. When you hear people opposing Systemd, practically all of them have some real-life issues with it, often related to functionality that would otherwise be non-essential (i.e. doesn't really need to be handled by PID 1). Of course if you don't have a particular problem, you don't feel it's important. That's precisely the attitude people resent.


> When you hear people opposing Systemd, practically all of them have some real-life issues with it

Yes, but a lot of people have real-life issues with it on their desktop of the form "It's too complicated." I'm asking specifically about real-life issues on production servers at scale. There will of course be tools that are poorly suited for a personal machine (even a personal server) but well suited for a team that wants to run a bunch of reliable servers.

For instance I would never be happy running RHEL on my desktop, but that doesn't mean RHEL is useless.


I can't quote any statistics but have the impression that a large part of non-Systemd crowd are old-time admins who maintain a large number of servers, myself included. When you break something on a desktop machine, that's easily fixable. When you need to deal with a large heterogeneous environment, you prefer to have things handled a bit more gracefully. Linus is a good example of a person who got this right.


This article is a bit of a joke "It's software so of course it's buggy" isn't a great argument when you're replacing something that didn't suffer the same issues.

I just count my blessings that runit is widely packaged in every major distro because it can just happily sit on top of sysvinit, systemd, upstart, pretty much any init system and does things in a very simple shell script style, I really wasn't a fan of the weird ini-like format for systemd or several different tools I'm expected to learn just to read my (now binary) log files competently.

If you're sick of switching init systems constantly or don't want to have to write separate scripts for your linux box and your freebsd box even, I highly recommend checking runit out.

I'm sure I'll give it a serious shot eventually... in about 3 years once they work all the Poettering kinks out, just like PulseAudio. They're doing some cool things with cgroups and stuff, so I hope it gets there eventually.


I second using runit. We use runit to be able to use the same service definitions inside docker, on a VM or bare metal.

If you've ever tried to use systemd inside docker to bring up a couple of services, you would know the hoops you have to jump through to get it working.

(I understand that docker wasn't invented to run multiple services in the one container, but sometimes it can't be avoided and simplifies app deployment vastly I.e, using CI to test your service actually starts up as per its definition: just run up a quick docker image with runit and a service definition file)


I've only seen supervisord as the root process in multi-purpose containers. Is there any significant gain to using systemd instead?


If you use systemd, you can use standard packages from your distro to run up services inside a container. That's basically the only reason I considered it.


>I'm sure I'll give it a serious shot eventually... in about 3 years once they work all the Poettering kinks out, just like PulseAudio.

Good luck. If you need anything more than "I play a three minute song" on Linux audio you need both some type of real time kernel and jack.


I hate how he talks about knee-jerk reactions to change. because I don't think that's what's going on here. I remember I first saw systemd on a release of opensuse. I didn't think anything of it at first, except that I didn't really like the command line interface (systemctl) and I found the flags and options cumbersome. I often see new software in new releases (including the old HAL/DBUS layer) and didn't have the same reactions (although HAL had a lot of issues and was later removed or merged into dbus).

I've seen the BSD talk on this and I agree, having a system layer is helpful. It'd be nice if it was plugable, NetworkManager (or others that have some standard messages you can send/get via dbus), consolekit OR logind, etc.

systemd does make it nice that I only have to write startup/shutdown scripts once for each distro, but I'm not happy with the layout of target files, the way mounts are handled, some of the weird race conditions I've found between systemd mount targets and fstab, etc.

systemd is modular, but the modules are still all part of the whole and are not easily replaceable. The same can be said when Docker went to a modeler refactor, but there are alternative implementations of the entire docker engine. Every attempt to create alternative implementations of systemd have eventually gone unmaintained because systemd keeps getting more and more complex and engulfing more systems.

If it wasn't for distros like Void, Gentoo, Alpine, Slackware, et. al, we'd no longer have a choice at all. There would be some things that simply couldn't be deployed on embedded systems because all of the dbus shims just wouldn't exist.

It's not that people are opposed to change, it's that there are legit concerns about some of the ways systemd works and is implemented, and the way it's been ham-fisted as a political move in a lot of ways.

Honestly, I don't think it will matter in a few years. I think the way things are going, eventually all services will be hosted via docker containers and it will be much easier to make Linux distros that have a tiny init layer that just launches a docker daemon and services. RacherOS already does this, with the init process being a container, which can be uses to start up shell environment containers and other service containers.


>It's not that people are opposed to change, it's that there are legit concerns about some of the ways systemd works and is implemented, and the way it's been ham-fisted as a political move in a lot of ways.

I personally think the industry needs a lot more resistance to change when it comes to interfaces and other things humans have to understand.

I mean, I'm not talking about systemd in particular; I'm talking about in general about how interfaces change over time and people don't seem to take into account the cognitive costs of that change. Sure, ss is better than netstat and IP is better than ifconfig... but how much of that 'better' could you have done in a way that didn't toss away the historical knowledge so many people have of those tools?

And really, sysadmin tools are the least of it; I mean, they are operated by professionals, so if you want to pay for retraining (or pay the costs associated with there being fewer of us)

People change customer facing interfaces to no benefit all the time, forcing people who are trying to do other things to put effort into re-learning their interface.

I mean, my point is that interface changes are expensive, and should not be undertaken without a really good argument that they bring more benefit than the cost of retraining.


First we need sane (secure, semantic, programmatic) interfaces that slowly become standards.

netstat? /proc files? ss? parsing text? wtf?

I mean, sure why not, but at least don't call them interfaces. they are userland apps people like to script, because they are lazy to use libnetlink (or libwhatever thay uses the right kernel interface, if it exists at all).

That said, the recent gmail ui change made me reconsider Thunderbird again. And android looks different every year. sometimes it's better, sometimes it's worse. iptables, nftables. http1, http2 (and now 3 over UDP). change is the only constant.


You need to figure out what is wrong with a machine before you write a program to fix it; this is why it's important to be able to log in to something broken and nose around.

Text processing is not harder than figuring out what library to use this month.

These things change a lot... but they don't have to, and running things on computers would be easier/cheaper if they didn't.


> systemd is modular, but the modules are still all part of the whole

The idea that you are searching for is coupling. Modular systems should aim to have low coupling and high cohesion.


Not everything runs well in a container. In fact with the lack of network understanding in most container implementations I would say there are many issues.

Additionally this concept of stateless container design and state kept in containers there are opposing implementations


docker is currently killed by the kubernetes community btw. It's a slow death in some regards (1+ years) but quick in regards to "it will replace anything in 10 years".


You know, there's so many "big" wrong decisions with systemd design and assumptions (in particular, there seems to an unhealthy focus on graphical desktops rather than headless multi-user servers, as this default of sigkill all the things on ssh connection drop default is but one example. And the monolithic design (DNS in the init?)).

But that said, does anyone know where on earth they came up with the command line ux? Like the names of the commands , and the parameters? I mean, they are like an April fool's joke...

More

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: