Two years later, you can find hundreds of support requests across the internet, from frustrated users who are having their sessions killed by systemd.
Bugs are annoying, but that's life. On the other hand, when you're an impacted user who's lost work, and researching the bug leads you to a years-old discussion in which someone is actively denying that the bug exists and refusing to fix it, that's infuriating. I don't think systemd's developers deserve the trust that maintaining a core piece of infrastructure requires; they don't seem to care enough about whether they've broken things.
Now I am used to taking blame for apparently everything that every went wrong on Linux, but you might as well blame your downstream distros for this as much you want to blame us upstream about this, as it's up to them to pick the right compile-time options matching their userbase and requirements in compatibility, and if they didn't do that to your liking, then maybe you should complain to them first.
(And yes, I still consider it a weakness of UNIX that "logout" doesn't really mean "logout", but just "maybe, please, if you'd be so kind, i'd like to exit, but not quite". I mean, that's not how you build a secure system. We fixed that really, fully knowing it would depart from UNIX tradition, but that's why we made it both compile-time and runtime configurable)
(Also, nobody has to "incorporate" systemd's library to avoid the automatic clean-up. In fact, there's no library we provide that could do that. What was requested though is to either run things as child of systemd --user or just register a separate PAM session, neither of which requires any systemd-specific library.)
It's up to you as a systemd developer to pick sane defaults. Claiming that it's okay to introduce opt-out breaking changes upstream and then abdicate responsibility is a quite bit like walking around while waving your hands and arms around and then blaming whoever you hit for walking into you.
IOW the distros maintainers made a mistake by picking systemd? Agreed.
That said the question is not so much about who sends what, but more about whether a secure system should allow user code to escape lifecycle management or whether logging out means logging out and giving up all resources.
Please stop trotting that tired old line out. It is simply untrue. Systemd does the exact opposite of providing increased security. If nothing else the greatly increased surface area of systemd makes for a less secure system.
The pwnie articulates a number of other ways in which your code and your behavior are actively reducing the security of Linux.
IMO, it is part and parcel of designing great software that you pick as universally agreeable defaults as possible.
That you can configure systemd to behave in a less obnoxious manner is well beside the point. Systemd should be unobtrusive and predictable without any extra action on the part of the distribution folks or end users.
That the suggestion is to simply read the code or documentation is the height of arrogance considering how sloppy and insecure the systemd code is (parse error equals root privileges? come on…).
Yet just two days ago, we see Linus Torvalds (the creator of Linux and maintainer of the Linux kernel), launching into a tirade against – yes, you guessed it – systemd developers because of their atrocious response to a bug in systemd that is crashing the kernel and preventing it from being debugged. Linus is so upset with systemd developer Kay Sievers (gee, where I have heard that name before – oh, that’s right, he’s the moron who refused to fix udev problems) that Linus is threatening to refuse any further contributions from this Red Hat developer, not just because of this bug, but because of a pattern of this behavior – a problem for Kay because Red Hat is also foaming at the mouth to have their kernel-based, no doubt bug- and security-flaw-ridden D-Bus implementation included in our kernels. Other developers were so peeved that they suggested simply triggering a kernel panic and halting the system when systemd is so much as detected in use.
The key phrase there is:
a bug in systemd that is crashing the kernel and preventing it from being debugged
Honestly though when you get Linus flaming your behavior you're doing something really wrong.
Haven't been around here long, have you? :-)
And sometimes security requires breaking compatibility.
The options for fixing the bug are:
* nohup, tmux, emacs, etc all take dependencies on systemd and use the new systemd daemonization procedure. This is not a viable path because the maintainers of those utilities have refused (see https://github.com/tmux/tmux/issues/428), and because there are too many of them.
* Each distro separately works around the problem by maintaining forks of nohup, tmux, etc. This is not a viable solution because it's way too many forks; people will be finding broken distro+utility pairs forever.
* Each distro separately works around the problem by putting loginctl enable-linger in /etc/profile and KillUserProcesses=no. This would effectively be overruling a systemd's decision. Some distros won't know they need to do this, and the github systemd repo becomes a trap.
* Or: systemd backs down and changes the defaults so that the old daemonization APIs work again.
If you have a fifth option, we'd all love to hear it. But the status quo is that there's a user-facing bug, and the bug is still there. Rather than make the case for it not being a bug, you're currently making the case for it being someone else's bug, but the "someone else" doesn't actually have the power to fix it. You are the only one with the power to fix this bug.
Replace systemd with something else.
As an aside this is the height of arrogance to suggest that the systemd is somehow a more secure alternative. Lest this be considered an empty ad hominem attack, let me quote the pwnie you won in 2017:
> Where you are dereferencing null pointers, or writing out
> of bounds, or not supporting fully qualified domain names,
> or giving root privileges to any user whose name begins with
> a number, there's no chance that the CVE number will
> referenced in either the change log or the commit message.
> But CVEs aren't really our currency any more, and only the
> lamest of vendors gets a Pwnie!
oh my god, what a spectacular issue. And, seriously, the Poetterings' response is basically "not my job" and "not a bug". And this person develops something that sits at the core of a modern linux system...
All the while Lennart claims that he's making Linux more secure. FFS.
Edit: I forgot about this
> He (Theodore Ts’o) goes on to describe how he previously had to neuter policykit’s security (rendering his system very vulnerable) just to get his system working, and how he has found systemd "very difficult sometimes to figure out".
> As for Kay Sievers, maybe he should rename himself to Kay Sewers, because that’s exactly what he smells of. He told to IETF internet area director and previously DHCP working group co-chair “Tod Lemon” to lmgtfy when he asked about a systemd related git repository.
This gem sums it up perfectly though:
> Yet just two days ago, we see Linus Torvalds (the creator of Linux and maintainer of the Linux kernel), launching into a tirade against – yes, you guessed it – systemd developers because of their atrocious response to a bug in systemd that is crashing the kernel and preventing it from being debugged. Linus is so upset with systemd developer Kay Sievers (gee, where I have heard that name before – oh, that’s right, he’s the moron who refused to fix udev problems) that Linus is threatening to refuse any further contributions from this Red Hat developer, not just because of this bug, but because of a pattern of this behavior – a problem for Kay because Red Hat is also foaming at the mouth to have their kernel-based, no doubt bug- and security-flaw-ridden D-Bus implementation included in our kernels. Other developers were so peeved that they suggested simply triggering a kernel panic and halting the system when systemd is so much as detected in use.
The other problem is, of course, the utter lack of understanding Lennart demonstrates by being so dismissive and the increased potential for systemd to be hiding future security vulns.
As to the stuff mentioned in the pwnie. Those sound like great contributions that would be appreciated.
You could also take your concerns to the distro development group. If that doesn't work you could also customize your distro with a custom build of systemd.
If you still don't get satisfaction you can stop using it.
If you dislike how they do thing you have options. Or, you could just be mean on a forum...
When I switch distro, it's almost always systemd, and not the system du jour, so I know how it works. Creating service files is a google query away, and makes common use cases a breathe, while advanced features that were hard to bash script yourself into, are now just a few options to type.
I understand that many people may have problems with systemd for their particular situation, but that's not my experience.
As a dumb user with a few laptops and servers that needs an occassional daemon, I'm glad systemd won. I know you get a lot of heat since it came out, so thank you for working on it.
What is not as good: (1) systemd takes over or duplicates functionality not related directly to its primary purpose, and (2) is not solid enough to trust it in a number of cases, while (3) the developers' attitude does not give a lot of hope that the situation will materially improve.
(Of course, I run a distro without systemd.)
Ok, but UNIX and it's behaviour has evolved over forty years, and users have a certain set of expectations about it.
Also, it should be noted, systems like UNIX are cultural artifacts. The way they are is the result of forty years of back and forth debate and negotiation and eventually compromise.
I can't speak for all of them, but I think that people that are bothered by systemd are upset that all of history has been brushed aside to make place for the preferences of just a few influential developers.
Whether a feature like logout is "logical" or not, is besides the point. Operating system design isn't just about logic, it's about serving users.
You build your software the way you want and like. If others don’t like that it breaks POSIX they should stop using it instead of complaining. Or fork it.
When you run your screen or tmux below `systemd --user`, you still would have to `loginctl enable-linger`, no? I remember having to do that when I set up a PulseAudio server on a headless machine where I don't maintain an active session.
It's entirely OK if the admin then opts out specific users or even all users from this behaviour, i.e. if a privileged players decides to liberalize unbounded, unlifecycled resource consumption for unprivileged players. But a default where unprivileged code can just stick around uncontrolled and consume as much as it wants forever is just a strange choice security wise.
i.e. I think the fact that SIGHUP masking is unrestricted, i.e. is not subject to privilege checks is the problem really. Something is unpriv by default that should be priv by default. And that's pretty much what this option in systemd provides you with.
This was well known and accounted for where necessary. You considered everyone else to be wrong about the issue and went ahead and fixed it according to your opinion. Don't be surprised that a considerable portion of "everyone" doesn't agree with you.
Could you please explain that in a bit more detail?
A unprivileged user can still do this by setting up an intermediary box that keeps a persistent ssh session open. Incidentally, this is exactly what I plan to do if I ever need to ssh into a server with KillUserProcesses=yes.
> other OSes don't really allow this unprivileged either
On Windows, if I remote desktop from a laptop into a desktop, and start a web server, then shut down the laptop, the server stays running. On iOS if I start drafting an email, and reboot my phone, I don't lose my work. On ChromeOS, my tabs will stick around after a system crash. The world is moving toward processes being _more_ persistent, not less.
Well I'm certainly seeing why people get so frustrated with systemd junkies. Killing a "rogue" Chrome extension doesn't provide any meaningful form of security. There's no privilege escalation in play here. Whatever snooping it could do with you logged out could be done when you're logged in. Snooping on all users? Yeah, not going to happen without privilege escalation (which systemd will happily provide). So while systemd introduced this obnoxious behavior that broke all sorts of commonly used utilities no benefit was gained (except perhaps reinventing the wheel).
Meanwhile if you're worried about security don't forget that systemd has introduced a number of denial-of-service vectors (including one that results in a kernel panic) as well as an actual privilege escalation bug (which, in a fit of irony, could've been mitigated significantly by respecting return value tradition of zero = success). Take a look at the privilege escalation bug remedy, the vuln was due entirely to breathtakingly sloppy code. I'm ignoring the whole dereferencing unchecked pointers thing because that's such laughably bad practice I don't even know where to begin. Then take a look at Lennart's response and his unwillingness to mention CVEs anywhere.
The end result is that you have a combination of: breaking changes offering zero benefit, sloppy code resulting in reduced security, and a complete absence of any sort of security culture. Lennart, IBM, and systemd can claim all sorts of things (perhaps there really is a value in moving away from shell scripts) but security? No. There is absolutely ZERO merit to any claim that systemd increases security. The lack of security culture and defensive coding that permeates systemd all but guarantee future vulnerabilities.
But wait! There's more!
Systemd is also remotely exploitable. Sure, no program is perfect, but most programs strive to decrease the attack surface where systemd strives to increase it.
If there were some way to design this so that nohup would give a permission denied error on start and tmux would give one on detach, rather than die on logout when it's too late to display a warning, that would be a lot better. There may not be a feasible way to do this, but it would solve a key part of this problem, which is that people don't find out about this behavior until something has already gone wrong, and don't find out that systemd is responsible for the behavior until after they've gotten frustrated enough to be mad about it.
What I see most companies in this industry do is use per-user virtual machines to address the issue, which completely bypasses the question about logged in and logged out. It would be interesting if the intention in current development is to give us administrators more options here and allow for cleaner handling of compromised accounts.
And I don't think anyone really had a problem with the old default of letting things run. Worse is better, after all. Pursuit of a perfect system will just make things too complicated, too brittle, and too obtuse.
I like Unix because it doesn't try to solve every problem. It's a libertarian operating system, if you will. Sometimes this causes problems, sure, but if the system is simple and liberal, you can always fix them without much effort.
so, unix has been running for 20+ years laden with this security flaw? strange that nobody has been screaming out to plug it all this time.
this feels like you have a bee in your bonnet that it is not a very 'pure' logout by some interpretation of what a "logout" should be. imho, "logout" should mean what it has always meant in the past.
This is standard from you. You knock the glass on the floor and blame the maid service for not cleaning up after you.
It's everyone's faults but yours.
>And yes, I still consider it a weakness of UNIX that "logout" doesn't really mean "logout", but just "maybe, please, if you'd be so kind, i'd like to exit, but not quite".
Oh how hyperbolic. Nuances and caveats in terminology is not a weakness.
I don't see why you're splitting hairs over this but can't be bothered to care about your UID numbering bug.
Or he fact systemd-resolv is responsible for DNS leaking on VPNs.
But yes, tell me more about how a functionality that enables terminal multiplexes is a "weakness"
>Now I am used to taking blame for apparently everything that every went wrong on Linux,
It's because of your smarmy, arrogance.
You break POSIX compliance, which has a real world effect in multiple areas and you accept bug reports with the humility of Donald Trump being interviewed by MSNBC.
Then when you retreat into your safe space, you play victim to the situation you created.
You talk of Linux culture toxicity, smearing the likes of Linus Torvalds, while essentially being the metaphorical sibling putting your finger in people's face repeating "I'm not touching you" over and over. Then you acted attacked when someone claps back.
You're a cry bully hiding behind a vaneer of professionalism acceptable for Red Hat's HR department which enables you to mark one more bug as "wontfix"; your attitude, your arrogance, your conceits that things not broken in fact, are so you can provide solutions no one asked for and no one benefits from.
You're just kind of yelling, and it diminishes any point you may have made.
What makes it worse is that he's often not completely wrong. Linux did need something like PulseAudio, something like Avahi and something like systemd. But his reach exceeds his grasp (which probably applies to us all, as I've found on my own projects), which leads to the well-known problems of PulseAudio & systemd.
I don't actually want him to quit the Linux world. But I wish he would scale back his ambitions just a tad, and consider that maybe — just maybe — other people have some good points, and valid concerns.
And also Windows/DOS are not terribly good design exemplars.
> It doesn't matter than hundreds or thousands of voices oppose him; I don't think it would matter if every single human being on earth opposed him.
makes it seem like everyone that uses systemd hates it or sees the same flaws as you or the other people yelling.
I and many others started admin'ing during or slightly before the systemd transition (ubuntu14->16 and rhel6->7) and have found it a much easier path to running services in a sane way than before. It was certainly possible before it, but with systemd I can do it a lot better and easier than I would have been able with previous inits.
For every person saying that systemd made things worse I expect there to be 10 silent sysadmins that appreciate what it did. I have no evidence of that, but that is my experience.
It breaks screen and tmux functionality, leaks DNS when connected to a VPN, it riddled with "wontfix" security vulnerabilities stemming from a refusal to be POSIX compliant.
Systemd replaced udev for crying out loud.
Most of what I see/use of systemd I like. Some of it I don't, and some of it is a dumpsterfire. I think I could say the same or worse for any ambitious software project.
As for the security issues I certainly place those in the dumpsterfire category and I'd like for the systemd team to handle them better.
That, however, does not mean that systemd is anything other than a giant fucking dumpster fire. Looking at how Lennart interacts with other Linux devs, how he reacts to bug and security reports, looking at the lack of code review and the shoddy design decisions that get baked into systemd… it appears as if systemd mostly works through sheer luck. That sort of approach may be acceptable when you're talking GNU vs X emacs, but it's absolutely the wrong approach to such a critical piece of software.
The other thing I'm missing is any improvement. All of this upheaval has been for what? Assuaging Lennart's ego? Not good enough.
> You're free to hate it and some of that is certainly justified, but don't assume that the contrary opinion is based on uneducated or misguided opinions.
When the article being discussed consistently wrongly characterizes and dismisses technical arguments against systemd I think it's fair to say it's a bit more than misguided.
> As for the security issues I certainly place those in the dumpsterfire category and I'd like for the systemd team to handle them better.
Yeah, no. Security as an afterthought is a bad approach in general but it's even worse when you're talking about low level bits like PID 1, the kernel, boot loader, etc. This right here is enough reason to run, screaming far far away from systemd.
You know the best part though? I've had plenty of frustration with upstart (especially with features they've decided to remove over the years). None of this compares to the heavy handed, anti-social bullshit that seems to engulf systemd. Hell, I recently bought a replacement laptop. I even entertained the idea of a Linux machine. Systemd and its effect on Linux on the desltop was one of the top reasons I went with another MacBook Pro.
if which loginctl > /dev/null && loginctl >& /dev/null; then
if loginctl show-user | grep KillUserProcesses | grep -q yes; then
echo "systemd is set to kill user processes on logoff"
echo "This will break screen, tmux, emacs --daemon, nohup, etc"
echo "Tell the sysadmin to set KillUserProcesses=no in /etc/systemd/login.conf"
Turning this on to true, for me it does no make sense to a user service (yeah, I run emacs as a user's systemd service) to keep running after I logout of my system.
P.S.: And the fact that for some people this behavior makes sense is why I think Lenart decision to put this as an option makes sense.
POSIX is nice, but rather lacking in certain aspects, such as security anf administration-friendliness. cgroups help with both, but people have to understand them and use them well.
1. Edit: Not literally closed by the reporter. Lennart Poettering closed it, "closed by the reporter" as in "the issue was resolved to the reporter's satisfaction".
Are we reading the same bug report? The one I'm looking at was closed by the creator of Systemd.
How else do you propose to make sure that when I log off my ssh-agent is really terminated and not just locked up with my keys still in memory? The POSIX approach is insufficient, there's no way to know if a process received a signal and chose to ignore it and keep running or if it received a signal but it was deadlocked and kept running.
If you're not going evaluate each individual program to determine whether the new behavior is appropriate then it should be opt-in rather than opt-out. Then ssh-agent and anything else that knows it should be forcefully killed can opt-in without breaking other innocent programs.
I think some people sometimes lack any perspective on the topic.
I’m not being emotional about it, just irritated.
Systemd has tangibly caused me to lose work with tmux; I appreciate there are root causes for this, but frankly, if some piece of someone’s code does that, for whatever reason that is beyond my control to immediately stop using it...
...it feels justified to be annoyed.
How do you suggest an alternative meaningful response would look?
Create my own distribution?
What tangible and meaningful alternatives do I have other than encouraging people not to use systemd?
> What tangible and meaningful alternatives do I have other than encouraging people not to use systemd?
Sure, if you think you can actually “test every single program and make everything opt-in.” I think you will however find that making everyone happy and having new features are just simply contradictory by the very definition. At some point you will want new stuff and you’ll have to break something.
The best you could do is adopt BSD’s model and fork tmux and other userland and ship outdated/patched versions. It’s a ton of work, of course.
I am not actually seriously suggesting you create your own distro, after all you can probably just fix the annoying issue with systemd and move on with your life, and Systemd actually makes it easy for your by making it a configuration switch and supporting the non-default workflow.
I am simply suggesting you put yourself in the position of someone that has to make those decisions and really think about it from that perspective. Everything’s always a trade off.
Given the extraordinary scope of systemd, what happens with the next major issue? Having to perpetually work around poorly designed software is infuriating.
> I am simply suggesting you put yourself in the position of someone that has to make those decisions and really think about it from that perspective. Everything’s always a trade off.
Why should the onus be on the end user? Perhaps the distributions should be making choices that are less antagonistic of their users (e.g. upstart instead of systemd).
You're right about the tradeoffs though, and one of the tradeoffs for buying into systemd is angry users.
Systemd doesn’t break stuff if they just feel like it. Everything is compatible if it can be, for example you can still run /etc/init.d scripts and manage them through systemd on Debian. Lingering processes are also still supported! It’s a configuration switch that most distros decided to turn on by default, because...
> Why should the onus be on the end user? Perhaps the distributions should be making choices that are less antagonistic of their users (e.g. upstart instead of systemd).
... it’s a net benefit to most users. It’s only “antagonistic” to a particular subset of powerusers perfectly capable of working around the issue but somehow more motivated to loudly complain about it on Internet.
> You're right about the tradeoffs though, and one of the tradeoffs for buying into systemd is angry users.
Fair deal if it helps with even 0.1% desktop market share.
What is the actual workaround? Is there a patch that unbreaks nohup by passing cwd and env to systemd-run --user or something?
My use case: I run a shell pipeline that will probably take all weekend to finish. On a POSIX box I start it with nohup. What do I do on a systemd box? Does nohup need a patch that doesn't exist yet?
In particular you'd probably want --user so that it runs it under your user instance of systemd and --scope so that it's all run under a scope for that command instead of just a transient service. For most uses of nohup you could literally just make it an alias for systemd-run --user --scope instead.
Apparently you think Linus is one of those who "lack perspective"?
I get that systemd isn't the kernel, but it's close enough. There are many who would agree that breaking existing behavior in the name of security isn't wise. I have also not yet seen anyone point out specific security issues this solved. Unix has worked this way for a long time.
As to the Linus’ post, if you want to argue that there wasn’t enough notice about this change, then that’s fine, but this isn’t what anyone here is arguing.
Also it’s a configuration switch, any distribution could have decided to revert it or postpone it at their choosing.
Err... Maybe I'm missing something but I don't believe that's the case. There's a lot of things that you shouldn't do inside of a signal handler that will exhibit undefined behavior, but it's not like the kernel puts any restrictions on what the application can do inside of a signal handler. If an application wants to make SIGHUP just call whatever existing application exit logic they already have, they can. It's a terrible idea because if the application was signalled in the middle of some library call then it's anyone's guess as to whether or not it's just going to crash but that doesn't mean that you can't do it.
I think you're underestimating the difficulty of gracefully shutting down an application in a signal handler. If it's waiting for the application to finish some operation it's stuck in it'll just do the exact same thing as using nohup and there's no way to know that outside of the application.
Meanwhile if the process isn't handling SIGHUP then there is little chance of undefined behavior in the default handler, which merely terminates the process immediately.
That's not correct, for stuff running in the user's scope more often than not a SIGHUP handler is just to gracefully exit the application. I.E. close any open files, finish any writes in process, etc.
But also, you don't know what the SIGHUP handler does to begin with. That's the crux of the problem. Outside of the process the SIGHUP handler is just a black box.
>If it used systemd-run instead, it could still get into a bad state at any point thereafter and you have the same problem.
No, if it was started with systemd-run there's no SIGHUP sent to it in the first place. Reaping applications that won't close in the user scope isn't about preventing them from breaking in the first place, it's just sweeping up the broken pieces so that it doesn't break the next user scope because it's still holding some exclusive lock on something.
It's like putting the user session into its own container. It doesn't fix anything, it just keeps the breakage contained to the user's scope so that when you log out, it really does shut down that "container".
That's essentially the same thing, and the application would have to do something similar to protect itself.
Suppose the user would lose data if the application doesn't exit gracefully, but this may take a variable amount of time depending on how much unsaved data there is, current load on the machine, etc. So it handles SIGHUP, continues running to save its state, but hasn't finished before systemd kills it.
To prevent this it would have to use systemd-run to preserve itself long enough to finish saving its state, and we're back to square one again. Or it doesn't do that and the user loses data.
What gives the init system the right or the duty to reach down into a user's processes and determine that they are stuck (versus running appropriately, as e.g. the user indicated with nohup(1))? Why is it the init system's job to handle that?
That's just not its job. If I wanted to run some sort of misbehaved-process killer, I could. Or, y'know, not running misbehaving processes. Ideally, that would include not running misbehaving processes like anything from the systemd project.
0: or, as in systemd's case, blindly assume
If this behavior was mandated by some other piece of software named FluffyUnicorn and had nothing to do with Lennart, but was still widely adopted just as systemd is, would you be ok with it?
It’s in systemd because it makes sense to be there. Systemd already groups services into cgroups so it makes sense to also do that for user sessions.
> That's just not its job. If I wanted to run some sort of misbehaved-process killer, I could. Or, y'know, not running misbehaving processes. Ideally, that would include not running misbehaving processes like anything from the systemd project.
So toggle a configuration switch on your system. What you are actually trying to do is to FORCE this bad and confusing behavior as a DEFAULT on regular users that have no need or want for it.
If this behavior was mandated by some other piece of software, it wouldn't be as widely adopted as systemd is.
That's the true problem with systemd. It tries to do everything and does 80% of it well enough that many people use it, but then is too complex and integrated with itself to easily identify and carve out the problematic bits and replace them with third party alternatives.
So your argument is that this is forced on people because of systemd’s political power?
There’s a configuration option to reverse this behavior, it’s not hidden away somewhere, it’s been widely publicized.
Any distro could have flipped the switch and easily reverted to preserve backwards compatibility, but none did. This is because this change is a net benefit to the majority of users.
> That's the true problem with systemd. It tries to do everything and does 80% of it well enough that many people use it, but then is too integrated with itself to easily identify and carve out the problematic bits
Again, you don’t need to fork systemd to change this behavior. If that was the case I would understand the criticism. But that is not the case. The alternative workflow is perfectly well supported. All we’re arguing about is the defaults. Systemd developers go out of their way to not break things.
You’re arguing for making up some abstraction layers for plug-n-play components that no one is demanding, and would probably never be used. Modularity has a cost, and not only that, but you also have to know where to draw the line between core and addon.
And if systemd actually did all of that, I’m pretty sure all those habitual complainers would just argue that it’s over-engineered and should have been kept simple. You can’t win with the peanut gallery.
No, many of them did. The problem is that this is not the only such issue, and distribution maintainers don't have unlimited time and resources to re-evaluate every individual default chosen by upstream, so most of the upstream defaults end up in the distributions. The distributions can fix this once you identify the problem, as e.g. Debian has done, but "you can change it" is no argument for a bad default, because changing it is work in the meantime things are broken.
> Again, you don’t need to fork systemd to change this behavior. If that was the case I would understand the criticism. But that is not the case. The alternative workflow is perfectly well supported. All we’re arguing about is the defaults.
If the defaults weren't important then why are you arguing about them?
> Systemd developers go out of their way to not break things.
Yet tmux and screen are broken on the distributions that use upstream's default.
> You’re arguing for making up some abstraction layers for plug-n-play components that no one is demanding, and would probably never be used. Modularity has a cost, and not only that, but you also have to know where to draw the line between core and addon.
You say that as if it wasn't the way everything works in many other init systems. The init system doesn't typically have a DNS server, you can use dnsmasq or BIND or unbound or djbdns or whatever you like. It doesn't have its own cron, there are many choices and you can choose any of them.
And just drawing any hard lines would help. Even if you had to replace two modular components to replace one thing, or one component that does two things when it should be one, that's certainly a lot more feasible than having to understand and touch thirty integrated pieces to replace one component.
Well they should. Otherwise, what’s the point of them?
> Yet tmux and screen are broken on the distributions that use upstream's default.
Of their own volition. And btw, distributions could patch them to work with systemd. None of this is systemd’s fault. Since when is it upstream’s job to make sure downstream properly integrates their software?
> The init system doesn't typically have a DNS server
There’s no DNS server in systemd core. It just lives under the same umbrella. Do you know FreeBSD has DNS server in the same repo as kernel? Does it mean it has a DNS server in the kernel? You know perfectly well that this is just plain false.
> It doesn't have its own cron, there are many choices and you can choose any of them.
Why would you need “many choices” for a simple timer? What are you going to do, invent new type of time?
Anyway, you’re completely ignoring the other perspective on this. Because old style init did so little and so poorly, cron used to be a de facto service manager. Also don’t forget inetd. So you had duplicated, poorly implemented, but nevertheless, redundant functionality in several separate systems. How is systemd’s approach not both less complex and much more sane?
> And just drawing any hard lines would help. Even if you had to replace two modular components to replace one thing, or one component that does two things when it should be one, that's certainly a lot more feasible than having to understand and touch thirty integrated pieces to replace one component.
Why? If you can’t point to where the line is then what’s the point. It’s like saying you want cars to be more modular, so let’s just arbitrarily invent a “motor carriage.”
You could replace the engine without the coach, wouldn’t that be swell?
Anyway most of systemd’s components communicate over a common system bus. You could provide alternatives just by speaking the same API.
 Sorry, I’m not a native speaker; I mean this: https://en.wikipedia.org/wiki/Coach_(carriage) but with an engine instead of horse
If the distribution is supposed to micromanage everything from upstream then what's the point of upstream?
> Of their own volition. And btw, distributions could patch them to work with systemd. None of this is systemd’s fault. Since when is it upstream’s job to make sure downstream properly integrates their software?
Since when does everything have to integrate with the init system at all?
> There’s no DNS server in systemd core. It just lives under the same umbrella.
It isn't a matter of which repository it's in, it's a matter of how much work it is to swap it out. Can I just run dnsmasq or dnscache and change an IP address somewhere, or do I actually have to change the code because it's expecting something more than a general purpose DNS resolver?
> Why would you need “many choices” for a simple timer? What are you going to do, invent new type of time?
An existing implementation has poor code quality and I can do better, but my new implementation is less feature complete, so some people prefer the one with more features while others prefer the one that has fewer bugs and uses less memory etc. etc.
> Because old style init did so little and so poorly, cron used to be a de facto service manager. Also don’t forget inetd.
Which they still are, because they're still there and there is nothing stopping people from using them in that way as ever.
But runit et al don't require that either, so let's not pretend that there is no third way.
> Why? If you can’t point to where the line is then what’s the point.
Your argument was that it's hard to know where to draw lines. But it's more important that you draw them somewhere than the specific place where you choose to draw them. Otherwise everything mushes together into a single piece of spaghetti that can't be disentangled from itself.
> Anyway most of systemd’s components communicate over a common system bus. You could provide alternatives just by speaking the same API.
Where are the RFCs for these APIs, so that I can write my application against the spec and be assured that it will continue to work against future versions of the software on the other end?
Writing something better doesn't get rid of the dependencies other projects now have on pieces of systemd, which pieces then have dependencies on other pieces until you need the whole thing.
> I mean you’ll find literally anything to dislike about it, I don’t get it.
This thread is about one specific complaint: It has too many interdependencies without well-specified stable interfaces between them, and actively encourages things to take on more of them, as with replacing SIGHUP handling with systemd-run.
> The default makes sense for 99.99999% of users, literally the only point I was trying to make.
This doesn't make any sense. Most applications don't handle SIGHUP and are terminated by the default handler. Applications that do handle it continue to run. If they used systemd-run instead they would also continue to run. Where is the benefit from forcing applications to do something systemd-specific and breaking existing things that don't?
It's a rule: if you're advocating systemd, you don't get to accuse anyone else of forcing anything.
Take a step back and consider if say Windows did it like that, wouldn’t you agree it is broken?
Only if I have root permissions (granted, I probably wouldn't be watching porn on a machine I wasn't admin on but that was just an example application).
> instead of forcing confusing behavior on the other 99% of users that don’t want or need it
Who is forcing users to run programs with nohup or tmux shells?
> Take a step back and consider if say Windows did it like that, wouldn’t you agree it is broken?
I'm pretty sure Windows does do it like this; if I were to remote desktop into a Windows box and start playing a video, it should keep playing even if I disconnect, reconnect, and log back in. It does this for normal applications, at least, though videos are a special enough case where it might be accelerating with the remote GPU.
It doesn't take root to do so, in most cases you probably still want to run the transient scope under your user so you'd use systemd-run --user in order to create it not with the main system instance of systemd but with the user level instance of it.
>I'm pretty sure Windows does do it like this
No it doesn't, as for your remote desktop example you can have the exact same behavior on Linux with systemd reaping user scopes by just using a VNC server. Windows is different in that when logging off it won't allow you to while an application is still running. It gives you the choice to either stop and go back to whatever application isn't closing (because you have unsaved work or something) or to kill it.
If a non-root user can do it and leave a program running then doesn't that invalidate all that BS about security?
Ironically enough when I went to Google to search for an example the result that came up was my comments on HN on the same subject from a year and a half ago.
Here's a great example of the kind of real life breakage that reaping the user scope on logout actually fixes.
If you’re not an admin you probably prefer the systemd default. OTOH if you do need to run tmux between sessions you probably have root as well.
> Who is forcing users to run programs with nohup or tmux shells?
You’re forcing confusing behavior (media playing despite logging out) on unsuspecting users. This is unintuitive to to nontechnical users, and just “wrong” to most that know the reasons behind it. I haven’t heard any good technical argument for keeping this behavior, only that it should remain like that because a minority is used to it. Though you’re welcome to change my mind.
> I'm pretty sure Windows does do it like this; if I were to remote desktop into a Windows box and start playing a video, it should keep playing even if I disconnect, reconnect, and log back in.
If you connect and disconnect you are not necessarily logging out, it’s equivalent to locking the session, which does keep music playing on Linux/systemd, and btw even offers MPRIS2-based media control right on the lockscreen, at least for Plasma.
Also it can pause the music if you log in concurrently as a different user. This is because systemd (and PolKit) have a very sophisticated seat management built in. For example it treats you differently if you log in remotely or have a seat right at the console. It can offer different authentication mechanisms and permissions (e.g. you need root/admin to shutdown the machine remotely, but don’t if you’re physically at it). All of this is possible and configurable thanks to the work of Lennart and others.
The question at hand is only whether you make the default the behavior that makes sense to 99% of regular users or to the few loudest.
or better yet, read the release notes, it likely mentions this breaking change. (if not, that's a bug.)
Breaking compatibility is generally avoided to the utmost. Even security-sensitive things like TLS continue to support older, less secure versions to retain compatibility with peers that haven't been upgraded yet, much to the chagrin of everyone when they screw up the version negotiation, but better than the chicken and egg problem where nobody can upgrade until everybody has.
But the other point is that the claimed security improvement doesn't actually seem to be there in this case. They haven't made it so you can't have a program continue to run after the end of the current session, they've only changed what you have to do to make that happen, thereby breaking everything that did it the traditional way.
It's not a hypothetical situation, everyone on here has seen applications hang and have to be terminated. SIGHUP handlers are no different in this regard.
>What kind of complex and graceful shutdown does ssh-agent really need?
That's a straw man argument, and the whole point of SIGHUP in the first place instead of just some "persistence" bit set per process is because for real world applications it's not as simple as just kill -9 to stop a process. But for ssh-agent in particular it needs to go through and unlink the socket that it binds to on startup. More to the point it also has to go through and close every PKCS11 provider that is registered which means calling functions that aren't even in openssh to begin with so who knows if some PKCS11 provider will hang during that.
Perhaps with a signal handler?
See the enable-linger option for loginctl and KillUserProcesses for logind.conf. KillUserProcesses was set to default enabled on 4/9/2016, prior to that it didn't happen, but was configurable if desired. So you were always able to change the config to restore the previous behavior from the moment the default turned it on.
Here is the commit where it happened
So how can it be the default?
This is why we have distro vendors, to build a system that works in the real world with software from developers with opinions that... differ to say the least.
No, you were not.
The thing that people are missing here is that neither of the systemd-logind behaviours, with KillUserProcesses=yes or KillUserProcesses=no, is the long-standing behaviour of kernel login sessions all of the way back to 7th Edition that nohup, tmux, screen, emacs --daemon, mosh-server, deluged, and more all interoperate with.
The behaviour of kernel login sessions is that end of login session is a HUP signal to the session leader, and that termination of the entire TTY login service (such as at system shutdown) is a TERM signal to everything followed by a KILL signal to everything then remaining.
The systemd-logind session behaviour with KillUserProcesses=no is no signals at all at the end of the login session, and at termination of the TTY login service both HUP and TERM signals together then KILL signals, to everything.
The systemd-logind session behaviour with KillUserProcesses=yes is both HUP and TERM signals together then KILL signals, to everything, both at login session termination and at TTY login service stop.
As I pointed out years ago, the fix is to make systemd-logind use KillUnit at hangup and StopUnit at service termination, actually providing the conventional behaviour which it currently does not in any mode and addressing the original problems (with some background GNOME utilities in a login session that were never being sent a HUP signal at logout and would have exited had they been) that motivated this whole mechanism in the first place.
Systemd is basically SMF, done poorly, because NIH.
I disagree that POSIX says that processes should expect a SIGHUP when a user logs out (SIGHUP means the controlling terminal was closed). I am not at all a POSIX expert, so please correct me if I misunderstand, but afaict POSIX explicitly does not specify what happens to the controlling terminal when a user logs out (http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_...):
> POSIX.1 does not specify how controlling terminal access is affected by a user logging out (that is, by a controlling process terminating). 4.2 BSD uses the vhangup() function to prevent any access to the controlling terminal through file descriptors opened prior to logout. System V does not prevent controlling terminal access through file descriptors opened prior to logout (except for the case of the special file, /dev/tty). Some implementations choose to make processes immune from job control after logout (that is, such processes are always treated as if in the foreground); other implementations continue to enforce foreground/background checks after logout. Therefore, a Conforming POSIX.1 Application should not attempt to access the controlling terminal after logout since such access is unreliable. If an implementation chooses to deny access to a controlling terminal after its controlling process exits, POSIX.1 requires a certain type of behavior (see Controlling Terminal ).
IBM was explaining what to do back in 1995.
killing user processes on logout
TIL what nohup(1) is for.
By "use systemd's new demonization API" you mean, instead of
systemd asks you to write
$ systemd-run --scope --user screen
instead. Annoying to have to learn a new thing, but hardly the unbearable burden.
On the other hand, when you're an impacted user who's lost work, and researching the bug leads you to a years-old discussion in which someone is actively denying that the bug exists and refusing to fix it, that's infuriating.
Because it's a bug for some, and intended behavior for others. Look, you make it as if they introduced a bug on purpose to screw with some people. It's clearly not the case, there was a specific tradeoff involved.
They broke userland.
It doesn't matter what tradeoff they made - they went against POSIX behaviour, and as a result, broke numerous utilities, both past and future.
Let's say that again - systemd introduced breaking behaviour on userland, against POSIX, and instead of backing down and allowing for expected and specified behaviour, they said it's everyone else's problem.
That is neither professional, nor responsible.
When you make a mistake, a mistake that breaks the behaviour of POSIX, and POSIX utilities like _cron_, you apologise, and fix the problem.
You don't turn around and say that all the sysutils should incorporate your new idea.
Moreover, this doesn't affect cron at all. Cron creates its own PAM session for each job it runs which means those jobs are independent from any real login session (i.e. ssh, graphical, tty login), and thus also don't get cleaned up by them.
This affected stuff that is forked off a login session and then stays around as "orphan" if you so will, i.e. with all session resources released, except for these processes that try hard to avoid clean-up (usually by double forking + detaching explicitly from any TTY/ignoring SIGHUP).
I tend to agree with the idea that the choice of defaults belongs to the distro's. If the distro's are deferring to the upstream project on default settings for a critical system component then they need to be more thorough and validate what they are shipping.
Distro maintainers need to have a lot of knowledge about their init system. There's no way out of that. It's probably something everyone should know a little about as well.
Then maybe the init system should be simpler and not attempt to ingratiate itself with UEFI or attempt to replace su, sudo, syslogd, netcat, resolvconf, etc.
That alludes to kernel development, which systemd is largely uninvolved with. A userland program chosen by various distributions failed to support conventions from a different userland program. That's all. Were the programs involved fundamental and highly important to many users' experience? Sure. Is busting out "you broke userland" like some magical shibboleth useful as a means of your conveying your unhappiness that your distribution maintainers chose to replace a widely-depended-upon program with a different program useful? I think not.
> they went against POSIX behaviour
Which? There's "tradition" and "specified behaviour". Both are important in different situations and in different degrees.
> You don't turn around and say that all the sysutils should incorporate your new idea.
Why not? They're no more privileged by the POSIX specification, or by the user/kernel -space divide than any other program.
Intel, the kernel, even Chrome broke my userland by mitigating Spectre.
CRON was and is run as a system service, in its own scope. If you run your own cron instance, but forgot to set it up as a system service, yeah, it gets cleaned up as you exit your shell/session/scope.
So? "We don't break userland" is a Linux kernel thing. Systemd is not kernel, it's userland, and userland things break other userland things all the time. They already broke lots of existing stuff when they replaced /etc/init.d/ scripts with systemd definition files, should systemd also have not done that?
> It doesn't matter what tradeoff they made - they went against POSIX behaviour, and as a result, broke numerous utilities, both past and future.
Linux is not POSIX, so I don't see how that's relevant. For what it's worth, I don't even know what part of POSIX it broke. Care to enlighten me?
IMO when a user has logged out and has not had the permissions/foresight to setup a task in the system to run without a session it should be killed.
I get that this has not been the default behavior in linux/UNIX, but to me it seems like the sensible one.
And that's before we ever argue about the possibility to turn it off.
If you ruin everyone else's day, and change behaviour everyone else is expecting, then it's probably your own fault.
Approaching it as if everyone should simply change and do what you want, is the height of arrogance. You are generating work for others. And in this particular case, not only are you generating work for others, you are eradicating a category of software.
When a distribution adopts systemd, they let everyone know how things are changing, and slowly transition things over, releasing when stable.
We know systemd replaces init.d. It was difficult, but distributions using systemd got over that hurdle, but it did take time.
However, this is not the same.
Yes, systemd is userland, however it is also PID 1. It is a layer between most userland and the kernel, and so needs to reflect the responsibility of it's position.
Ignoring how NOHUP is supposed to be interpreted, is a _bad idea_, and yes, a violation of POSIX, specifically signals (SIGHUP and nohup), and how they are supposed to be handled.
Moreso, it greatly heightens the difficulty of many utilities that are expected to work.
Why should cron (all implementations of cron), suddenly need to rely on another userland library to maintain it's function?
You just broke most Linux automation. Across an entire industry.
Why should screen (all implementations of screen), suddenly need to rely on a userland library much bigger than most implementations, to continue it's base function?
You just broke an entire category of background systems - including systems communicating with embedded hardware. You might have caused a factory-floor fault. Which could cause injury, or worse.
A breaking change of this level can cause industry-wide ramifications that are not just limited to the digital. Unexpected behaviour is exceptional, and should take time and considerable thought before occurring.
Systemd has responsibility that no other userland system has. It's PID 1.
If they're going to require a massive change in process behaviour, then they are going to require consultation, awareness within the industry, and transition time. They should be working with distributions, aware of the man-hours they're generating, before they put something in place.
> The whole systemd battle, Rice said, comes down to a lot of disruptive change; that is where the tragedy comes in. Nerds have a complicated relationship to change; it's awesome when we are the ones creating the change, but it's untrustworthy when it comes from outside. Systemd represents that sort of externally imposed change that people find threatening. That is true even when the change isn't coming from developers like Poettering, who has shown little sympathy toward the people who have to deal with this change that has been imposed on them.
The posix violation is by design. If you think that posix dictates the wrong thing, then you will do something different and this is what Poettering has done. The fact that systemd has more or less been embraced by linux is an endorsement of his design philosophy, even if distributions reject specific features.
Design choices are fine - I can understand why systemd takes a different approach.
What I don't like, and completely disagree with, is systemd not working with the community they directly effect to reduce disruption.
Like it or not, the product is an industry standard, and so will be held to industry expectations.
Rather than turning around and requiring everyone to change, they could have said, "Sorry, we're making changes, here are some preliminary patches that could help."
Or a timeline for a breaking change, wherein they can negotiate with others.
I don't have significant issues with systemd's software, though some reservations about quality. My main concern, and it has been since the beginning, is that systemd acts without thought or conscience to the effects that they might cause.
They lack the ability to be a team player, despite creating an environment where people depend on them.
systemd's adoption rates is an absolute credit to it. They have some very good design thoughts, and those working on it have done some excellent work.
However, it would be better if they communicated with the people they effect, rather than letting the community be an accidental Q&A team when things go wrong.
They do get this right sometimes, but that seems to be the exception, rather than the rule.
They approached the init.d situation calmly, and slowly. They worked with Debian, and Fedora and others to make sure it would work without interruption or loss of quality.
They approached the sigkill situation like they were a kid who just learned how to light a fire and wanted to burn the library down.
From The Hitchiker’s Guide to the Galaxy, regarding the plans to destroy the Earth:
‘But the plans were on display …’
‘On display? I eventually had to go down to the cellar to find them.’
‘That’s the display department.’
‘With a flashlight.’
‘Ah, well, the lights had probably gone.’
‘So had the stairs.’
‘But look, you found the notice, didn’t you?’
‘Yes,’ said Arthur, ‘yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying “Beware of the Leopard.”’
Back in the real world: you built & shipped a system whose defaults were and are broken, and now you blame others for not enabling the DONT_BE_WRONG setting. You might as well blame end users for not becoming fully-versed with your code before their first login.
It’s not the users’ fault. It’s not the distros’ fault. It’s yours, and your project’s, for shipping code which breaks the user experience.
I appreciate your vision. It’s a good one. You’re a smart guy. But have some humility! Have a sense of your own limitations, and those of the distros and users who will use your code. You’re a human being; the distros are made up of human beings; your end users are … human beings. Think of them.
> Rather than turning around and requiring everyone to change, they could have said, "Sorry, we're making changes, here are some preliminary patches that could help."
> Or a timeline for a breaking change, wherein they can negotiate with others.
But they did exactly that.
They contacted the tmux mainteners and asked if some modifications would be possible to accomodate the new option (see poettering comment here: run things as child of systemd --user or just register a separate PAM session). If I remember correctly, it would not even have been the first special case in tmux ; there already is one for OSX.
The discussion was actually progressing nicely until the anti-systemd flooded it. I remember seeing posts in a lot of place urging people to comment on the bug report with specious arguments. The whole thing was kind of upsetting.
Your argument is way too impassioned to be just technical. You just basically accused Lennart of hurting people with no evidence whatsoever.
This sort of stuff really doesn't help.
It follows that when someone implements functionality that doesn't follow POSIX, POSIX has been violated.
There's nothing wrong with the statement.
You criticised the parent's language saying that "you don't violate a standard" because it "isn't a law". I was just pointing out that you do indeed violate a standard because it's a standard, and saying that does not add any kind of moral or passion value - it's just using the language the way it's intended.
No, you have the responsibility to check what the software you are installing does, and if you don't approve, change it or reject it. Or, don't check, and deal with it.
Systemd developers do not owe you working POSIX, working cron, industry wide working Linux automation, screen, separate userland for everything. They don't owe you anything. If you don't like their thing, don't use their thing.
So it comes down to "something changed which is absolutely extremely important for me but I would rather discuss about it for hours then take the few seconds to configure it". Especially since the new behavior is intended behavior and also has upsides for a lot of use cases.
So don't be ungrateful. Be happy that some people are really putting a lot of work behind the software you use daily FOR FREE and just configure the darn thing the way you like.
And last but not least, most people here (me included) are not in the position to complain so much about free software, unless they show some commitment to open source themselves.
Oh how I wish that was a course of action I could reasonably take in this instance...
The problem is now your scripts won't work on systems that don't use systemd. Shell scripts work on FreeBSD, but now you can't use them because they require systemd-specific code.
I am not necessarily anti-systemd in most respects (I like a declarative definitions of services and less shell script hell), but the fact that they keep trying to get people (including container runtime developers like myself) to use _their_ API rather than the preexisting ones is fairly "anti-social".
I am not trying to get you to use our APIs. You talking about the cgroups APIs again, if I am not mistaken? As I tried to explain again and again: if you want container runtimes to manage their own cgroups then just set Delegate=yes in the unit file of your manager, get your own cgroup subtree, and you can do below it whatever you want, you do not have to call into systemd ever. Not a single API call, no C call, no D-Bus call, nothing. You get your own kingdom if you set Delegate=yes, and systemd won't interfere with that. This is extensively documented.
I wished you'd actually listen to what I keep repeating to you. We tried to be really nice to container managers, knowing that they disklike systemd APIs, so we put a lot of work in making the delegation boundary clean, so that they can be entirely systemd agnostic beyond setting the Delegate=yes boolean in their unit file, but alas, we just keep hearing the same nonsense.
The LXC/LXD people btw did get this right: they manage their own cgroup subtree now, and systemd doesn't interfere, and they don't link to or do dbus calls into systemd either.
In runc we don't have a dedicated manager or long-running daemon. Yes, Docker and cri-o use Delegate=yes (so I am quite aware of this option) but that really doesn't help people who are using runc in their own user sessions or wrote their own wrapper and aren't aware of Delegate=yes.
I get that we are quite odd, and don't fit into a system-service model. After all of the back-and-forth with both you and Tejun (especially when it comes to "rootless" delegation -- which systemd only offers if you get a privileged user to delegate for you), I'm not sure that there's much I can do on this topic. I get that what I care about is not something you care about, but I would hope you accept that I'm not just being obstinate for the sake of it.
> Not a single API call, no C call, no D-Bus call, nothing.
Right, unless you need to set this up for someone else. And we have code that does this too -- I don't really recommend people use it, but it is necessary (and I'm pretty sure some folks at Red Hat use it based on how many bug reports they submit related to it).
Since systemd is managing the entire cgroupv2 tree (and the fact we can get around that for cgroupv1 appears to be seen as a design flaw by both you and Tejun), obviously we have to talk to systemd to do this type of thing. I just wish this wasn't the way it was done (and if cgroupv2 had a named cgroup concept -- which is what systemd needs for tracking services -- I would think that this wouldn't be such a pain-point).
I guess I'm just annoyed that we can't use "better rlimits" with "rootless" container runtimes because of all of this.
> I wished you'd actually listen to what I keep repeating to you.
I am listening, and I am aware of Delegate=yes and all of that history. But as I outlined above, I don't necessarily agree with it entirely. And unlike a lot of people around here, I don't think any of these pain-points are coming up because of malice or something stupid like that -- I just think we disagree on our priorities.
> We tried to be really nice to container managers, knowing that they disklike systemd APIs, so we put a lot of work in making the delegation boundary clean
Don't get me wrong -- I do appreciate that we have Delegate now (there was a period of several years where "systemd decided to reorganise the cgroup tree, un-containing my containers" happened on several occasions -- and Delegate solved those issues).
And from what I've heard from the LXC folks, you were quite reasonable about getting systemd to work inside LXC. Which is good to hear.
> The LXC/LXD people btw did get this right: they manage their own cgroup subtree now, and systemd doesn't interfere, and they don't link to or do dbus calls into systemd either.
We do basically the same thing. We just don't support cgroupv2.
Rather, a breaking change to everyone's scripts and processes for zero benefit.
EDIT: My reply was supposed to be to xyzzys's post below, not the one I apparently replied to.. sorry about that.
I agree that it might not be the most desirable default, but if that's the case, then the guilt also falls on the distribution maintainers, who either ignored the big bold letters in the changelog, or didn't bother to test the everyone's standard workflows before pushing to stable.
Based on Lennart's behavior, yes I do.
But frankly, 100% people would be fine with it if the default was left at no instead of changing it to yes. It's all about giving users a choice when a new feature is introduced, something Systemd developers understand only partially.
Not to appeal to self-authority, but I have been maintaining production Linux systems in large-scale environments since the late 90s. If there were a benefit that outweighed the unnecessary breaking changes, I would see it, even if I didn't appreciate it. There isn't.
You should stop and think before you assume that other people are incompetent, both because it would make you a better interlocutor, and as a bonus it wouldn't violate HN's principle of charity.
Of course, a defense of systemd's comically broken reaping behavior removes all necessity for assumption in this case. sysvinit at least consistently reaps on SIGCHLD -- systemd randomly reorders into the sd-event API and then does something random based on the order receipt.
Sorry, I assumed you're competent enough to figure it out, or at least look at the original sources where authors of the change explicitly explain the reason why they do it. Of course, since you assumed that they are incompetent, you didn't bother to do so, instead, completely uncharitably assumed that there's zero benefit for that.
Can we please stop misrepresenting the complaints against systemd? The only time I ever hear this "monolithic binary" argument is from systemd advocates. The actual complaint is about tightly coupling important features together. Not only does this make it difficult (often impossible) to replace individual components, when tight coupling happens at the (internal) protocol level, any replacement component ne4cessarily hast to implement a bunch of (sometimes unwanted) systemd baggage.
Busybox implements all of its features in single monolithic binary, but it isn't a monolithic design that tightly couples those specific components together. Replacing one of busybox's components is often as simple as removing busybox's symlink and installing the replacement. This isn't even a "Unix philosophy" issue. Even inexperienced designers shouldn't have as hard time Understanding why systemd is a monolithic design but busybox isn't.
systemd is a PID 1 program, it means it have to raise bar higher. When troubles begin, you would need tools to fix them, and if PID1 is crashed, you are out of luck. If system cannot boot into shell, you'd need to fix it from initrd shell. Or to boot other system, to fix this one. It sucks.
Linux kernel chases very high standards of reliability, because when kernel panics it is even worse than PID1 crash. Init system should follow the same standards as linux.
The bar is higher for pid 1 - if I were designing systemd I would have made a tiny pid 1 that just did message-passing to a more complex secondary process that could be restarted, or something, just to be safe - but I think systemd has empirically cleared the bar.
there's the classic case of the linux "debug" parameter: https://bugs.freedesktop.org/show_bug.cgi?id=76935
and the even more classic case of firmware loading events:
and while "all software has bugs" systemd really has the most annoying bugs (by virtue of trying to do everything core to the system) and always insists that they are features and we are backwards whiny geeks for complaining.
3AM, deep slumber, called out to look at a stricken server. Its problems included that systemd was frozen. Reluctantly I came to the conclusion that a restart was the only route forward. Cept, that is when you discover that the commands that have served you well for 2 decades don't work, as they are all wrappers for systemd, which has keeled over.
To this day, the `shutdown` man page, which I was checking in, makes no mention of how to resolve, tho in fairness the other commands (poweroff, halt, init) do. I discovered this after stumbling across https://github.com/systemd/systemd/issues/3282
If you find yourself stuck in the middle of the night, reading through docs to try and figure out how recover a machine with a crashed systemd, then `systemctl reboot -ff` or equivalent is what you are now looking for, the `-ff` being the key to "JUST £&*(ing RESTART THE MACHINE!!!".
Experiences like that, don't win you friends.
If this happened to me today with systemd I'd be up shit creek without a paddle.
Only recourse has been to reboot the instance from AWS dashboard.
I can’t get to the bottom of it because the tools don’t work when it’s down and there’s nothing there when it comes back up. I am not enjoying boiling to death in this pot of shit.
And then there’s the situation where it just won’t boot. I just fire up a new instance then because it’s easier than debugging it.
No, I have not. But I have seen how systemd gracefully failed to boot system to login, with good looking colorful error message. Something that reminded me "Keyboard is not found. Press F1 to run setup."
But OP was asserting that systemd crashes under normal operation because its pid 1 is too fragile, which is very different. At scale I already expect that there's a chance a machine won't come back if I reboot it - it's annoying if I can't ssh in, but, well, I already lost a disk I care about and it won't return to service and I need to fix it anyway. (And it's an easy fix, just add "nofail" to fstab.) At scale I don't expect init to crash under normal operation.
rm -rf --no-preserve-root /
The configuration files should have set that to read only after boot.
The kernel patch where this was fixed can be found here:
(I have been hit by the same issue on my private notebook, but I have procedures in place to cleanly recover from failed upgrades on all systems, so it was not a big deal.)
But it does happen to other people.
And there was one crash that made the headlines.
You see, that's the argument I hear a lot from Systemd advocates. The problem with anecdotal evidence is obvious. When you hear people opposing Systemd, practically all of them have some real-life issues with it, often related to functionality that would otherwise be non-essential (i.e. doesn't really need to be handled by PID 1). Of course if you don't have a particular problem, you don't feel it's important. That's precisely the attitude people resent.
Yes, but a lot of people have real-life issues with it on their desktop of the form "It's too complicated." I'm asking specifically about real-life issues on production servers at scale. There will of course be tools that are poorly suited for a personal machine (even a personal server) but well suited for a team that wants to run a bunch of reliable servers.
For instance I would never be happy running RHEL on my desktop, but that doesn't mean RHEL is useless.
I just count my blessings that runit is widely packaged in every major distro because it can just happily sit on top of sysvinit, systemd, upstart, pretty much any init system and does things in a very simple shell script style, I really wasn't a fan of the weird ini-like format for systemd or several different tools I'm expected to learn just to read my (now binary) log files competently.
If you're sick of switching init systems constantly or don't want to have to write separate scripts for your linux box and your freebsd box even, I highly recommend checking runit out.
I'm sure I'll give it a serious shot eventually... in about 3 years once they work all the Poettering kinks out, just like PulseAudio. They're doing some cool things with cgroups and stuff, so I hope it gets there eventually.
If you've ever tried to use systemd inside docker to bring up a couple of services, you would know the hoops you have to jump through to get it working.
(I understand that docker wasn't invented to run multiple services in the one container, but sometimes it can't be avoided and simplifies app deployment vastly I.e, using CI to test your service actually starts up as per its definition: just run up a quick docker image with runit and a service definition file)
Good luck. If you need anything more than "I play a three minute song" on Linux audio you need both some type of real time kernel and jack.
I've seen the BSD talk on this and I agree, having a system layer is helpful. It'd be nice if it was plugable, NetworkManager (or others that have some standard messages you can send/get via dbus), consolekit OR logind, etc.
systemd does make it nice that I only have to write startup/shutdown scripts once for each distro, but I'm not happy with the layout of target files, the way mounts are handled, some of the weird race conditions I've found between systemd mount targets and fstab, etc.
systemd is modular, but the modules are still all part of the whole and are not easily replaceable. The same can be said when Docker went to a modeler refactor, but there are alternative implementations of the entire docker engine. Every attempt to create alternative implementations of systemd have eventually gone unmaintained because systemd keeps getting more and more complex and engulfing more systems.
If it wasn't for distros like Void, Gentoo, Alpine, Slackware, et. al, we'd no longer have a choice at all. There would be some things that simply couldn't be deployed on embedded systems because all of the dbus shims just wouldn't exist.
It's not that people are opposed to change, it's that there are legit concerns about some of the ways systemd works and is implemented, and the way it's been ham-fisted as a political move in a lot of ways.
Honestly, I don't think it will matter in a few years. I think the way things are going, eventually all services will be hosted via docker containers and it will be much easier to make Linux distros that have a tiny init layer that just launches a docker daemon and services. RacherOS already does this, with the init process being a container, which can be uses to start up shell environment containers and other service containers.
I personally think the industry needs a lot more resistance to change when it comes to interfaces and other things humans have to understand.
I mean, I'm not talking about systemd in particular; I'm talking about in general about how interfaces change over time and people don't seem to take into account the cognitive costs of that change. Sure, ss is better than netstat and IP is better than ifconfig... but how much of that 'better' could you have done in a way that didn't toss away the historical knowledge so many people have of those tools?
And really, sysadmin tools are the least of it; I mean, they are operated by professionals, so if you want to pay for retraining (or pay the costs associated with there being fewer of us)
People change customer facing interfaces to no benefit all the time, forcing people who are trying to do other things to put effort into re-learning their interface.
I mean, my point is that interface changes are expensive, and should not be undertaken without a really good argument that they bring more benefit than the cost of retraining.
netstat? /proc files? ss? parsing text? wtf?
I mean, sure why not, but at least don't call them interfaces. they are userland apps people like to script, because they are lazy to use libnetlink (or libwhatever thay uses the right kernel interface, if it exists at all).
That said, the recent gmail ui change made me reconsider Thunderbird again. And android looks different every year. sometimes it's better, sometimes it's worse. iptables, nftables. http1, http2 (and now 3 over UDP). change is the only constant.
Text processing is not harder than figuring out what library to use this month.
These things change a lot... but they don't have to, and running things on computers would be easier/cheaper if they didn't.
The idea that you are searching for is coupling. Modular systems should aim to have low coupling and high cohesion.
Additionally this concept of stateless container design and state kept in containers there are opposing implementations
But that said, does anyone know where on earth they came up with the command line ux? Like the names of the commands , and the parameters? I mean, they are like an April fool's joke...