
Escape from System D, episode VI: freedom in sight - _emacsomancer_
https://davmac.wordpress.com/2019/12/18/escape-from-system-d-episode-vi-freedom-in-sight/
======
xiaomai
As someone who has been using Linux full time on
servers/desktops/laptops/embedded devices and any other computing device I've
had since 1998:

Systemd is such an awesome improvement over everything that came before it. It
benefits me on a daily basis. The power and ease of configuring services is so
far beyond sysinit (and let's be honest, most of us weren't able to write our
own init scripts back then and were resorting to stuff like rc.local or
inittab anyway which were inadequate solutions).

Timers are generally better than cron. I love that systemd does all the stuff
I used to have to do on my own like balancing when jobs run and preventing DST
issues.

I like journalctl. I know it bothers some people that the logs are stored in
some kind of binary format but the practical difference of journalctl is that
logs are easier to filter in a variety of ways.

Anyway, I think making a init system would be a fun hobby too, but the world
is a better place because of systemd and I am sick of the belly-aching about
it.

~~~
correct_horse
I agree that systemd gets way more hate than it deserves, but it has continual
problems on Arch Linux. Wherever you try to shutdown without logging out
first, systemd says ”a stop job is running for user manager...", then waits 90
seconds. The issue keeps getting fixed, then resurfacing. People with this
same problem are all over the internet.

~~~
zaarn
In my experience, that happens when a user process is ignoring the SIGTERM
signal (ie, a word process or browser wanting user confirmation before
shutting down).

Generally, I personally don't mind it and I set the timeout to 30 seconds,
which helps.

There is probably no good solution unless systemd starts to send SIGKILL
instead of SIGTERM to kill without questioning why, but that would break
everything else.

~~~
alxlaz
Just to play devil's advocate: _prior_ to systemd, nobody had this problem. In
20 years of Unices I can't remember a single time when I couldn't shut down my
machine because a user process was ignoring SIGTERM (or really, because a user
process was doing _something_ ). I'm not sure what "everything else" there is
to break...

I actually enjoy the extra flexibility that I get out of being able to tell
systemd how to terminate processes. On the systems I'm _building_. But it's
something that could certainly be improved. It works right for my use case
(embedded systems) but I hate it when it does it on my laptop or my home
machine.

The whole interface around it is silly. It tells you that a job is running,
but it doesn't tell you what that job is, what that process is, and unless you
enable the debug shell, there's no way to check it or kill that process
either. All you get to is stare at the thing counting down from 90, and in
some cases when it reaches 0, it starts counting down again instead of
rebooting.

~~~
zaarn
Well, before it wasn't a problem, largely because the init system just
murdered everything that didn't drop dead when told to (ie, your desktop apps
will be SIGTERM'd along with the X Server). Systemd certainly takes a more
safe approach.

Atleast in my experience, systemd certainly tells you what is hanging. The
issue with "timer reset" is that some service that is hanging is also blocking
some other service from shutting down (either because the hanging service
depends on it or the blocked service needs to be shut down after the hanging
service). Old Init systems didn't care either, if the system was to shut down,
it just killed in order of runlevel so stuff like "The database was corrupted
because init forcibly killed the NFS mount too early" was certainly possible.

~~~
alxlaz
Edit: first and foremost -- sorry, I went on a trip down the memory lane
because you mentioned NFS and I forgot to mention the most important point.

That message, and this approach, is 100% inconsequential, and it doesn't make
anything safer. IIRC, last time I checked, virtually none of the distributions
enable the systemd debug shell. If you see that message, you can't stop the
application manually. Ctrl-Alt-Del doesn't work, either, and if systemd can't
recover from that state -- which, depending on what's actually happening and
on the configuration, it sometimes can't -- all you get to do is press the
reset button anyway.

If an application is hung for a _legitimate_ reason, like, there's unsaved
work, you can't go back and save it anyway. All you can do is wait for 30
seconds for the system to reboot and lose the data anyway (just 30 seconds
later).

That aside, I've debugged my fair share of corruption, back when journaling
filesystems were new and NFS was way worse than it is today. I don't want to
go back to those days, either, and I'm glad to have a safety switch for
sensitive setups, or for misbehaving/legacy applications.

But no one has a problem with _that_. People have a problem when they don't
have anything mounted over NFS (that, by the way, is also something that
occasionally caused systemd to hang at one point) and _still_ can't reboot
their machine.

It's not like we had to wait until 2010 to figure out that init systems need
to handle these things carefully. Nobody compares this behaviour with the
golden age of Ultrix and says oh, right, thank God for this, it's a minor
inconvenience but at least my files are safe. Everybody compares this with
what's available on every other platform that's popular today, not in 1996 --
namely a "Force reboot" button that reboots your machine and virtually never
results in data loss.

~~~
mlyle
> it doesn't make anything safer

Why does a program given SIGTERM not quit? Is it because 1) it's still busy
desperately trying to save something large and unusual or over a network, 2)
it's antisocial and ignores SIGTERM, or 3) it's hung for some permanent
reason?

It's hard to tell from the outside.

Most of the time it's the second or third, but the behavior occasionally saves
data in the first case.

~~~
alxlaz
I mean... yes, in theory that would be the case but that's really not what's
happening in this particular case.

(Edit: First _est_ of all, actually, more often than not it's actually 4) the
program is waiting for a user's confirmation, or requires some user action to
exit, but if you see that message, you can't do it anymore.)

First of all, not being able to distinguish between these three cases is
precisely what makes virtually everyone who's seen this a couple of time hit
the reset button as soon as they see it. That doesn't save any data.

Second, if a program is really busy desperately trying to save something large
and unusual over a network _and you see that message_ , there's a good chance
it won't be able to save anything anyway. The network is usually down by that
time, for example.

There's also no way for a program to signal that it's done. If, indeed, a
program needed one more second to write everything, it's cool, but you still
get to wait 29 more seconds.

And finally, processes ignoring SIGTERM is really just one of the reasons why
this happens.

I think all of us in the FOSS community should be more charitable when
discussing these things. systemd is a big step ahead but it's not like we had
to wait until 2010 to figure out that init systems have to tread carefully
during shutdown to avoid data corruption.

Also, people who read HN usually understand that killing processes can cause
data corruption under some circumstances. Just sayin'.

~~~
zaarn
>First of all, not being able to distinguish between these three cases is
precisely what makes virtually everyone who's seen this a couple of time hit
the reset button as soon as they see it. That doesn't save any data.

It saves data if you don't mash the reset button at first sight

>Second, if a program is really busy desperately trying to save something
large and unusual over a network and you see that message, there's a good
chance it won't be able to save anything anyway. The network is usually down
by that time, for example.

Systemd won't bring down the network until all services that depends on the
network have been stopped. The desktop environment doesn't unless the home
directory is a NFS mount.

>There's also no way for a program to signal that it's done. If, indeed, a
program needed one more second to write everything, it's cool, but you still
get to wait 29 more seconds.

A program running as a service can override the timeout in it's unit file.

>And finally, processes ignoring SIGTERM is really just one of the reasons why
this happens.

Correct, processes who ignore SIGTERM are the real problem, processes that
take a long time are a secondary culprit. But systemd can't do much if the
devs of software refuse to read documentation on signals when writing handlers
for them.

------
Nextgrid
The lack of "monoculture" (whether systemd or something else - which the
author says is a good thing) is the reason Linux software distribution is
still a shit-show and every single distro has their own, incompatible way of
doing things so you can't just provide a single binary and get on with your
day like you can with Windows or Mac.

Systemd might have its flaws but I'm so grateful that a service/unit file will
work on pretty much any distro and that my knowledge of it will serve me on
pretty much any modern Linux system, as opposed to years ago where every
distro had their own unique way of managing services.

~~~
turbinerneiter
This is especially true since any init system which sets out to "escape from
systemd" will eventually end up doing everything that systemd does, at which
point it will be as complex as systemd.

HOWEVER - the Linux world has lived with these problems for ages and I deem it
also a strength. You can switch around components, mix and match and it will
mostly work. You can run KDE apps on GNOME no problemo. I think this is
something we should consider a positive. If we can unify around a common
service file language, we can swap out service managers easy.

~~~
jcelerier
> You can run KDE apps on GNOME no problemo.

well you can also run KDE apps on windows and macOS so it would be quite the
shit show if they did not run on GNOME :-)...

~~~
turbinerneiter
That's ... a good point.

------
miloshadzic

      Dinit has been booting my own system for a long while, and other than a
      few hiccups on odd occasions it’s been quite reliable.
    
      Ok, compared to Systemd it lacks some features. It doesn’t know anything
      about Cgroups, the boot manager, filesystem mounts, dynamic users or
      binary logging. For day-to-day use on my personal desktop system, none
      of this matters, but then, I’m running a desktop based on Fluxbox and
      not much else; if I was trying to run Gnome, I’d rather expect that some
      things might not work quite as intended (on the other hand, maybe I
      could Elogind and it would all work fine… I’ve not tried, yet).
    
      On the plus side, compared to Systemd’s binary at 1.5mb, Dinit weighs in
      at only 123kb. It’s much smaller, but fundamentally almost as powerful,
      in my own opinion, as the former.
    

I applaud the OP for writing a new init system, and in light of that, the few
paragraphs above serve as a good counterpoint to everyone writing how systemd
does too much, is doing everything etc. In the past several years it really
has been insufferable to be in the vicinity of any discussion related to
systemd/init systems.

------
jakeogh
For anyone looking for an out, but wants to keep using linux... Gentoo. You
can even use it with systemd. It's all about choice.

Follow the handbook, it's worth it. Once that gets too tedious, checkout
[https://github.com/jakeogh/sendgentoo](https://github.com/jakeogh/sendgentoo)

~~~
enriquto
There are many distributions without systemd. Voidlinux and slackware have
rolling releases and are very clean, for example.

~~~
jakeogh
That's the point. Gentoo isnt one of those. You can use it with systemd, it's
supported.

Slackware(1993!) has been around longer than Gentoo(2000), but USE flags,
slots, _choice_ and control over the compilation stack are pretty nice.
NixOS's package manager and portage hopefully merge sometime in the future. I
havent tried Void.

We didnt need to have the init system debate, it was never a possibility that
Gentoo would be monoinit.

------
LessDmesg
I'm a Gentoo user and your problems with "systemd" look quaint to me. I mean
I've heard about such a thing as "systemd" that turns Linux into Lindows, but
I've no idea why anyone would use it when an awesome Linux (a real one, not
Lindows) is within reach...

~~~
n0rbwah
You know Lindows used to be the name of a Linux distribution whose main focus
was trying to look like windows and offering some compatibility with windows
through wine?

It was still a Linux distribution and back then, no one had ever thought of
systemd.

Saying using systemd turns Linux into Lindows just shows how ignorant you are
of linux history, of that "systemd" thing you've just heard of and of what
makes Linux and Windows different.

Being a Gentoo user doesn't make one into a Linux guru. In fact, starting your
comment by stating you're a Gentoo user as if it will give more weight to what
follows is just plain arrogant. It shows that you're definitely _not_ a linux
guru, what distribution you use/have used don't matter at all for that.

I feel for Gentoo users, people like you make them look bad.

~~~
mongol
Lindows, Corel Linux, yes those were the days...

