> *replacing a cumbersome collection of shell scripts that previously were used ...

ahartmetz · on March 31, 2021

In addition to the gratuitous tight coupling, the quality of implementation of most of these system service replacements is pretty bad! NTP and DNS clients are clearly worse (more limited and more error-prone and buggy than the separate packages they replace). The syslog replacement is only not a major issue for me because my distribution (K)Ubuntu has it configured to produce plaintext log files in addition to the binary files.

I also have a problem with socket activation. There is a reason why (x)inetd fell out of favor - its whole purpose was to provide socket activation. It's a hack that can produce new error states and it can actually slow down startup compared to starting services proactively.

That said, the OOM killer daemon seems to be a good thing.

vetinari · on March 31, 2021

systemd-resolved and dnsmasq are the only two clients that can properly route DNS queries via the proper interface to their respective authoritative server based on the resolved hostname itself.

It would be hard to argue, that systemd-resolved is worse/more-limited/more error prone/buggy than dnsmasq.

The socket activation is intended for services, that are not normally needed, but occasionally they might be. It doesn't apply much to servers serving well-defined services, but for desktop, it is a godsend (together with dbus-activation). Just because you don't have use for it for your use cases doesn't mean it is not useful for other people.

cout · on March 31, 2021

> It would be hard to argue, that systemd-resolved is worse/more-limited/more error prone/buggy than dnsmasq.

I'll take up that argument. On Ubuntu 14.04 (and I think also 16.04), systemd-resolved used to crash on my desktop machine after about a week of uptime. I never figured out the reason; I got in the habit of manually pointing `/etc/resolv.conf` at a real server whenever it happened.

I don't remember ever having that problem with dnsmasq, but I've seen so many DNS proxy things over the years that I can't keep them all straight. I think it's probably fair to compare systemd-resolved and nscd; both were there by default doing something with DNS, and both crashed often enough to annoy me.

It's not a new problem, either. I remember whatever came with RedHat or Mandrake or whatever I was running circa 2001 also seemed to stop working after a while. I ended up writing my own DNS proxy in C++ and ended up learning a lot. The protocol is wildly different enough from almost any other common protocol that I'm not surprised when software gets it wrong (I sure did!). The takeaway for me is that designing a protocol is as much a human problem as it is a technical one.

nijave · on April 1, 2021

Yup, not-too-long-ago we setup OpenVPN at work with split tunneling which required DNS to work correctly. Windows worked out of the box (except for a small pool of users who had their DNS manually set which was a checkbox fix). macOS was more irritating because OpenVPN didn't native support the DNS API and it turned out the defacto way was Tunnelblick third party software.

Linux... Absolute nightmare. Every system had a slightly different DNS stack and trying to get queries to route over the VPN interface was an acrobatic exercise in docs navigation and trial and error. Ultimately, newer desktops with systemd-resolved were the easiest

systemd-resolved supporting DNSSEC is also a big win!

ahartmetz · on March 31, 2021

Well. This http://0pointer.de/blog/projects/socket-activation.html sells socket activation as the best thing since sliced bread, including for boot time. It has its uses for some rarely used services and for some startup order problems, yes.

About the DNS service, I remember a bug where the fallback behavior was not what people wanted (it was something that seemed to be a good idea to systemd developers but didn't work well in completely reasonable real-world configurations), and the developers rejected the idea that systemd's behavior was wrong. Can't find it right now. I have disabled it on my system as well due to a problem I don't remember. Maybe it didn't use my router's DNS server and went straight to the internet. Maybe I just wanted it to leave my resolv.conf alone.

vetinari · on March 31, 2021

> About the DNS service, I remember a bug where the fallback behavior was not what people wanted (it was something that seemed to be a good idea to systemd developers but didn't work well in completely reasonable real-world configurations), and the developers rejected the idea that systemd's behavior was wrong.

It was a fallback DNS used when the user has configured nothing.

Not only that, it was a default fallback that can be changed by both the distribution package and the user (including disabling completely).

It was a tempest in the teapot, by people who have to find some drama.

ahartmetz · on March 31, 2021

I think I remember what it was now. Local configuration (DHCP?) contained two DNS servers. systemd-resolved switched to the global fallback when it tried one of the local servers and it failed, instead of falling back to the other local server first. It's a completely reasonable configuration to have two local servers for redundancy as well as load balancing, and systemd-resolved broke the redundancy aspect. It was not about a "wrong" global fallback, but about using it when a local alternative still works. It breaks local network name resolution.

vetinari · on March 31, 2021

If it behaved like that, yes, it is a bug that needs to be fixed.

I don't remember that (not saying that it didn't happen, after a quick search I didn't find any filed issue, #18769 could in theory have similar symptoms), but I remember several threads, including LWN article about the existence of the global fallback.

mst · on March 31, 2021

It can both be true that there are a bunch of annoying things about systemd and their response to genuine bugs sometimes leave something to be desired, and that there's a lot of hate for the sake of hate directed at it.

This, naturally, just makes everything harder to have a sensible conversation about.

totony · on April 1, 2021

What you may be referring to is systemd treating resolv.conf entries as identical DNS servers, where they used to be processed sequentially. Previous discussion: https://news.ycombinator.com/item?id=15228940

It's a breaking change for some users. I remember experiencing that bug.

vetinari · on March 31, 2021

>... ntpd, DNS, dhcpd

These are separate services, with separate binaries, with separate packages for most distributions and their use is optional; they have their uses. For example, most distributions won't use systemd-networkd, because it is (intentionally) quite limited and they use NetworkManager anyway. RHEL8 for example doesn't even ship a package systemd-networkd.

throw0101a · on March 31, 2021

> These are separate services, with separate binaries, with separate packages

Then why are they in the same repo as the udevd code? And why did udevd have to be pulled into the same repo as the init code when it was doing just fine outside of it?

vetinari · on March 31, 2021

Probably because it was not doing just fine outside of it.

If you think they took the wrong approach and yours is the right one, show them by doing.

throw0101a · on March 31, 2021

> If you think they took the wrong approach and yours is the right one, show them by doing.

If Roger Ebert said a movie was awful, did people expect him to write, produce, direct, and/or act in one?

If Doug DeMuro said a car handles badly, are people expecting him to get a mechanical engineering degree and build a better one?

faho · on March 31, 2021

Ebert did write a movie: https://en.m.wikipedia.org/wiki/Beyond_the_Valley_of_the_Dol...

throw0101a · on March 31, 2021

Yes, I know. :) And that would probably be the level of quality of any code I tried to write. :)

vetinari · on March 31, 2021

So what weight should throw0101a's word have? Why?

Since throw0101a has exactly zero public record of understanding the topic at hand, and everything he provided is an opinion, why should be that opinion taken into account? What exactly is his point, except for demonstrating that throw0101a doesn't like the existing approach?

The show them better is one way to establish your credentials that you know what you are talking about. There are other ways to achieve similar effect, but peanut gallery isn't it.

throw0101a · on March 31, 2021

> So what weight should throw0101a's word have? Why?

About the same as any other rando's on the Internet. Take it or leave it. Up or down vote me.

As I type this: I have 23 imaginary Internet points on the post that kicked-off this sub-thread, and 0 points on a post bringing up Ebert and DeMuro.

Whatever.

vetinari · on March 31, 2021

Do you do the discussion for imaginary internet points?

I'm neither down nor up voting you; I consider doing that to people I discuss with a bad form (yes, that's opinion too).

However, it helps the discussion if there are arguments for the positions of those discussing; the point is not for either side to "win" (whatever that means and however you measure that "win"), but to find the best outcome after considering all valid arguments.

However, that does not work when there aren't any arguments. Vaguely liking or not liking something, and having no idea about the

throw0101a · on March 31, 2021

> Do you do the discussion for imaginary internet points?

That's what Bitcoin is and people assign value to those useless bits don't they?

What's the point of life if you can't fret over magnetic ones and zeros?

vetinari · on March 31, 2021

The value of any currency is in that, that you can get material stuff in exchange. The biggie historically used to be to be able pay the taxes, today getting extra heaps of atoms delivered to your doorstep does the job too.

Quite difficult to do with imaginary internet points tho ;)

throw0101a · on March 31, 2021

> Quite difficult to do with imaginary internet points tho ;)

There's a market for established Reddit accounts:

* https://blog.usejournal.com/what-i-learned-selling-my-reddit...

Blikkentrekker · on March 31, 2021

Most of these services have a dependency on the service manager and can't run without it.

For other services, such as journald and udevd, the dependency also works in reverse and the service manager can't run without the services.

In many cases, it's not entirely clear why they depend on the service manager itself. logind in particular was moved into systemd “in anticipation of the single writer cgroup architecture", an architecture that never came to be because it was quite clearly a very bad idea, and then elogind was forked to separate it again with no loss of functionality.

There is no reason for logind to exist, elogind should be the only thing that exists as it does the same thing without depending on systemd; elogind can also be used in conjunction with sytemd with no loss of functionality.

Even stranger things happen, such as DBus performing activation viā systemd by use of a private, nonsstandard a.p.i. for which there are standardized protocols specified by Linux-base that systemd also supports. That they chose to use a specific unstable, undocumented a.p.i. rather than a standardized one to do this is certainly a political rather than technical decision to create a dependency for it's own sake.

And that is indeed what many RedHat projects have done over the years; they have created dependencies on each other of little to no technical merit as form of product tying to encourage adoption of more RedHat software. — this is certainly not limited to sytemd.

candiddevmike · on March 31, 2021

What part of networkd do you find limited compared to other tools?

vetinari · on March 31, 2021

NM can manage VPNs and modems (LTE, etc) for example.

Don't get me wrong, systemd-networkd is fine for server or static usage; NM is better for desktop/laptop use.

BenjiWiebe · on March 31, 2021

I found wireguard tunnels easier to set up in systemd-networkd then NM. systemd-networkd can handle VPNs for at least some use cases.

ragnese · on March 31, 2021

I definitely understand that criticism w.r.t. all of the systemd-verse replacement tools.

I'm not an expert. I'm just a software dev and free software nut who wants a free software OS.

But (serious question), why does it matter that systemd has an ntp daemon or that it subsumed udev?

You say, disparagingly, that it is tightly-coupled and opaque (binary).

Why is loosely coupled better in these cases specifically?

Weren't `mount`, `ntpd`, `udev`, etc all compiled C code already? How does systmed make those service more opaque?

Everyone seems to hate journald. I'll go ahead and join in the chorus and say that it sounds really bad. But do we have to throw the baby out with the bath water? What else about systemd is actually worse than what we had before?

throw0101a · on March 31, 2021

> But (serious question), why does it matter that systemd has an ntp daemon or that it subsumed udev?

What does a time system have to with system start infrastructure? If you want to write a time daemon, write a time daemon: the chrony folks did, and it's pretty good. Lots of folks have switch away from the old school NTPd to it.

But it stood on its own merits, was able to mature over time, and when people thought it good, they could willing start using it (or not).

> Everyone seems to hate journald. I'll go ahead and join in the chorus and say that it sounds really bad. But do we have to throw the baby out with the bath water?

And that is exactly my point about things being tightly-coupled. If I (rightly or wrongly) perceive journald to be garbage, how do I not-use it?

tkfu · on March 31, 2021

> What does a time system have to with system start infrastructure?

Accurate time is vital for many cryptographic operations. For example, in the embedded space you may want to have a check that your current disk image (ideally checked by dm-verity) is actually up to date. And to verify the metadata signing that disk image using Uptane/TUF/SUIT or similar protocols you need verifiable time. Or you might need to contact a provisioning server at boot time, and need to be able to verify that server's TLS cert.

(Now, systemd used to have a bug[1] where time-sync.target would be reached before time was actually synced, and Poettering's response to the bug was the all-too-typical typical "yeah, don't worry about it, it's intended behaviour that the time-sync target doesn't mean that time is synced"[2]. But that did get fixed a couple years ago [3].)

[1] https://github.com/systemd/systemd/issues/5097 [2] https://github.com/systemd/systemd/issues/5097#issuecomment-... [3] https://github.com/systemd/systemd/pull/8494

admax88q · on March 31, 2021

Thats a pretty uncharitable take on what Poettering said. IMO hes right that in many cases you dont want to hold up booting on whether or not time has been synced. I dont want my desktop, or most servers to not boot if the network or NTP server is down.

He also suggests what they can add for people who want this behaviour.

Oh woe is me that systemd does not support my use case of an embedded system needing to verify disk image with accurate time before booting further, as the _default_.

growse · on March 31, 2021

> > But (serious question), why does it matter that systemd has an ntp daemon or that it subsumed udev?

> What does a time system have to with system start infrastructure? If you want to write a time daemon, write a time daemon: the chrony folks did, and it's pretty good. Lots of folks have switch away from the old school NTPd to it.

Nothing, but you're making the common mistake of thinking that systemd is an init system. It isn't. It's a suite of components that more or less can be enabled independently, that no-one's forcing anyone to use. One of these components is an init. Another one in a time sync daemon, etc. etc.

Chrony's great, but if you don't need most of its pretty advanced features, why bother? A simpler, smaller, 'good enough' client might be more suitable.

Isn't software choice a good thing?

throw0101a · on March 31, 2021

> Isn't software choice a good thing?

Yes, but tightly coupling can reduce the ability to make a choice.

growse · on March 31, 2021

I'm not sure how that's true in this case.

I personally need the features of chrony, so despite running debian stable / systemd everywhere, I just turn systemd-timesyncd off.

Tight coupling would be "you can't boot your system unless you use this time sync daemon implementation".

dane-pgp · on March 31, 2021

There are definitely degrees of freedom, and I wonder if it would make sense to extend the Free Software Definition[0] backwards further by adding:

* The freedom to not run any program you do not wish to run, for any reason (freedom -1).

Presumably Stallman would agree that government-mandated spyware which happened to be GPL licensed would not really be granting users much freedom. The tight coupling of "you can't boot your system unless you use this time sync daemon implementation" might also count as a violation of this hypothetical freedom.

Applying this to systemd is more complicated though. I'm sure some people would say "Well it's released under a Free Software licence so you can remove or replace any part of the code you don't want.", but that seems almost as unhelpful as saying "Microsoft Windows is Free Software because you can always create a clean-room reimplementation of it".

[0] https://en.wikipedia.org/wiki/The_Free_Software_Definition

ragnese · on March 31, 2021

>> But (serious question), why does it matter that systemd has an ntp daemon or that it subsumed udev? > >What does a time system have to with system start infrastructure? If you want to write a time daemon, write a time daemon: the chrony folks did, and it's pretty good. Lots of folks have switch away from the old school NTPd to it. > >But it stood on its own merits, was able to mature over time, and when people thought it good, they could willing start using it (or not).

But my question was why does this matter? As far as I know, you don't have to use systemd's ntp thing. You might disagree with the project vision of systemd wanting to include an ntp daemon, but it sounds like it hasn't cause you or the chrony people any real harm. There was a good chance that, if systemd didn't have an ntp daemon, you'd have to replace your distro's default with chrony anyway. So just do that here, too, no?

> And that is exactly my point about things being tightly-coupled. If I (rightly or wrongly) perceive journald to be garbage, how do I not-use it?

You can't, AFAIK. And that sucks. I know you can disable its on-disk logs and enabled some other syslog service, though. In what way(s) is that insufficient?

Are there any other example of tight coupling that are bothersome in systemd?

Blikkentrekker · on March 31, 2021

> But (serious question), why does it matter that systemd has an ntp daemon or that it subsumed udev?

> You say, disparagingly, that it is tightly-coupled and opaque (binary).

> Why is loosely coupled better in these cases specifically?

It creates extra work for others.

udev is far older than systemd; maintainership of the former was eventually inherited by one the latter's lead maintainers, who decided to move the latter into the former, at first promising that the former, which has applications reaching far outside systemd, could still be built and used independently.

That promise weakened after about two years and it became increasingly difficult to build and use udev without systemd; — some embedded systems that use udev can't even fit systemd into memory and thus using it is not an option.

So, udev in response was forked to eudev by other developers who now have to spend time and money on maintaining this fork and porting udev changes to it.

Loosely coupled is better because it allows one to be use without the other if there be such a need; tight coupling is essentially a form of product tying.

Why is it better that one can use an iPhone on any computer? rather than only computers designed by Apple?

> Everyone seems to hate journald. I'll go ahead and join in the chorus and say that it sounds really bad. But do we have to throw the baby out with the bath water? What else about systemd is actually worse than what we had before?

Many call for logind to replace acpid; — I strongly object to this and the difference in functionality between both highlights well the difference in philosophy between traditional Unix design and Lennart's brand.

acpid responds to acpi events by calling an executable file with a specific path; for instance `/etc/acpi/actions/powerbtn.sh` is called with arguments that describe the nature of the event when the power button is pressed. logind on the other hand simply allows for a limited list of I believe seven options in a configuration file of what to do when the power button is pressed, not an arbitrary file to be executed.

The possibilities of acpid are obviously limitless, and I use them as such. My machine is configured such that when I close my notebook's lid, the machine does not suspend, but rather disables the screen and goes into power-safe mode, but otherwise continues to operate, I can override this behavior by simply creating the file `/run/lid.disable`.

I cannot configure my machine to be so flexible with logind which highlights the difference in culture: Traditional Unix design was always about removing restrictions, and this is about providing features, but when the feature one seeks not be provided, then one is out of luck.

pengaru · on March 31, 2021

> † Of course its log file is not ACID, so when it poops its pants and corrupts the file there's no way to recover it besides moving it out of the way. Though the log entries on why this happened are also corrupted so there's no root cause analysis.‡ They could have just used SQLite as the logging format and it would have probably been awesome with regards to tooling and doing queries (instead of grep-foo). But no.

When current journals are detected as dirty by journald @ startup, they are renamed and no longer written to. This is a very robust and conservative approach. It burns some space in the interest of simplicity and preserving the log data without risk of messing things up.

These renamed journals are still accessed when reading, so it's not like those dirty files no longer participate in journalctl operations; their contents are not lost.

Any logs lost in a crash are just what was in-flight IPC, or sitting dirty in the kernel's buffer cache and hadn't been sync'd to backing store yet. Journald also performs an explicit sync whenever an urgent message arrives, to try ensure it's made durable ASAP. Of course there's still a filesystem and potentially myriad layers below that which could lose data in a crash.

ACID would be overkill for this fairly simple single-writer multiple-reader mostly-appended situation.

Edit:

Your statement "when it poops its pants and corrupts the file" implies journald has a tendency to actively corrupt its own files.

The principal mechanism journald uses to detect potentially corrupt files is simply identifying a file as being ONLINE when opening for writing, typically at startup. This is almost always caused by an unclean shutdown of the host, though it could be due to a crash of journald itself.

The file isn't even necessarily corrupt. It's simply assumed inconsistent and treated as such WRT writability. I'm not aware of any journald bugs in recent history where journald actively corrupted the contents of journal files. Even crashy bugs where journald was littering dirty journals to be treated as corrupt I'm not aware of being actually corrupted internally by journald bugs.

Source: I've worked quite a bit on journald over the years

Blikkentrekker · on March 31, 2021

> But yeah: unit files are (non-sarcastically) pretty good.

I personally like service managers more that have clean shell scripts such as runit or OpenRC where shell scripts more so look like this:

   #!/usr/bin/openrc-run

   depends="some list of services to depend on"
   after="some list of services which need to start earlier"
   cmd="/path/to/executable"
   args="command line arguments"

The nice thing is that it's a shell script, so each of these lines can be defined conditionally.

throw0101a · on March 31, 2021

See also the BSDs, which have similarly simple scripts

nijave · on April 1, 2021

I'd be interested in how your logs actually got corrupted (was journald to blame or was there power/filesystems issues that were just hard to recover from)

I actually like journald quite a bit. Before, you'd have log files scattered all over the place with various cronjobs to try to keep them from filling up the disk (usually with the help of logrotate). Each application was free to put its logs wherever it decided.

Journald keeps them structured with first-class support for metadata like timestamp and unit so it's easy to get a service's logs in a 2 hour window on a given day and equally as easy to look at them in a stream with other service's logs (it aggregates them for you). From an application perspective, you just dump to stdout/err.

>And journald also cannot send logs off-host in any useful format

This seems directly at odds with your "tightly-coupled" beginning statement (you want it to do more or you think it does too much?). It's trivial to do this with fluentd or other log aggregation software

anotherhue · on March 31, 2021

To be fair, there isn't one Uber-bin, but that doesn't change the fact that all these systems are now in lock step and under the control of one group.

Tell me, a naive one, why a throwaway for this?

throw0101a · on March 31, 2021

> To be fair, there isn't one Uber-bin

Damning with faint praise? :) While an uber-bin would be a bigger problem, there is a reason why I italicized "tightly-coupled". Often many people simply respond with "but there are multiple binaries",† which kind of misses the point.

> Tell me, a naive one, why a throwaway for this?

It was created 2017-01-01.

† Edit: Case in point, see sibling comment: "These are separate services, with separate binaries, with separate packages […]"

* https://news.ycombinator.com/item?id=26646250

barkingcat · on March 31, 2021

Why do you assume a throwaway?

tored · on March 31, 2021

> under the control of one group

Exactly.