Hacker News new | past | comments | ask | show | jobs | submit login
Systemd needs official documentation on best practices (utoronto.ca)
63 points by zdw 5 days ago | hide | past | web | favorite | 58 comments

I tend to approach such tasks by just copying and editing a unit that shipped with the os (e.g sshd) — the example the post gives as being problematic is using some timer feature instead of cron and I suppose my question is really: who is doing that anyway?

I know you can rig up systemd to act like inetd or cron, but perhaps you would be better with an actual service (how hard is quartz?) than baking your jobs into systemd units..

That said, I only use Linux in prod for enterprise stuff that depends on it, everything else is on bsd so you can probably guess my opinion on systemd ;)

I'm one of the (rare?) people who actually likes systemd. On Arch Linux, it is uniformly applied to the whole system and is actually a nice experience IMO. This contrasts with other distros that use a hodgepodge of sysv-init, upstart, and others. Even still, I think that systemd has discoverability and verbosity issues that should be addressed.

Not rare. The majority of Linux distros have switched to systemd because the majority of people who work with it like it better than what they were using before. It's just that no one ever made it to the front page of HN writing a blog post titled "This technology is generally decent."

I dunno about that, the majority of ops guys I work with (or have worked with over the last few years) dislike it... we’re mostly angry old greybeards though.

Greybeard here - I like systemd well enough. It's not perfect, but it's a lot better than rando SYSV init scripts. Journald and journalctl are loads better than /var/log.

Have you used openbsd for anything lately? Systemd is better than the horrible mess of upstart/sysv etc, although the one linux laptop I run is using void (and thus, runit) and you can type init 6 and it doesn’t randomly hang for 2 minutes while shutting down if you’re not using gnome/kde.. imagine! The future is here ;)

It's been a number of years - doesn't openbsd use just plain /etc/rc and /etc/rc.conf? I like that better than init.d just because it enforces some standards around the scripting at least.

I admin a number of ubuntu servers running systemd, and they all shut down/reboot pretty much instantaneously.

I have to give a second endorsement to runit. I really wish more distros would use it.

Why are they loads better btw? I think in composable commands, being able to cat my logs without first setting them up to pipe to a syslog is something I very much want...

Being able to view logs based on the name of the service instead of the name of the logfile. `journalctl -u <unit>` vs `less /var/log/service/something.log`

Being able to specify date ranges instead of grepping around multiple files. `journalctl -u <unit> -s <start> -e <end>` instead of horrid combinations of grep and gzcat.

"Okay," you say, "but that's all for logs on the host. Shouldn't you be using Splunk or an ELK stack?"

Journalctl can export logs in a json format that can be natively consumed by splunk for free metadata markup. I wrote some container sidecars to do exactly that, and it worked great.

The only thing I don't like about journalctl is that it doesn't line wrap by default.

I'm sure it's a nice init program, and god knows sysvinit is garbage, but I'm much less clear why systemd needs to be my fucking DNS resolver, too.

It doesn't. Systemd-resolverd is a separate application, completely optional and only Ubuntu uses it by default.

Not that it is bad, but you don't have to use it.

It doesn't need one, but systemd-resolved works quite well and integrates nicely with other services. Systemd-resolved isn't mandatory, and you can still just use /etc/resolv.conf pointed to an external DNS server.

I spent the time to learn it and yeah, it's pretty nice when it's working.

The problem is of course when you need to do something that isn't supported by SystemD, in which case you're in for a world of hurt as the system constantly fights you.

For example, I had a requirement to make sure some machines randomized their MAC addresses when connecting to untrusted WiFi in Ubuntu 14. Turns out this functionality was broken in NetworkManager and trying to go around its back was a huge huge headache because the system was relentless in checking the MAC of the interface and setting it back to the physical one.

There are other times where it bites you in the ass too, like writing a raw image to a SD card. In the old days you would just dd the image onto the card, but you can't do that anymore because shortly after you write the filesystem header to the card SystemD will notice the new filesystem attempt to mount it, killing off your dd process.

I always try to work with SystemD now because it's so much harder to fight it.

> ... because shortly after you write the filesystem header to the card SystemD will notice the new filesystem attempt to mount it, killing off your dd process.

Could you elaborate on this? I develop an embedded OS which regularly flashes itself while running systemd (we support two methods: 1. write partition table to disk, write fs header to partitions, explicitly mount fs and write file content to fs; and 2. dd pre-built image) and I have never seen this happen. We have also flashed (via dd) our images onto cards from within Ubuntu 16.04 and 18.04 and never seen this.

Are you talking about systemd-networkd or Network Manager? Those are separate projects, and I agree that Network Manager is garbage. Still, you don't have to take systemd wholesale. You can still use ifupdown if that is your jam.

In your specific case, if you were using systemd-networkd you could specify MACAddressPolicy=random in your link file to achieve what you wanted.

To be fair, you can run systemd without NetworkManager or an automounter. For network access, you can run dhcpcd with systemd (on Arch anyway). I've never wanted an automounter on any system.

Likewise, systemd doesn't prevent you from running ALSA instead of PulseAudio. The flexibility is there. Better documentation/best practices would be helpful.

Ubuntu 14 doesn't use systemd. It was the last LTS prior to the systemd switch.

Ack, I misspoke. Ubuntu 16 was the one with the busted MAC randomization. It worked in 14, but was broken in 16 due to the SystemD deployment.


What was the solution for the SD card imaging?

Download a bloated electron app to do the imaging, like you're on Windows.


Eurgh. What have we done? :(

Count me too. I love systemd and use it extensively to manage programs that can crash or randomly fail.

It can monitor many things including ram usage, and manages dependencies. Also it is everywhere by default - one less moving part I have to manage.

Same for me, in fact Arch was the impetus for me to actually dig into all the things systemd could do (which I guess is the point of Arch). That said, I have had the exact issues linked in the article, staring in puzzlement that there's no simple way to ship errors from systemd via email. I actually really love journalctl but there's a big market for more tooling around it (it also annoys me that I can't remap log levels of different units, for example).

I thought most people like systemd but those who do not are very loud.

The reason for using timers is that you can make them dependent on sockets, files, file systems, etc.

Love it or hate it, the whole idea of systemd is to centralize all task-ish management, whether one-offs, long-running, periodically-running, static resources, logs, etc.

Here are some best practices:

- Small is beautiful.

- Make each program do one thing well.

- Build a prototype as soon as possible.

- Choose portability over efficiency.

- Store data in flat text files.

- Use software leverage to your advantage.

- Use shell scripts to increase leverage and portability.

- Avoid captive user interfaces.

- Make every program a filter.

I know this is only tangentially related, but when you're bringing up UNIX philosophy...

> - Store data in flat text files.

Storing encryption keys or the like in flat text files is surprisingly hard, at least in C. I don't think it's even possible to use sscanf(3) without unexpected explosions, and otherwise you get to hand-roll your hex parsing code yourself (Base 64? Get yourself a library).

Then don't do it the hard way and store something like a password in some general text file that needs to be parsed, just have the password in its own file. This is also good practice when that's otherwise a sane idea because you can apply separate permissions to that password file.

Working with that sort of file in C is trivial, you either open() and read() from that file, or mmap() it.

> Storing encryption keys or the like in flat text files is surprisingly hard, at least in C.

Storing keys (or other binary) data is _exceptionally_ easy with C. What's hard is _multiplexing_ it with other data in the same file, but you probably don't need to:



The GP is probably more focused on textual logs.

But yes, C is very weak on converting data from one encoding into another. You end-up always having to write your own encoder/decoder.

You should be able to convert between binary and base64 on memory more easily than during IO, and for encryption keys the buffers all have known size limits. As a rule sscanf will add security vulnerabilities to your code, so one should better avoid it if you do not trust your inputs.

> - Use software leverage to your advantage.

openssl's libcrypt has Base 64 routines in it.

Storing data in text files is the biggest con pointed at proc. Store data in binary, so that its easier to read by everything.

The lack of clear, thorough, reliable documentation is one of my biggest gripes with systemd.

The man pages that ship with systemd seem very thorough to me. The problem is rather that they're very thorough in the first place. They're a very hard, lengthy read and you kind of require to understand all of it before things begin to make sense.

That's exactly the point the article is making. In addition to the comprehensive documentation, systemd users need best practices for a good range of common-to-uncommon use cases.

Agreed, what’s lacking is documentation on how the many, many options can interact with each other.

Today I wanted to get systemctl list-unit-files give me the full path to each file. Couldn’t figure it out.

I agree, and I am one of the people who has some major gripes with systemd through experience. I admit it's a reality and people should learn it, but the problem is not many people have, and sometimes it really makes things that used to be much easier not necessarily more difficult (though sometimes), but rather more complicated (there is a difference). For example, I always found writing a sysvinit service pretty straight forward, but systemd units can get unweildy especially when doing anything complex, though I will admit some of the features are nice if you can figure them out, such as when in the init proc to start, vs having to do that manually in sysvinit scripts.

It also often fails to admit when it is failing, for example, nobody I know uses journald as the main or only logging system, and are still piping to local (r)syslog(ng) and then to elk/splunk, etc, because journald is simply not up to par.

In general, and I'm just being honest here, besides these mostly work-aroundable issues, it is the seeming disingenuousness of those who automatically rail against anybody who criticises systemd that makes me not like it. Call me a contrarian, but I hate being told I must like X and if I don't I'm dumb and behind the times, and my nature is to push back against that.

In that vein, I support the efforts of Devuan and similar projects if only to support a counter-balance.

> [..] elk/splunk, etc, because journald is simply not up to par.

Could you expand on what's not up to par in journald vs. rsyslog? Watching textfiles for changes vs. receiving data on a non blocking socket is definitely _not_ more efficient (nor "better" in my opinion).

Mostly related to the tooling around those things, especially when dealing with groups of machines sending logs to a central system. It's worth remembering you are still using journald when using one of the syslogs, it's just forwarding all the journal messages to the syslog socket.

As the tooling catches up this gap will close. For example the json output of journald can be really useful, that said, most of the tools still require some change of the output from the binary format which they can't read to a text format whether that be json or syslog... and the binary format in general being a general pain point if (read: when) things break (such as corruption of the whole journal past corruption point vs one section of text in syslog).

It's worth remembering that journald was designed for local system use. It's just one more of the reasons I had some hesitation with systemd but got shouted down by everyone on the systemd bandwagon.

What's the general consensus on the relevance of systemd when you've migrated to a fully containerized architecture? I know that I run almost everything in docker nowadays and barely touch systemd. I use fargate on AWS too and there is no systemd to touch. I use a simple python based init process inside most of my containers.

If you are not using systemd, it is not relevant.

To be more specific, is it worth learning nowadays?

Depends on your job I would say. If you're a swe and youre pushing your own app to prod via containers then you can pretty safely ignore it.

If you're in ops and you're mostly managing that fleet but you also have to look after some non-docker stuff (say, an old artifactory or some qa tool or other) then probably the time to learn how it works is before it breaks.

My 0.02$ -- If you can avoid it, do so.

I've used systemd to great effect managing docker containers. Systemd can order the start up/shutdown of containers with other dependencies, the most common one being network file systems that are used as volumes. a very simple and clean unit file can ensure that the container is only started after the file system is mounted, and that it is killed before the file system is unmounted.

I love systemd. I use it extensively in production. The constraints handling (ex: service depends on database which depends on network) is great. You can have a systemd timer restart another timer that was suspended when the system load was too high, etc.

Systemd is the best way I know to handle programs that can fail randomly, with complex dependencies. Just tweak the restart limits on the service. You can manage the whole set of dependencies. It can even be instructed to reboot the server.

However, documentation could be better. For example, I have not been able to use yet the new feature allowing services to join existing namespaces not created by systemd (so not JoinsNamespaceOf=, but NetworkNamespacePath= ; cf otherwise https://cloudnull.io/2019/04/running-services-in-network-nam...)

But systemd is still a bit new. Documentation and best practices will improve.

EDIT: actual technical content get downvoted as usual, while opinion pieces about systemd being the antechrist fly.

But if they codify best practices, then they won't be able to invent "best practices" on the spot to cover their asses whenever somebody discovers some flaw in their software.

Prime example: the "best practice" of not letting usernames start with numbers is invented on the spot by a SystemD dev to cover his ass: https://github.com/systemd/systemd/issues/6237#issuecomment-...

of course systemd is so bloated and monolithic that it's hard to find and identify bugs and flaws in the first place, so it's win-win for them.

Isn’t it shadow-utils that came up with that rule? And rhel/centos patching that out? https://gist.github.com/bloerwald/a482791395114fa82636e2ab20...

It was never a rule, only a quirk a few other tools had. The kernel has always permitted these usernames; it is plainly systemd's responsibility to not shit the bed when it encounters such a username.

The example is because SystemD rejects POSIX as a standard. Whenever Pottering decides something is better, POSIX is tossed out the window.

It's deeper than that. Rejection of POSIX is the reason SystemD is Linux only; Pottering rejects POSIX compatibility and gives the finger to anything other than Linux. But in this case SystemD failed to even adhere to the [non]spec of "what Linux supports."

The best practice would be to replace it with something simpler and more elegant.

The article complains about not getting email when a timer-activated service fails.

Like all software, if you didn't test it to work, why should you expect it to work? Cron can fail silently as well if don't set MAILTO. Not all servers start out with working outbound mail, which is something you have to setup for both Cron and Systemd as well.

I agree the OnFailure= recipe could be documented better, although I found an example easily enough when I looked up how to have systemd timers send email on failure. I use a script which mails the output of `systemctl status` on the service that fails, so I get logs of what happens.

I've a number of questions about systemd on StackOverflow sites and there are definitely some FAQs about best practices related to getting started with setting up services.

>Cron can fail silently as well if don't set MAILTO.

From the cron man page:

> Any output produced by a command is sent to the user specified in the MAILTO environment variable as set in the crontab(5) file or, if no MAILTO variable is set (or if this is an at(1) or batch(1) job), to the job's owner.

With a non working mail configuration, it goes nowhere.

I think the author meant that

If they did mean that then they were deliberately missing the point of the original complaint.That isn't the sort of thing we are allowed to assume here...

So you prefer not assuming good fait? Ok.

Without mail, journalctl and even better, systemctl list-timers will tell you about failed crons.

Can you do something as good with at and cron?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact