Hacker News new | past | comments | ask | show | jobs | submit login
I wish systemd logged information about the source of “transactions” (utoronto.ca)
127 points by zdw 12 days ago | hide | past | favorite | 77 comments

A peer of mine was exiting their SSH sessions with 'exit'. One time apparently they already typed 'systemctl', probably in an attempt to check the status of a service, changed their mind and then later wanted to close the session using 'exit', actually executing 'systemctl exit'. This translated into a shutdown of the machine in question.

After being able to piece together what happened with the machine's logs and the bash history I recommended to simply exit all programs/sessions with Ctrl+D. It works almost everywhere and would have prevented this exact issue.

I can do one better for exit. The Solaris kernel debugger KDB can be used at runtime for inspecting some stats and also to change some global configurable variables.

For whatever reason if you type a variable/symbol it assigns it the value 0. If you type exit nothing happens immediately but as soon as the next process exits usually a few seconds to 10s of seconds later the entire system kernel panics on a null pointer de-reference.

Kernel paniced our production ZFS filer twice before I cottoned on. Newer releases special cased “exit” not to do that.

That sounds like the most amazing UX I have ever encountered :)

For those familiar with Solaris, is there any reason they did it this way?

How can you possibly set the default behaviour of assigning any symbol a value of 0 by default?

Solaris is built on some idiosyncrasies (Speaking as someone who, as a Linux person, has to sysadmin Solaris systems).

For example, on our Solaris machines, after install, "reboot" does not do a clean reboot, it's a hard reset. If you want a clean reboot you need "init 6". Same story for "shutdown" and "init 5".

"killall" also kills all processes. Not the one you specified. Or more specifically it SIGKILLs all processes that have open files (ssh session and server go byebye). If you type "reboot" or "shutdown", this is in fact the binary that gets called do that.

Sadly, Solaris is also one of the few systems that support the NFSv4 ACLs (Linux supports NFSv4 but not the ACL Extension, TrueNAS has a patch for that).

Doesn't Ganesha support NFSv4 ACLs?

On Linux only for non-Filesystem backends (Ceph, Gluster, etc.)

The only fork of Ganesha that supports it on a proper filesystem is the TrueNAS fork and that only on their ZFS Fork that brings NFSv4 ACLs to Linux.

So really, no. You can't use NFSv4 ACLs with Ganesha on Linux outside of Forks or using scale-out data stores.

That's of course due to the principle of maximal astonishment, a time honoured software design law.

Hey, it’s better than what C does for its variables ;)

because adb(1) did it that way.

Remember how on sparc machines if you turned off your laptop with the console cable connected it powers the servers down too?

That was a fun lesson…

Years ago I had the habit of shutting down my laptop with "sudo shutdown -h now" at the end of the workday. Until one day I did that accidentally in a live SSH session.

Since then I always shut down my machine using the GUI and I have Tmux configured with different colors for SSH sessions.

One time I was using my laptop to remote desktop into a computer I was hundreds of km away from physically, I discovered that if I hit the power key on the laptop (yes, the laptop had a key on the keyboard for power, not a separate power button) while having the remote session window focused, it sent the signal over the wire and put the remote host to sleep instead. Whoops.

After that I looked up how to enable wake-on-lan and open up a port to be able to do that remotely.

I don't have the best experience with Wake-on-LAN, it stopped working after a BIOS update or something. So now my desktop is set to turn on when power comes back and it's connected to a WiFi switchable power socket. I bought one with the usual app/cloud rubbish and flashed Tasmota on it.

I have some smart plugs I was planning to hack the network protocol of. I looked up Tasmota but I couldn't figure out how I could see if it would work with the random crap I bought on AliExpress and if so how to do it. Any advice?

Well. I went the other way, I bought known compatible devices. In my case the inofficial name is "OBI socket 2", from the (German) OBI home improvement store. It's about 10€ a piece.

That said, if your device is based on some kind of Espressif ESP32 module, you might be able to find the right four pins on the circuit board and find or cobble together a configuration to talk to the I/O ports. Hardware required is a (usually USB) RS232 interface at 3.3 volts, some medium-thin cables, screwdriver, soldering iron, probably multimeter to check things. The firmware flashing and WiFi setup are fairly independent of the I/O port configuration, so you can flash something that can bring up the WiFi connection and web interface and experiment from there.

There's a package called molly-guard which can help with that.

It's a must on my systems. After I once rebooted the wrong server accidentally, I now force myself to go through the pain of confirming the hostname of the machine.

Also, it's possible to circumvent it when you have scripts that need to reboot the machine without interaction by issuing `reboot </dev/null` in the script.

I've done the opposite, back in the day. I used server A's keyboard & monitor to SSH to admin server B. When I was done with the system upgrade on server B, I rebooted it with control-alt-del. Except I rebooted server A.

Happened to me years ago. Fortunately that server had a power saving bug that most times rebooted it instead of powering it down.

Almost entirely unrelated, but lately I worked with old photos that had missing, incomplete or wrong exif data. While trying to assess and automate fixing the collection via scripts, I used the command line utility "exif" a lot. You can't imagine how many times I typed "exit somefile.jpg" and flinched when the terminal window just closed. Guess I should just have created an alias but that's like resigning to your own stupidity. ;-)

Also, spam Ctrl+C if you're gonna issue a new command after being AFK for a while.

I recommend not using root privileges unnecessarily. `systemctl status` and most other querying commands don't need root, while the dangerous things like `systemctl exit` do.

Sure, but some systems do poweroff without being root. Has happened to me, don't remember the details. Maybe a polkit thing because users are supposed to shut down their own laptop?

Yeah true, if the user is logged on via a physical tty or local X session (i.e. the policykit subject.local attribute == true) then in some distros they will get permission to shutdown or reboot.

They won’t have the permission if connected remotely though.

I can't say what they had in mind when typing `systemctl`. Even they couldn't. Because of the delay in the shutdown to cleanly stop the services, they had already forgotten that they just exited a SSH session and thus the connection between the machine being dead and typing `exit` was not obvious.

Maybe it was `systemctl status`. Maybe it was intended to be a `reload` (which would require elevated privileges).

GP's point was more that you shouldn't be able to casually do "systemctl exit" in a standard shell session. All privileged operations should require a sudo. One might be tempted to just do a full "sudo shell" to perform all systemctl operations, but GP's point is that many of the "observation" actions don't require sudo in the first place!

In the end, the user being able to accidentally run "systemctl exit" may be indicative of a policy issue (ie don't allow root logins).

All sudo actions on my machines are logged to a remote syslog server (and locally to /var/log/auth.log, which is rotated, compressed, and kept far longer than other logs). That certainly used to be standard. You can't log on as root, you have to log on as your own user and elevate to root (even if that's all you do with sudo), so there's a trail there.

This article suggests there are ways for programs to issue unaudited commands with elevated privileges to systemd.

"Systemd has a D-Bus interface that people can use, there's hardware events that may trigger a reboot, there are various programs that may decide to ask systemd to reboot the system, and under some circumstances systemd itself can decide that a particular, harmless looking process failure or 'systemctl' transaction actually will trigger a reboot through some weird chain of dependencies and systemd unit settings"

Complaining about systemd is as old as systemd, and borders on a religious war, a proxy towards "modern linux" and "old school unix" methods.

But may not be. Maybe the policy is fine and changing it based on one machine shutdown is not worth the costs.

I did not even know that systemctl exit exists. Tye man page says it's equivalent to poweroff (for the system manager not running in a container, i.e. the most common case).

Having 2 alternative commands for the same functionality is not a good design decision IMHO. But not the most central design decision for systemd.

there are already multiple commands for turning the computer off without systemd anyway

On average, systems are poo. The median modes around.

I prefer openRC, and I'll wave a flag or whatever but to each their own.

WinNT4 was goat.

... eyeing you suspiciously and backing away ...

I'll just get off your lawn.

Ctrl+D is great, but will still run a command if it's been typed out so I always Ctrl+C and then Ctrl+D.

It does not for me. If the current command line is not completely empty it will not do anything (both with bash and fish). So if the terminal does not close after pressing Ctrl+D I will know that something is wrong and check more carefully.

What's wrong with just closing the putty window?

Maybe they're not using a GUI ssh client, and wanted to return to their local shell.

alt-f4 is slightly more efforts to do (requires hand contortions) comapred to writing out a word and typing enter. Also, control-D is easier.

ctrl+D/exit doesn't close the window in all cases. eg if you're sshed to a remote it returns you to local.

Similarly I discovered yesterday that the systemd service definition for auditd includes the `RefuseManualStop` option for this exact reason. When stopping (and thus also when restarting) the service via systemd, auditd is unable to log who shut it down, so it just disallows being stopped. (https://linux-audit.redhat.narkive.com/3weoVaZE/rational-beh...)

The workaround is to use the service command instead. Manually I usually do that anyway, muscle memory etc. But Ansible's service module will default to systemctl if it finds systemd. So there I had to add a "use: service".

It seems odd to me to go to the effort to tweet & blog about 'wishing' for something like this, but not to open a GitHub issue requesting it: https://github.com/systemd/systemd/issues

(At least, I can't find it searching there, and OP doesn't link one, so I assume there isn't one.)

I'm probably going to get a lot of hate for saying this; because anything anti-systemd tends to attract weird people who claim that you want bash scripts back but: Making issues on systemd's github page is often met with stoicism and reluctance on the part of the systemd developers.

There's countless examples of the systemd maintainers refusing to fix bugs or acknowledge that bugs exist.

Making a public statement could be more effective.

Maybe, personally though (if that were my experience and I was OP) I'd try an issue first, and then if it was shot down I could tweet & blog 'unfortunately, doesn't look like it's going to happen' or whatever, referencing my attempt.

It just seems like they went to about the same amount of effort, but on something with a much lower (IMO) probability of being actioned. (Not least because if it were it'd probably go though roughly the same process anyway, such as through the issue now opened by a sibling comment to yours. Maybe PR without issue, but which OP could (skills permitting) also have done.)

Good point, created one https://github.com/systemd/systemd/issues/21497 (I'm not the author though)

Aside from the maintenance and uptime issues, OP is actually raising a very real security concern; often an attacker will reboot the machine to restart all of the logging processes or to load in a LKM. Knowing why, and when (and which process) just forced a reboot is a very real requirement.

Systemd also seemingly randomly attacks my processes, and it's almost impossible to actually figure out why. (At least the kernel OOM killer actually logs "Out of memory, killed this process."[0])

0. https://linuxwheel.com/oom-killer-explained/

While generally the systemd documentation is pretty good (Start with reading the original blog series as a primer, later search for "systemd directives" using your favourite search engine) I have always found the transactions concept underdocumented. Does anyone have a good link?

Edit: Are transactions and jobs the same thing? Both are mentioned in the documentation here and there, without having an own page or chapter AFAIK.

“Documentation” is “Pretty Good?

“Start with” the “blog”?

Clearly, the two does not connect well … for systemd.

> ... as a primer

It's worked very well for me. Not seeing your disconnect here. Blog for concepts, man pages for specifics

I wish more developers wrote enough into the man page so you don't need to google things. More of bash's man page and less of i3's.

Don't forget all the stuff in GNU's info pages.

I'm of the opinion that "info" is for long form guides and "man" is for short references.

(replace "info" with some better reader than the default one though)

For instance, I wanted to know why my custom systemd.conf unit file is causing my custom daemon to restart whenever the Ethernet cord gets unplugged or its netdev goes offline.

> Clearly, the two does not connect well … for systemd

It's linked from the man page. https://man7.org/linux/man-pages/man1/systemd.1.html

Unfortunately the reading will still need to do ourselves. Not trying to be snarky here, it happens to me all the time that I have not read something and complain it's hard to understand.

I think what OP means is that having good documentation should not require you to read a blog of any sorts, even if it is linked in the man page. Either documentation is good OR you read the blog, you cannot have both.

I disagree, for instance in a man page I don't want to read the rationale for something (which the blog provides), just the raw "if you do this, then that will happen".

That would have its place in a GNU info book instead.

Plenty of man pages have a dedicated RATIONALE section. You don't have to read it, but the stuff is documented.

It's open source, you get what you pay for. Actually, you get much more, but you need to fill any gaps with your own efforts. Accessing the spread out documentation is one of them.

While they are not comparable entities I'd say systemd documentation is in better shape than Linux kernel documentation. (Not to negate the efforts of those who do work with kernel documentation, but to stress the huge areas of no or pretty obsolete documentation).

Of course you can buy Linux (incl. user space) from the commercial players. I have only worked in one small project in my career that did, but I did not notice that better documentation was worth paying for.

Paying for something doesn't guarantee good documentation. Example: I read many people complaining about Apple [1]

My take is that pay vs free software and good vs bad documentation are orthogonal and you can be in any of the four quadrants.

[1] https://www.reddit.com/r/swift/comments/ljl6bq/we_were_so_fr...

> My take is that pay vs free software and good vs bad documentation are orthogonal and you can be in any of the four quadrants.

It's definitely not independent. Few people enjoy writing documentation and even then, it takes a lot of time. Plus, writing good documentation is something you need to train for. Very few OS people want to spend their free time writing code and then spend just as much again for - usually boring - documentation and support tasks.

Companies have exactly the same problem, but they have the option to throw money at the problem. Sure, there are some OS projects with good documentation (usually sponsored by a company) and a lot of proprietary stuff without, but proprietary software usually has more financial backing and that's directly related to good documentation.

I generally agree with you but let me enumerate the FOSS projects that I use and that have good documentation.

Not explicitly company backed: Ruby, Python.

Backed by multiple companies: PostgreSQL, JavaScript.

Backed by one company: Ruby on Rails, Elixir and Phoenix, Nginx.

Don't know: Django, Apache Httpd.

Of course I might be wrong about the categories.

> paying for something doesn't guarantee good documentation.

Isn't that what I said when referring to commercial Linux distros? Of course it would be easy to continue enumerating.

But morally it entitles you much more the complain if you pay and it's poor quality than if you just get it for free with no promises.

The idea that good documentation requires payment is defeated by the huge amount of open source projects with excellent documentation.

I did not say that every open source project produces insufficient documentation. But some central ones that are hard to avoid do. There complaining doesn't help, you just need to invest your efforts. Ideally you could contribute better documentation, but at least you have to make the effort to learn it for yourself.

I'm not familiar with the term transactions, but it doesn't sound like it's the same as jobs.

Jobs are externally visible. You can see them easily (e.g. systemctl list-jobs), and systemd provides an interface for them over D-Bus[1]. There's no similar interface for anything called a transaction.

From the documentation that does mention transactions, it sounds like transactions are internal to systemd. When systemd starts a unit, it works out the dependency graph and spins up a job for each unit that need to be started before the originally requested unit can start. That would all be considered a single transaction, but it might spin out into dozens of separate jobs that get queued up.

As an example, when systemd starts on boot, all it really wants to do is successfully reach some target (e.g. multi-user.target). systemd starts from there and works backwards, building a dependency graph with every single unit that needs to start up as a part of the boot sequence to reach that target. You could probably consider that a single transaction, but the full dependency graph would probably pull in hundreds of jobs.

I don't work on systemd or anything so this isn't canon.

[1] https://www.freedesktop.org/wiki/Software/systemd/dbus/

Is there a good resource for people who are used to non-systemd systems (something like a gotcha list)? I keep on running into weird situations where I end up finding out systemd is somehow responsible for my woes. Last time that happened was when I changed /etc/fstab but somehow old mounts kept on being remounted, I wasted an hour before I found out I had to reload some systemd service.

Related incident: "Systemd killing processes each minute at second 27" (German):


Ran into a similar problem, trying to work out why systemd was stopping my service.

systemd just has so many reasons to kill processes, and invents new ones with new releases. Timeouts for things it thinks it should be short-lived, resource limits, service isolation and sandboxing settings, etc. They are mostly documented, but you need to know where to look. While I came to like some aspects of systemd, debugging why things die for apparently no reason after systemd was upgraded has eaten many of my workdays.

New one as of this morning is to kill STOP processes it doesn't like. I have no idea why, or how to revent it. All I did was update my 'sid' debian and here goes

> All I did was update my 'sid' debian and here goes

As it turns out it's called "unstable" for a reason

Hate to tell you, but I've been using sid for well over 20 years, and it is usually more 'stable' than most distro out there.

systemd is why I love OpenBSD.

The acronym POLA translates to peace where systemd creates havoc and burns many hours globally having people investigate what the heck is going on in this almost binary blob of spaghetti.

Does OpenBSD log what source caused the machine to reboot?

A very slim down fork would possibly not be that bad

Don't forget Artix, Void, MX Linux, FreeBSD, Alpine, Gentoo/Funtoo..

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact