After being able to piece together what happened with the machine's logs and the bash history I recommended to simply exit all programs/sessions with Ctrl+D. It works almost everywhere and would have prevented this exact issue.
For whatever reason if you type a variable/symbol it assigns it the value 0. If you type exit nothing happens immediately but as soon as the next process exits usually a few seconds to 10s of seconds later the entire system kernel panics on a null pointer de-reference.
Kernel paniced our production ZFS filer twice before I cottoned on. Newer releases special cased “exit” not to do that.
For those familiar with Solaris, is there any reason they did it this way?
How can you possibly set the default behaviour of assigning any symbol a value of 0 by default?
For example, on our Solaris machines, after install, "reboot" does not do a clean reboot, it's a hard reset. If you want a clean reboot you need "init 6". Same story for "shutdown" and "init 5".
"killall" also kills all processes. Not the one you specified. Or more specifically it SIGKILLs all processes that have open files (ssh session and server go byebye). If you type "reboot" or "shutdown", this is in fact the binary that gets called do that.
Sadly, Solaris is also one of the few systems that support the NFSv4 ACLs (Linux supports NFSv4 but not the ACL Extension, TrueNAS has a patch for that).
The only fork of Ganesha that supports it on a proper filesystem is the TrueNAS fork and that only on their ZFS Fork that brings NFSv4 ACLs to Linux.
So really, no. You can't use NFSv4 ACLs with Ganesha on Linux outside of Forks or using scale-out data stores.
That was a fun lesson…
Since then I always shut down my machine using the GUI and I have Tmux configured with different colors for SSH sessions.
After that I looked up how to enable wake-on-lan and open up a port to be able to do that remotely.
That said, if your device is based on some kind of Espressif ESP32 module, you might be able to find the right four pins on the circuit board and find or cobble together a configuration to talk to the I/O ports. Hardware required is a (usually USB) RS232 interface at 3.3 volts, some medium-thin cables, screwdriver, soldering iron, probably multimeter to check things. The firmware flashing and WiFi setup are fairly independent of the I/O port configuration, so you can flash something that can bring up the WiFi connection and web interface and experiment from there.
Also, it's possible to circumvent it when you have scripts that need to reboot the machine without interaction by issuing `reboot </dev/null` in the script.
They won’t have the permission if connected remotely though.
Maybe it was `systemctl status`. Maybe it was intended to be a `reload` (which would require elevated privileges).
In the end, the user being able to accidentally run "systemctl exit" may be indicative of a policy issue (ie don't allow root logins).
This article suggests there are ways for programs to issue unaudited commands with elevated privileges to systemd.
"Systemd has a D-Bus interface that people can use, there's hardware events that may trigger a reboot, there are various programs that may decide to ask systemd to reboot the system, and under some circumstances systemd itself can decide that a particular, harmless looking process failure or 'systemctl' transaction actually will trigger a reboot through some weird chain of dependencies and systemd unit settings"
Complaining about systemd is as old as systemd, and borders on a religious war, a proxy towards "modern linux" and "old school unix" methods.
Having 2 alternative commands for the same functionality is not a good design decision IMHO. But not the most central design decision for systemd.
I prefer openRC, and I'll wave a flag or whatever but to each their own.
WinNT4 was goat.
The workaround is to use the service command instead. Manually I usually do that anyway, muscle memory etc. But Ansible's service module will default to systemctl if it finds systemd. So there I had to add a "use: service".
(At least, I can't find it searching there, and OP doesn't link one, so I assume there isn't one.)
There's countless examples of the systemd maintainers refusing to fix bugs or acknowledge that bugs exist.
Making a public statement could be more effective.
It just seems like they went to about the same amount of effort, but on something with a much lower (IMO) probability of being actioned. (Not least because if it were it'd probably go though roughly the same process anyway, such as through the issue now opened by a sibling comment to yours. Maybe PR without issue, but which OP could (skills permitting) also have done.)
Systemd also seemingly randomly attacks my processes, and it's almost impossible to actually figure out why. (At least the kernel OOM killer actually logs "Out of memory, killed this process.")
Edit: Are transactions and jobs the same thing? Both are mentioned in the documentation here and there, without having an own page or chapter AFAIK.
“Start with” the “blog”?
Clearly, the two does not connect well … for systemd.
It's worked very well for me. Not seeing your disconnect here. Blog for concepts, man pages for specifics
(replace "info" with some better reader than the default one though)
It's linked from the man page. https://man7.org/linux/man-pages/man1/systemd.1.html
Unfortunately the reading will still need to do ourselves. Not trying to be snarky here, it happens to me all the time that I have not read something and complain it's hard to understand.
That would have its place in a GNU info book instead.
While they are not comparable entities I'd say systemd documentation is in better shape than Linux kernel documentation. (Not to negate the efforts of those who do work with kernel documentation, but to stress the huge areas of no or pretty obsolete documentation).
Of course you can buy Linux (incl. user space) from the commercial players. I have only worked in one small project in my career that did, but I did not notice that better documentation was worth paying for.
My take is that pay vs free software and good vs bad documentation are orthogonal and you can be in any of the four quadrants.
It's definitely not independent. Few people enjoy writing documentation and even then, it takes a lot of time. Plus, writing good documentation is something you need to train for. Very few OS people want to spend their free time writing code and then spend just as much again for - usually boring - documentation and support tasks.
Companies have exactly the same problem, but they have the option to throw money at the problem. Sure, there are some OS projects with good documentation (usually sponsored by a company) and a lot of proprietary stuff without, but proprietary software usually has more financial backing and that's directly related to good documentation.
Not explicitly company backed: Ruby, Python.
Backed by one company: Ruby on Rails, Elixir and Phoenix, Nginx.
Don't know: Django, Apache Httpd.
Of course I might be wrong about the categories.
Isn't that what I said when referring to commercial Linux distros? Of course it would be easy to continue enumerating.
But morally it entitles you much more the complain if you pay and it's poor quality than if you just get it for free with no promises.
Jobs are externally visible. You can see them easily (e.g. systemctl list-jobs), and systemd provides an interface for them over D-Bus. There's no similar interface for anything called a transaction.
From the documentation that does mention transactions, it sounds like transactions are internal to systemd. When systemd starts a unit, it works out the dependency graph and spins up a job for each unit that need to be started before the originally requested unit can start. That would all be considered a single transaction, but it might spin out into dozens of separate jobs that get queued up.
As an example, when systemd starts on boot, all it really wants to do is successfully reach some target (e.g. multi-user.target). systemd starts from there and works backwards, building a dependency graph with every single unit that needs to start up as a part of the boot sequence to reach that target. You could probably consider that a single transaction, but the full dependency graph would probably pull in hundreds of jobs.
I don't work on systemd or anything so this isn't canon.
As it turns out it's called "unstable" for a reason