It lacks everything a good alerting system has (acknowledgements, fine-grained notifications...).
Even those addresses in the stacktraces could be sensitive in other situations since they contain ASLR offsets.
What medium might be better than Slack?
(to be clear, i'm asking, not saying Slack is good for this)
Our solution is a custom notification broker that decides whom to alert and then waits for an acknowledgement. It uses different backends including our company chat.
Not complicated at all, just 100 lines of Python code that contain the business logic.
Anything that relies on a single medium is unsuitable for anything but unimportant alerts. What if Slack goes down for 2 hour? Unlikely, but definitely possible.
This ensures that every alert is explicitly acknowledged by someone, and that unimportant alerts are quickly forgotten without wondering whether someone handled them or not.
We have different applications sending alerts, not just Nagios (because Nagios sucks at processing events as opposed to states), and it would quickly become unmanageable without some sort of middleware.
Edit: or maybe something like a blog post to describe the structural details.
(we used this in addition to, not in lieu of, the pagerduty dashboard itself, as well as mobile notification via pagerduty, and Nagio's web UI.)
> fine-grained notifications
what do you mean by this, out of curiosity?
A common approach is having a "alerts" channel that grabs everyones attention, even though most alerts are only relevant for a small subset of people. A company-wide #general channel is bad enough, and an #alerts channel only makes it worse.
You can totally do this with Slack by having something like PagerDuty in-between. We use a custom-written alerting broker that makes it easier to correctly handle some of the more complicated cases but it's pretty much the same idea.
Does anyone run it also as a separate user to manage certain applications? So that you could have certain people log on and operate some things and others operate other things?
I used to use runit/daemontools/inittab. My favourite thing about systemd is that it is increasingly available, and while it has it's faults, it has instances (macros you can use to kickoff a fleet of services easily) and pretty good isolation features.
It also has a "systemctl-over-ssh" feature which is quite nice, and which allows you to use an .ssh/authorized_keys file instead of sudo to allow access to certain administrative tasks.
0: https://fedoramagazine.org/systemd-template-unit-files/
The biggest systemd fault is one of tooling, and that just comes from a project ambitious enough to try and own it's own ecosystem.
When runit/daemontools, you "debug" a service by typing:
/path/to/service/run
With systemd, you run a unit file by copying it into a directory and run some (magic) commands. You need training/internet search to learn those commands. If your unit file doesn't work, you need more training/internet search, but systemd is still so new that your best bet may be to read the systemd source code, or insert a hacky "sleep 30" at the top of your start script and try to race and strace it in another window. Stuff like that.
Want to upgrade your systemd unit? You can't run it along-side an existing version of itself unless you give it a new name, which changes how journalctl can pick up the results. Versioning in the unit name feels wrong, and nobody does this yet which still currently breaks live upgrades where the unit changes.
Eventually the tooling will get better, but then we'll have a way to read files, and away to read systemd files; we'll have a way to run programs, and we'll have a way to run systemd programs; we'll have a way to "test" units, and a side-by-side mode, and so on.
Another way to do this is pam_ssh_agent_auth. Been using it to authenticate to sudo for years on systems that only maintain keys and no passwords.
loginctl enable-linger <username>
However it doesn't support being the CMD in a Dockerfile. Which is why it's not very common in software deployment scenarios in the post-container world.
For older deployments, it may not be worth switching to systemd because the base OS may not be compatible.
So it's kind of a catch-22.
If you are on baremetal, systemd is much preferable to run it/supervisord
Er, that's only half-true. systemd isn't great for running as PID 1 inside a Dockerfile, but that's because Docker already monitors PID 1[0], and systemd can be used to monitor your container itself.
In other words, think of containers as individual applications that you want to monitor, and systemd can be used either to monitor them or even to run the containers directly. (Yes, systemd can even run Docker containers directly, without Docker![1])
[0] you are using exec mode, right?
[1] https://chimeracoder.github.io/docker-without-docker/#1
I'm referring to pid1 inside the docker container. systemd does not run inside the container as pid1 very easily.
Take a look at this - https://github.com/docker/docker/pull/13525
I think your presentation was about replicating docker functionality using systemd-nspawn...Which pretty cool...But it's not the same as what I'm talking about.
I'm referring more generally to production decisions with docker. Also read this https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zomb...
We are - I'm saying that you don't actually want to run systemd as PID 1 inside a Docker container; the Docker model is built around the container being an application unit, not a system unit.
But if you want to have isolated system(d) units, you can use systemd to get that behavior inside containers. In that case, you'll want to use systemd to run your containers instead of Docker, because systemd's tooling is container-aware (ie, you can have integration between units that run on your host and units that run inside a machine - 'machine' being the systemd term for 'container', in this case).
I know what you are saying - that an atomic unit of work is the program itself..But we run stuff under supervisord even if it is a single program. It helps us to make quick debugging changes to scripts,etc and "restart" them without restarting the container.
In theory it seems the same - in practice it is not. This is the reason for the existence of tons of different init tools for docker.
BTW, I had trouble understanding what you meant because you are constantly moving from docker-as-an-application-unit concept (which is reasonably true) to systemd-nspawn-is-better-than-docker (which is something I am not generally opinionated about).
Docker, "matured", in past tense? Proof of time travel right there!
ExecStartPre=-/usr/bin/foo
Stack traces are fine and all, but without locals it's often hard to track down the issue.
If you want to only see failures, you can use an OnFailure directive.
In the process, I also discovered sendxmpp, which provides mail(1) for XMPP. It does not support encryption and is written in Perl, so I am building my own.
systemd can also run processes that ignore SIGHUP. But systemd does a lot of things that nohup doesn't do. Please don't attempt to use nohup as a daemon management system for anything but the noddiest of tasks.
If you're going to use anything, you're probably best choosing from this list:
https://en.wikipedia.org/wiki/Operating_system_service_manag...
shrug
Its setup is way, way too brittle for me, but I guess the QR codes are kind of neat.
[1]: http://www.gnu.org.ua/software/pies/
They also provide examples for a lot of languages, I guess I'm going to try their service.
Good job, Scaledrone!
It lacks everything a good alerting system has (acknowledgements, fine-grained notifications...).
Even those addresses in the stacktraces could be sensitive in other situations since they contain ASLR offsets.