> I should point out that this is not theoretical. I went through all of the above because some real machines hit this for some reason. I don't have access to them, so I had to work backwards from just the message logged by init. Then I worked forwards with a successful reproduction case to get to this point. I have no idea what the original machines are doing to make this fire, but it's probably something bizarre like spamming /etc with whatever kinds of behavior will generate those inotify events libnih asked to see.
I would argue that it was almost certainly a pathological use case, as I suggested above. i.e. Something was already broken, and it triggered this crash; had it been left to its own devices, it probably would have triggered some other bad behavior eventually (disk full, OOM killer, etc., many possibilities). I don't know it, of course, since she doesn't have the details on what caused these inotify events on such a massive and rapid scale, but I'm having a hard time imagining why /etc/init would be receiving thousands of events, short of something already being broken badly.
Doing testing myself on such device on one side I follow the mantra "The art of testing is to make border cases possible and not to assume that they will not happen."
On the other side I also have to deal with safety related stuff, where there is the rule: "The safety of the system must be maintained under any circumstances including during system with failures." That it is important to maintain human safety, like a crane should work _always_ within its limits even when failed sensors provide misreadings.
That is the same here, even when a certain service is going wild, system integrity and function must be maintained. Ignoring this fact under the assumption the cause is something else is for me just general ignorance in providing quality work.
But, being said that, part of safety related development is, to cover any theoretically possible behavior. Because not doing it, leads to systematic failures which will decrease the overall system safety. Knowing this will prevent certification with according authorities, like FDA in medical equipment, LLoyds in ships, TÜV in off-road vehicles.
At the end, knowing that such bugs are just ignored with such blatant arguments fuels the image of bad software quality.
Perhaps the repro seems pathological. But fix this issue, and you may well have fixed a whole bunch of other issues that are not so pathological. Certainly, just touching Files should never force the system to reboot!