Hacker News new | past | comments | ask | show | jobs | submit login

> It's mentioned that a patch wasn't accepted, but it was with good reason - it doesn't fix the problem (by traversing the directory).

The patch can not fix the problem: upstart can't scan the directory (or ignore the overflow, or handle it however it would) because libnih blows up without yielding to upstart. It does not call a NihWatch handler and does not return an error code.

So the first step is to have libnih at least not blow up (which the patch did), ideally notify the user (which the patch didn't). The purpose of the patch was not to fix the issue, it was to fix libnih's broken handling of inotify.

If I was using this system I would say that it isn't up to libnih to fix the problem; inform init and let init do a complete config file(s) reload. This is very similar to the way the kernel handles interrupts; when there aren't many it handles hardware in default interrupt mode (e.g. one interrupt per packet), but when the rate of interrupts gets too high it moves to non blocking synchronous polling (e.g. keep iterating through packets until a buffer fills/reschedule or the input queue empties).

> If I was using this system I would say that it isn't up to libnih to fix the problem

It isn't up to libnih to handle the overflow, but the problem right now is in libnih itself[0] and only libnih can fix that.

[0] and is twofold: libnih blows up on inotify overflow instead of not blowing up on a relatively normal (if rare) situation, and consequently (and obviously) libnih doesn't provide any way for its caller to handle the overflow.

That's true. What upstart could have done, though, is to watch for config changes in a separate process, so that it does not need to have an in-process dependency on something like this.

That would give upstart a way of handling the overflow and other (unknown) potential problems: rescan the config if the config watcher dies (after respawning the watcher).

This seems like a reasonable solution. And one that sandboxes failures in that code path to prevent PID 1 from crashing. It's a shame this is the only comment (that I've seen) even mentioning this.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact