
Correct or inotify: pick one - emme
http://wingolog.org/archives/2018/05/21/correct-or-inotify-pick-one
======
bcantrill
When implementing inotify for SmartOS to implement LX-branded zones, I could
not help editorializing on some of its design defects -- most of which spilled
into the man page.[1]

In the end, I came to the same conclusion as the author: while inotify is much
better than dnotify (and relatively better than misconceived interfaces like
epoll(2)), its flaws remain serious and it would be frustrating to write a
correct application based upon it.

[1] [https://smartos.org/man/5/inotify](https://smartos.org/man/5/inotify)

------
JdeBP
The BSD kqueue has similar problems, in a different event domain. The
semantics of the interface and the mechanics of the data structures mean that
an EVFILT_PROC event with NOTE_FORK will be overwritten by an EVFILT_PROC
event with NOTE_EXIT for the same process, meaning that it is not possible to
always correctly track the children of processes that fork and then rapidly
exit, or of ones that rapidly fork multiple times.

Fixing this would entail letting go of one of the architectural concepts of
the system, either that a filter+ident pair uniquely identifies an event in a
queue or that a process only ever has one event queue. (Letting go of the
latter also entails a further mechanism for polling multiple event queues.)

~~~
wahern
I think the issue, ultimately, is buffering. Kernel interfaces that can't
preallocate space at request time are problematic. If you can allocate space
sufficient to store the event(s) at the same time that you request
notification, error handling is easy and obvious from a design perspective.
Unbounded, asynchronous buffering in kernel space is simply a non-starter
unless you don't care about consistency and reliability. Linux developers
generally follow (coincidentally?) the GNU directive[1] of lazy, dynamic
allocation, and it's why Linux interfaces can be so problematic and why
everybody throws up their hands when one raises concerns about OOM handling.

The reason EVFILT_VNODE works where inotify doesn't is precisely because
EVFILT_VNODE requires an open file descriptor, which indirectly behaves as a
preallocated buffer for storing pending events. But just like with signals,
like events are coalesced and ordering can be lost.

I think the most straight-forward solution to fork notifications is to _block_
the forking process if the event can't be enqueued. That might be the only
proper solution, period, if you want guaranteed delivery _with_ the new PID.
That creates the potential for deadlocking, but that already exists with other
facilities like file locking, pipes, etc.

[1] [https://www.gnu.org/prep/standards/standards.html#index-
arbi...](https://www.gnu.org/prep/standards/standards.html#index-arbitrary-
limits-on-data)

------
mkj
John Siracusa has a good description of OSX's equivalent.
[https://arstechnica.com/gadgets/2007/10/mac-
os-x-10-5/7/](https://arstechnica.com/gadgets/2007/10/mac-os-x-10-5/7/)

I assume it hasn't changed too much in the past decade?

~~~
fulafel
tl;dr

> The FSEvents framework relies on a single, constantly running daemon process
> called fseventsd that reads from /dev/fsevents and writes the events to log
> files on disk (stored in a .fseventsd directory at the root of the volume
> the events are for). That's it.

> "The FSEvents framework, in turn, can only tell its clients, "Something has
> changed in directory /foo/bar/baz."

> Clients of FSEvents are expected to then scan the directory that has changed
> in order to determine what, exactly, happened (assuming they're interested
> in that level of detail)."

~~~
eridius
FSEvents is largely just a support infrastructure for Time Machine. Time
Machine doesn't need to know about every single change, it just wants to be
able to avoid having to scan the entire disk every time it goes to make a
backup. This can be generalized to other backup utilities as well, or really
any utility that periodically needs to check the state of the disk against
some snapshot of that state. Under these conditions, "something changed in
directory /foo/bar/baz" is sufficient because the directory itself can then be
scanned.

~~~
tinus_hn
Just because you don’t know other examples doesn’t mean it only exists for
Time Machine. Many other things are interested in changes to the file system,
for instance programs that sync a folder to a cloud service, Finder which
wants to show a live view of a folder or alternative backup applications.

If there was no use for this outside of Apples own apps they wouldn’t have
given it a public api.

~~~
eridius
Many things that are interested in the filesystem would use kqueue instead of
FSEvents, as kqueue can give fine-grained event notices as opposed to
FSEvents's course-grained "something changed in this folder".

