

Linux fork detection using thread specific keyrings - daurnimator
http://daurnimator.com/post/120415954844/linux-fork-detection-using-thread-specific

======
geofft
This is pretty amazing. I can't decide if the right response is "<3" or
"sigh", but it's one of those.

Is this the best way you know of? I'm reminded of how the best way to answer
"is this process single-threaded" on Linux is apparently to stat
/proc/self/task, which I suggested satirically to a friend but turned out to
be the closest thing to a right answer.

~~~
geocar
You can tell (without a syscall!) if between two points there has been a fork
or thread created by maintaining two counters: One in a MAP_SHARED and one in
MAP_PRIVATE.

Every check, increment both counters, then compare them. If they disagree,
then one of those pages has been copied (the MAP_PRIVATE) and one has not
(MAP_SHARED).

~~~
geofft
I don't think that algorithm works, but I'm probably misunderstanding:

    
    
        char *shared, *private;
    
        bool check_fork() {
            return ++*shared != ++*private;
        }
    
        shared = mmap(MAP_SHARED...);
        private = mmap(MAP_PRIVATE...);
        *shared = 0;
        *private = 0;
        check_fork();
        if (fork() == 0) {
            printf("%d\n", check_fork());
        } else {
            wait();
        }
    

The only way it could work is if you were guaranteed to also call check_fork
synchronously in the parent. But if you had that degree of control over forks,
then you just know when your forks are and you don't need these tricks.

The use case here is in an external library (e.g., OpenSSL) where it might be
called post-fork by a semi-naive user of the library, or where the library
might explicitly want to be usable post-fork (e.g., Apache prefork serving)
and need to adjust some state. If you expect the caller to reliably inform the
library of forks, then you don't need any trickery.

Another use case that I'm interested in is signal handlers, which could be
delivered immediately post-fork. If your handler does something like write to
a file descriptor (which is very common), those file descriptors mean
something different in the parent and the child, so if the post-fork code in
the child was in the middle of switching out file descriptors before exec, you
might accidentally write to the wrong thing. So you want to be sure that
you're still in the same process before writing to a particular file
descriptor. UNIX doesn't offer a way to reset signal handlers on fork,
unfortunately.

~~~
agwa
The algorithm works, though it doesn't distinguish between the parent and
child - the first process to call check_fork() after the fork will return
false, and the second process to call it will return true. (Note that upon
returning true, it should remap and reset the state.) This is sufficient for
the use case of reseeding a PRNG, though not for your signal handler/file
descriptor use case.

The problem with the algorithm (which I'm discussing on another subthread) is
that the increment of the shared counter needs to be atomic, and the integers
need to be 64 bits to avoid wraparound, and I'm not sure that all
architectures support atomic 64 bit increments.

Edit: I just realized that check_fork() would spuriously return true in the
first process after the second process calls it, which isn't so nice. Maybe
this can be prevented by comparing before you increment, but then I think
there might be a race condition? It would be helpful if geocar could provide
some pseudocode so we're not left guessing how the algorithm works.

~~~
geofft
Ah, gotcha, it works for a different use case from mine (but a valid one). And
yes, I'd like to see code using well-defined atomic operations and orderings /
barriers.

------
bjackman
Sorry to go meta but I love how concise this post is. "I did a clever thing,
it solves problem X. Here's the code."

------
xroche
Stupid question: shouldn't it be KEY_SPEC_PROCESS_KEYRING rather than
KEY_SPEC_THREAD_KEYRING ?

~~~
daurnimator
> Stupid question: shouldn't it be KEY_SPEC_PROCESS_KEYRING rather than
> KEY_SPEC_THREAD_KEYRING ?

Not a stupid question.

It doesn't really matter; the distinction between 'thread' and 'process' on
linux is very thin: `CLONE_THREAD` and `CLONE_VM` are independant.

I chose to use `KEY_SPEC_THREAD_KEYRING` to err on the safe side. In the back
of my mind I also considered that thread-local would be 'more' local than
process-local, and hence might be faster inside the kernel (but this is
unverified).

------
geocar
Another option is to store the seed in a MAP_SHARED segment.

~~~
daurnimator
> Another option is to store the seed in a MAP_SHARED segment.

I assume you mean MAP_PRIVATE?

But that doesn't work. MAP_PRIVATE pages are copied to a fork.

~~~
geocar
No, I meant MAP_SHARED. That way the seed is shared across all forks and will
only cause problems if the PRNG is being used as a cipher.

However there's another trick. If you have two counters: One in MAP_PRIVATE
and one in MAP_SHARED you can detect the unsafe scenario by incrementing each
counter and verifying lockstep.

If the counters mismatch, you've been forked: unmap and map new pages, and do
whatever post-fork "reinitialisation" that is required.

~~~
agwa
That doesn't work because there's a race condition - if both processes
increment the MAP_SHARED counter at the same time, they could end up seeing
the same value and not think that a fork has occurred.

~~~
geocar
Use an atomic increment (e.g. lock xadd).

~~~
agwa
That's architecture-specific, which isn't nice.

Also, the counters would need to be 64 bits so that wraparound isn't a
concern. Do all architectures even have an atomic 64 bit increment?

~~~
geocar
What, and linux isn't architecture-specific?

The counters don't need to be 64-bits: Programs don't need a random number in
one process and the 4 billionth only.

~~~
agwa
> What, and linux isn't architecture-specific?

I have no clue what point you're trying to make.

> The counters don't need to be 64-bits: Programs don't need a random number
> in one process and the 4 billionth only.

If this is going to be used for a CSPRNG, you have to take into account
adversarial conditions. A 32 bit counter leaves insufficient safety margin:
it's conceivable an attacker could induce an application to generate so many
random numbers that the counter wraps around, causing another process to
generate a duplicate random number. It may seem unlikely, but time after time
security researchers have demonstrated that attacks previously considered
"unlikely" are possible.

~~~
geocar
No, a 64-bit _atomic_ add isn't required. Smaller atomic adds can be used just
fine with a double-wide number.

