
The C standard library function atexit is broken - mulle_nat
https://www.mulle-kybernetik.com/weblog/2019/atexit_is_broken.html
======
gumby
There were two motivations for shared libraries; one no longer applies and the
other, arguably was never reliable, so after 35 years of dynamic linking, a
couple of years ago I went static only (except for system calls of course).

The first reason was disk space: object files could simply be smaller.
Nowadays, for all intents and purposes, disk space is free and unlimited.

The second was the ability to ship fixes to system facilities. These were
fragile, leading to even more complex methods such as versioning, so though
this was a good idea in principle, in practice it was very painful. I believe
the Windows folks had their own name for it, "DLL Hell". The npm people still
suffer from this from time to time, sometimes notoriously.

As for the fixes: software isn't updated on magtape any more, and most systems
have robust upgrade systems. In addition there are all manner of quasi-
hermetic isolation systems (VMs, docker images, venvs and the like) so why not
just ship a static binary?

Plus with a static binary, if you really care about order of loading/unloading
etc (which you ideally shouldn't) its trivial to manage with a custom linker
script.

My only exception to this is the kernel: it can be upgraded, and mostly
promises not to vary system call semantics too much, so it's OK to me not to
link the kernel into my binary :-).

~~~
jcranmer
> The first reason was disk space: object files could simply be smaller.
> Nowadays, for all intents and purposes, disk space is free and unlimited.

It's not for disk space, it's runtime RAM occupation usage. The code for a .so
is going to be slightly larger than the equivalent .a. At the very least,
you'll need to burn extra space for the PLT and GOT. And PIC relocations are
going to tend to burn more space than static linking relocations.

The advantage of a .so you're talking about is that you can load a library
shared by multiple processes at a fixed virtual address location that's the
same for all processes, and thus you only burn the library once in RAM. For
static libraries, the embedded addresses are going to be slightly different
depending on their corresponding libraries, so even magic deduplication isn't
going to be able to keep you from having the library in multiple places in
physical memory.

~~~
CamperBob2
RAM is free, too, at least as far as library code size is concerned.

~~~
mikeash
A typical system has dozens or hundreds of processes running even when nothing
is going on. If they all had their own copy of every system library in RAM, it
wouldn’t look so free. Especially on mobile devices.

~~~
zingermc
What if the OS could dedupe pages?

~~~
adrianmonk
You'd need to arrange for the linking process to result in identical (and
identically-aligned!) code pages. If you're going to all the trouble to ensure
that, why not just write it all to a file and call it a shared library?

~~~
bayareanative
Exactly. Emperors' new clothes from naive convention abandonment of solved
problems.

------
avar
The glibc repository has multiple test cases for atexit, it seems a lot of the
current behavior derives from a fix for this bug in 2005:
[https://sourceware.org/bugzilla/show_bug.cgi?id=1158](https://sourceware.org/bugzilla/show_bug.cgi?id=1158)

It seems odd for the author to have invested so much time in this, but not to
have found or patches those glibc test case to demonstrate the behavior he
expects in various hairy test cases glibc itself is worrying about, and which
forms the basis for its current behavior.

------
kps
The submission title is misleading. The C standard library function atexit()
isn't broken. At worst, some C library implementations are broken (or more
charitably, not C standard compliant) _if_ you also use certain other
facilities that are _not_ C standard library functions.

------
eridius
I question what the author was expecting to happen. If bar_exit comes from a
dylib, and that dylib is subsequently unmapped, you can't very well call any
functions from the unmapped library, so there's only 4 possible outcomes I can
see:

1\. The program crashes during termination as atexit() tries to invoke the
previously-registered-but-now-invalid function.

2\. The function is silently skipped as it's no longer loaded.

3\. The function is invoked upon dlclose().

4\. dlclose() does not actually unmap the library.

Of these options, only the 3rd actually seems reasonable. The first two are
obviously bad, and the 4th seems like it rather defeats the purpose of calling
dlclose() if you can't actually unload the library.

~~~
grahamlee
4 is not uncommon. For example, as described by Apple, there are so many
things that can stop dlclose() from unlinking a library that they considered
making it a no-op. Page 175 of [https://devstreaming-
cdn.apple.com/videos/wwdc/2017/413fmx92...](https://devstreaming-
cdn.apple.com/videos/wwdc/2017/413fmx92zo14voet8/413/413_app_startup_time_past_present_and_future.pdf?dl=1)

~~~
eridius
Only reference-counting actually stops dlclose() from unmapping the library,
and that just means you haven't balanced out all the dlopens() (well, you
can't unmap anything linked from the main executable, or anything in the dyld
shared cache either, but those aren't particularly interesting cases). It's
generally a really bad idea to unload a bundle containing Obj-C classes, but
it's still doable if you can guarantee all instances of classes defined in
that bundle have been deallocated (I'm not sure what happens to category
methods, I hope the runtime properly removes those, but it's possible those
would also cause a problem if invoked later). I assume the issue with Swift
classes is identical to the issue with Obj-C classes.

In any case, image unloading is traditionally done with C libraries, not with
Obj-C/Swift, and we're talking about a C function (atexit).

As for making it a no-op, they said only on platforms other than macOS, which
is fine because the other platforms (iOS, watchOS, tvOS) are sufficiently
constrained that there really is no reason to ever dlclose() anyway (there's
barely even any reason to dlopen() besides trying to poke at Apple SPIs; I
think sqlite will dlopen to load extensions, but that's about the only valid
reason that comes to mind).

------
benmmurphy
I don't see how the behaviour he wants can work. If you dlclose a library and
cause it to be unmapped then the function pointers in that library are invalid
and can't be called atexit. Either, dlclose will have broken behaviour by
keeping libraries mapped even though all their dlopen references have been
dropped or atexit will be broken.

~~~
kazinator
POSIX doesn't require that _dlclose_ must unmap a library forcefully.
Basically it can be that if _atexit_ has been called with an address pointing
into a shared library, that's considered a reference to the library which
prevents it from being unmapped. How can it be unmapped? It's registered to be
called when the process terminates.

------
gok
I believe this was changed everywhere because glibc maintainers decided that
it made more sense to run at library unload time, and then everyone else had
to change their implementation to match "what Linux does."

That being said, none of these behaviors really make any sense. If the
registered function isn't mapped into the process address space any more, what
is supposed to happen?

~~~
AnimalMuppet
Seems to me that what's "supposed to happen" is "make it so the situation
can't happen". That is, either make it so that you cannot unload the library
if some function is registered with atexit, or else make it so that unloading
the library unregisters the atexit function.

But of course, that can't happen, because you can have N function calls
between the registered atexit function and the call to the library, and
deciding if any calls to the library are anywhere in the graph _and will be
called_ is probably equivalent to the halting problem.

I agree that none of the possibilities are what we actually want.

~~~
gok
Bonus: what happens when the function you register is JIT'd, so there's no
unregistration code path when it's unloaded?

~~~
loeg
JIT is well outside the realm of the C standard.

~~~
jcelerier
Just like shared libraries.

~~~
saagarjha
So is atexit: it's part of POSIX, just like the rest of the dl* APIs.

~~~
loeg
POSIX extends standard C. atexit() is part of standard C89.

------
mrpippy
I seem to remember a WWDC session from a few years ago (the one where dyld 3
was first introduced) where they said that dlclose() is now essentially a no-
op on macOS, since it’s rarely done, the benefits are few, and the possible
problems are many.

Edit: no-op dlclose() was something under consideration for dyld 3 on
everything but macOS. [https://devstreaming-
cdn.apple.com/videos/wwdc/2017/413fmx92...](https://devstreaming-
cdn.apple.com/videos/wwdc/2017/413fmx92zo14voet8/413/413_app_startup_time_past_present_and_future.pdf?dl=1)

~~~
saagarjha
dlclose on iOS still does garbage collection of images if possible, FWIW.

------
barrkel
There's two distinct concepts that are conflated for statically linked
executables.

On process death vs on module death. Is the function to clean up the process,
or is it to clean up the module?

I think the argument for module cleanup is stronger, irrespective of the
difficulty of registering code that's supposed to last longer than the calling
address space. The module had nothing before it was loaded, it should leave
nothing behind when it is unloaded. It's symmetrical.

The difficulty of providing an executable callback that outlives its module
just seals the deal.

~~~
zwetan
it should definitively clean up the module

you could make the same argument with threads

main process vs child threads, you would expect atexit installed in the child
thread to execute before the main process atexit installed functions

~~~
barrkel
If it were thread local, sure, but they'd need access to a module cleanup
function too, as there's no necessary tie between thread lifetime and module
lifetime.

------
smcameron
This reminds me of a strange bug I encountered once on ia64. We had a custom
library that we loaded with dlopen(), and then called an initialization
function within this library. Unbeknownst to me at the time, this function
created a thread which did something, and then almost immediately went to
sleep for a few seconds. Also unbeknownst to me, there was also function that
you were supposed to call before dlclose() to kill this thread. If you
dlclosed the library without previously having called this function to stop
the thread, when the thread woke up, all its code would of course be gone, and
the program segfaulted, leaving a core file. Interestingly, if you ran gdb on
the core file, gdb segfaulted! Presumably because the thread didn't have any
corresponding code in the core file either. Took awhile to figure that one
out. Curiously, on x86, all this seemingly worked fine (maybe dlclose
implicitly killed the thread on x86? ... never figured out why it didn't crash
on x86 but did on ia64.)

------
_rtld_global_ro
It's actually quite hard to do exit cleanups, even by putting functions into
`.__fini_array`. The reason is because some signals (SIGTERM, SIGKILL) cannot
be masked, when a thread/process (`task_struct`) got `SIGKILL`-ed, there's no
way for application to run functions (including .__fini_array). Things can get
more complicated when a thread calls `exit_group`, which is also recommended,
cause all other threads in the same process group receive `SIGKILL`, hence
none of the other threads are able to run any code after.

~~~
ryanpetrich
There's also the matter of someone forcibly pulling a machine's power cord. In
some sense exit cleanups are fundamentally unreliable.

~~~
verall
A significant amount of code can be run after a power cord pull is detected
but before the big caps discharge on most boxes. But yea no userspace program
is going to get those cycles...

