Hacker News new | past | comments | ask | show | jobs | submit login
The C standard library function atexit is broken (mulle-kybernetik.com)
99 points by mulle_nat 27 days ago | hide | past | web | favorite | 99 comments

There were two motivations for shared libraries; one no longer applies and the other, arguably was never reliable, so after 35 years of dynamic linking, a couple of years ago I went static only (except for system calls of course).

The first reason was disk space: object files could simply be smaller. Nowadays, for all intents and purposes, disk space is free and unlimited.

The second was the ability to ship fixes to system facilities. These were fragile, leading to even more complex methods such as versioning, so though this was a good idea in principle, in practice it was very painful. I believe the Windows folks had their own name for it, "DLL Hell". The npm people still suffer from this from time to time, sometimes notoriously.

As for the fixes: software isn't updated on magtape any more, and most systems have robust upgrade systems. In addition there are all manner of quasi-hermetic isolation systems (VMs, docker images, venvs and the like) so why not just ship a static binary?

Plus with a static binary, if you really care about order of loading/unloading etc (which you ideally shouldn't) its trivial to manage with a custom linker script.

My only exception to this is the kernel: it can be upgraded, and mostly promises not to vary system call semantics too much, so it's OK to me not to link the kernel into my binary :-).

> The first reason was disk space: object files could simply be smaller. Nowadays, for all intents and purposes, disk space is free and unlimited.

It's not for disk space, it's runtime RAM occupation usage. The code for a .so is going to be slightly larger than the equivalent .a. At the very least, you'll need to burn extra space for the PLT and GOT. And PIC relocations are going to tend to burn more space than static linking relocations.

The advantage of a .so you're talking about is that you can load a library shared by multiple processes at a fixed virtual address location that's the same for all processes, and thus you only burn the library once in RAM. For static libraries, the embedded addresses are going to be slightly different depending on their corresponding libraries, so even magic deduplication isn't going to be able to keep you from having the library in multiple places in physical memory.

To your first point: iOS takes this idea to its logical conclusion and it seems to be an important aspect of iOS’s performance. https://iphonedevwiki.net/index.php/Dyld_shared_cache

icache, TLB entries, and even L1 are much more limited than main memory, especially on embedded devices. By linking everything into a single object, iOS can save on wasted padding and use superpages to map the shared library into each process in a TLB-efficient way.

macOS as well.

Modern shared libs load at different addresses in different processes. For that to work, they have to be compiled with relocatable code generation, and mapped read-only.

Ancient Linux distributions once had something called "a.out shared libs". These were mapped to fixed addresses. The distribution had to work out non-conflicting addresses for all available libraries. The build procedure for a.out shared libs involved intercepting the assembly output of GCC and doing some text processing on it.

A modest proposal: let's use only static libraries (plugins aside). Let's help KSM efficiently merge the R/X, and the R/O memory pages of the static libraries of running processes. To maintain the current performance, let's give KSM a hint: the hint would be a string of standardized, canonical library pathname, including name and version number.

During the (static) linking of libraries, an extra metadata item would be stored in the ELF file: the canonical pathnames of all the static libraries used, along with base addresses of R/X segments (code) and R/O segments (r/o data). Upon forking a process, the information would be provided to kernel, and passed over to the KSM, to use as basis of same-page lookup and possible merging of the pages.

This effectively inverts the current mechanism. Currently when libraries are dynamically loaded, the dynloader uses actual file pathname of the shlib as the key for operation opposite to "merging" - i.e., re-uses the already loaded R/X and R/O pages, by adding proper memory mapping.

The proposed change is three-fold:

  1) extend the linker to add the metadata to ELF,

  2) extend in-kernel ELF interpreter to extract the info upon exec() and friends,

  3a) extend the KSM with a limited mode, where it would look up & merge only the hinted memory regions, in linear fashion, right after an exec() & friends.
An alternative to 3a), to avoid fussing about with KSM:

  3b) modify the VM subsystem to extend current swap/SHM so it provides an unique address range for each static lib canonical pathname. Requesting pages from this address range would map from the shared pages, if any process already loaded one such.
To handle adversarial processes on single machine, a further extension where crypto signatures are checked is possible.

RAM is free, too, at least as far as library code size is concerned.

A typical system has dozens or hundreds of processes running even when nothing is going on. If they all had their own copy of every system library in RAM, it wouldn’t look so free. Especially on mobile devices.

What if the OS could dedupe pages?

You'd need to arrange for the linking process to result in identical (and identically-aligned!) code pages. If you're going to all the trouble to ensure that, why not just write it all to a file and call it a shared library?

Exactly. Emperors' new clothes from naive convention abandonment of solved problems.

On the fly memory dedupe is incredibly expensive. Especially compared to just not duplicating in the first place.

Also, it's quite likely that static linked libraries wouldn't be located at the exact same page offsets every time they're linked in to a binary, making page-level dedupe not useful.

What if the OS could map a shared object that resides in one section of memory into the virtual address space of multiple processes?

Do any OSes actually do this, though? If not, this is academic until one actually does.

Linux does have something called "KSM" that de-dupes memory pages, but it appears that it's only so a hypervisor can de-dupe guest OS pages? Odd that it's not more general than that. Though I guess there is a cost: the KSM code needs to scan through memory to find duplicate pages, which I can't imagine is fast.

Would not work with any kind of LTO.

Meh, not so much. The processes will only page in the code they actually use.

Shared libraries are a goofy hack dating back to a time when people mistakenly thought they needed them, and they need to go away.

Shared object libraries are incredibly useful because they save memory and CPU resources. Cloning the process space during a fork(2) call, initializing structures, then context switching and whatnot is incredibly expensive.

shared library can be expensive as well, the dynamic symbol lookup could be a lot more expensive than people would assume.

Correct. Shared libraries trade away CPU for memory. Even with some benefits in terms of cache warmth, the extra indirection and loader/VM complexity make shared libraries a net loss in terms of CPU.

>Cloning the process space during a fork(2) call, initializing structures, then context switching and whatnot is incredibly expensive.

Err, we're running Electron apps today, with a full blown embedded browser engine, a vm, and a hellish complex DOM for rendering. Others run apps inside containers, with their own basic OS and standard libs -- compared to that the above is negligible.

> we're running Electron apps today

Some of us try not to on the belief that they don't perform acceptably.

You can usually avoid this problem by using the web version of Slack, Discord, Rocketchat, etc. A single browser running everything is obviously more efficient than running ten.

Nope. A strawman wrong doesn't justify anything related to the topic. fork() duplicates everything, which maybe overkill, which is why pthreads and vfork+exec also exist.

That's not why pthreads exist, and vfork barely does. Pthreads exist to facilitate parallel operation (hence the name) on shared data, not as an efficiency measure. As for vfork, it has been deprecated since every UNIX implementation started marking pages as copy-on-write instead of actually copying. I was doing some of that work myself in 1990-91, so it's hardly new.

The real problems with static linking don't have to do with fork, except in the sense of increasing reliance on VM overcommit (which IMO is a bad idea already). It's more to do with masking reuse of the same library across unrelated processes, as an efficiency issue but even more importantly in terms of tracking dependencies and updating software. No matter how good your configuration management (or similar mechanism) is, rebuilding large-N statically linked executables is less efficient and more error-prone than using the same information to update small-N shared libraries.

You are working on one of these electron-style apps, right?

(Shrug) RAM is there to be used.

I don't think we need to get into my optimization street cred, but suffice it to say I've never used Electron and couldn't really tell you exactly what it is.

Resources aren't infinite. It's arrogant and wasteful to be lazy and resource greedy.

Electron is an embedded web-browser meets cross-platform application platform.. it's bloated, slow and awful... it makes Java look snappy, lightweight and brilliant.

I don't get it. You're using more RAM for literally no benefit. The end result is exactly the same. There isn't even a productivity benefit in the vast majority of cases because a huge chunk of software is packaged by the OS to use shared libraries by default.

Avoiding DLL hell is certainly a benefit. It seems likely that few participants in this thread besides myself have ever actually had to support end users.

you would be surprised to learn that for very large binaries (think about monolithic front end codes at large internet companies), code size is actually enough to put pressure on instruction caches and lead to paging!

It used to be free, until people thought it was free to waste.

RAM is free as long as you don’t have to pay the price. But it does have a price.

> a couple of years ago I went static only

You mean for your code, deployed on your systems. Yeah, that works fine.

Now ship a binary to a client system with your static TLS implementation, JPEG decoder, LZ decompressor, whatever. And then go hire someone to watch the CVE feeds every day to know when you have to get them to install an upgrade.

Of, if you don't want it to be your fault when they get pwned, you could just link against the system libraries and tell them to stay updated.

"Static only" makes sense for web apps developed and deployed within the same organization, for embedded solutions where the whole update/upgrade process is known to be under the control of the developer, and... basically nowhere else.

This advice is bad, sorry.

Not the parent, but I think dynamic linking with system libraries is the only time it makes good sense. Otherwise you're just asking for trouble.

The GP argues that it should be abolished everywhere, even in the cases where it makes good sense.

Of course, you need to make sure that your system actually provides up-to-date implementations of these libraries.

Still a much better bet than that you will.

Do you play any steam games or most other modern games? Yea those static compile almost everything.

This example hurts the case, it doesn't help. Windows deployment, especially of C/C++ apps, is indeed a problem. And for decades it was absolutely routine that random windows binaries turn out to be subject to attacks against libraries like zlib or libjpeg which had been long since patched in linux distributions but lived on statically in exploitable blobs.

Things have in fact gotten much better over the past years, though. Largely because developers (even of games) have moved away from this architecture and onto managed runtimes like Java and .NET where the runtime and system provide cleaner separation of dependencies and don't require the "static all the time" nonsense that was the norm since the beginning of PCs.

Shared libraries by themselves are not the problem here, so I’m not sure why you bring them up.

The problem described in the article appears when you dynamically load/unload shared libraries. This only comes up in the context of plugin systems, or sometimes hot code reloading, and in those cases there is no alternative; there is nothing you can do with static binaries to emulate this behaviour.

I want to live in this world where disk space is free and unlimited, but it isn't and has never been.

A primary goal that you don't really cover is conserving not just disk space, but memory. The disk space saved is a proxy for memory savings, which are (even more) valuable. Shared libraries linked in to multiple independent binaries can consume little to no additional memory due to the magic of virtual memory. The same does not hold true for static-linked libraries in multiple binaries.

Honest question: have you ever ran into a situation where reducing binary sizes dramatically improved either performance or disk-space utilization?

The reason I ask is, even as "bloated" as software is these days, I could fit ~100,000 fat 10 megabyte Go binaries on my SSD, without bothering to reach for compression. In practice, binaries on disk always pale in comparison to the amount of disk space I spend on media files, caches, backups and imaging, etc.

Perhaps your concerns about virtual memory are true, but looking at top on a machine running Firefox with many tabs open, Discord (an Electron based application,) 3 GNOME Terminal windows, HexChat, Wine, and Telegram... I still don't crack 100 tasks. Still.*

How about a server environment? Well, saving memory is certainly valuable on a server, but I would fail to imagine a scenario where you want so many different distinct binaries running that it would make a major difference in utilization. The shared memory savings that come from dynamically linked libraries are effectively removed when using Docker because each Docker container is going to have its own isolated system libraries and many of them probably won't be the same anyways, and yet I've never had any Kubernetes system where the majority of memory usage came from binary sizes.

I suppose if every single system binary was statically linked, you would maybe be able to notice some kind of difference. But honestly, probably not, unless you were really searching. Usually disk usage of binaries is a much less important concern than CPU and RAM usage, especially across large fleets.

*Though, that was only checking under my local user; it does turn out that there are a fair bit more tasks, but a good amount of them are actually kernel tasks. There's 400 counting kernel tasks, but only 191 not counting kernel tasks. Still not a very large number, in my opinion.

That world is here. Though people are running out of space, same as ever, if you took away movies and videos, they would not know what to do with all that space.

To be fair, DLL Hell has a lot more issues than just versioning or registering the unload handler.

First, there’s load order which can mean that you’re not actually loading the DLL you think you are.

The second is WOW64 (Windows on Windows). Rather than using fat binaries like Linux and Mac OS, Microsoft decided to use a “brilliant” strategy where the file system lies to you in the “right” ways so that old 32-bit folders are moved to new addresses and the old 32-bit named golfers hold 64-bit binaries. This means that unless you use the “I really mean it” escape mechanisms (e.g. the “sysnative” directory) 32-bit binaries on 64-bit windows is actually loading DLLs from different paths than the code says.

I hack on the Android operating system. Without shared libraries, the system would fall over and die pretty much instantly, and Zygote helps only a little. Static linking the world is a luxury one can only afford in cornucopian backend environments where one can just throw money at resource limitations. On mobile, we have to work with the hardware we have. Economy is still important --- that's why I love it. Constraints breed creativity.

* Speaking in my own capacity

As short info, Zygote is gone on Android Q.

You missed the third one, which is used a lot in desktop software and application servers, plugins and dynamic configuration on production.

In name of stability, one may say to run external processes with shared memory instead, but now imagine the resource usage of something like InteliJ or Eclipse using only processes for their plugins.

Wouldn't this be analogous to web browsers running each tab in its own process? I don't run IntelliJ or Eclipse - do people generally have more plugins than they'd have browser tabs?

It kind of is, but do you happen to have around 300 tabs or more open?

And on an 8GB machine, which most business still use nowadays.

This kind of desktop software, each feature can be its own plugin, meaning menu entries, UI widgets, integration with external tool, database drivers, ….

Now imagine that to the extreme that every class could be a possible plugin, and even 300 will be a low bar.

IDEs are only one exemple, there are plenty of use cases on desktop software, like music, graphics editing for two other common cases.

Plugins have interactions.

Browsers are VM managers. Their tabs are the tabs of the VM manager like a VirtManager for KVM or VmWare vSphere. Browsers boot a JS VM fast over HTTP and then communicate over the same HTTP.

This is crazy! If every app in the iOS or Android app store "just shipped a static binary", all apps would be enormous, and the OS vendors would be unable to evolve the system whatsoever.

Already now, the main reason users uninstall apps is to play disk space tetris.

They already do, except for the slowly shrinking set of platform libraries.

> the slowly shrinking set of platform libraries

Not on iOS?

That is ok if you're just dealing with the upper layer of code, i.e., the final application. Then statically linking can very well be the best solution. But if you're building an operating system or distribution and do not want to recompile the whole of it (and let users download and reinstall the whole of it) each time you fix a bug in a core library, then static linking becomes pretty quickly a big pain.

More or less the same happens with C++ header-only libraries, which are basically uncompiled static libraries. They are very nice and possibly allow strong optimization, but are rather painful to handle in a distribution.

> Plus with a static binary, if you really care about order of loading/unloading etc (which you ideally shouldn't) its trivial to manage with a custom linker script.

Why? It seems to me rather sensible to deinitialize resources in the opposite order in which you initialized them. This way the time span of different resources are contained each within the other instead of just overlap. When using RAII in C++ (which is basically just atexit on a finer grain) it often makes sense to have this requirement.

How would i.e. OpenGL IHV drivers work as static libraries? No driver updates without a recompile? No driver switches (between vendors) without a recompile?

Re-link, but yeah.

Shared libraries also reduce RAM usage; multiple executables that use the same shared library can load it into memory just once.

If you alawys write exactly one monolith, you are probably fine. But when you don't buy into the hype of "every service/app/executable in it's own language", you probably want to share some code between all your executables. Say the DB driver, the model, or even just a video codec.

Without dynamic linking you duplicate all that stuff and, even worse, cannot be certain that all your executables use the same version.

The glibc repository has multiple test cases for atexit, it seems a lot of the current behavior derives from a fix for this bug in 2005: https://sourceware.org/bugzilla/show_bug.cgi?id=1158

It seems odd for the author to have invested so much time in this, but not to have found or patches those glibc test case to demonstrate the behavior he expects in various hairy test cases glibc itself is worrying about, and which forms the basis for its current behavior.

The submission title is misleading. The C standard library function atexit() isn't broken. At worst, some C library implementations are broken (or more charitably, not C standard compliant) if you also use certain other facilities that are not C standard library functions.

I question what the author was expecting to happen. If bar_exit comes from a dylib, and that dylib is subsequently unmapped, you can't very well call any functions from the unmapped library, so there's only 4 possible outcomes I can see:

1. The program crashes during termination as atexit() tries to invoke the previously-registered-but-now-invalid function.

2. The function is silently skipped as it's no longer loaded.

3. The function is invoked upon dlclose().

4. dlclose() does not actually unmap the library.

Of these options, only the 3rd actually seems reasonable. The first two are obviously bad, and the 4th seems like it rather defeats the purpose of calling dlclose() if you can't actually unload the library.

4 is not uncommon. For example, as described by Apple, there are so many things that can stop dlclose() from unlinking a library that they considered making it a no-op. Page 175 of https://devstreaming-cdn.apple.com/videos/wwdc/2017/413fmx92...

Only reference-counting actually stops dlclose() from unmapping the library, and that just means you haven't balanced out all the dlopens() (well, you can't unmap anything linked from the main executable, or anything in the dyld shared cache either, but those aren't particularly interesting cases). It's generally a really bad idea to unload a bundle containing Obj-C classes, but it's still doable if you can guarantee all instances of classes defined in that bundle have been deallocated (I'm not sure what happens to category methods, I hope the runtime properly removes those, but it's possible those would also cause a problem if invoked later). I assume the issue with Swift classes is identical to the issue with Obj-C classes.

In any case, image unloading is traditionally done with C libraries, not with Obj-C/Swift, and we're talking about a C function (atexit).

As for making it a no-op, they said only on platforms other than macOS, which is fine because the other platforms (iOS, watchOS, tvOS) are sufficiently constrained that there really is no reason to ever dlclose() anyway (there's barely even any reason to dlopen() besides trying to poke at Apple SPIs; I think sqlite will dlopen to load extensions, but that's about the only valid reason that comes to mind).

On Apple's platforms system libraries can't be unloaded anyways, as they're mapped into every process as part of the shared cache.

I don't see how the behaviour he wants can work. If you dlclose a library and cause it to be unmapped then the function pointers in that library are invalid and can't be called atexit. Either, dlclose will have broken behaviour by keeping libraries mapped even though all their dlopen references have been dropped or atexit will be broken.

POSIX doesn't require that dlclose must unmap a library forcefully. Basically it can be that if atexit has been called with an address pointing into a shared library, that's considered a reference to the library which prevents it from being unmapped. How can it be unmapped? It's registered to be called when the process terminates.

Right, he should be using a destructor instead, although AFAIK that requires (in C) both ELF and a non-standard compiler extension albeit those are available everywhere it matters on Linux and *BSD.

How would that be any different, sorely a destructor function would also be unmapped with dlclose

For a start it works reliably. It's a mechanism that we use all the time on multiple ELF platforms including Linux and several BSDs. It's also the right thing to do because it is called by the dynamic loader at the correct time, ie. after the library is no longer accessible from other threads via dlsym, but before the library has been unloaded (and the code that would run is unmapped). I wouldn't even have thought to use atexit for running code on library unload.

I believe you have the article backwards. I think the author wants atexit() from DLLs to prevent dlclose() from actually unloading the code, so that his library's atexit handler can run at exit? But libc on all these platforms actually invokes atexit handlers from shared libraries at dlclose(), or not at all.

It's kind of confusing to me why the author wants this or believes it should be the default behavior.

I believe this was changed everywhere because glibc maintainers decided that it made more sense to run at library unload time, and then everyone else had to change their implementation to match "what Linux does."

That being said, none of these behaviors really make any sense. If the registered function isn't mapped into the process address space any more, what is supposed to happen?

If the registered function isn't mapped, then you get undefined behavior.

GCC has local functions which build trampolines on the stack. If I register a trampoline with atexit() what should happen if the process exits after that stack frame is gone? Gee, let's be idiots and go on a crusade against this Important Problem: of course, GCC must be patched to generate code to look for atexit-registered trampolines every time a stack frame is popped and run them right there and then.

The spec for atexit is very clear that it's process-termination:


The spec for dlclose doesn't say that it must forcefully unmap everything:


The basic description is "The dlclose() function shall inform the system that the symbol table handle specified by handle is no longer needed by the application." The "symbol table is not needed" doesn't mean "the chunks of memory referenced by function pointers, and any data they need is no longer needed". Those are not "symbol table" material. Also: "Although a dlclose() operation is not required to remove any functions or data objects from the address space, neither is an implementation prohibited from doing so."

> Although a dlclose() operation is not required to remove any functions or data objects from the address space, neither is an implementation prohibited from doing so.

As far as I can tell, “neither is an implementation prohibited from doing so” does not have an exception for atexit. In other words, a POSIX compliant system is free to implement atexit in a straightforward way without any magic to check which library the function pointers come from. On such a system, if atexit is called with a function from a dlopen’d shared library, and that library is subsequently dlclose’d, the C library will cheerfully call the now-invalid function pointer at program exit, resulting in undefined behavior. As such, using atexit from libraries intended to be dlclose’d is fundamentally non-portable. Even if some system supported the “atexit keeps a library alive” behavior you’re asking for, any program relying on it would have to be very careful to ensure it only gets run on that system: unlike other non-portable features that can be tested for at compile-time or runtime, or at least fail cleanly if they’re not supported, this one would give you no warning it’s unsupported other than a segfault, if you’re lucky. And the system had better clearly document that it is guaranteeing that behavior forever, as opposed to it being an implementation detail that could change at any time. IMO, it’s a much better idea to just avoid that pattern altogether.

> using atexit from libraries intended to be dlclose’d is fundamentally non-portable

Problem is, whether or not a library can be dlclose'd is not usually up to that library. Some things are explicitly intended as runtime-loadable plugins; others are dependencies of those plugins, and so on. Additionally, the person packaging a library for runtime linking may not be the library author.

The upshot of this is basically an ass-backwards obligation (not an unfamiliar situation when working in this area): atexit is only valid for use if you know exactly how code containing it will be linked, which is impossible for a lot of code. The failure mode is worse than other similar situations (e.g. 'a library I linked clobbered my signal handler' is easier to debug and mitigate).

All of that is a bit beside the point, though. We do know (and test for, and are careful about) the implementation details of atexit, and code around them, with all the benefits and drawbacks that entails. That's what this thread is about. "POSIX technically allows this function to blow up the universe when you call it" is a good thing to be mindful of, though.

> The upshot of this is basically an ass-backwards obligation (not an unfamiliar situation when working in this area): atexit is only valid for use if you know exactly how code containing it will be linked, which is impossible for a lot of code.

True. The logical conclusion is that in most cases you just shouldn't use atexit from libraries, which... seems fine to me. In my opinion, a well-behaved library should endeavor to be cleanly unloadable without leaking any memory. (That includes using __attribute__((destructor)) if necessary; it's non-portable but without the blow-up-the-universe potential. Or just link in a C++ source file if you want portability.) That way, you can repeatedly load and unload different versions of the same library into the same process, without accumulating wasted memory over time. At least, that's useful in some cases for runtime-loadable plugins, and as you noted, other libraries may end up being loaded and unloaded as dependencies of those plugins. In fact, I'd argue that this rule of etiquette is least applicable when your library is a runtime-loadable plugin itself, but for a host that is known to never unload plugins.

Admittedly, unloading libraries is fairly rare in practice, so I can forgive a library for not bothering with global destructors. But I don't think it's good practice to go out of your way to make a library not unloadable, at least without a good reason.

Indeed, I'm curious what exactly the author's use case is. They say they "need to have a dependable form of post-process destruction for my tests"... what tests? Why unload libraries in a test program?

atexit is really just for small programs; it should really be called only from the same source file where main is located. The code smell increases with increasing distance from main. If it's in an adjacent source file which is an inalienable part of that program, it's still okay.

Turns out that immediately destructing and unmapping the library is the behavior that is most useful to applications, however.

Which is fine; it's just in conflict with the application-hostile behavior of registering shared library functions with atexit.

Seems to me that what's "supposed to happen" is "make it so the situation can't happen". That is, either make it so that you cannot unload the library if some function is registered with atexit, or else make it so that unloading the library unregisters the atexit function.

But of course, that can't happen, because you can have N function calls between the registered atexit function and the call to the library, and deciding if any calls to the library are anywhere in the graph and will be called is probably equivalent to the halting problem.

I agree that none of the possibilities are what we actually want.

> because you can have N function calls between the registered atexit function and the call to the library,

I don't think that's a problem. A simple sweep through the atexit list to see if any of the pointers are into this library, to block the unload.

If some atexit-registered function not in that unloaded library has a secretly stashed pointer to a function in that library, and calls it during atexit, that's a programmer problem.

Bonus: what happens when the function you register is JIT'd, so there's no unregistration code path when it's unloaded?

JIT is well outside the realm of the C standard.

Just like shared libraries.

So is atexit: it's part of POSIX, just like the rest of the dl* APIs.

POSIX extends standard C. atexit() is part of standard C89.

Yes, that's true.

I seem to remember a WWDC session from a few years ago (the one where dyld 3 was first introduced) where they said that dlclose() is now essentially a no-op on macOS, since it’s rarely done, the benefits are few, and the possible problems are many.

Edit: no-op dlclose() was something under consideration for dyld 3 on everything but macOS. https://devstreaming-cdn.apple.com/videos/wwdc/2017/413fmx92...

dlclose on iOS still does garbage collection of images if possible, FWIW.

There's two distinct concepts that are conflated for statically linked executables.

On process death vs on module death. Is the function to clean up the process, or is it to clean up the module?

I think the argument for module cleanup is stronger, irrespective of the difficulty of registering code that's supposed to last longer than the calling address space. The module had nothing before it was loaded, it should leave nothing behind when it is unloaded. It's symmetrical.

The difficulty of providing an executable callback that outlives its module just seals the deal.

it should definitively clean up the module

you could make the same argument with threads

main process vs child threads, you would expect atexit installed in the child thread to execute before the main process atexit installed functions

If it were thread local, sure, but they'd need access to a module cleanup function too, as there's no necessary tie between thread lifetime and module lifetime.

This reminds me of a strange bug I encountered once on ia64. We had a custom library that we loaded with dlopen(), and then called an initialization function within this library. Unbeknownst to me at the time, this function created a thread which did something, and then almost immediately went to sleep for a few seconds. Also unbeknownst to me, there was also function that you were supposed to call before dlclose() to kill this thread. If you dlclosed the library without previously having called this function to stop the thread, when the thread woke up, all its code would of course be gone, and the program segfaulted, leaving a core file. Interestingly, if you ran gdb on the core file, gdb segfaulted! Presumably because the thread didn't have any corresponding code in the core file either. Took awhile to figure that one out. Curiously, on x86, all this seemingly worked fine (maybe dlclose implicitly killed the thread on x86? ... never figured out why it didn't crash on x86 but did on ia64.)

It's actually quite hard to do exit cleanups, even by putting functions into `.__fini_array`. The reason is because some signals (SIGTERM, SIGKILL) cannot be masked, when a thread/process (`task_struct`) got `SIGKILL`-ed, there's no way for application to run functions (including .__fini_array). Things can get more complicated when a thread calls `exit_group`, which is also recommended, cause all other threads in the same process group receive `SIGKILL`, hence none of the other threads are able to run any code after.

There's also the matter of someone forcibly pulling a machine's power cord. In some sense exit cleanups are fundamentally unreliable.

A significant amount of code can be run after a power cord pull is detected but before the big caps discharge on most boxes. But yea no userspace program is going to get those cycles...

> The reason is because some signals (SIGTERM, SIGKILL) cannot be masked

SIGTERM can be masked. SIGKILL and SIGSTOP are the only ones that can't be.

Thanks for the correction

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact