Doesn't almost every C library function potentially call getenv()? printf()ing a number requires checking the locale which can be configured via the environment, after all.
I'd say you shouldn't be calling setenv() at all once you've spawned threads.
As the link contains a specific line number, I believe gp suggests to read comment blocks. I’m not very interested to dive into details atm, but at least the explanations seem to be on point.
Grandparent didn't just link to "code"; they linked to a five-paragraph comment explaining what is going on. Did you read those comments before you posted your snarky reply?
I think the gist of it is that unlike the glibc implementation, the Illumos implementation of getenv() is lock-free and thread-safe.
I think a lot could be fixed with functions that don't access global state, but get that state as an additional parameter. E.g. I would really like `snprintf_l()`/`fprintf_l()` to be supported by glibc. It is supported by FreeBSD, macOS (Darwin), and even Windows (with a `_` prefix for some reason)! Not by GNU libc.
I think that it is generally not reasonable to convert existing large projects to be multithreaded, in the same way that it is generally not reasonable to fully rewrite existing large projects from scratch.
The alternative that I have seen be successful, is to achieve parallelism by forking separate processes wherever you would have spawned threads, and then communicate through shared memory regions.
It's a lot like having Rust-style unsafe blocks, in that you know that if you are having a thread-safety issue it will definitely be in one of the code sections where you are touching the shared memory region, and not anywhere else.
Obviously there's a higher startup cost for forking, but this makes it possible to gain parallelism without breaking all the thread-unsafe code that is certainly in an existing large project.
ZeroMQ is a big abstraction over sockets. The higher the level of abstractions, the more these things are usually necessary.
If you're using a library like that it comes with the territory, and you have to decide for yourself the cost-benefit of using non blocking sockets and async io yourself, or trust a library.
Much more fun to write these things yourself sometimes, but I find e.g. libevent a nice somewhere-in-between abstraction level I can be happy with.
Sure, but library that spawn threads are a pain. It is convenient for the library writer but it always end up being a significant inconvenience for the user.
I build most of my Go binaries with cgo disabled for this, and many other reasons.
In case you don't know, cross-building with GOOS/GOARCH will imply CGO_ENABLED=0 unless you also specify CC_FOR_${GOOS}_${GOARCH}; I cross-build most of my code for (and test it on) amd64, arm64, linux, openbsd, and darwin.
Go will sometimes link to the local libc for network-related functionality if you don't disable cgo.
The post is about getaddrinfo() specifically. It just struck me as odd to call that one out when there are far more common C library calls that use getenv().
There's a strong tendency to think of network calls as entirely universal and not tied in any way to to locale settings in the environment.
Time, date, physical spellings, ... many things are locale dependant, but socket stuff?.
It comes as a Surprise!!, and not the good kind, to many a network programmer with just a few years under their belt to discover threaded networking can segfault because of this.
Once you know, you know and don't forget (until next time), but I suspect this was the motivation behind the blog posting, the principal of potentially most surprise.
I think I must be missing something here, but I’ll ask anyway:
Why don’t the OS libraries have some sort of lock around setenv/getenv, so that only one thread can be inside them at a time? I can’t see how it could deadlock. And surely no-one is so dependent on the performance of these calls that the time to lock/unlock would be problematic?
getenv returns a pointer which could be invalidated after releasing the lock, unless the lock also guards uses of that pointer and all application code uses that lock, which they most certainly do not. Likewise, this scheme does not solve direct use of environ by application code.
NetBSD has getenv_r, which copies into a buffer, but few applications use getenv_r, and certainly not all of them. And it doesn't resolve environ.
Solaris never free's env strings or environ arrays, only creating new copies and atomically swapping them. It uses a special allocator for those objects which doubles the backing buffer each time it deep copies the environ array, then argues this strategy is technically asymptotically memory bounded.
EDIT: Glancing at the code I think glibc is similar to Solaris in that it never free's env strings, but it has a heuristic to conditionally free environ arrays which means directly using environ isn't thread-safe.
Having a lock would still be better than the current situation. Especially if the lock was exposed so that programs that did mess with environ directly, or the pointer returned by getenv could hold the lock while doing so.
I think the missing piece here is how POSIX specifiers the environment: `getenv(3)` and `setenv(3)` are accessors for `environ`[1], which is just a pointer to some memory (over which walking is also specified[2]). That level of granularity means that it's pretty hard to add any locking here.
Just independent environments would be the best, like it was done for locale with uselocale. But it is a breaking change, would have to go through posix and will take forever anyway. Also, as environ is a public symbol, it has ABI implications.
Because there is no way to tell when a thread is done with the buffer, there is no moment when you can be sure you can manipulate it. The options are create a new copy and leak the old one or accept the chance of a segfault.
But getenv/setenv syscall could still be under lock, which I think most of the people would use. Walking over memory could be without lock and the program could see inconsistent values as is the current behaviour.
I think in libc, there's a lot of stuff where an interface is kind of broken from a thread perspective, and it could be implemented better without changing the interface, but often people generally do not.
I can't think of any examples offhand, but I often think it about thread-local storage. Eg. lots of interfaces have an _r() equivalent where you provide the buffer, but many people still call the unsafe one which is broken when there are threads... In my mind, the best way to do this would be to use static thread-local storage in the non-_r() one, and have it call the _r() one ... Sure that has overhead and isn't a perfect solution, but it's better than "bad". But a lot of these old functions don't necessarily get love.
An a sane world creating a thread would set a global that causes non thread safe library functions to seg fault. Or maybe calling them from two different threads causes a seg fault. But just make it really obvious you're doing bad stuff.
I think that sounds completely insane. The unsafety is an emergent property of what's done in the function, and entirely dependent on usage. If you were doing very disciplined use of an unsafe call, it's harmless.
Perhaps this would be a good feature of an assert, or something that breaks in a debugger if it's attached. But I don't think that is reasonable for production.
The pointer isn't guaranteed to point into `environ` directly. `getenv()` could copy the value to a thread-local, (dynamically-allocated?) buffer while holding the lock.
Edit: In hindsight, a dynamic buffer would require returning ENOMEM errors (which might lead to some unexpected failures), while a static buffer would limit the value length. I think you might be right about the API being broken.
You miss the point. If you have full control over when and how getenv is called, there's no issue to begin with. The problem is that you don't, as OP demonstrates. It's perfectly natural to call getaddrinfo in a loop.
We need a new API which is not broken like in NetBSD, and a multi-year migration of all core libraries to it. Well a pity it wasn't started years ago though, could've been 95% done by now.
I was suggesting that the buffer be invalidated by each subsequent call – like some other libc functions' internal buffers – although, as I noted in the edit this would need `getenv()` to be able to indicate errors (specifically ENOMEM). It currently cannot do this as currently described, because NULL is used to indicate an absent variable.
You could also require callers free the returned memory when they're done, but that would be another change of API.
The solution to all problems like this was decided years ago: _r
You provide the storage and free it
The problem is these non-direct uses. They each need to switch to •_r and manage the buffer, or offer _r versions themselves and sort of pass through the problem
Of course, *_r is a better option, but the existing API is used so pervasively that it needs to be made thread-safe to actually avoid thread-unsafe code in, e.g, libraries.
A number of libc functions return a pointer to an internal thread-local buffer, which is invalidated on subsequent calls. If the function copies the environment variable's value to such a buffer while holding the mutex controlling access to the global state, then the returned value is guaranteed to remain unaffected by other threads.
There are, however, other problems (discussed elsewhere in this thread) that complicate such an API in the context of getenv().
That makes a lot of sense. You’d need to snapshot environ when the lock was taken (when another thread could be accessing it!), which I imagine would be complicated. Although surely possible.
On at least some other operating systems, getenv(3C) and setenv(3C) are indeed thread-safe; e.g., on illumos: https://illumos.org/man/3C/getenv
We inherited our implementation from OpenSolaris, as did Oracle when they created their Solaris 11 fork. I expect at least some of the BSDs have probably fixed this as well.
Just because it is documented as thread safe doesn't mean it actually is. They might have just not understood the problem (see e.g. the various indirect links to "please mandate a thread-local `getenv`").
`setenv` is a nasty beast, especially since the raw `environ` variable is also exposed (and is in fact the only way to enumerate the environment).
For the curious: They make getenv() thread-safe by intentionally leaking the old environment, which they argue is acceptable because the memory leak is bounded to 3x the space actually needed.
The getenv/setenv/putenv/environ API looks terrible on closer inspection -- it does not appear possible for an implementation to be safe, leak-free, and efficient.
Just because they use locks doesn't mean it's automatically actually thread-safe.
The comment about "this is safe for people walking `environ`" is definitely a lie, for example, though the bug might be hidden on some common architectures with popular compilers in their default configuration.
I agree. I'm shocked this isn't there in 2023. Or even better, a rwlock allowing concurrent reads while serializing writes. Or some lock free algorithm for writes.
I'm pretty sure the Windows version of the environment calls has locking.
Historically you can access the environment via a global variable too, which would side-step locking schemes. But probably hardly anybody does that anymore.
SetEnvironmentVariable is thread safe on Windows, but their POSIX wrappers aren't. Windows has a better API design in this instance, and the guarantees by POSIX make it impossible to make a compliant implementation that can be used with threads.
> and the guarantees by POSIX make it impossible to make a compliant implementation that can be used with threads.
Can you be more specific than this? I kind of doubt it.
For example, seems to me you could write a setenv() that uses a lock-free algorithm or writes in a strategic way that won't result in a fault if getenv() or reading environ(7) runs concurrently, then say all bets are off for thread safety if you write via environ(7). That's safer than the status quo and I don't foresee it breaking POSIX.
Reading the spec again, I suppose it's possible to keep copies of environment variables around in memory without violating the spec, basically creating a copy for every getenv call that was modified since the last setenv/putenv. I thought the spec also specified that writes to the returned pointer would update the environment variables, but no such guarantee is given, that's just an implementation detail (and is disallowed by the API spec but good luck enforcing that).
The XSI spec does state that the result of getenv() may be overwritten by putenv() but that's not a strict requirement either.
You still risk programs and libraries expecting getenv to always return a pointer to *environ failing (I believe Go has an issue like that on MUSL).
On the other hand, the POSIX standard explicitly states that getenv() does not need to be reentrant (and therefore doesn't need to be thread safe) so any program relying on thread-safe getenv is already violating the API contract.
The rationale also seems to assume you can't make this stuff thread safe because of the standard implementation:
> The getenv() function is inherently not reentrant because it returns a value pointing to static data.
Modifying the buffer returned by getenv() seems like a terrible way to write back a value, because you could only replace it with a string with equal or shorter length. One of the problems setenv() solves is allocation.
It's important to note the difference between reentrant and thread safe. The most obvious implementation of getenv(), which would just loop through environ(7) and do a bunch of strncmp, can safely be re-entered, in that you could interrupt it and call it again and it would produce no ill effect. It just can't be overlapped with writes.
there are lots of programs that access `*environ` directly, so while this might be good, it wouldn’t solve all classes of the problem. There are also uses out there which are performance sensitive (and often just as if not more unsafe, such as holding pointers into the structure over long periods).
threaded programs should probably seriously consider retiring libc, but we don’t currently have a common ground replacement.
name related activities are one of the worst areas, contributing significantly to the glibc linkage and abi challenges, but also lacking sufficient standards for alternatives or even consensus to be built quickly.
The sarcastic answer to that would be something along the lines of “users should be aware of what they’re doing, and you should be more careful about calling those concurrently anyway/good-luck-have-fun”.
I’ve been searching around, and you can find a bunch of discussions about this online. Your ’sarcastic’ argument is basically the one I’ve seen in most places.
They could easily be made thread safe, but, paraphrasing, most arguments seem to come down to something like:
“setenv and getenv are POSIX functions, and not defined to lock. Just like many POSIX functions, they’ve _never_ been thread safe, and it’s an error to assume they are. Should we really start papering over client errors in use of a supposedly portable API, even though it’s working as specified? And if we make that choice pragmatically for this instance, should we be trying to do it for _all_ of POSIX? That’s impossible for some things, and would add complexity even where it’s not. For all these reasons, it’s better if these just stay dangerous like they’ve always been.”
This is fine for a 1980s monolithic program but if you use any library that reads environment variables (like, ahem, libc!) you have to treat the whole library as non-thread-safe? Or keep track of the "color" of each library function?
This kind of historical baggage is one of the main reasons I now completely avoid C/C++ programming and won't touch it ever again. It's Rust or C# only for me from here on...
The problem is that this effects higher languages too, because they often build on libc. And on some OSes, they don't have a choice, because the system call interface is unstable and/or undocumented).
For example in rust, multiple time libraries were found to be unsound if `std::env::set_env` was ever called from a multi-threaded program. See:
Even on OSes that happen to use the Linux kernel like Android, those that insist on using the NDK and pretend it is like GNU/Linux, beyond the official supported use cases, end up bumping their heads against the wall.
I’ll go one further and say that maybe it’s time we had OS’/kernel/API base that isn’t just better suited, but is explicitly designed for the massively multithreaded, ludicrously fast, massively concurrent hardware we have in spades these days.
Alas I am not as OS dev, I have not the skills or understanding to not how to build that, or what this would involve, but I do think it’s clear that what we have at the moment isn’t as well suited as it could be. Io_uring / Direct-IO seem to be better suited though.
Even if the api was perfect and used locks and returned memory to be managed by the caller, it would still be hard to use safely in a multithreading environment as long as the env is a process global property.
If I were King I would ban environment variables from the OS entirely. Global mutable state is the root of all evil! Globals are evil evil evil and the modern reliance on bullshit environment variables is a plague upon reliability.
Well, environment variables are not "global" globals. They are just my globals, or my post-it notes for some variables. Because they are not per-user even. They are per user session.
10 processes can have completely different set of values for the same environment variables, because they are in their own environments, and apparently, that's useful.
There are foot guns, and there are unintentional consequences of implementation and design details. This is why we patch, improve and rewrite our software over time. To iron out these kinks.
Fire is also have a tendency to cause collateral damage. So use both fire and environment variables responsibly, and world will be a better place.
I definitely think a lot of filesystem access is a code smell and probably not the right thing. That one causes me a lot of pain. But that’s largely because I work in games and you really need to use the Unity/Unreal/whatever asset management system instead of direct file system access.
I’ve got a small build system and the first thing it does is nuke PATH to empty. It’s glorious. No more grabbing random shit from a big blob with untracked dependencies that varies wildly by system!
I could easily live my entire life without environment variables. They’re just a fundamentally bad idea. Every program that foolishly uses environment variables can be replaced by a better program that takes a config file or arglist.
Honestly sometimes I think the answer is yes. Imagine how happy we could be, and how many fewer problems we would have. Add printers to that list and you're describing a paradise.
The value of `setenv(3)` has always been pretty murky to me -- the only time I've ever really needed it is when performing a fork-exec, and even then it's been the wrong tool for the job (the dedicated exec*e variants are the right way).
Would there be any significant downsides (besides breakage) to mapping `environ(7)` as read-only? That seems like the kind of thing that a Linux distribution (or more realistically OpenBSD) could do as a way to kill off a persistent family of bugs.
The best part is that you can't use setenv after fork and before execve as it is not async signal safe. As you mention, the envp-taking variant of execve is the only sane option.
This is perhaps just my ignorance, but when do you find yourself needing to set the timezone like that? Not in 10+ years of C programming have I ever had to do that.
I might do that if I am writing a quick-and-dirty program that works with times in multiple time zones and I can’t be bothered to find a library with a better API.
I think the problem is in setenv(3), not getenv(3). Reading shared global state is okay as long as it is not mutable. If someone relies on modifying environment variables, one should use execve(3), not setenv.
Exactly. Setting environment variables while a program is running is a terrible idea. Thread safe or not.
A lot of code, for good reasons, assume envvars are constants set before the program started and caches computations based on them, read config files and so on.
The fact that they are essentially global variables should be enough to deter usage of them.
Many posix (and c standard library for that matter) functions were not designed with multithreaded programs in mind and don't work well in multithreaded programs.
I really think it would be worth creating a new standard API that is built with threading in mind, where functions like mktime, getaddrinfo, localtime, etc. take arguments instead of reading from the environment, that avoid global state as much as possible, and are thread safe if there is global state.
Almost everybody calls getenv(). malloc for tunings, checks, tracing and such, half of the string library for localization specifics, all of the locale and time and timezone functions, many math functions need fegetenv().
GAI just needs to go. It needs to be moved to its own daemon with a simple RPC.
libc is kinda schizophrenic in this regard. It has mostly obviously low-level functions like string manipulation and memory management, and then unexpectedly a DNS client implementation and a support for arbitrary runtime plugins (for PAM).
This is essentially what systemd-resolved with the nss-resolve NSS module is right? It’s possible to use /etc/nsswitch.conf to entirely disable the built-in DNS resolution in glibc if you want.
...on Windows, single-threaded programs don't really exist; any DLL can, and most of them do, spawn worker threads as an implementation detail. Some of them do it the moment their initializer is being run, so if you link your program against something else than kernel32 and its friends (the basic Windows system libraries don't spawn worker threads on being loaded), then when a thread finally starts executing your executable's entry point there is no guarantee that this is the only thread that exists in your process. And in fact, finding a non-toy, real-world Windows application that has only one thread is almost impossible (for example, IIRC all .NET runtimes have worker-thread pool from the get go so that rules out any .NET executables).
Which is why on Windows there is almost no system APIs (well, almost: there were some weird technical decisions around single-threaded apartments for COM...) that can be safely used only in single-threaded applications.
Maybe in several more decades Linux community will also accept the fact that multi-threaded applications are an entirely normal and inevitable thing, not an aberration of nature that we all best pretend don't exist until we're absolutely forced to deal with their reality.
It's easy to think about some complex interactive software where the need to call setenv appears only after you have worker threads doing some other thing. Without a warning, you won't know it's a bad thing to do, and the manpage only says that it and unsetenv are not thread safe, as if this was remotely enough information.
What nobody is telling is that the environment is so big that you need it to compress data or open an IPv6 connection. It's not obvious at all that you can't do those things while editing a variable.
There’s always a lot of weird emergent behavior in bootstrapping an app, and on an app of any serious size, I can’t entirely control if someone decides to spool up a thread pool on startup so that everything is hot before listen() happens.
I may think I have control, I may believe that a handful of us are entitled to have that say, but all it takes is someone adding a cross dependency that forces an existing piece of code to jump from 20th position in the load order to 6th and all hell can break loose. Or just as often, set a ticking time bomb that nobody notices until there’s a scaling or peak traffic event or someone adds one more small mistake to the code and foomp! up it goes.
It’s neither in the headline or in the article. The question was about setenv, not getenv.
It is best to avoid calling setenv in a threaded program. Some programs do it to make space for rewriting argv with large strings (freeing space from *environ which tends to be right after the tail of argv). Some programs or libraries use *environ directly to stage variables for exec before forking. Some want to pass variable changes to forks. There are alternatives possible, but in the context of something like go calling libc setenv, it’s to make interop easier- sadly it may make other interop harder, such as this case.
Which is why I have gotten rid of getaddrinfo() calls in my server code, and rather resolve DNS directly reading the DNS server setting from the system.
Other issues I faced :
- Not epoll() friendly. Always forks a process while resolving domain name.
- Valgrind complains of uninitialized memory touches when the function is called and I can't get rid of it.
> Which is why I have gotten rid of getaddrinfo() calls in my server code, and rather resolve DNS directly reading the DNS server setting from the system.
This works as long as you don't need support for mDNS or LDAP host resolution, which depends on libnss/nsswitch on glibc-based systems. Which is fine, but should be a well documented limit of this of approach.
(this is also what the Go runtime does by default, but they automatically fall back to the glibc resolver in any more complex case: https://pkg.go.dev/net#hdr-Name_Resolution)
If it's your application on your server, you can be pretty confident. Only later, it may surprise someone why doesn't this application react to system-wide config. Or it may never happen.
Those redirections happen further in the stack. You are ok reading settings from /etc/resolve.conf, which frequently points to a localhost daemon that redirects your calls to whatever DNS setting you have in your connection.
But parsing /etc/resolve.conf and using it is all that you need in your code.
You don't need all the options. Search google. Parsing resolve.conf is an old technique and the file was written assuming individual apps will be parsing it. You will find instructions on how to do it in say 4 lines. Explicitly for this file. Not any random conf file from the system.
Then your application won't be portable. Which is fine if you have no plans of distributing it. But otherwise I can guarantee it will break on some machine.
You mean linked libraries you don't have the source of? Yes, in that case set the environment vars as early as possible (if possible before starting any thread).
Keep in mind: Don't use multiple threads unless you really, really need to, and have thought long and hard about concurrency issues.
In a way, I think the fact that many library functions are not thread-safe should be viewed as an encouragement to not use threads, or use them only for the bare minimum necessary.
I say this from a few decades of experience fighting with race conditions and the like, and whereupon several times I rewrote an existing multithreaded process into a single-threaded one and greatly improved performance and reduced memory usage. The architecture astronauts may have moved on to stuff like microservices now, but in the 90s/2000s threads were overused just as much.
That's basically impossible in many modern programming environments - even if you never spawn a thread, something else in your executable probably has. By the time your iOS or macOS app has finished launching, it has multiple threads. The Windows loader uses threads to load DLLs.
There are many ways to do multiple threads wrong. Seems that the "right way" is to wake up a sleeping but already-created thread, and take elements out of a work queue in a threadsafe way. Your main thread can even be processing elements during the 10000 clock cycles it takes to wake up a thread.
There is no reason that `getenv()` should ever lock -- it should always be lock-free. `putenv()`/`setenv()`/`unsetenv()` can lock, of course, since there's no point allowing more than one writer at a time.
But it unconditionally calls a memory barrier, which is most of the cost of an un-contended futex-style lock already.
And I'd be interested in your ideas on removing the lock - as I can't see any paths that don't change semantics (e.g. unconditionally doing the init work at process start time when you know there's not multiple threads, for example)
> a memory barrier, which is most of the cost of an un-contended futex-style lock already
Yes, but it's not a lock.
> unconditionally doing the init work at process start time when you know there's not multiple threads
That's the most obvious fix, yes. I was thinking (but I've not checked yet) that when `my_environ` is not set up yet then `getenv()` can use `_environ` directly.
It’s easy to sterilize your code in this regard (although as this article points out you need to know to do it).
POSIX implements a three-argument `main` function (just look at `exec()`) where the third argument is `char* envp[]`. You can call `setenv` to manipulate it.
But easiest is to just null out the POSIX extern `char* environ` (save a copy if you want to consult it yourself later). Just `man 7 environ`
It's not clear from the man pages, but `setenv()` mutates `environ`.
In particular, setting a previously-undefined variable causes `environ` to be reallocated. Whereas `setenv()` of an extant variable changes just that value in the current `environ` pointer array.
I wonder if this only applies to the dns resolver, or also other NSS modules, like the systemd resolver. And don't forget about nscd: if nscd is running, then all of the nsswitch stuff will be done out of process.
That might mean a viable workaround is enabling nscd, oddly enough.
And frankly, maybe libpthread should just overlay thread-safe getenv/setenv like I believe it does for a couple of other libc symbols.
After reading some comments, to avoid the problems
- immediately copy the contents of the buffer the pointer from getenv() points to
- don't use getenv after threads have been started
A library could be written which makes an immutable copy of the whole environment before starting main(). This library then hands out pointers to the environment copy. Or to be even more secure make another copy of the environment variable. This trades some efficiency for security.
In effect, ignore the mutable accessors like setenv from libc.
Or did I miss something? I am not an expert in these things.
And of course it won't solve the problem of two other libraries fighting with each other...
Everything would be cool if it weren't for there being situations in which you need to set an environment variable in the same process in order to get a function to do something. E.g the TZ variable in order to coax a behavior out of the time functions.
If environment variables are for child processes only, there is no need to use setenv because you can pass a specified environment array through exec.
getaddrinfo is terrible. Did you know it also opens and then connects to a socket? Any process that uses getaddrinfo needs a blanket exception in my firewall in order to work properly, because otherwise it will fail to connect to some randomly-generated port that it just made up.
It opens a socket to the same process that called getaddrinfo. That is, it's just communicating with itself, using a brand-new randomly-generated port for each call. This should be completely unnecessary.
So, I have a program that does getaddrinfo(3) and nothing more, and this program setup a socket, listen(2) to it, create another socket and connect with it to the first one ?
I didn't know glibc didn't do the same thing. `getaddrinfo()` on Windows seems to do this because randomly a program will try to connect to `::1:59962` or something, and if I don't allow it in my firewall, it will start whining that some getaddrinfo thread failed to start. This has happened across all sorts of different programs. It's infuriating.
I thought it was just a general libc thing. Isn't there a spec on this somewhere?
For one thing, it could delegate to a local service. Granted, the communication to this service is probably still be over a socket interface, but at least as a purely-local connection you would hopefully have some better worst-case performance characteristics.
This is basically what dnsmasq does when you use it as a local DNS cache.
This! And the attributes page explains these even better:
const Functions marked with const as an MT-Safety issue non-
atomically modify internal objects that are better
regarded as constant, because a substantial portion of the
GNU C Library accesses them without synchronization.
Unlike race, which causes both readers and writers of
internal objects to be regarded as MT-Unsafe, this mark is
applied to writers only. Writers remain MT-Unsafe to
call, but the then-mandatory constness of objects they
modify enables readers to be regarded as MT-Safe (as long
as no other reasons for them to be unsafe remain), since
the lack of synchronization is not a problem when the
objects are effectively constant.
The identifier that follows the const mark will appear by
itself as a safety note in readers. Programs that wish to
work around this safety issue, so as to call writers, may
use a non-recursive read-write lock associated with the
identifier, and guard all calls to functions marked with
const followed by the identifier with a write lock, and
all calls to functions marked with the identifier by
itself with a read lock.
and
env Functions marked with env as an MT-Safety issue access the
environment with getenv(3) or similar, without any guards
to ensure safety in the presence of concurrent
modifications.
We do not mark these functions as MT-Unsafe, however,
because functions that modify the environment are all
marked with const:env and regarded as unsafe. Being
unsafe, the latter are not to be called when multiple
threads are running or asynchronous signals are enabled,
and so the environment can be considered effectively
constant in these contexts, which makes the former safe.
POSIX does document it, it just requires carefully picking through pages and carefully thinking about the wording, unlike the simplicity of GLIBC documentation. For example, the best information is on the page for `exec`.
Stuff like this is why I'm supportive of newer languages like Go and Zig that sidestep libc entirely (when not using cgo as in TFA of course). libc is a great achievement and has served us well but, boy, it sure is a product of its time.
`errno` is another relic that needs to die yesterday.
I think it's the same in Windows, right? Can't use the syscalls underneath the hood, everything through the standard libraries. Maybe I'm wrong (I know very little about Windows other than how to use it to play games, and WSL)
The standard libraries on Windows don't involve libc.
The Windows APIs look rather different, and in general are much more friendly to multi-threading. POSIX on the other hand tends to assume that the program is in control of everything happening inside of it, which is an incorrect assumption due to libraries.
In this particular case, the Windows APIs have neither getaddrinfo() nor getenv(); and the closest equivalent GetEnvironmentVariableW is perfectly thread-safe.
Microsoft additionally has a C runtime (msvcrt) providing functions like getenv(), but this is much less fundamental than it is on other system. Every program is supposed to ship its own copy of the C runtime, it's not officially part of Windows! And it's perfectly possible for multiple different copies of the C runtime to be loaded into the same Windows process. And since *environ is a variable defined by the C runtime, there's a different copy for each C runtime...
Almost correct, except that since Windows 10 there is now a C runtime shipped as standard, ironically it is actually written in C++ taking advantage of its safety features over plain C, and exposing the C API via extern "C".
On windows it’s somewhat possible to avoid most of it by linking to ntdll, which only provides symbols for raw syscall wrappers. But a lot of it is unstable and may change from a windows release to the next.
Doing raw syscalls without ntdll is also possible, but windows syscall numbers change on essentially every release, so you’d end up with something that only works on your windows version.
We've been building everything with CGO_ENABLED=0 for years now, with no nasty side effects. It gets to be a pain using the default, when something as innocuous as a point version of a Docker image breaks compatibility because of a glibc version change[1].
[1] golang official image 1.20.4 to 1.20.5 went from Debian 11 to 12 base. Always use the -(debian version) tags.
Split DNS is broken on macOS when doing that, and for users with VPN that does split DNS it is not just an annoyance it leads to software not actually functioning.
Re-implementing system capabilities is fine and all as long as you support common use cases properly, which Golang does not.
And on the flip side, there have been a number of instances where, in cases where the behavior differs, the Golang documentation describes a function only as it behaves with the Golang-native implementation, rather than the system implementation which ends up being the default - without calling any of this out
Yeah, there's so much misery in the C ecosystem that it's better to eschew it altogether. Even merely packaging anything that depends on C ends up being a hugely painful undertaking since every C library has its own bespoke build system and its own implicit set of dependencies (and implicit versions of those dependencies, and expectations about where on the system those dependencies live).
I mostly like C as a language, but between the security concerns and the tooling concerns (and the community's zealous devotion to ignoring these very real problems) I'm really excited for its increasing marginalization. Unfortunately, it's not being marginalized in favor of "a better C", but rather every ecosystem is rewriting the same stuff from scratch which seems like a bit of a bummer (but still better than depending on C).
if (somecall() == -1) {
printf("somecall() failed\n");
if (errno == ...) { ... }
}
sure, the issue is that `somecall(...)` might have altered `errno` through 'acts-of-omission-or-comission' :o)
fwiw, posix has updated its definition to pretty much say that 'value of errno in one thread is not affected by assignments to it by another'. this has been the case since at least a decade-and-a-half (iirc), which in internet years would positively be in the pleistocenic era :o)
so, i am not sure i really appreciate 'the shared-global-mutable-state' argument above. thanks !
The problem in that snippet is that `printf` could have altered the `errno` set by `somecall`, and that's only thanks to it being shared-global-mutable-state. You not realizing that was possible makes for a great example of why shared-mutable-global-state is hard to reason about.
This thread isn't talking about how to fix the errno problem generally. It's talking about the existence of a problem in the first place. Fixing it would be a whole different can of worms, and indeed, sisyphean sounds about right.
Notice how this entire thread was started by someone asking why errno was problematic. This is just about understanding.
Returning a"Result" struct doubles the size. This is one less register to use.
Exception handling is even more invasive.
They are great for high(-er) level language, but less prefect on lower level where performance is critical.
EDIT: Linux kernel use negative return value for error. It's good and efficient when it work. But it is not always an option when you need the full register width
You are saving one register at the cost of having a thread local variable that is visible to signal handlers, so none of its uses can be optimized away. Which results in things like gcc having to decorate every math instruction with code to set errno on the off chance that someone somewhere might read it (no one ever does).
Some newer POSIX APIs, such as pthreads, do return the error this way. But many legacy APIs, such as dup or read, use the positive integer space, thus the negation pattern you often see in syscalls. Notably, POSIX guarantees <errno.h> values to be positive integers.
There are a lot of alternatives, and it's not clear why the ones you've suggested are inappropriate. You've listed some perceived costs, but I don't see why those costs are greater than the ones paid by the status quo.
Linux even shows you a path, yet you reject it for reasons that don't seem compelling to me.
The performance cost of having otherwise pure functions clobber global mutable memory defeating many optimization passes, is way higher than clobbering another register for the result.
The most obvious way it is wrong is that it is archaic. There is simply no reason to ever pass return values in hidden state. Just use return values damnit.
Err no thank you. Ld preload and similar mechanism are great to inject code into apps legitimately, i.e. to patch long unsupported systems or to tame current ones.
For example I have vision issue and without reshade filter I would be unable to play a great deal of games.
Now that is also an attack vector, that's for sure, but you cannot go ax features willy nilly just because you don't see value in them.
LD_PRELOAD won't be needed if the OS were built around containers / jails, instead of the weakly isolated processes and process groups.
The Unix kernel (both Linux, BSD, and Solaris) already had much of what's needed, say, 30 years ago, but nobody saw it as such a burning necessity (likely except Solaris which eventually developed Zones).
On a "normal" desktop system, you don't need containers or jails. Your programs must communicate with each other (copy paste, print screen, etc.).
But today every god damn UI program needs an internet connection to phone home and execute remote code. This is the actual problem which must be fixed.
this is not a unix beartrap, this is a bug in Go if that's where it was found. If your code is multithreaded, it's up to you to make it threadsafe. You can't declare you're creating the most threadsafe memorysafe newbie safe system, and then go home. You have to write the code.
Your program gets its own copy of the environment when the program was launched. Nobody is changing it on you, any contention for that resource is you contending with yourself.
You don't expect an operating system to change things out from under you. Unix doesn't. If there is contention for this resource, it's all you (or whoever wrote the library you are using)
The environment is name-value pairs, as strings. That's it. That's what makes it accessible and useful. You can swallow it up, the whole thing, into whatever data structure you prefer in your language in a few lines of code, and a millisecond (if that much) of runtime. Just learn how things work and you won't feel helpless.
There is no clear/obvious mention of the fact that setenv could interfere with it. It is a glibc footgun/beartrap that this re-entrancy doesn't actually mean calling it with non-shared memory in the arguments ensures no data races.
there could very well be a bug. if there's a bug, fix it.
But from what I read, it's documented as not threadsafe and this problem is happening in a threaded environment. That still could be called a bug. You find bugs, you fix them;
That's better than hyperventilating about how there is some massive problem with an operating system that you are using because all the people who came before you who knew more than you decided it was the best thing to use. The other OSes sucked more. This OS is simple enough that you can actually learn how it works, it's all laid out in front of you.
Should it do some additional things? sure. Help write the code.
A beartrap/footgun means "a bit too easy to harm yourself for comfort", not "there is a bug". I am relatively old and have been programming to this interface for a few decades, so I'm not hyperventilating about it.
My point (which made reference to the actual OS documentation) was that it takes being bit by such issues before you learn what this "Attributes" section means:
For the 1st decade of my career, I just searched for "re-entrant", "reentrant", or "MT" and went on my merry way, without realizing the importance of the other possible values of attributes table: in this case "env".
when you read the doc and realized you missed something, don't say "footgun", say "aha! I misunderstood" Then if you want, set about to make things better.
Instead of jumping to "It shouldn't work this way" consider "Hmmm, why does it work this way? Is it possible that Thompson, Kernighan and Ritchie knew more about how to make a coherent straightforward system that would last 50 years in 1970 than I know how to now?"
Or perhaps you might consider that we've learned something in the last 50 years.
It sounds to me like you're hyperventilating about other people pointing out footguns. Maybe, just maybe, people in the past were capable of making mistakes. Let's not put them on a pedestal.
In fairness, they also gave us the joys of `strcpy(src_ptr, dest_ptr)` and `scanf("%s", str_ptr)`, which with the benefit of hindsight and many buffer overflows later were a terrible idea.
Yes, I reference that table in a deeper comment. I didn't mean that it wasn't accurately documented, but rather that it's not immediately apparent. I know lots of people who were once quite experienced but still not enough to know what the attributes section is and its importance and its full meaning (myself included).
Indeed, surprising and easy to overlook was all I was trying to convey.
┌───────────────────────────┬───────────────┬────────────────────┐
│Interface │ Attribute │ Value │
├───────────────────────────┼───────────────┼────────────────────┤
│getaddrinfo() │ Thread safety │ MT-Safe env locale │
├───────────────────────────┼───────────────┼────────────────────┤
│freeaddrinfo(), │ Thread safety │ MT-Safe │
│gai_strerror() │ │ │
└───────────────────────────┴───────────────┴────────────────────┘
MT-Safe
MT-Safe or Thread-Safe functions are safe to call in the
presence of other threads. MT, in MT-Safe, stands for
Multi Thread.
Being MT-Safe does not imply a function is atomic, nor
that it uses any of the memory synchronization mechanisms
POSIX exposes to users. It is even possible that calling
MT-Safe functions in sequence does not yield an MT-Safe
combination. For example, having a thread call two MT-
Safe functions one right after the other does not
guarantee behavior equivalent to atomic execution of a
combination of both functions, since concurrent calls in
other threads may interfere in a destructive way.
Whole-program optimizations that could inline functions
across library interfaces may expose unsafe reordering,
and so performing inlining across the GNU C Library
interface is not recommended. The documented MT-Safety
status is not guaranteed under whole-program optimization.
However, functions defined in user-visible headers are
designed to be safe for inlining.
Other safety remarks
Additional keywords may be attached to functions, indicating
features that do not make a function unsafe to call, but that may
need to be taken into account in certain classes of programs:
locale Functions annotated with locale as an MT-Safety issue read
from the locale object without any form of
synchronization. Functions annotated with locale called
concurrently with locale changes may behave in ways that
do not correspond to any of the locales active during
their execution, but an unpredictable mix thereof.
We do not mark these functions as MT-Unsafe, however,
because functions that modify the locale object are marked
with const:locale and regarded as unsafe. Being unsafe,
the latter are not to be called when multiple threads are
running or asynchronous signals are enabled, and so the
locale can be considered effectively constant in these
contexts, which makes the former safe.
env Functions marked with env as an MT-Safety issue access the
environment with getenv(3) or similar, without any guards
to ensure safety in the presence of concurrent
modifications.
We do not mark these functions as MT-Unsafe, however,
because functions that modify the environment are all
marked with const:env and regarded as unsafe. Being
unsafe, the latter are not to be called when multiple
threads are running or asynchronous signals are enabled,
and so the environment can be considered effectively
constant in these contexts, which makes the former safe.
Yes, I talk about that table in a deeper comment. Note to other readers, the table itself is in getaddrinfo(3) but the explanation of the elements is in attributes(7).
This is the viewpoint of someone who works alone or on small teams, where all the code in the process is their own. In a complex program there is plenty that can do surprising things that you didn’t expect.
no, this is a person who understands what is there. People who imagine something else is there, are disappointed when it's not.
People who try to invent the new new, and don't complete the task, blame unix, when unix just does what unix said it would do. It was the people who made bigger claims but did not deliver who should step up and say "I didn't turn the unix environment into what I thought I did."
but people trying to invent the new new are trying to invent it on unix because unix delivers what it promises. It doesn't deliver what you promise. But have some humility, and accept your failure and don't try to pin it on unix. Go implement on Windows.
Unix does allow you to deliver what you promise, that's why you'll still be complaining about unix 10 years from now, but it's up to you to deliver what you promise.
people who say "footgun" are people who try to shed responsibility; "it wasn't my fault, waaaaaah". I'm not saying we should not make computers easier to program, I'm saying that when we fail to: "it's a poor worker who blames his tools."
Why is it so important to you to blame this on unix, when you could blame it on a mistaken implementation of a library that could be fixed?
The words "blame" and "guilt" do not even appear in the article, and "fault" only as part of "segfaulting". Rachel merely points out an easy-to-make mistake before others make them, to save them time. She isn't talking about who is to blame. No need to get defensive.
It's not a syscall, it's a C standard library function that may read files, talk to dbus, read env vars, etc. The problem is people quite understandably expect its re-entrancy to mean more than it really means.
a "syscall" is not going to call back into your code to get the environment. If a "syscall" is using your environment, it's part of your code, probably better described as a library call.
I always thought getenv() was code smell, and this confirms it. I guess a better option would be to add a parameter to Getaddrinfo(), and deal with it further up the stack?
I'd say you shouldn't be calling setenv() at all once you've spawned threads.