Getaddrinfo() on glibc calls getenv(), oh boy

kentonv · on Oct 17, 2023

Doesn't almost every C library function potentially call getenv()? printf()ing a number requires checking the locale which can be configured via the environment, after all.

I'd say you shouldn't be calling setenv() at all once you've spawned threads.

jwilk · on Oct 17, 2023

The environment variables don't affect locale immediately; only when you call setlocale().

cryptonector · on Oct 17, 2023

> I'd say you shouldn't be calling setenv() at all once you've spawned threads.

Or, you know, get a better C library. https://src.illumos.org/source/xref/illumos-gate/usr/src/lib...

chollida1 · on Oct 17, 2023

Can you explain what point you are trying to get across? I just see a page of code.

chasil · on Oct 17, 2023

That appears to be the OpenSolaris C library.

I imagine that it is meant to convey that the quality is higher in this library than in GNU libc.

wruza · on Oct 17, 2023

As the link contains a specific line number, I believe gp suggests to read comment blocks. I’m not very interested to dive into details atm, but at least the explanations seem to be on point.

_justinfunk · on Oct 17, 2023

Just read the code! /s

sltkr · on Oct 17, 2023

Grandparent didn't just link to "code"; they linked to a five-paragraph comment explaining what is going on. Did you read those comments before you posted your snarky reply?

I think the gist of it is that unlike the glibc implementation, the Illumos implementation of getenv() is lock-free and thread-safe.

_justinfunk · on Oct 23, 2023

you are a joy

raggi · on Oct 17, 2023

Correct. libc is largely thread unsafe.

masklinn · on Oct 17, 2023

Thread unsafe is too kind a qualifier, libc is largely thread-broken.

cryptonector · on Oct 17, 2023

Doesn't have to be. Even `strtok()`, `getpwnam()`, etc. can be thread-safe (just not reentrant).

somat · on Oct 17, 2023

I have to admit the conclusion I have reached is that threads are the problem not libc.

We really should have embraced a better primitive than "shared memory execution environment"

panzi · on Oct 17, 2023

I think a lot could be fixed with functions that don't access global state, but get that state as an additional parameter. E.g. I would really like `snprintf_l()`/`fprintf_l()` to be supported by glibc. It is supported by FreeBSD, macOS (Darwin), and even Windows (with a `_` prefix for some reason)! Not by GNU libc.

senkora · on Oct 17, 2023

I think that it is generally not reasonable to convert existing large projects to be multithreaded, in the same way that it is generally not reasonable to fully rewrite existing large projects from scratch.

The alternative that I have seen be successful, is to achieve parallelism by forking separate processes wherever you would have spawned threads, and then communicate through shared memory regions.

It's a lot like having Rust-style unsafe blocks, in that you know that if you are having a thread-safety issue it will definitely be in one of the code sections where you are touching the shared memory region, and not anywhere else.

Obviously there's a higher startup cost for forking, but this makes it possible to gain parallelism without breaking all the thread-unsafe code that is certainly in an existing large project.

j16sdiz · on Oct 17, 2023

> I'd say you shouldn't be calling setenv() at all once you've spawned threads.

That's why I hate ZeroMQ. They spawn threads and do magics behind your back.

another2another · on Oct 17, 2023

ZeroMQ is a big abstraction over sockets. The higher the level of abstractions, the more these things are usually necessary.

If you're using a library like that it comes with the territory, and you have to decide for yourself the cost-benefit of using non blocking sockets and async io yourself, or trust a library.

Much more fun to write these things yourself sometimes, but I find e.g. libevent a nice somewhere-in-between abstraction level I can be happy with.

gpderetta · on Oct 17, 2023

Sure, but library that spawn threads are a pain. It is convenient for the library writer but it always end up being a significant inconvenience for the user.

bb88 · on Oct 17, 2023

But the question is: Who knows what and what not to do with libc?

"But Go compiled it just fine..."

rollcat · on Oct 17, 2023

I build most of my Go binaries with cgo disabled for this, and many other reasons.

In case you don't know, cross-building with GOOS/GOARCH will imply CGO_ENABLED=0 unless you also specify CC_FOR_${GOOS}_${GOARCH}; I cross-build most of my code for (and test it on) amd64, arm64, linux, openbsd, and darwin.

Go will sometimes link to the local libc for network-related functionality if you don't disable cgo.

thewakalix · on Oct 17, 2023

That's the entire point of the post and the earlier posts it links to.

kentonv · on Oct 17, 2023

The post is about getaddrinfo() specifically. It just struck me as odd to call that one out when there are far more common C library calls that use getenv().

defrost · on Oct 17, 2023

There's a strong tendency to think of network calls as entirely universal and not tied in any way to to locale settings in the environment.

Time, date, physical spellings, ... many things are locale dependant, but socket stuff?.

It comes as a Surprise!!, and not the good kind, to many a network programmer with just a few years under their belt to discover threaded networking can segfault because of this.

Once you know, you know and don't forget (until next time), but I suspect this was the motivation behind the blog posting, the principal of potentially most surprise.

j16sdiz · on Oct 17, 2023

They did a post on mktime()

But, yes, in general for libc, if the manpage didn't say it's thread-safe, it is unsafe.

jrmg · on Oct 17, 2023

I think I must be missing something here, but I’ll ask anyway:

Why don’t the OS libraries have some sort of lock around setenv/getenv, so that only one thread can be inside them at a time? I can’t see how it could deadlock. And surely no-one is so dependent on the performance of these calls that the time to lock/unlock would be problematic?

wahern · on Oct 17, 2023

getenv returns a pointer which could be invalidated after releasing the lock, unless the lock also guards uses of that pointer and all application code uses that lock, which they most certainly do not. Likewise, this scheme does not solve direct use of environ by application code.

NetBSD has getenv_r, which copies into a buffer, but few applications use getenv_r, and certainly not all of them. And it doesn't resolve environ.

Solaris never free's env strings or environ arrays, only creating new copies and atomically swapping them. It uses a special allocator for those objects which doubles the backing buffer each time it deep copies the environ array, then argues this strategy is technically asymptotically memory bounded.

EDIT: Glancing at the code I think glibc is similar to Solaris in that it never free's env strings, but it has a heuristic to conditionally free environ arrays which means directly using environ isn't thread-safe.

cryptonector · on Oct 17, 2023

`getenv()` never returns pointers that can be invalidated. If you `putenv()` something, you commit to never freeing it or overwriting it.

hddqsb · on Oct 17, 2023

Good point about `putenv()`; however there is also `setenv()`, which does make a copy, so you are wrong about `getenv()` in general.

POSIX explicitly states "The string [returned by getenv] may be overwritten by a subsequent call to getenv(), setenv(), unsetenv(), or putenv()" (https://pubs.opengroup.org/onlinepubs/9699919799.2008edition...).

cryptonector · on Oct 17, 2023

"May", but really, it's best for `setenv()` to leak.

thayne · on Oct 17, 2023

Having a lock would still be better than the current situation. Especially if the lock was exposed so that programs that did mess with environ directly, or the pointer returned by getenv could hold the lock while doing so.

lazide · on Oct 17, 2023

Deadlock disaster that one.

woodruffw · on Oct 17, 2023

I think the missing piece here is how POSIX specifiers the environment: `getenv(3)` and `setenv(3)` are accessors for `environ`[1], which is just a pointer to some memory (over which walking is also specified[2]). That level of granularity means that it's pretty hard to add any locking here.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1...

[2]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/s...

Retr0id · on Oct 17, 2023

I wonder if you could get around this by giving each thread its own environment context, and synchronizing them, asynchronously.

gpderetta · on Oct 17, 2023

Just independent environments would be the best, like it was done for locale with uselocale. But it is a breaking change, would have to go through posix and will take forever anyway. Also, as environ is a public symbol, it has ABI implications.

tinus_hn · on Oct 17, 2023

Because there is no way to tell when a thread is done with the buffer, there is no moment when you can be sure you can manipulate it. The options are create a new copy and leak the old one or accept the chance of a segfault.

YetAnotherNick · on Oct 17, 2023

But getenv/setenv syscall could still be under lock, which I think most of the people would use. Walking over memory could be without lock and the program could see inconsistent values as is the current behaviour.

cryptonector · on Oct 17, 2023

It should never use locks. Solaris/Illumos' getenv() is lock-less. Every C library should copy that pattern.

aaronmdjones · on Oct 17, 2023

It's a library call, not a system call. You're right though, they could implement locking.

asveikau · on Oct 17, 2023

I think in libc, there's a lot of stuff where an interface is kind of broken from a thread perspective, and it could be implemented better without changing the interface, but often people generally do not.

I can't think of any examples offhand, but I often think it about thread-local storage. Eg. lots of interfaces have an _r() equivalent where you provide the buffer, but many people still call the unsafe one which is broken when there are threads... In my mind, the best way to do this would be to use static thread-local storage in the non-_r() one, and have it call the _r() one ... Sure that has overhead and isn't a perfect solution, but it's better than "bad". But a lot of these old functions don't necessarily get love.

Gibbon1 · on Oct 17, 2023

An a sane world creating a thread would set a global that causes non thread safe library functions to seg fault. Or maybe calling them from two different threads causes a seg fault. But just make it really obvious you're doing bad stuff.

asveikau · on Oct 17, 2023

I think that sounds completely insane. The unsafety is an emergent property of what's done in the function, and entirely dependent on usage. If you were doing very disciplined use of an unsafe call, it's harmless.

Perhaps this would be a good feature of an assert, or something that breaks in a debugger if it's attached. But I don't think that is reasonable for production.

lokar · on Oct 17, 2023

That won’t help. The APi is broken. It returns a pointer.

Sprocklem · on Oct 17, 2023

The pointer isn't guaranteed to point into `environ` directly. `getenv()` could copy the value to a thread-local, (dynamically-allocated?) buffer while holding the lock.

Edit: In hindsight, a dynamic buffer would require returning ENOMEM errors (which might lead to some unexpected failures), while a static buffer would limit the value length. I think you might be right about the API being broken.

lokar · on Oct 17, 2023

Libc and much of posix predate threads. There was no way to fix everything without changing the APIs, which they did in many cases but not all.

alexey-salmin · on Oct 17, 2023

Call getenv() in a loop and you run out of memory.

harerazer · on Oct 17, 2023

Then don't do that. Of all the footguns in POSIX/C programming, having to remember to free this is really not as bad as you seem to imply.

alexey-salmin · on Oct 17, 2023

You miss the point. If you have full control over when and how getenv is called, there's no issue to begin with. The problem is that you don't, as OP demonstrates. It's perfectly natural to call getaddrinfo in a loop.

We need a new API which is not broken like in NetBSD, and a multi-year migration of all core libraries to it. Well a pity it wasn't started years ago though, could've been 95% done by now.

thayne · on Oct 17, 2023

And how would you free it? The current posix API doesn't have any way to reliably free the result returned by `getenv`.

Sprocklem · on Oct 17, 2023

I was suggesting that the buffer be invalidated by each subsequent call – like some other libc functions' internal buffers – although, as I noted in the edit this would need `getenv()` to be able to indicate errors (specifically ENOMEM). It currently cannot do this as currently described, because NULL is used to indicate an absent variable.

You could also require callers free the returned memory when they're done, but that would be another change of API.

alexey-salmin · on Oct 17, 2023

> I was suggesting that the buffer be invalidated by each subsequent call

This would break very simple single-threaded programs that e.g. print two env vars in one printf call.

lokar · on Oct 17, 2023

The solution to all problems like this was decided years ago: _r
You provide the storage and free it

The problem is these non-direct uses. They each need to switch to •_r and manage the buffer, or offer _r versions themselves and sort of pass through the problem

Sprocklem · on Oct 17, 2023

Of course, *_r is a better option, but the existing API is used so pervasively that it needs to be made thread-safe to actually avoid thread-unsafe code in, e.g, libraries.

lokar · on Oct 18, 2023

I don’t see how you can make:

- return a pointer - the library owns the allocation - the state is global and mutable

Thread safe

Sprocklem · on Oct 18, 2023

A number of libc functions return a pointer to an internal thread-local buffer, which is invalidated on subsequent calls. If the function copies the environment variable's value to such a buffer while holding the mutex controlling access to the global state, then the returned value is guaranteed to remain unaffected by other threads.

There are, however, other problems (discussed elsewhere in this thread) that complicate such an API in the context of getenv().

lokar · on Oct 18, 2023

Requiring you to hold a mutex to safely call the function is an API change

Sprocklem · on Oct 18, 2023

I was not suggesting the caller hold a mutex, but rather getenv(), which eould be transparent to the caller.

lokar · on Oct 20, 2023

Don’t you need C++11?

jrmg · on Oct 17, 2023

That makes a lot of sense. You’d need to snapshot environ when the lock was taken (when another thread could be accessing it!), which I imagine would be complicated. Although surely possible.

jclulow · on Oct 17, 2023

On at least some other operating systems, getenv(3C) and setenv(3C) are indeed thread-safe; e.g., on illumos: https://illumos.org/man/3C/getenv

We inherited our implementation from OpenSolaris, as did Oracle when they created their Solaris 11 fork. I expect at least some of the BSDs have probably fixed this as well.

o11c · on Oct 17, 2023

Just because it is documented as thread safe doesn't mean it actually is. They might have just not understood the problem (see e.g. the various indirect links to "please mandate a thread-local `getenv`").

`setenv` is a nasty beast, especially since the raw `environ` variable is also exposed (and is in fact the only way to enumerate the environment).

tjfontaine · on Oct 17, 2023

In my experience, because it is documented that’s the behavior I would expect.

That being said, I went to look[0] and it turns out it wasn’t a lie. [0]https://github.com/illumos/illumos-gate/blob/master/usr/src/...

hddqsb · on Oct 17, 2023

For the curious: They make getenv() thread-safe by intentionally leaking the old environment, which they argue is acceptable because the memory leak is bounded to 3x the space actually needed.

The getenv/setenv/putenv/environ API looks terrible on closer inspection -- it does not appear possible for an implementation to be safe, leak-free, and efficient.

o11c · on Oct 17, 2023

Just because they use locks doesn't mean it's automatically actually thread-safe.

The comment about "this is safe for people walking `environ`" is definitely a lie, for example, though the bug might be hidden on some common architectures with popular compilers in their default configuration.

asveikau · on Oct 17, 2023

I agree. I'm shocked this isn't there in 2023. Or even better, a rwlock allowing concurrent reads while serializing writes. Or some lock free algorithm for writes.

I'm pretty sure the Windows version of the environment calls has locking.

Historically you can access the environment via a global variable too, which would side-step locking schemes. But probably hardly anybody does that anymore.

jeroenhd · on Oct 17, 2023

SetEnvironmentVariable is thread safe on Windows, but their POSIX wrappers aren't. Windows has a better API design in this instance, and the guarantees by POSIX make it impossible to make a compliant implementation that can be used with threads.

asveikau · on Oct 17, 2023

> and the guarantees by POSIX make it impossible to make a compliant implementation that can be used with threads.

Can you be more specific than this? I kind of doubt it.

For example, seems to me you could write a setenv() that uses a lock-free algorithm or writes in a strategic way that won't result in a fault if getenv() or reading environ(7) runs concurrently, then say all bets are off for thread safety if you write via environ(7). That's safer than the status quo and I don't foresee it breaking POSIX.

jeroenhd · on Oct 17, 2023

Reading the spec again, I suppose it's possible to keep copies of environment variables around in memory without violating the spec, basically creating a copy for every getenv call that was modified since the last setenv/putenv. I thought the spec also specified that writes to the returned pointer would update the environment variables, but no such guarantee is given, that's just an implementation detail (and is disallowed by the API spec but good luck enforcing that).

The XSI spec does state that the result of getenv() may be overwritten by putenv() but that's not a strict requirement either.

You still risk programs and libraries expecting getenv to always return a pointer to *environ failing (I believe Go has an issue like that on MUSL).

On the other hand, the POSIX standard explicitly states that getenv() does not need to be reentrant (and therefore doesn't need to be thread safe) so any program relying on thread-safe getenv is already violating the API contract.

The rationale also seems to assume you can't make this stuff thread safe because of the standard implementation:

> The getenv() function is inherently not reentrant because it returns a value pointing to static data.

asveikau · on Oct 17, 2023

Modifying the buffer returned by getenv() seems like a terrible way to write back a value, because you could only replace it with a string with equal or shorter length. One of the problems setenv() solves is allocation.

It's important to note the difference between reentrant and thread safe. The most obvious implementation of getenv(), which would just loop through environ(7) and do a bunch of strncmp, can safely be re-entered, in that you could interrupt it and call it again and it would produce no ill effect. It just can't be overlapped with writes.

JoshTriplett · on Oct 17, 2023

There's a proposal for that: https://www.owlfolio.org/development/thread-safe-environment...

thayne · on Oct 17, 2023

That would be nice. What is the likelihood this will actually make it into libc implementations and/or become part of the posix standard?

raggi · on Oct 17, 2023

there are lots of programs that access `*environ` directly, so while this might be good, it wouldn’t solve all classes of the problem. There are also uses out there which are performance sensitive (and often just as if not more unsafe, such as holding pointers into the structure over long periods).

threaded programs should probably seriously consider retiring libc, but we don’t currently have a common ground replacement.

name related activities are one of the worst areas, contributing significantly to the glibc linkage and abi challenges, but also lacking sufficient standards for alternatives or even consensus to be built quickly.

dundarious · on Oct 17, 2023

Yes, on Linux you can even write `int main(int argc, char* argv[], char* envp[])`.

o11c · on Oct 17, 2023

`envp` is unsafe after `setenv` even in single-threaded programs though. So you really should use `environ`.

FridgeSeal · on Oct 17, 2023

The sarcastic answer to that would be something along the lines of “users should be aware of what they’re doing, and you should be more careful about calling those concurrently anyway/good-luck-have-fun”.

jrmg · on Oct 17, 2023

I’ve been searching around, and you can find a bunch of discussions about this online. Your ’sarcastic’ argument is basically the one I’ve seen in most places.

They could easily be made thread safe, but, paraphrasing, most arguments seem to come down to something like:

“setenv and getenv are POSIX functions, and not defined to lock. Just like many POSIX functions, they’ve _never_ been thread safe, and it’s an error to assume they are. Should we really start papering over client errors in use of a supposedly portable API, even though it’s working as specified? And if we make that choice pragmatically for this instance, should we be trying to do it for _all_ of POSIX? That’s impossible for some things, and would add complexity even where it’s not. For all these reasons, it’s better if these just stay dangerous like they’ve always been.”

wmf · on Oct 17, 2023

This is fine for a 1980s monolithic program but if you use any library that reads environment variables (like, ahem, libc!) you have to treat the whole library as non-thread-safe? Or keep track of the "color" of each library function?

jiggawatts · on Oct 17, 2023

This kind of historical baggage is one of the main reasons I now completely avoid C/C++ programming and won't touch it ever again. It's Rust or C# only for me from here on...

thayne · on Oct 17, 2023

The problem is that this effects higher languages too, because they often build on libc. And on some OSes, they don't have a choice, because the system call interface is unstable and/or undocumented).

For example in rust, multiple time libraries were found to be unsound if `std::env::set_env` was ever called from a multi-threaded program. See:

https://github.com/time-rs/time/issues/293 and https://github.com/chronotope/chrono/issues/499

https://github.com/rust-lang/rust/issues/27970

https://github.com/rust-lang/rust/issues/90308

Taniwha · on Oct 17, 2023

Yes it's worth remembering that the POSIX base came before threads became commonly available (or even had a standardised API)

thayne · on Oct 17, 2023

Maybe it is time to replace the POSIX base with something that is better suited to a multi-threaded environemnt.

pjmlp · on Oct 17, 2023

Which is already happening on non UNIX OSes.

Even on OSes that happen to use the Linux kernel like Android, those that insist on using the NDK and pretend it is like GNU/Linux, beyond the official supported use cases, end up bumping their heads against the wall.

FridgeSeal · on Oct 17, 2023

I’ll go one further and say that maybe it’s time we had OS’/kernel/API base that isn’t just better suited, but is explicitly designed for the massively multithreaded, ludicrously fast, massively concurrent hardware we have in spades these days.

Alas I am not as OS dev, I have not the skills or understanding to not how to build that, or what this would involve, but I do think it’s clear that what we have at the moment isn’t as well suited as it could be. Io_uring / Direct-IO seem to be better suited though.

j16sdiz · on Oct 17, 2023

For libc-developer, they could do better.

For the rest of us, I guess this is the best option. We can do wrappers, but, yak!

dataflow · on Oct 17, 2023

That's why getenv_s et al were added.

gpderetta · on Oct 17, 2023

Even if the api was perfect and used locks and returned memory to be managed by the caller, it would still be hard to use safely in a multithreading environment as long as the env is a process global property.

cryptonector · on Oct 17, 2023

On Solaris/Illumos `putenv()` and `getenv()` are lock-less and really fast.

Basically, if you `putenv()`, you commit to never freeing that memory.

xy2_ · on Oct 17, 2023

putenv is not lock free: https://github.com/illumos/illumos-gate/blob/master/usr/src/...

cryptonector · on Oct 17, 2023

Ay, yes. Though it could be made lock-less.

forrestthewoods · on Oct 17, 2023

If I were King I would ban environment variables from the OS entirely. Global mutable state is the root of all evil! Globals are evil evil evil and the modern reliance on bullshit environment variables is a plague upon reliability.

Kill it all with fire.

bayindirh · on Oct 17, 2023

Well, environment variables are not "global" globals. They are just my globals, or my post-it notes for some variables. Because they are not per-user even. They are per user session.

10 processes can have completely different set of values for the same environment variables, because they are in their own environments, and apparently, that's useful.

There are foot guns, and there are unintentional consequences of implementation and design details. This is why we patch, improve and rewrite our software over time. To iron out these kinks.

Fire is also have a tendency to cause collateral damage. So use both fire and environment variables responsibly, and world will be a better place.

gpderetta · on Oct 17, 2023

They are dynamically scoped variables. Very powerful, but only a slight step above globals.

gpderetta · on Oct 17, 2023

Should we also get rid of filesystems? Databases? All form of RPC?

forrestthewoods · on Oct 17, 2023

I definitely think a lot of filesystem access is a code smell and probably not the right thing. That one causes me a lot of pain. But that’s largely because I work in games and you really need to use the Unity/Unreal/whatever asset management system instead of direct file system access.

I’ve got a small build system and the first thing it does is nuke PATH to empty. It’s glorious. No more grabbing random shit from a big blob with untracked dependencies that varies wildly by system!

I could easily live my entire life without environment variables. They’re just a fundamentally bad idea. Every program that foolishly uses environment variables can be replaced by a better program that takes a config file or arglist.

nikanj · on Oct 17, 2023

Honestly sometimes I think the answer is yes. Imagine how happy we could be, and how many fewer problems we would have. Add printers to that list and you're describing a paradise.

woodruffw · on Oct 17, 2023

The value of `setenv(3)` has always been pretty murky to me -- the only time I've ever really needed it is when performing a fork-exec, and even then it's been the wrong tool for the job (the dedicated exec*e variants are the right way).

Would there be any significant downsides (besides breakage) to mapping `environ(7)` as read-only? That seems like the kind of thing that a Linux distribution (or more realistically OpenBSD) could do as a way to kill off a persistent family of bugs.

gpderetta · on Oct 17, 2023

The best part is that you can't use setenv after fork and before execve as it is not async signal safe. As you mention, the envp-taking variant of execve is the only sane option.

fsociety · on Oct 17, 2023

If they did it, it would break userspace for a probably surprising amount of processes.

Brian_K_White · on Oct 17, 2023

Why not find those? Especially in a dedicated arena whose stated goals are things just like that like openbsd.

j16sdiz · on Oct 17, 2023

It is quite common a pattern setting the timezones with setenv("TZ",...)

There are alternatives, but rewriting every program is not always an option.

These kind of rewriting sounds exactly like what OpenBSD would do ..

woodruffw · on Oct 17, 2023

This is perhaps just my ignorance, but when do you find yourself needing to set the timezone like that? Not in 10+ years of C programming have I ever had to do that.

masklinn · on Oct 17, 2023

I assume doing something on behalf of a remote user and still convinced locales can do a thing other than hurt you.

j16sdiz · on Oct 17, 2023

In most of the time, you just set it on program start and never change. This is quite common for programs that expects UTC.

Searching in github, other example are:

  1) `jq` parsing datetime
  2)  systemd , timectl printing status
  3) Unit tests 
  4) Some RDP client update its own timezone with server timezone

fanf2 · on Oct 17, 2023

I might do that if I am writing a quick-and-dirty program that works with times in multiple time zones and I can’t be bothered to find a library with a better API.

r2vcap · on Oct 17, 2023

I think the problem is in setenv(3), not getenv(3). Reading shared global state is okay as long as it is not mutable. If someone relies on modifying environment variables, one should use execve(3), not setenv.

Too · on Oct 17, 2023

Exactly. Setting environment variables while a program is running is a terrible idea. Thread safe or not.

A lot of code, for good reasons, assume envvars are constants set before the program started and caches computations based on them, read config files and so on.

The fact that they are essentially global variables should be enough to deter usage of them.

chriswarbo · on Oct 17, 2023

> The fact that they are essentially global variables should be enough to deter usage of them

Env var behaviour is much closer to dynamic variables, rather than global variables (which I argue at http://www.chriswarbo.net/blog/2021-04-08-env_vars.html )

Either way, I agree that mutating them is usually a bad idea; though I find them very good for constant (or dynamically-bound) config.

cryptonector · on Oct 17, 2023

It can be made safe: https://src.illumos.org/source/xref/illumos-gate/usr/src/lib...

metadat · on Oct 17, 2023

For those who, like me, are a bit rusty on the man memorization:

execve(3): https://linux.die.net/man/3/execve

X-Istence · on Oct 17, 2023

This is something that has bit Rust as well: https://github.com/rust-lang/rust/issues/90308

thayne · on Oct 17, 2023

Many posix (and c standard library for that matter) functions were not designed with multithreaded programs in mind and don't work well in multithreaded programs.

I really think it would be worth creating a new standard API that is built with threading in mind, where functions like mktime, getaddrinfo, localtime, etc. take arguments instead of reading from the environment, that avoid global state as much as possible, and are thread safe if there is global state.

cookiengineer · on Oct 17, 2023

https://musl.libc.org/

krackers · on Oct 20, 2023

Doesn't musl have the same issue? https://github.com/JuliaLang/julia/issues/34726#issuecomment...

I also wonder about OSX's libc. Newer versions seem to have some sort of locking https://github.com/apple-open-source-mirror/Libc/blob/master... but they free pointers so it's not safe from a potential use-after-free client side.

but older versions (from 10.9) don't even have any locking: https://github.com/apple-oss-distributions/Libc/blob/Libc-99...

cryptonector · on Oct 17, 2023

Solaris/Illumos also has a very nice C library with lots more thread-safety than others, including `getenv()` being thread-safe.

rurban · on Oct 17, 2023

Almost everybody calls getenv(). malloc for tunings, checks, tracing and such, half of the string library for localization specifics, all of the locale and time and timezone functions, many math functions need fegetenv().

Also most set errno, ha!

jonhohle · on Oct 17, 2023

errno is thread safe (and thread local).

cyberax · on Oct 17, 2023

GAI just needs to go. It needs to be moved to its own daemon with a simple RPC.

libc is kinda schizophrenic in this regard. It has mostly obviously low-level functions like string manipulation and memory management, and then unexpectedly a DNS client implementation and a support for arbitrary runtime plugins (for PAM).

xerxes901 · on Oct 17, 2023

This is essentially what systemd-resolved with the nss-resolve NSS module is right? It’s possible to use /etc/nsswitch.conf to entirely disable the built-in DNS resolution in glibc if you want.

alexgartrell · on Oct 17, 2023

IME, familiarity with /etc/nsswitch.conf and the rest of the nss stuff is very highly correlated with people who have seen some shit.

cyberax · on Oct 17, 2023

Yep, it's close. But you still depend on the unholy mess that is nsswitch.conf

It's still a bit worrying, with manual JSON parsing: https://github.com/systemd/systemd/blob/79f487038444646f5bce... But at least it's just ~600 lines of fairly straightforward code.

You also can get most of it with nscd.

(Sigh, I wish BUS1 guys pushed their project to completion)

tadfisher · on Oct 17, 2023

This, along with the general inability to handle per-interface DNS resolution, is the primary reason systemd-resolved exists.

magicalhippo · on Oct 17, 2023

Why would you ever call setenv after spawning threads though?

Or are there other sneaky calls which will do that behind you back?

Joker_vD · on Oct 17, 2023

...on Windows, single-threaded programs don't really exist; any DLL can, and most of them do, spawn worker threads as an implementation detail. Some of them do it the moment their initializer is being run, so if you link your program against something else than kernel32 and its friends (the basic Windows system libraries don't spawn worker threads on being loaded), then when a thread finally starts executing your executable's entry point there is no guarantee that this is the only thread that exists in your process. And in fact, finding a non-toy, real-world Windows application that has only one thread is almost impossible (for example, IIRC all .NET runtimes have worker-thread pool from the get go so that rules out any .NET executables).

Which is why on Windows there is almost no system APIs (well, almost: there were some weird technical decisions around single-threaded apartments for COM...) that can be safely used only in single-threaded applications.

Maybe in several more decades Linux community will also accept the fact that multi-threaded applications are an entirely normal and inevitable thing, not an aberration of nature that we all best pretend don't exist until we're absolutely forced to deal with their reality.

marcosdumay · on Oct 17, 2023

Well, that's the gotcha isn't it?

It's easy to think about some complex interactive software where the need to call setenv appears only after you have worker threads doing some other thing. Without a warning, you won't know it's a bad thing to do, and the manpage only says that it and unsetenv are not thread safe, as if this was remotely enough information.

What nobody is telling is that the environment is so big that you need it to compress data or open an IPv6 connection. It's not obvious at all that you can't do those things while editing a variable.

hinkley · on Oct 17, 2023

There’s always a lot of weird emergent behavior in bootstrapping an app, and on an app of any serious size, I can’t entirely control if someone decides to spool up a thread pool on startup so that everything is hot before listen() happens.

I may think I have control, I may believe that a handful of us are entitled to have that say, but all it takes is someone adding a cross dependency that forces an existing piece of code to jump from 20th position in the load order to 6th and all hell can break loose. Or just as often, set a ticking time bomb that nobody notices until there’s a scaling or peak traffic event or someone adds one more small mistake to the code and foomp! up it goes.

numbsafari · on Oct 17, 2023

That’s literally explained in the article. It’s worth reading more than the headline.

Ed: actually, that’s even spelled out in the headline.

raggi · on Oct 17, 2023

It’s neither in the headline or in the article. The question was about setenv, not getenv.

It is best to avoid calling setenv in a threaded program. Some programs do it to make space for rewriting argv with large strings (freeing space from *environ which tends to be right after the tail of argv). Some programs or libraries use *environ directly to stage variables for exec before forking. Some want to pass variable changes to forks. There are alternatives possible, but in the context of something like go calling libc setenv, it’s to make interop easier- sadly it may make other interop harder, such as this case.

emmelaich · on Oct 17, 2023

? setenv not getenv. You'd rarely use setenv, and even then you'd do it at startup.

magicalhippo · on Oct 17, 2023

Right. That's been my experience so far, hence my question.

silisili · on Oct 17, 2023

It's not. OP was asking about setenv, not getenv...

WhyNotHugo · on Oct 17, 2023

On Linux/musl, the man page also specifies that LOCALDOMAIN and RES_OPTIONS are inspected.

The man page for OpenBSD indicates the same thing: https://man.openbsd.org/resolv.conf#ENVIRONMENT

Apparently it's not a gnu-specific behaviour.

habibur · on Oct 17, 2023

Which is why I have gotten rid of getaddrinfo() calls in my server code, and rather resolve DNS directly reading the DNS server setting from the system.

Other issues I faced :

- Not epoll() friendly. Always forks a process while resolving domain name.

- Valgrind complains of uninitialized memory touches when the function is called and I can't get rid of it.

q3k · on Oct 17, 2023

> Which is why I have gotten rid of getaddrinfo() calls in my server code, and rather resolve DNS directly reading the DNS server setting from the system.

This works as long as you don't need support for mDNS or LDAP host resolution, which depends on libnss/nsswitch on glibc-based systems. Which is fine, but should be a well documented limit of this of approach.

(this is also what the Go runtime does by default, but they automatically fall back to the glibc resolver in any more complex case: https://pkg.go.dev/net#hdr-Name_Resolution)

AceJohnny2 · on Oct 17, 2023

how confident are you that the DNS settings you are honoring are the ones the user intended?

For example, in a split-VPN situation?

krab · on Oct 17, 2023

If it's your application on your server, you can be pretty confident. Only later, it may surprise someone why doesn't this application react to system-wide config. Or it may never happen.

habibur · on Oct 17, 2023

Those redirections happen further in the stack. You are ok reading settings from /etc/resolve.conf, which frequently points to a localhost daemon that redirects your calls to whatever DNS setting you have in your connection.

But parsing /etc/resolve.conf and using it is all that you need in your code.

uxp8u61q · on Oct 17, 2023

Your complete lack of mention of nsswitch.conf makes me believe that you're not implenting anything correctly in this instance.

Karellen · on Oct 17, 2023

* /etc/resolv.conf

guappa · on Oct 17, 2023

And your code correctly parses all the options that can be in that file?

habibur · on Oct 17, 2023

You don't need all the options. Search google. Parsing resolve.conf is an old technique and the file was written assuming individual apps will be parsing it. You will find instructions on how to do it in say 4 lines. Explicitly for this file. Not any random conf file from the system.

guappa · on Oct 17, 2023

Then your application won't be portable. Which is fine if you have no plans of distributing it. But otherwise I can guarantee it will break on some machine.

amelius · on Oct 17, 2023

You shouldn't call setenv in the first place. Instead, use a new environment when you call execle/execvpe. This avoids problems with multiple threads.

pdw · on Oct 17, 2023

Some APIs require the use of setenv. For example, you need to setenv("TZ", ...) if you want to use multiple timezones.

amelius · on Oct 17, 2023

You mean linked libraries you don't have the source of? Yes, in that case set the environment vars as early as possible (if possible before starting any thread).

userbinator · on Oct 17, 2023

Keep in mind: Don't use multiple threads unless you really, really need to, and have thought long and hard about concurrency issues.

In a way, I think the fact that many library functions are not thread-safe should be viewed as an encouragement to not use threads, or use them only for the bare minimum necessary.

I say this from a few decades of experience fighting with race conditions and the like, and whereupon several times I rewrote an existing multithreaded process into a single-threaded one and greatly improved performance and reduced memory usage. The architecture astronauts may have moved on to stuff like microservices now, but in the 90s/2000s threads were overused just as much.

favorited · on Oct 17, 2023

That's basically impossible in many modern programming environments - even if you never spawn a thread, something else in your executable probably has. By the time your iOS or macOS app has finished launching, it has multiple threads. The Windows loader uses threads to load DLLs.

Dwedit · on Oct 17, 2023

There are many ways to do multiple threads wrong. Seems that the "right way" is to wake up a sleeping but already-created thread, and take elements out of a work queue in a threadsafe way. Your main thread can even be processing elements during the 10000 clock cycles it takes to wake up a thread.

gpderetta · on Oct 17, 2023

For one, if you are using getaddrinfo you are often forced to use threads as gai is a slow blocking call.

cryptonector · on Oct 17, 2023

There is no reason that `getenv()` should ever lock -- it should always be lock-free. `putenv()`/`setenv()`/`unsetenv()` can lock, of course, since there's no point allowing more than one writer at a time.

Don't believe me? Look at this work of art (the entire file!): https://src.illumos.org/source/xref/illumos-gate/usr/src/lib...

kimixa · on Oct 17, 2023

What are you trying to say using that example? The getenv() function shown calls initenv(), which can (clearly) take a lock?

cryptonector · on Oct 17, 2023

One time. Also, I think that could be removed with some care.

kimixa · on Oct 17, 2023

But it unconditionally calls a memory barrier, which is most of the cost of an un-contended futex-style lock already.

And I'd be interested in your ideas on removing the lock - as I can't see any paths that don't change semantics (e.g. unconditionally doing the init work at process start time when you know there's not multiple threads, for example)

cryptonector · on Oct 17, 2023

> a memory barrier, which is most of the cost of an un-contended futex-style lock already

Yes, but it's not a lock.

> unconditionally doing the init work at process start time when you know there's not multiple threads

That's the most obvious fix, yes. I was thinking (but I've not checked yet) that when `my_environ` is not set up yet then `getenv()` can use `_environ` directly.

gumby · on Oct 17, 2023

It’s easy to sterilize your code in this regard (although as this article points out you need to know to do it).

POSIX implements a three-argument `main` function (just look at `exec()`) where the third argument is `char* envp[]`. You can call `setenv` to manipulate it.

But easiest is to just null out the POSIX extern `char* environ` (save a copy if you want to consult it yourself later). Just `man 7 environ`

naruhodo · on Oct 17, 2023

It's not clear from the man pages, but `setenv()` mutates `environ`.

In particular, setting a previously-undefined variable causes `environ` to be reallocated. Whereas `setenv()` of an extant variable changes just that value in the current `environ` pointer array.

garaetjjte · on Oct 17, 2023

>But easiest is to just null out the POSIX

And then wonder why some shared library dependency few layers deep blew up.

jchw · on Oct 17, 2023

I wonder if this only applies to the dns resolver, or also other NSS modules, like the systemd resolver. And don't forget about nscd: if nscd is running, then all of the nsswitch stuff will be done out of process.

That might mean a viable workaround is enabling nscd, oddly enough.

And frankly, maybe libpthread should just overlay thread-safe getenv/setenv like I believe it does for a couple of other libc symbols.

dur-randir · on Oct 17, 2023

We've once spent 3 months debugging crash caused by this.

_nalply · on Oct 18, 2023

After reading some comments, to avoid the problems

- immediately copy the contents of the buffer the pointer from getenv() points to

- don't use getenv after threads have been started

A library could be written which makes an immutable copy of the whole environment before starting main(). This library then hands out pointers to the environment copy. Or to be even more secure make another copy of the environment variable. This trades some efficiency for security.

In effect, ignore the mutable accessors like setenv from libc.

Or did I miss something? I am not an expert in these things.

And of course it won't solve the problem of two other libraries fighting with each other...

kazinator · on Oct 18, 2023

Everything would be cool if it weren't for there being situations in which you need to set an environment variable in the same process in order to get a function to do something. E.g the TZ variable in order to coax a behavior out of the time functions.

If environment variables are for child processes only, there is no need to use setenv because you can pass a specified environment array through exec.

cratermoon · on Oct 18, 2023

Why threads are a bad idea (for most purposes) https://blog.acolyer.org/2014/12/09/why-threads-are-a-bad-id...

LoganDark · on Oct 17, 2023

getaddrinfo is terrible. Did you know it also opens and then connects to a socket? Any process that uses getaddrinfo needs a blanket exception in my firewall in order to work properly, because otherwise it will fail to connect to some randomly-generated port that it just made up.

Sprocklem · on Oct 17, 2023

How do you propose that DNS resolution precede without opening a socket?

LoganDark · on Oct 17, 2023

It opens a socket to the same process that called getaddrinfo. That is, it's just communicating with itself, using a brand-new randomly-generated port for each call. This should be completely unnecessary.

JackSlateur · on Oct 17, 2023

So, I have a program that does getaddrinfo(3) and nothing more, and this program setup a socket, listen(2) to it, create another socket and connect with it to the first one ?

This looks insane and not what strace(1) tells me

Could you give me more details ?

LoganDark · on Oct 17, 2023

I didn't know glibc didn't do the same thing. `getaddrinfo()` on Windows seems to do this because randomly a program will try to connect to `::1:59962` or something, and if I don't allow it in my firewall, it will start whining that some getaddrinfo thread failed to start. This has happened across all sorts of different programs. It's infuriating.

I thought it was just a general libc thing. Isn't there a spec on this somewhere?

nomadluap · on Oct 17, 2023

For one thing, it could delegate to a local service. Granted, the communication to this service is probably still be over a socket interface, but at least as a purely-local connection you would hopefully have some better worst-case performance characteristics.

This is basically what dnsmasq does when you use it as a local DNS cache.

kccqzy · on Oct 17, 2023

Completely agree. Just use systemd-resolved which is the recommended way of doing DNS anyway.

guappa · on Oct 17, 2023

And how do you communicate with systemd-resolved?

mambru · on Oct 17, 2023

Via DBus:

https://www.freedesktop.org/software/systemd/man/org.freedes...

guappa · on Oct 17, 2023

dbus is just a protocol that needs something else to actually transfer the data.

Normally, that's a socket in /run/dbus/system_bus_socket

kccqzy · on Oct 17, 2023

And guess what. A UNIX socket does not use DNS or the damned getaddrinfo() function that's the ire of the article.

tedunangst · on Oct 17, 2023

What should I use instead that doesn't open a socket?

hknapp · on Oct 17, 2023

They probably could have given a more apt name

jxy · on Oct 17, 2023

Either Linux's fault or Glibc's fault.

POSIX mandates getaddrinfo to be thread safe.

o11c · on Oct 17, 2023

Yes, but `setenv` is documented as not thread-safe.

Once you do unsafe things, the nasal demons can spread to safe code elsewhere.

Glibc's documentation is more explicit about the propogation of the nasal demons:

       ┌─────────────────────┬───────────────┬─────────────────────┐
       │Interface            │ Attribute     │ Value               │
       ├─────────────────────┼───────────────┼─────────────────────┤
       │setenv(), unsetenv() │ Thread safety │ MT-Unsafe const:env │
       └─────────────────────┴───────────────┴─────────────────────┘
       ┌────────────────┬───────────────┬────────────────────┐
       │Interface       │ Attribute     │ Value              │
       ├────────────────┼───────────────┼────────────────────┤
       │getaddrinfo()   │ Thread safety │ MT-Safe env locale │
       ├────────────────┼───────────────┼────────────────────┤
       │freeaddrinfo(), │ Thread safety │ MT-Safe            │
       │gai_strerror()  │               │                    │
       └────────────────┴───────────────┴────────────────────┘

anttihaapala · on Oct 17, 2023

This! And the attributes page explains these even better:

       const  Functions marked with const as an MT-Safety issue non-
              atomically modify internal objects that are better
              regarded as constant, because a substantial portion of the
              GNU C Library accesses them without synchronization.
              Unlike race, which causes both readers and writers of
              internal objects to be regarded as MT-Unsafe, this mark is
              applied to writers only.  Writers remain MT-Unsafe to
              call, but the then-mandatory constness of objects they
              modify enables readers to be regarded as MT-Safe (as long
              as no other reasons for them to be unsafe remain), since
              the lack of synchronization is not a problem when the
              objects are effectively constant.

              The identifier that follows the const mark will appear by
              itself as a safety note in readers.  Programs that wish to
              work around this safety issue, so as to call writers, may
              use a non-recursive read-write lock associated with the
              identifier, and guard all calls to functions marked with
              const followed by the identifier with a write lock, and
              all calls to functions marked with the identifier by
              itself with a read lock.

and

       env    Functions marked with env as an MT-Safety issue access the
              environment with getenv(3) or similar, without any guards
              to ensure safety in the presence of concurrent
              modifications.

              We do not mark these functions as MT-Unsafe, however,
              because functions that modify the environment are all
              marked with const:env and regarded as unsafe.  Being
              unsafe, the latter are not to be called when multiple
              threads are running or asynchronous signals are enabled,
              and so the environment can be considered effectively
              constant in these contexts, which makes the former safe.

jxy · on Oct 17, 2023

It's great that Glibc documents its POSIX violating behavior very well, but it doesn't change the fact that it violates POSIX.

o11c · on Oct 18, 2023

POSIX does document it, it just requires carefully picking through pages and carefully thinking about the wording, unlike the simplicity of GLIBC documentation. For example, the best information is on the page for `exec`.

uxp8u61q · on Oct 17, 2023

This table is unreadable on a narrow screen.

mmis1000 · on Oct 17, 2023

The general rule of unsafe api(usage) is "once you done it, nothing is expected to safe even if it was stated to be".

Error can spread in your program in funny way, which also break the program in funny way (think about stack overflow or double free).

Unless the manual/document explicitly stated what error can it cause, there is no way to know without actually trigger it.

raggi · on Oct 17, 2023

glibc, I’m too lazy to lookup if it removes the getenv lookup with posix build flags.

fulafel · on Oct 17, 2023

getenv() could be made thread-safe by leaking the memory returned.

jakeogh · on Oct 17, 2023

musl: https://git.musl-libc.org/cgit/musl/tree/src/network/getaddr...

saagarjha · on Oct 17, 2023

Yes, that’s why nobody likes musl’s DNS.

heinrich5991 · on Oct 17, 2023

Permalink: https://git.musl-libc.org/cgit/musl/tree/src/network/getaddr....

ForkMeOnTinder · on Oct 17, 2023

Stuff like this is why I'm supportive of newer languages like Go and Zig that sidestep libc entirely (when not using cgo as in TFA of course). libc is a great achievement and has served us well but, boy, it sure is a product of its time.

`errno` is another relic that needs to die yesterday.

eatonphil · on Oct 17, 2023

Depending on the operating system, you can't skip libc even in Go. I think it's required on openbsd and illumos/solaris for example.

https://utcc.utoronto.ca/~cks/space/blog/programming/Go116Op...

favorited · on Oct 17, 2023

golang used to break all the time on macOS, because it was using the syscall ABI, which isn't stable, instead of libSystem, which is.

dottedmag · on Oct 17, 2023

It has been fixed recently-ish.

bananapub · on Oct 17, 2023

it was fixed by using-the-system-libs

littlestymaar · on Oct 17, 2023

A long time ago, but AFAIK the fix was “use the system's libc”.

eatonphil · on Oct 17, 2023

Could you link to the fix or their docs on it? I.e. what do they do today?

favorited · on Oct 17, 2023

The comments on this GitHub issue include links to the changes in the golang code review system: https://github.com/golang/go/issues/17490

beebmam · on Oct 17, 2023

I think it's the same in Windows, right? Can't use the syscalls underneath the hood, everything through the standard libraries. Maybe I'm wrong (I know very little about Windows other than how to use it to play games, and WSL)

ynik · on Oct 17, 2023

The standard libraries on Windows don't involve libc. The Windows APIs look rather different, and in general are much more friendly to multi-threading. POSIX on the other hand tends to assume that the program is in control of everything happening inside of it, which is an incorrect assumption due to libraries.

In this particular case, the Windows APIs have neither getaddrinfo() nor getenv(); and the closest equivalent GetEnvironmentVariableW is perfectly thread-safe. Microsoft additionally has a C runtime (msvcrt) providing functions like getenv(), but this is much less fundamental than it is on other system. Every program is supposed to ship its own copy of the C runtime, it's not officially part of Windows! And it's perfectly possible for multiple different copies of the C runtime to be loaded into the same Windows process. And since *environ is a variable defined by the C runtime, there's a different copy for each C runtime...

pjmlp · on Oct 17, 2023

Almost correct, except that since Windows 10 there is now a C runtime shipped as standard, ironically it is actually written in C++ taking advantage of its safety features over plain C, and exposing the C API via extern "C".

https://learn.microsoft.com/en-us/cpp/windows/universal-crt-...

roblabla · on Oct 17, 2023

On windows it’s somewhat possible to avoid most of it by linking to ntdll, which only provides symbols for raw syscall wrappers. But a lot of it is unstable and may change from a windows release to the next.

Doing raw syscalls without ntdll is also possible, but windows syscall numbers change on essentially every release, so you’d end up with something that only works on your windows version.

silisili · on Oct 17, 2023

We've been building everything with CGO_ENABLED=0 for years now, with no nasty side effects. It gets to be a pain using the default, when something as innocuous as a point version of a Docker image breaks compatibility because of a glibc version change[1].

[1] golang official image 1.20.4 to 1.20.5 went from Debian 11 to 12 base. Always use the -(debian version) tags.

X-Istence · on Oct 17, 2023

Split DNS is broken on macOS when doing that, and for users with VPN that does split DNS it is not just an annoyance it leads to software not actually functioning.

Re-implementing system capabilities is fine and all as long as you support common use cases properly, which Golang does not.

DominoTree · on Oct 17, 2023

And on the flip side, there have been a number of instances where, in cases where the behavior differs, the Golang documentation describes a function only as it behaves with the Golang-native implementation, rather than the system implementation which ends up being the default - without calling any of this out