Hacker News new | past | comments | ask | show | jobs | submit login
Getaddrinfo() on glibc calls getenv(), oh boy (rachelbythebay.com)
282 points by defrost on Oct 17, 2023 | hide | past | favorite | 261 comments



Doesn't almost every C library function potentially call getenv()? printf()ing a number requires checking the locale which can be configured via the environment, after all.

I'd say you shouldn't be calling setenv() at all once you've spawned threads.


The environment variables don't affect locale immediately; only when you call setlocale().


> I'd say you shouldn't be calling setenv() at all once you've spawned threads.

Or, you know, get a better C library. https://src.illumos.org/source/xref/illumos-gate/usr/src/lib...


Can you explain what point you are trying to get across? I just see a page of code.


That appears to be the OpenSolaris C library.

I imagine that it is meant to convey that the quality is higher in this library than in GNU libc.


As the link contains a specific line number, I believe gp suggests to read comment blocks. I’m not very interested to dive into details atm, but at least the explanations seem to be on point.


Just read the code! /s


Grandparent didn't just link to "code"; they linked to a five-paragraph comment explaining what is going on. Did you read those comments before you posted your snarky reply?

I think the gist of it is that unlike the glibc implementation, the Illumos implementation of getenv() is lock-free and thread-safe.


you are a joy


Correct. libc is largely thread unsafe.


Thread unsafe is too kind a qualifier, libc is largely thread-broken.


Doesn't have to be. Even `strtok()`, `getpwnam()`, etc. can be thread-safe (just not reentrant).


I have to admit the conclusion I have reached is that threads are the problem not libc.

We really should have embraced a better primitive than "shared memory execution environment"


I think a lot could be fixed with functions that don't access global state, but get that state as an additional parameter. E.g. I would really like `snprintf_l()`/`fprintf_l()` to be supported by glibc. It is supported by FreeBSD, macOS (Darwin), and even Windows (with a `_` prefix for some reason)! Not by GNU libc.


I think that it is generally not reasonable to convert existing large projects to be multithreaded, in the same way that it is generally not reasonable to fully rewrite existing large projects from scratch.

The alternative that I have seen be successful, is to achieve parallelism by forking separate processes wherever you would have spawned threads, and then communicate through shared memory regions.

It's a lot like having Rust-style unsafe blocks, in that you know that if you are having a thread-safety issue it will definitely be in one of the code sections where you are touching the shared memory region, and not anywhere else.

Obviously there's a higher startup cost for forking, but this makes it possible to gain parallelism without breaking all the thread-unsafe code that is certainly in an existing large project.


> I'd say you shouldn't be calling setenv() at all once you've spawned threads.

That's why I hate ZeroMQ. They spawn threads and do magics behind your back.


ZeroMQ is a big abstraction over sockets. The higher the level of abstractions, the more these things are usually necessary.

If you're using a library like that it comes with the territory, and you have to decide for yourself the cost-benefit of using non blocking sockets and async io yourself, or trust a library.

Much more fun to write these things yourself sometimes, but I find e.g. libevent a nice somewhere-in-between abstraction level I can be happy with.


Sure, but library that spawn threads are a pain. It is convenient for the library writer but it always end up being a significant inconvenience for the user.


But the question is: Who knows what and what not to do with libc?

"But Go compiled it just fine..."


I build most of my Go binaries with cgo disabled for this, and many other reasons.

In case you don't know, cross-building with GOOS/GOARCH will imply CGO_ENABLED=0 unless you also specify CC_FOR_${GOOS}_${GOARCH}; I cross-build most of my code for (and test it on) amd64, arm64, linux, openbsd, and darwin.

Go will sometimes link to the local libc for network-related functionality if you don't disable cgo.


That's the entire point of the post and the earlier posts it links to.


The post is about getaddrinfo() specifically. It just struck me as odd to call that one out when there are far more common C library calls that use getenv().


There's a strong tendency to think of network calls as entirely universal and not tied in any way to to locale settings in the environment.

Time, date, physical spellings, ... many things are locale dependant, but socket stuff?.

It comes as a Surprise!!, and not the good kind, to many a network programmer with just a few years under their belt to discover threaded networking can segfault because of this.

Once you know, you know and don't forget (until next time), but I suspect this was the motivation behind the blog posting, the principal of potentially most surprise.


They did a post on mktime()

But, yes, in general for libc, if the manpage didn't say it's thread-safe, it is unsafe.


I think I must be missing something here, but I’ll ask anyway:

Why don’t the OS libraries have some sort of lock around setenv/getenv, so that only one thread can be inside them at a time? I can’t see how it could deadlock. And surely no-one is so dependent on the performance of these calls that the time to lock/unlock would be problematic?


getenv returns a pointer which could be invalidated after releasing the lock, unless the lock also guards uses of that pointer and all application code uses that lock, which they most certainly do not. Likewise, this scheme does not solve direct use of environ by application code.

NetBSD has getenv_r, which copies into a buffer, but few applications use getenv_r, and certainly not all of them. And it doesn't resolve environ.

Solaris never free's env strings or environ arrays, only creating new copies and atomically swapping them. It uses a special allocator for those objects which doubles the backing buffer each time it deep copies the environ array, then argues this strategy is technically asymptotically memory bounded.

EDIT: Glancing at the code I think glibc is similar to Solaris in that it never free's env strings, but it has a heuristic to conditionally free environ arrays which means directly using environ isn't thread-safe.


`getenv()` never returns pointers that can be invalidated. If you `putenv()` something, you commit to never freeing it or overwriting it.


Good point about `putenv()`; however there is also `setenv()`, which does make a copy, so you are wrong about `getenv()` in general.

POSIX explicitly states "The string [returned by getenv] may be overwritten by a subsequent call to getenv(), setenv(), unsetenv(), or putenv()" (https://pubs.opengroup.org/onlinepubs/9699919799.2008edition...).


"May", but really, it's best for `setenv()` to leak.


Having a lock would still be better than the current situation. Especially if the lock was exposed so that programs that did mess with environ directly, or the pointer returned by getenv could hold the lock while doing so.


Deadlock disaster that one.


I think the missing piece here is how POSIX specifiers the environment: `getenv(3)` and `setenv(3)` are accessors for `environ`[1], which is just a pointer to some memory (over which walking is also specified[2]). That level of granularity means that it's pretty hard to add any locking here.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1...

[2]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/s...


I wonder if you could get around this by giving each thread its own environment context, and synchronizing them, asynchronously.


Just independent environments would be the best, like it was done for locale with uselocale. But it is a breaking change, would have to go through posix and will take forever anyway. Also, as environ is a public symbol, it has ABI implications.


Because there is no way to tell when a thread is done with the buffer, there is no moment when you can be sure you can manipulate it. The options are create a new copy and leak the old one or accept the chance of a segfault.


But getenv/setenv syscall could still be under lock, which I think most of the people would use. Walking over memory could be without lock and the program could see inconsistent values as is the current behaviour.


It should never use locks. Solaris/Illumos' getenv() is lock-less. Every C library should copy that pattern.


It's a library call, not a system call. You're right though, they could implement locking.


I think in libc, there's a lot of stuff where an interface is kind of broken from a thread perspective, and it could be implemented better without changing the interface, but often people generally do not.

I can't think of any examples offhand, but I often think it about thread-local storage. Eg. lots of interfaces have an _r() equivalent where you provide the buffer, but many people still call the unsafe one which is broken when there are threads... In my mind, the best way to do this would be to use static thread-local storage in the non-_r() one, and have it call the _r() one ... Sure that has overhead and isn't a perfect solution, but it's better than "bad". But a lot of these old functions don't necessarily get love.


An a sane world creating a thread would set a global that causes non thread safe library functions to seg fault. Or maybe calling them from two different threads causes a seg fault. But just make it really obvious you're doing bad stuff.


I think that sounds completely insane. The unsafety is an emergent property of what's done in the function, and entirely dependent on usage. If you were doing very disciplined use of an unsafe call, it's harmless.

Perhaps this would be a good feature of an assert, or something that breaks in a debugger if it's attached. But I don't think that is reasonable for production.


That won’t help. The APi is broken. It returns a pointer.


The pointer isn't guaranteed to point into `environ` directly. `getenv()` could copy the value to a thread-local, (dynamically-allocated?) buffer while holding the lock.

Edit: In hindsight, a dynamic buffer would require returning ENOMEM errors (which might lead to some unexpected failures), while a static buffer would limit the value length. I think you might be right about the API being broken.


Libc and much of posix predate threads. There was no way to fix everything without changing the APIs, which they did in many cases but not all.


Call getenv() in a loop and you run out of memory.


Then don't do that. Of all the footguns in POSIX/C programming, having to remember to free this is really not as bad as you seem to imply.


You miss the point. If you have full control over when and how getenv is called, there's no issue to begin with. The problem is that you don't, as OP demonstrates. It's perfectly natural to call getaddrinfo in a loop.

We need a new API which is not broken like in NetBSD, and a multi-year migration of all core libraries to it. Well a pity it wasn't started years ago though, could've been 95% done by now.


And how would you free it? The current posix API doesn't have any way to reliably free the result returned by `getenv`.


I was suggesting that the buffer be invalidated by each subsequent call – like some other libc functions' internal buffers – although, as I noted in the edit this would need `getenv()` to be able to indicate errors (specifically ENOMEM). It currently cannot do this as currently described, because NULL is used to indicate an absent variable.

You could also require callers free the returned memory when they're done, but that would be another change of API.


> I was suggesting that the buffer be invalidated by each subsequent call

This would break very simple single-threaded programs that e.g. print two env vars in one printf call.


The solution to all problems like this was decided years ago: _r

You provide the storage and free it

The problem is these non-direct uses. They each need to switch to •_r and manage the buffer, or offer _r versions themselves and sort of pass through the problem


Of course, *_r is a better option, but the existing API is used so pervasively that it needs to be made thread-safe to actually avoid thread-unsafe code in, e.g, libraries.


I don’t see how you can make:

- return a pointer - the library owns the allocation - the state is global and mutable

Thread safe


A number of libc functions return a pointer to an internal thread-local buffer, which is invalidated on subsequent calls. If the function copies the environment variable's value to such a buffer while holding the mutex controlling access to the global state, then the returned value is guaranteed to remain unaffected by other threads.

There are, however, other problems (discussed elsewhere in this thread) that complicate such an API in the context of getenv().


Requiring you to hold a mutex to safely call the function is an API change


I was not suggesting the caller hold a mutex, but rather getenv(), which eould be transparent to the caller.


Don’t you need C++11?


That makes a lot of sense. You’d need to snapshot environ when the lock was taken (when another thread could be accessing it!), which I imagine would be complicated. Although surely possible.


On at least some other operating systems, getenv(3C) and setenv(3C) are indeed thread-safe; e.g., on illumos: https://illumos.org/man/3C/getenv

We inherited our implementation from OpenSolaris, as did Oracle when they created their Solaris 11 fork. I expect at least some of the BSDs have probably fixed this as well.


Just because it is documented as thread safe doesn't mean it actually is. They might have just not understood the problem (see e.g. the various indirect links to "please mandate a thread-local `getenv`").

`setenv` is a nasty beast, especially since the raw `environ` variable is also exposed (and is in fact the only way to enumerate the environment).


In my experience, because it is documented that’s the behavior I would expect.

That being said, I went to look[0] and it turns out it wasn’t a lie. [0]https://github.com/illumos/illumos-gate/blob/master/usr/src/...


For the curious: They make getenv() thread-safe by intentionally leaking the old environment, which they argue is acceptable because the memory leak is bounded to 3x the space actually needed.

The getenv/setenv/putenv/environ API looks terrible on closer inspection -- it does not appear possible for an implementation to be safe, leak-free, and efficient.


Just because they use locks doesn't mean it's automatically actually thread-safe.

The comment about "this is safe for people walking `environ`" is definitely a lie, for example, though the bug might be hidden on some common architectures with popular compilers in their default configuration.


I agree. I'm shocked this isn't there in 2023. Or even better, a rwlock allowing concurrent reads while serializing writes. Or some lock free algorithm for writes.

I'm pretty sure the Windows version of the environment calls has locking.

Historically you can access the environment via a global variable too, which would side-step locking schemes. But probably hardly anybody does that anymore.


SetEnvironmentVariable is thread safe on Windows, but their POSIX wrappers aren't. Windows has a better API design in this instance, and the guarantees by POSIX make it impossible to make a compliant implementation that can be used with threads.


> and the guarantees by POSIX make it impossible to make a compliant implementation that can be used with threads.

Can you be more specific than this? I kind of doubt it.

For example, seems to me you could write a setenv() that uses a lock-free algorithm or writes in a strategic way that won't result in a fault if getenv() or reading environ(7) runs concurrently, then say all bets are off for thread safety if you write via environ(7). That's safer than the status quo and I don't foresee it breaking POSIX.


Reading the spec again, I suppose it's possible to keep copies of environment variables around in memory without violating the spec, basically creating a copy for every getenv call that was modified since the last setenv/putenv. I thought the spec also specified that writes to the returned pointer would update the environment variables, but no such guarantee is given, that's just an implementation detail (and is disallowed by the API spec but good luck enforcing that).

The XSI spec does state that the result of getenv() may be overwritten by putenv() but that's not a strict requirement either.

You still risk programs and libraries expecting getenv to always return a pointer to *environ failing (I believe Go has an issue like that on MUSL).

On the other hand, the POSIX standard explicitly states that getenv() does not need to be reentrant (and therefore doesn't need to be thread safe) so any program relying on thread-safe getenv is already violating the API contract.

The rationale also seems to assume you can't make this stuff thread safe because of the standard implementation:

> The getenv() function is inherently not reentrant because it returns a value pointing to static data.


Modifying the buffer returned by getenv() seems like a terrible way to write back a value, because you could only replace it with a string with equal or shorter length. One of the problems setenv() solves is allocation.

It's important to note the difference between reentrant and thread safe. The most obvious implementation of getenv(), which would just loop through environ(7) and do a bunch of strncmp, can safely be re-entered, in that you could interrupt it and call it again and it would produce no ill effect. It just can't be overlapped with writes.



That would be nice. What is the likelihood this will actually make it into libc implementations and/or become part of the posix standard?


there are lots of programs that access `*environ` directly, so while this might be good, it wouldn’t solve all classes of the problem. There are also uses out there which are performance sensitive (and often just as if not more unsafe, such as holding pointers into the structure over long periods).

threaded programs should probably seriously consider retiring libc, but we don’t currently have a common ground replacement.

name related activities are one of the worst areas, contributing significantly to the glibc linkage and abi challenges, but also lacking sufficient standards for alternatives or even consensus to be built quickly.


Yes, on Linux you can even write `int main(int argc, char* argv[], char* envp[])`.


`envp` is unsafe after `setenv` even in single-threaded programs though. So you really should use `environ`.


The sarcastic answer to that would be something along the lines of “users should be aware of what they’re doing, and you should be more careful about calling those concurrently anyway/good-luck-have-fun”.


I’ve been searching around, and you can find a bunch of discussions about this online. Your ’sarcastic’ argument is basically the one I’ve seen in most places.

They could easily be made thread safe, but, paraphrasing, most arguments seem to come down to something like:

“setenv and getenv are POSIX functions, and not defined to lock. Just like many POSIX functions, they’ve _never_ been thread safe, and it’s an error to assume they are. Should we really start papering over client errors in use of a supposedly portable API, even though it’s working as specified? And if we make that choice pragmatically for this instance, should we be trying to do it for _all_ of POSIX? That’s impossible for some things, and would add complexity even where it’s not. For all these reasons, it’s better if these just stay dangerous like they’ve always been.”


This is fine for a 1980s monolithic program but if you use any library that reads environment variables (like, ahem, libc!) you have to treat the whole library as non-thread-safe? Or keep track of the "color" of each library function?


This kind of historical baggage is one of the main reasons I now completely avoid C/C++ programming and won't touch it ever again. It's Rust or C# only for me from here on...


The problem is that this effects higher languages too, because they often build on libc. And on some OSes, they don't have a choice, because the system call interface is unstable and/or undocumented).

For example in rust, multiple time libraries were found to be unsound if `std::env::set_env` was ever called from a multi-threaded program. See:

https://github.com/time-rs/time/issues/293 and https://github.com/chronotope/chrono/issues/499

https://github.com/rust-lang/rust/issues/27970

https://github.com/rust-lang/rust/issues/90308


Yes it's worth remembering that the POSIX base came before threads became commonly available (or even had a standardised API)


Maybe it is time to replace the POSIX base with something that is better suited to a multi-threaded environemnt.


Which is already happening on non UNIX OSes.

Even on OSes that happen to use the Linux kernel like Android, those that insist on using the NDK and pretend it is like GNU/Linux, beyond the official supported use cases, end up bumping their heads against the wall.


I’ll go one further and say that maybe it’s time we had OS’/kernel/API base that isn’t just better suited, but is explicitly designed for the massively multithreaded, ludicrously fast, massively concurrent hardware we have in spades these days.

Alas I am not as OS dev, I have not the skills or understanding to not how to build that, or what this would involve, but I do think it’s clear that what we have at the moment isn’t as well suited as it could be. Io_uring / Direct-IO seem to be better suited though.


For libc-developer, they could do better.

For the rest of us, I guess this is the best option. We can do wrappers, but, yak!


That's why getenv_s et al were added.


Even if the api was perfect and used locks and returned memory to be managed by the caller, it would still be hard to use safely in a multithreading environment as long as the env is a process global property.


On Solaris/Illumos `putenv()` and `getenv()` are lock-less and really fast.

Basically, if you `putenv()`, you commit to never freeing that memory.



Ay, yes. Though it could be made lock-less.


If I were King I would ban environment variables from the OS entirely. Global mutable state is the root of all evil! Globals are evil evil evil and the modern reliance on bullshit environment variables is a plague upon reliability.

Kill it all with fire.


Well, environment variables are not "global" globals. They are just my globals, or my post-it notes for some variables. Because they are not per-user even. They are per user session.

10 processes can have completely different set of values for the same environment variables, because they are in their own environments, and apparently, that's useful.

There are foot guns, and there are unintentional consequences of implementation and design details. This is why we patch, improve and rewrite our software over time. To iron out these kinks.

Fire is also have a tendency to cause collateral damage. So use both fire and environment variables responsibly, and world will be a better place.


They are dynamically scoped variables. Very powerful, but only a slight step above globals.


Should we also get rid of filesystems? Databases? All form of RPC?


I definitely think a lot of filesystem access is a code smell and probably not the right thing. That one causes me a lot of pain. But that’s largely because I work in games and you really need to use the Unity/Unreal/whatever asset management system instead of direct file system access.

I’ve got a small build system and the first thing it does is nuke PATH to empty. It’s glorious. No more grabbing random shit from a big blob with untracked dependencies that varies wildly by system!

I could easily live my entire life without environment variables. They’re just a fundamentally bad idea. Every program that foolishly uses environment variables can be replaced by a better program that takes a config file or arglist.


Honestly sometimes I think the answer is yes. Imagine how happy we could be, and how many fewer problems we would have. Add printers to that list and you're describing a paradise.


The value of `setenv(3)` has always been pretty murky to me -- the only time I've ever really needed it is when performing a fork-exec, and even then it's been the wrong tool for the job (the dedicated exec*e variants are the right way).

Would there be any significant downsides (besides breakage) to mapping `environ(7)` as read-only? That seems like the kind of thing that a Linux distribution (or more realistically OpenBSD) could do as a way to kill off a persistent family of bugs.


The best part is that you can't use setenv after fork and before execve as it is not async signal safe. As you mention, the envp-taking variant of execve is the only sane option.


If they did it, it would break userspace for a probably surprising amount of processes.


Why not find those? Especially in a dedicated arena whose stated goals are things just like that like openbsd.


It is quite common a pattern setting the timezones with setenv("TZ",...)

There are alternatives, but rewriting every program is not always an option.

These kind of rewriting sounds exactly like what OpenBSD would do ..


This is perhaps just my ignorance, but when do you find yourself needing to set the timezone like that? Not in 10+ years of C programming have I ever had to do that.


I assume doing something on behalf of a remote user and still convinced locales can do a thing other than hurt you.


In most of the time, you just set it on program start and never change. This is quite common for programs that expects UTC.

Searching in github, other example are:

  1) `jq` parsing datetime
  2)  systemd , timectl printing status
  3) Unit tests 
  4) Some RDP client update its own timezone with server timezone


I might do that if I am writing a quick-and-dirty program that works with times in multiple time zones and I can’t be bothered to find a library with a better API.


I think the problem is in setenv(3), not getenv(3). Reading shared global state is okay as long as it is not mutable. If someone relies on modifying environment variables, one should use execve(3), not setenv.


Exactly. Setting environment variables while a program is running is a terrible idea. Thread safe or not.

A lot of code, for good reasons, assume envvars are constants set before the program started and caches computations based on them, read config files and so on.

The fact that they are essentially global variables should be enough to deter usage of them.


> The fact that they are essentially global variables should be enough to deter usage of them

Env var behaviour is much closer to dynamic variables, rather than global variables (which I argue at http://www.chriswarbo.net/blog/2021-04-08-env_vars.html )

Either way, I agree that mutating them is usually a bad idea; though I find them very good for constant (or dynamically-bound) config.



For those who, like me, are a bit rusty on the man memorization:

execve(3): https://linux.die.net/man/3/execve


This is something that has bit Rust as well: https://github.com/rust-lang/rust/issues/90308


Many posix (and c standard library for that matter) functions were not designed with multithreaded programs in mind and don't work well in multithreaded programs.

I really think it would be worth creating a new standard API that is built with threading in mind, where functions like mktime, getaddrinfo, localtime, etc. take arguments instead of reading from the environment, that avoid global state as much as possible, and are thread safe if there is global state.



Doesn't musl have the same issue? https://github.com/JuliaLang/julia/issues/34726#issuecomment...

I also wonder about OSX's libc. Newer versions seem to have some sort of locking https://github.com/apple-open-source-mirror/Libc/blob/master... but they free pointers so it's not safe from a potential use-after-free client side.

but older versions (from 10.9) don't even have any locking: https://github.com/apple-oss-distributions/Libc/blob/Libc-99...


Solaris/Illumos also has a very nice C library with lots more thread-safety than others, including `getenv()` being thread-safe.


Almost everybody calls getenv(). malloc for tunings, checks, tracing and such, half of the string library for localization specifics, all of the locale and time and timezone functions, many math functions need fegetenv().

Also most set errno, ha!


errno is thread safe (and thread local).


GAI just needs to go. It needs to be moved to its own daemon with a simple RPC.

libc is kinda schizophrenic in this regard. It has mostly obviously low-level functions like string manipulation and memory management, and then unexpectedly a DNS client implementation and a support for arbitrary runtime plugins (for PAM).


This is essentially what systemd-resolved with the nss-resolve NSS module is right? It’s possible to use /etc/nsswitch.conf to entirely disable the built-in DNS resolution in glibc if you want.


IME, familiarity with /etc/nsswitch.conf and the rest of the nss stuff is very highly correlated with people who have seen some shit.


Yep, it's close. But you still depend on the unholy mess that is nsswitch.conf

It's still a bit worrying, with manual JSON parsing: https://github.com/systemd/systemd/blob/79f487038444646f5bce... But at least it's just ~600 lines of fairly straightforward code.

You also can get most of it with nscd.

(Sigh, I wish BUS1 guys pushed their project to completion)


This, along with the general inability to handle per-interface DNS resolution, is the primary reason systemd-resolved exists.


Why would you ever call setenv after spawning threads though?

Or are there other sneaky calls which will do that behind you back?


...on Windows, single-threaded programs don't really exist; any DLL can, and most of them do, spawn worker threads as an implementation detail. Some of them do it the moment their initializer is being run, so if you link your program against something else than kernel32 and its friends (the basic Windows system libraries don't spawn worker threads on being loaded), then when a thread finally starts executing your executable's entry point there is no guarantee that this is the only thread that exists in your process. And in fact, finding a non-toy, real-world Windows application that has only one thread is almost impossible (for example, IIRC all .NET runtimes have worker-thread pool from the get go so that rules out any .NET executables).

Which is why on Windows there is almost no system APIs (well, almost: there were some weird technical decisions around single-threaded apartments for COM...) that can be safely used only in single-threaded applications.

Maybe in several more decades Linux community will also accept the fact that multi-threaded applications are an entirely normal and inevitable thing, not an aberration of nature that we all best pretend don't exist until we're absolutely forced to deal with their reality.


Well, that's the gotcha isn't it?

It's easy to think about some complex interactive software where the need to call setenv appears only after you have worker threads doing some other thing. Without a warning, you won't know it's a bad thing to do, and the manpage only says that it and unsetenv are not thread safe, as if this was remotely enough information.

What nobody is telling is that the environment is so big that you need it to compress data or open an IPv6 connection. It's not obvious at all that you can't do those things while editing a variable.


There’s always a lot of weird emergent behavior in bootstrapping an app, and on an app of any serious size, I can’t entirely control if someone decides to spool up a thread pool on startup so that everything is hot before listen() happens.

I may think I have control, I may believe that a handful of us are entitled to have that say, but all it takes is someone adding a cross dependency that forces an existing piece of code to jump from 20th position in the load order to 6th and all hell can break loose. Or just as often, set a ticking time bomb that nobody notices until there’s a scaling or peak traffic event or someone adds one more small mistake to the code and foomp! up it goes.


That’s literally explained in the article. It’s worth reading more than the headline.

Ed: actually, that’s even spelled out in the headline.


It’s neither in the headline or in the article. The question was about setenv, not getenv.

It is best to avoid calling setenv in a threaded program. Some programs do it to make space for rewriting argv with large strings (freeing space from *environ which tends to be right after the tail of argv). Some programs or libraries use *environ directly to stage variables for exec before forking. Some want to pass variable changes to forks. There are alternatives possible, but in the context of something like go calling libc setenv, it’s to make interop easier- sadly it may make other interop harder, such as this case.


? setenv not getenv. You'd rarely use setenv, and even then you'd do it at startup.


Right. That's been my experience so far, hence my question.


It's not. OP was asking about setenv, not getenv...


On Linux/musl, the man page also specifies that LOCALDOMAIN and RES_OPTIONS are inspected.

The man page for OpenBSD indicates the same thing: https://man.openbsd.org/resolv.conf#ENVIRONMENT

Apparently it's not a gnu-specific behaviour.


Which is why I have gotten rid of getaddrinfo() calls in my server code, and rather resolve DNS directly reading the DNS server setting from the system.

Other issues I faced :

- Not epoll() friendly. Always forks a process while resolving domain name.

- Valgrind complains of uninitialized memory touches when the function is called and I can't get rid of it.


> Which is why I have gotten rid of getaddrinfo() calls in my server code, and rather resolve DNS directly reading the DNS server setting from the system.

This works as long as you don't need support for mDNS or LDAP host resolution, which depends on libnss/nsswitch on glibc-based systems. Which is fine, but should be a well documented limit of this of approach.

(this is also what the Go runtime does by default, but they automatically fall back to the glibc resolver in any more complex case: https://pkg.go.dev/net#hdr-Name_Resolution)


how confident are you that the DNS settings you are honoring are the ones the user intended?

For example, in a split-VPN situation?


If it's your application on your server, you can be pretty confident. Only later, it may surprise someone why doesn't this application react to system-wide config. Or it may never happen.


Those redirections happen further in the stack. You are ok reading settings from /etc/resolve.conf, which frequently points to a localhost daemon that redirects your calls to whatever DNS setting you have in your connection.

But parsing /etc/resolve.conf and using it is all that you need in your code.


Your complete lack of mention of nsswitch.conf makes me believe that you're not implenting anything correctly in this instance.


* /etc/resolv.conf


And your code correctly parses all the options that can be in that file?


You don't need all the options. Search google. Parsing resolve.conf is an old technique and the file was written assuming individual apps will be parsing it. You will find instructions on how to do it in say 4 lines. Explicitly for this file. Not any random conf file from the system.


Then your application won't be portable. Which is fine if you have no plans of distributing it. But otherwise I can guarantee it will break on some machine.


You shouldn't call setenv in the first place. Instead, use a new environment when you call execle/execvpe. This avoids problems with multiple threads.


Some APIs require the use of setenv. For example, you need to setenv("TZ", ...) if you want to use multiple timezones.


You mean linked libraries you don't have the source of? Yes, in that case set the environment vars as early as possible (if possible before starting any thread).


Keep in mind: Don't use multiple threads unless you really, really need to, and have thought long and hard about concurrency issues.

In a way, I think the fact that many library functions are not thread-safe should be viewed as an encouragement to not use threads, or use them only for the bare minimum necessary.

I say this from a few decades of experience fighting with race conditions and the like, and whereupon several times I rewrote an existing multithreaded process into a single-threaded one and greatly improved performance and reduced memory usage. The architecture astronauts may have moved on to stuff like microservices now, but in the 90s/2000s threads were overused just as much.


That's basically impossible in many modern programming environments - even if you never spawn a thread, something else in your executable probably has. By the time your iOS or macOS app has finished launching, it has multiple threads. The Windows loader uses threads to load DLLs.


There are many ways to do multiple threads wrong. Seems that the "right way" is to wake up a sleeping but already-created thread, and take elements out of a work queue in a threadsafe way. Your main thread can even be processing elements during the 10000 clock cycles it takes to wake up a thread.


For one, if you are using getaddrinfo you are often forced to use threads as gai is a slow blocking call.


There is no reason that `getenv()` should ever lock -- it should always be lock-free. `putenv()`/`setenv()`/`unsetenv()` can lock, of course, since there's no point allowing more than one writer at a time.

Don't believe me? Look at this work of art (the entire file!): https://src.illumos.org/source/xref/illumos-gate/usr/src/lib...


What are you trying to say using that example? The getenv() function shown calls initenv(), which can (clearly) take a lock?


One time. Also, I think that could be removed with some care.


But it unconditionally calls a memory barrier, which is most of the cost of an un-contended futex-style lock already.

And I'd be interested in your ideas on removing the lock - as I can't see any paths that don't change semantics (e.g. unconditionally doing the init work at process start time when you know there's not multiple threads, for example)


> a memory barrier, which is most of the cost of an un-contended futex-style lock already

Yes, but it's not a lock.

> unconditionally doing the init work at process start time when you know there's not multiple threads

That's the most obvious fix, yes. I was thinking (but I've not checked yet) that when `my_environ` is not set up yet then `getenv()` can use `_environ` directly.


It’s easy to sterilize your code in this regard (although as this article points out you need to know to do it).

POSIX implements a three-argument `main` function (just look at `exec()`) where the third argument is `char* envp[]`. You can call `setenv` to manipulate it.

But easiest is to just null out the POSIX extern `char* environ` (save a copy if you want to consult it yourself later). Just `man 7 environ`


It's not clear from the man pages, but `setenv()` mutates `environ`.

In particular, setting a previously-undefined variable causes `environ` to be reallocated. Whereas `setenv()` of an extant variable changes just that value in the current `environ` pointer array.


>But easiest is to just null out the POSIX

And then wonder why some shared library dependency few layers deep blew up.


I wonder if this only applies to the dns resolver, or also other NSS modules, like the systemd resolver. And don't forget about nscd: if nscd is running, then all of the nsswitch stuff will be done out of process.

That might mean a viable workaround is enabling nscd, oddly enough.

And frankly, maybe libpthread should just overlay thread-safe getenv/setenv like I believe it does for a couple of other libc symbols.


We've once spent 3 months debugging crash caused by this.


After reading some comments, to avoid the problems

- immediately copy the contents of the buffer the pointer from getenv() points to

- don't use getenv after threads have been started

A library could be written which makes an immutable copy of the whole environment before starting main(). This library then hands out pointers to the environment copy. Or to be even more secure make another copy of the environment variable. This trades some efficiency for security.

In effect, ignore the mutable accessors like setenv from libc.

Or did I miss something? I am not an expert in these things.

And of course it won't solve the problem of two other libraries fighting with each other...


Everything would be cool if it weren't for there being situations in which you need to set an environment variable in the same process in order to get a function to do something. E.g the TZ variable in order to coax a behavior out of the time functions.

If environment variables are for child processes only, there is no need to use setenv because you can pass a specified environment array through exec.


Why threads are a bad idea (for most purposes) https://blog.acolyer.org/2014/12/09/why-threads-are-a-bad-id...


getaddrinfo is terrible. Did you know it also opens and then connects to a socket? Any process that uses getaddrinfo needs a blanket exception in my firewall in order to work properly, because otherwise it will fail to connect to some randomly-generated port that it just made up.


How do you propose that DNS resolution precede without opening a socket?


It opens a socket to the same process that called getaddrinfo. That is, it's just communicating with itself, using a brand-new randomly-generated port for each call. This should be completely unnecessary.


So, I have a program that does getaddrinfo(3) and nothing more, and this program setup a socket, listen(2) to it, create another socket and connect with it to the first one ?

This looks insane and not what strace(1) tells me

Could you give me more details ?


I didn't know glibc didn't do the same thing. `getaddrinfo()` on Windows seems to do this because randomly a program will try to connect to `::1:59962` or something, and if I don't allow it in my firewall, it will start whining that some getaddrinfo thread failed to start. This has happened across all sorts of different programs. It's infuriating.

I thought it was just a general libc thing. Isn't there a spec on this somewhere?


For one thing, it could delegate to a local service. Granted, the communication to this service is probably still be over a socket interface, but at least as a purely-local connection you would hopefully have some better worst-case performance characteristics.

This is basically what dnsmasq does when you use it as a local DNS cache.


Completely agree. Just use systemd-resolved which is the recommended way of doing DNS anyway.


And how do you communicate with systemd-resolved?



dbus is just a protocol that needs something else to actually transfer the data.

Normally, that's a socket in /run/dbus/system_bus_socket


And guess what. A UNIX socket does not use DNS or the damned getaddrinfo() function that's the ire of the article.


What should I use instead that doesn't open a socket?


They probably could have given a more apt name


Either Linux's fault or Glibc's fault.

POSIX mandates getaddrinfo to be thread safe.


Yes, but `setenv` is documented as not thread-safe.

Once you do unsafe things, the nasal demons can spread to safe code elsewhere.

Glibc's documentation is more explicit about the propogation of the nasal demons:

       ┌─────────────────────┬───────────────┬─────────────────────┐
       │Interface            │ Attribute     │ Value               │
       ├─────────────────────┼───────────────┼─────────────────────┤
       │setenv(), unsetenv() │ Thread safety │ MT-Unsafe const:env │
       └─────────────────────┴───────────────┴─────────────────────┘
       ┌────────────────┬───────────────┬────────────────────┐
       │Interface       │ Attribute     │ Value              │
       ├────────────────┼───────────────┼────────────────────┤
       │getaddrinfo()   │ Thread safety │ MT-Safe env locale │
       ├────────────────┼───────────────┼────────────────────┤
       │freeaddrinfo(), │ Thread safety │ MT-Safe            │
       │gai_strerror()  │               │                    │
       └────────────────┴───────────────┴────────────────────┘


This! And the attributes page explains these even better:

       const  Functions marked with const as an MT-Safety issue non-
              atomically modify internal objects that are better
              regarded as constant, because a substantial portion of the
              GNU C Library accesses them without synchronization.
              Unlike race, which causes both readers and writers of
              internal objects to be regarded as MT-Unsafe, this mark is
              applied to writers only.  Writers remain MT-Unsafe to
              call, but the then-mandatory constness of objects they
              modify enables readers to be regarded as MT-Safe (as long
              as no other reasons for them to be unsafe remain), since
              the lack of synchronization is not a problem when the
              objects are effectively constant.

              The identifier that follows the const mark will appear by
              itself as a safety note in readers.  Programs that wish to
              work around this safety issue, so as to call writers, may
              use a non-recursive read-write lock associated with the
              identifier, and guard all calls to functions marked with
              const followed by the identifier with a write lock, and
              all calls to functions marked with the identifier by
              itself with a read lock.
and

       env    Functions marked with env as an MT-Safety issue access the
              environment with getenv(3) or similar, without any guards
              to ensure safety in the presence of concurrent
              modifications.

              We do not mark these functions as MT-Unsafe, however,
              because functions that modify the environment are all
              marked with const:env and regarded as unsafe.  Being
              unsafe, the latter are not to be called when multiple
              threads are running or asynchronous signals are enabled,
              and so the environment can be considered effectively
              constant in these contexts, which makes the former safe.


It's great that Glibc documents its POSIX violating behavior very well, but it doesn't change the fact that it violates POSIX.


POSIX does document it, it just requires carefully picking through pages and carefully thinking about the wording, unlike the simplicity of GLIBC documentation. For example, the best information is on the page for `exec`.


This table is unreadable on a narrow screen.


The general rule of unsafe api(usage) is "once you done it, nothing is expected to safe even if it was stated to be".

Error can spread in your program in funny way, which also break the program in funny way (think about stack overflow or double free).

Unless the manual/document explicitly stated what error can it cause, there is no way to know without actually trigger it.


glibc, I’m too lazy to lookup if it removes the getenv lookup with posix build flags.


getenv() could be made thread-safe by leaking the memory returned.



Yes, that’s why nobody likes musl’s DNS.



Stuff like this is why I'm supportive of newer languages like Go and Zig that sidestep libc entirely (when not using cgo as in TFA of course). libc is a great achievement and has served us well but, boy, it sure is a product of its time.

`errno` is another relic that needs to die yesterday.


Depending on the operating system, you can't skip libc even in Go. I think it's required on openbsd and illumos/solaris for example.

https://utcc.utoronto.ca/~cks/space/blog/programming/Go116Op...


golang used to break all the time on macOS, because it was using the syscall ABI, which isn't stable, instead of libSystem, which is.


It has been fixed recently-ish.


it was fixed by using-the-system-libs


A long time ago, but AFAIK the fix was “use the system's libc”.


Could you link to the fix or their docs on it? I.e. what do they do today?


The comments on this GitHub issue include links to the changes in the golang code review system: https://github.com/golang/go/issues/17490


I think it's the same in Windows, right? Can't use the syscalls underneath the hood, everything through the standard libraries. Maybe I'm wrong (I know very little about Windows other than how to use it to play games, and WSL)


The standard libraries on Windows don't involve libc. The Windows APIs look rather different, and in general are much more friendly to multi-threading. POSIX on the other hand tends to assume that the program is in control of everything happening inside of it, which is an incorrect assumption due to libraries.

In this particular case, the Windows APIs have neither getaddrinfo() nor getenv(); and the closest equivalent GetEnvironmentVariableW is perfectly thread-safe. Microsoft additionally has a C runtime (msvcrt) providing functions like getenv(), but this is much less fundamental than it is on other system. Every program is supposed to ship its own copy of the C runtime, it's not officially part of Windows! And it's perfectly possible for multiple different copies of the C runtime to be loaded into the same Windows process. And since *environ is a variable defined by the C runtime, there's a different copy for each C runtime...


Almost correct, except that since Windows 10 there is now a C runtime shipped as standard, ironically it is actually written in C++ taking advantage of its safety features over plain C, and exposing the C API via extern "C".

https://learn.microsoft.com/en-us/cpp/windows/universal-crt-...


On windows it’s somewhat possible to avoid most of it by linking to ntdll, which only provides symbols for raw syscall wrappers. But a lot of it is unstable and may change from a windows release to the next.

Doing raw syscalls without ntdll is also possible, but windows syscall numbers change on essentially every release, so you’d end up with something that only works on your windows version.


We've been building everything with CGO_ENABLED=0 for years now, with no nasty side effects. It gets to be a pain using the default, when something as innocuous as a point version of a Docker image breaks compatibility because of a glibc version change[1].

[1] golang official image 1.20.4 to 1.20.5 went from Debian 11 to 12 base. Always use the -(debian version) tags.


Split DNS is broken on macOS when doing that, and for users with VPN that does split DNS it is not just an annoyance it leads to software not actually functioning.

Re-implementing system capabilities is fine and all as long as you support common use cases properly, which Golang does not.


And on the flip side, there have been a number of instances where, in cases where the behavior differs, the Golang documentation describes a function only as it behaves with the Golang-native implementation, rather than the system implementation which ends up being the default - without calling any of this out


Yeah, there's so much misery in the C ecosystem that it's better to eschew it altogether. Even merely packaging anything that depends on C ends up being a hugely painful undertaking since every C library has its own bespoke build system and its own implicit set of dependencies (and implicit versions of those dependencies, and expectations about where on the system those dependencies live).

I mostly like C as a language, but between the security concerns and the tooling concerns (and the community's zealous devotion to ignoring these very real problems) I'm really excited for its increasing marginalization. Unfortunately, it's not being marginalized in favor of "a better C", but rather every ecosystem is rewriting the same stuff from scratch which seems like a bit of a bummer (but still better than depending on C).


> since every C library has its own bespoke build system and its own implicit set of dependencies (and

You should see other libraries. At least glibc does not require meson, cmake and ninja.


yes, it only requires autotools. a build system so friendly that it spawned cmake and meson to replace it.


> `errno` is another relic that needs to die yesterday.

ok, i will bite: what is the problem with it ?


Shared global (well, thread local) mutable state.

As for why shared global mutable state is (generally) bad, see: https://softwareengineering.stackexchange.com/questions/1481...

`man 3 errno` on my Linux system even has a note calling out a common failure pattern. Can you spot the problem?

           if (somecall() == -1) {
               printf("somecall() failed\n");
               if (errno == ...) { ... }
           }


regarding your snippet:

           if (somecall() == -1) {
               printf("somecall() failed\n");
               if (errno == ...) { ... }
           }
sure, the issue is that `somecall(...)` might have altered `errno` through 'acts-of-omission-or-comission' :o)

fwiw, posix has updated its definition to pretty much say that 'value of errno in one thread is not affected by assignments to it by another'. this has been the case since at least a decade-and-a-half (iirc), which in internet years would positively be in the pleistocenic era :o)

so, i am not sure i really appreciate 'the shared-global-mutable-state' argument above. thanks !


The problem in that snippet is that `printf` could have altered the `errno` set by `somecall`, and that's only thanks to it being shared-global-mutable-state. You not realizing that was possible makes for a great example of why shared-mutable-global-state is hard to reason about.


crap, i definitely meant to write `printf(...)` there !

my typical usage for such scenarios i.e. when i know that callee might alter errno etc. is to

        int save_errno = errno;

        do_foo(...);

        if (errno == ...) {
          ...
        }
wrapping libc into something that (maybe) does better seems like such a sisyphean task to me.


This thread isn't talking about how to fix the errno problem generally. It's talking about the existence of a problem in the first place. Fixing it would be a whole different can of worms, and indeed, sisyphean sounds about right.

Notice how this entire thread was started by someone asking why errno was problematic. This is just about understanding.


> Notice how this entire thread was started by someone asking why errno was problematic. This is just about understanding.

yes, you are absolutely right. this is just about understanding and how easy it is to miss what is hidden just one-level away.


I specifically mentioned thread local. The problems of shared global mutable state aren't limited to multi threaded environments.


There are 2 errno mistakes in that snippet since there is no way to know who set errno.


What's the alternative?

Returning a"Result" struct doubles the size. This is one less register to use.

Exception handling is even more invasive.

They are great for high(-er) level language, but less prefect on lower level where performance is critical.

EDIT: Linux kernel use negative return value for error. It's good and efficient when it work. But it is not always an option when you need the full register width


You are saving one register at the cost of having a thread local variable that is visible to signal handlers, so none of its uses can be optimized away. Which results in things like gcc having to decorate every math instruction with code to set errno on the off chance that someone somewhere might read it (no one ever does).


> Returning a"Result" struct doubles the size. This is one less register to use.

One less register right before and after returns doesn't sound like a big problem, especially with 16+ registers.


Most of the POSIX functions that use errno for error signaling simply return 0 for success and -1 for error. They could have returned errno directly.


Last I checked, the Linux kernel ABI does return -errno for errors. Then libc mangles that all up.


Some newer POSIX APIs, such as pthreads, do return the error this way. But many legacy APIs, such as dup or read, use the positive integer space, thus the negation pattern you often see in syscalls. Notably, POSIX guarantees <errno.h> values to be positive integers.


`dup` and `read` returns -1 on error and set the `errno` variable. If they were redesigned, they should just return `-errno` on failure.


What about the rest?


Put the original return value in a pointer argument, and return errno (or -errno).


There are a lot of alternatives, and it's not clear why the ones you've suggested are inappropriate. You've listed some perceived costs, but I don't see why those costs are greater than the ones paid by the status quo.

Linux even shows you a path, yet you reject it for reasons that don't seem compelling to me.


The performance cost of having otherwise pure functions clobber global mutable memory defeating many optimization passes, is way higher than clobbering another register for the result.


It's global state for a local condition.

Linus Torvalds on errno: https://yarchive.net/comp/linux/errno.html


> ... Linus Torvalds on errno: https://yarchive.net/comp/linux/errno.html

yes, he argues against its usage in the KERNEL.


But his argument stands even outside the kernel. Errno is awful, similar to locale APIs.


The most obvious way it is wrong is that it is archaic. There is simply no reason to ever pass return values in hidden state. Just use return values damnit.


Most of the still working privilege escalation exploits from 2006 are only still there because it is intended behaviour of glibc.

Environment variables like LD_PRELOAD should never ever be available in production.

I totally understand why the muslc developers kinda freaked out and started their own standard library.


Err no thank you. Ld preload and similar mechanism are great to inject code into apps legitimately, i.e. to patch long unsupported systems or to tame current ones.

For example I have vision issue and without reshade filter I would be unable to play a great deal of games.

Now that is also an attack vector, that's for sure, but you cannot go ax features willy nilly just because you don't see value in them.



LD_PRELOAD won't be needed if the OS were built around containers / jails, instead of the weakly isolated processes and process groups.

The Unix kernel (both Linux, BSD, and Solaris) already had much of what's needed, say, 30 years ago, but nobody saw it as such a burning necessity (likely except Solaris which eventually developed Zones).


On a "normal" desktop system, you don't need containers or jails. Your programs must communicate with each other (copy paste, print screen, etc.).

But today every god damn UI program needs an internet connection to phone home and execute remote code. This is the actual problem which must be fixed.


Are you confusing LD_PRELOAD with LD_LIBRARY_PATH? I'm not sure how jails and containers help with the former.


At least it could be additionally guarded by a system setting or something like that.


Yes, for example by setting an environment variable.


> Err no thank you.

> Err no

Not sure if you are trolling


this is not a unix beartrap, this is a bug in Go if that's where it was found. If your code is multithreaded, it's up to you to make it threadsafe. You can't declare you're creating the most threadsafe memorysafe newbie safe system, and then go home. You have to write the code.

Your program gets its own copy of the environment when the program was launched. Nobody is changing it on you, any contention for that resource is you contending with yourself.

You don't expect an operating system to change things out from under you. Unix doesn't. If there is contention for this resource, it's all you (or whoever wrote the library you are using)

The environment is name-value pairs, as strings. That's it. That's what makes it accessible and useful. You can swallow it up, the whole thing, into whatever data structure you prefer in your language in a few lines of code, and a millisecond (if that much) of runtime. Just learn how things work and you won't feel helpless.


Most people would expect getaddrinfo to be re-entrant, and in the Linux man 3 page it is: https://www.man7.org/linux/man-pages/man3/getaddrinfo.3.html

There is no clear/obvious mention of the fact that setenv could interfere with it. It is a glibc footgun/beartrap that this re-entrancy doesn't actually mean calling it with non-shared memory in the arguments ensures no data races.


there could very well be a bug. if there's a bug, fix it.

But from what I read, it's documented as not threadsafe and this problem is happening in a threaded environment. That still could be called a bug. You find bugs, you fix them;

That's better than hyperventilating about how there is some massive problem with an operating system that you are using because all the people who came before you who knew more than you decided it was the best thing to use. The other OSes sucked more. This OS is simple enough that you can actually learn how it works, it's all laid out in front of you.

Should it do some additional things? sure. Help write the code.


A beartrap/footgun means "a bit too easy to harm yourself for comfort", not "there is a bug". I am relatively old and have been programming to this interface for a few decades, so I'm not hyperventilating about it.

My point (which made reference to the actual OS documentation) was that it takes being bit by such issues before you learn what this "Attributes" section means:

> │Interface │ Attribute │ Value │

> │getaddrinfo() │ Thread safety │ MT-Safe env locale │

For the 1st decade of my career, I just searched for "re-entrant", "reentrant", or "MT" and went on my merry way, without realizing the importance of the other possible values of attributes table: in this case "env".

It's worth highlighting!


when you read the doc and realized you missed something, don't say "footgun", say "aha! I misunderstood" Then if you want, set about to make things better.

Instead of jumping to "It shouldn't work this way" consider "Hmmm, why does it work this way? Is it possible that Thompson, Kernighan and Ritchie knew more about how to make a coherent straightforward system that would last 50 years in 1970 than I know how to now?"


Or perhaps you might consider that we've learned something in the last 50 years.

It sounds to me like you're hyperventilating about other people pointing out footguns. Maybe, just maybe, people in the past were capable of making mistakes. Let's not put them on a pedestal.


In fairness, they also gave us the joys of `strcpy(src_ptr, dest_ptr)` and `scanf("%s", str_ptr)`, which with the benefit of hindsight and many buffer overflows later were a terrible idea.


In the attribute table, thread safety is clearly marked as "Thread-Safe, env", with a link just below explaining the meaning of the attributes.

As many things, the behaviour is surprising and can be overlooked. But undocumented is not.


Yes, I reference that table in a deeper comment. I didn't mean that it wasn't accurately documented, but rather that it's not immediately apparent. I know lots of people who were once quite experienced but still not enough to know what the attributes section is and its importance and its full meaning (myself included).

Indeed, surprising and easy to overlook was all I was trying to convey.


       ┌───────────────────────────┬───────────────┬────────────────────┐
       │Interface                  │ Attribute     │ Value              │
       ├───────────────────────────┼───────────────┼────────────────────┤
       │getaddrinfo()              │ Thread safety │ MT-Safe env locale │
       ├───────────────────────────┼───────────────┼────────────────────┤
       │freeaddrinfo(),            │ Thread safety │ MT-Safe            │
       │gai_strerror()             │               │                    │
       └───────────────────────────┴───────────────┴────────────────────┘


       MT-Safe
              MT-Safe or Thread-Safe functions are safe to call in the
              presence of other threads.  MT, in MT-Safe, stands for
              Multi Thread.

              Being MT-Safe does not imply a function is atomic, nor
              that it uses any of the memory synchronization mechanisms
              POSIX exposes to users.  It is even possible that calling
              MT-Safe functions in sequence does not yield an MT-Safe
              combination.  For example, having a thread call two MT-
              Safe functions one right after the other does not
              guarantee behavior equivalent to atomic execution of a
              combination of both functions, since concurrent calls in
              other threads may interfere in a destructive way.

              Whole-program optimizations that could inline functions
              across library interfaces may expose unsafe reordering,
              and so performing inlining across the GNU C Library
              interface is not recommended.  The documented MT-Safety
              status is not guaranteed under whole-program optimization.
              However, functions defined in user-visible headers are
              designed to be safe for inlining.


   Other safety remarks
       Additional keywords may be attached to functions, indicating
       features that do not make a function unsafe to call, but that may
       need to be taken into account in certain classes of programs:

       locale Functions annotated with locale as an MT-Safety issue read
              from the locale object without any form of
              synchronization.  Functions annotated with locale called
              concurrently with locale changes may behave in ways that
              do not correspond to any of the locales active during
              their execution, but an unpredictable mix thereof.

              We do not mark these functions as MT-Unsafe, however,
              because functions that modify the locale object are marked
              with const:locale and regarded as unsafe.  Being unsafe,
              the latter are not to be called when multiple threads are
              running or asynchronous signals are enabled, and so the
              locale can be considered effectively constant in these
              contexts, which makes the former safe.

       env    Functions marked with env as an MT-Safety issue access the
              environment with getenv(3) or similar, without any guards
              to ensure safety in the presence of concurrent
              modifications.

              We do not mark these functions as MT-Unsafe, however,
              because functions that modify the environment are all
              marked with const:env and regarded as unsafe.  Being
              unsafe, the latter are not to be called when multiple
              threads are running or asynchronous signals are enabled,
              and so the environment can be considered effectively
              constant in these contexts, which makes the former safe.


Yes, I talk about that table in a deeper comment. Note to other readers, the table itself is in getaddrinfo(3) but the explanation of the elements is in attributes(7).


This is the viewpoint of someone who works alone or on small teams, where all the code in the process is their own. In a complex program there is plenty that can do surprising things that you didn’t expect.


no, this is a person who understands what is there. People who imagine something else is there, are disappointed when it's not.

People who try to invent the new new, and don't complete the task, blame unix, when unix just does what unix said it would do. It was the people who made bigger claims but did not deliver who should step up and say "I didn't turn the unix environment into what I thought I did."

but people trying to invent the new new are trying to invent it on unix because unix delivers what it promises. It doesn't deliver what you promise. But have some humility, and accept your failure and don't try to pin it on unix. Go implement on Windows.

Unix does allow you to deliver what you promise, that's why you'll still be complaining about unix 10 years from now, but it's up to you to deliver what you promise.

people who say "footgun" are people who try to shed responsibility; "it wasn't my fault, waaaaaah". I'm not saying we should not make computers easier to program, I'm saying that when we fail to: "it's a poor worker who blames his tools."

Why is it so important to you to blame this on unix, when you could blame it on a mistaken implementation of a library that could be fixed?


The words "blame" and "guilt" do not even appear in the article, and "fault" only as part of "segfaulting". Rachel merely points out an easy-to-make mistake before others make them, to save them time. She isn't talking about who is to blame. No need to get defensive.


I'm confused, aren't we talking about the getaddrinfo() syscall?

It definitely seems like that syscall is a (multithreaded) footgun that Go just happened to hit.


It's not a syscall, it's a C standard library function that may read files, talk to dbus, read env vars, etc. The problem is people quite understandably expect its re-entrancy to mean more than it really means.


Oh, that's my error. Shows you how much code I write that calls into libc, i.e. almost none.


If you write anything other than Go or Zig on Linux, you're probably calling libc without realising.


Obviously. But I meant directly writing the code that calls into libc myself.


a "syscall" is not going to call back into your code to get the environment. If a "syscall" is using your environment, it's part of your code, probably better described as a library call.


Few people make raw syscalls.


I always thought getenv() was code smell, and this confirms it. I guess a better option would be to add a parameter to Getaddrinfo(), and deal with it further up the stack?


getaddrinfo is far too ossified to be changed


You know the source code and manual pages are open source. You can look for these things.


She does in fact know these things, which is why she’s writing about it. For the people who may not have thought to do that yet.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: