The major takeaway from this is that Rust will be making environment setters unsafe in the next edition. With luck, this will filter down into crates that trigger these crashes (https://github.com/alexcrichton/openssl-probe/issues/30 filed upstream in the meantime).
But that won't actually fix the underlying problem, namely that getenv and setenv (or unsetenv, probably) cannot safely be called from different threads.
It seems like the only reliable way to fix this is to change these functions so that they exclusively acquire a mutex.
I have a different perspective: the underlying problem is calling setenv(). As far as I'm concerned, the environment is a read-only input parameter set on process creation like argv. It's not a mechanism for exchanging information within a process, as used here with SSL_CERT_FILE.
And remember that the exec* family of calls has a version with an envp argument, which is what should be used if a child process is to be started with a different environment — build a completely new structure, don't touch the existing one. Same for posix_spawn.
And, lastly, compatibility with ancient systems strikes again: the environment is also accessible through this:
I think there's a narrow window, at least in some programming languages, when environment variables can be set at the start of a process. But since it's global shared state, it needs to be write (0,1) and read many. No libraries should set them. No frameworks should set them, only application authors and it should be dead obvious to the entire team what the last responsible moment is to write an environment variable.
I am fairly certain that somewhere inside the polyhedron that satisfies those constraints, is a large subset that could be statically analyzed and proven sound. But I'm less certain if Rust could express it cleanly.
Your process can be started in a paused state by a debugger, have new libraries and threads injected into it, and then resumed before a single instruction of your own binary has been executed... and debuggers are far from the only thing that will inject code into your processes. If you're willing to handwave that, pre-main constructors, etc. away, you can write something like this easily enough:
struct BeforeEnvFreeze(());
struct AfterEnvFreeze(());
impl BeforeEnvFreeze {
pub fn new() -> Self { /* singleton check using a static AtomicBool or something */ Self(()) }
pub fn freeze(self) -> AfterEnvFreeze { AfterEnvFreeze(()) }
pub fn set_env(&self, ...) { ... }
}
impl AfterEnvFreeze {
pub fn spawn_thread(&self, ...) { ... }
}
fn main() {
let a = BeforeEnvFreeze::new();
a.set_env(...);
a.set_env(...);
//b.spawn_thread(...); // not available
let b = a.freeze(); // consumes `a`
b.spawn_thread(...);
//a.set_env(...); // not available
}
Exercises left to the reader:
• Banning access to the relevant bits of Rust's stdlib, libc, etc. as a means of escaping this "safe" abstraction
• Conning your lead developer into accepting your handwave
• Setting up the appropriate VCS alerts so you have a chance to NAK "helpful" "utility" pull requests that undermine your "protections"
And of course, this all remains a hackaround for POSIX design flaws - your engineering time might be better spent ensuring or enforcing your libc is "fixed" via intentional memory leaks per e.g. https://github.com/bminor/glibc/commit/7a61e7f557a97ab597d6f... , which may ≈fix more than your Rust programs.
I agree that libraries certainly should not. But why would writing be the right choice ever, even for applications? Doesn't it make far more sense to use env to create in some better-typed global configuration object, filling any gaps with defaults, then use that?
I'd go further and say env should always be read-only and libraries should never even read env vars.
> I think there's a narrow window, at least in some programming languages, when environment variables can be set at the start of a process.
I mean, based on this issue I would say the only safe time is "at the start of the program, before any new threads may have been created".
But again, as others have said, there's no good reason I'm aware of to set environment variables in your own process, and when you spawn a new process you can give it its own environment with any changes you want.
When using C++ I wanted programs to have a function that was called before main() and set up things that got sealed afterwards, like parsing command-line-arguments, the environment variables, loading runtime libraries, and maybe look at the local directory, but I'm not sure if it'll be a useful and meaningful distinction unless you restructure way too many things.
I remember that on the Fuchsia kernel programs needed to drop capabilities at some point, but the shift needed might be a hard sell given things already "work fine".
Everyone thinks they are can be the first to do something, and that there is surely nothing that will happen before them. Unfortunately everyone save for one is mistaken. Sometimes that chosen one is not even consistent.
This is one of the problems with Singletons. Especially if they end up interacting or being composed.
In Java you’d have the static initializers run before the main method starts. And in some languages that spreads to the imports which is usually where you get into these chicken and egg problems.
One of the solutions here is make the entry point small, and make 100% of bootstrapping explicit.
Which is to say: move everything into the main method.
I’ve seen that work. On the last project it got a little big, and I went in to straighten out some bits and reduce it. But at the end anyone could read for themselves the initialization sequence, without needing any esoteric knowledge.
I know I can fool around with crt0, but I'm not sure how much you can really use that if you plan to use libraries that may depend on global `static` things that get created as they are linked in before `main` starts.
Maybe it's possible, but if I need to review every library (and hope they don't break my assumptions later) I think I lost on building this separation in practical way.
You needn't go "hacky" for this; constructors for global/static variables are called before main(). But then, the underlaying linker support is usually "trivially exposed" (using the constructor attribute in gcc/clang, say).
This (obviously?) isn't "110%" perfect as the order of the constructor calls for several such objects may not be well-defined, and were they to create threads (who am I to suggest being reasonable ...) you end up with chicken-egg situations again.
JavaScript only just got top level async. So what I saw happen is that files that do their own background tasks start those either in their constructor or lazily in the case of static functions.
There was one place and only one place where we violated that, and it was in code I worked on. It was a low level module used everywhere else for bootstrapping, and so we collectively decided to do something sneaky in order to avoid making the entire code base async.
And while I find that most of the time people can handle making one special case for a rule, it was a complicated system and even “we” screwed it up occasionally for a good long while.
The problem was we needed to make a consul call at startup and the library didn’t have a synchronous way to make that call. So all bootstrapping code had to call a function and await it, before loading other things that used that module. At the end we had about a dozen entry points (services, dev and diagnostic tools). And I always got blamed because nobody seemed to remember we decided this together.
I hate singletons. And I ended up with one of only two in the whole project, and that hatred still wasn’t enough to prevent hitting the classical problems with singletons.
That does happen. Still there is a reason many avoid it. Probably every significant project has places where they do that. Still if it isn't in main it is always a little "magic" and that means hard to understand how the program works. (or worse randomly doesn't work because something is used before it is initialized)
> When using C++ I wanted programs to have a function that was called before main() and set up things that got sealed afterwards, like parsing command-line-arguments, the environment variables, loading runtime libraries, and maybe look at the local directory, but I'm not sure if it'll be a useful and meaningful distinction unless you restructure way too many things
If you're only reading environment variables you have no problem, though. It's only if you try to change them that it causes issues.
For setting, "only set environment variables in the Bash script that starts your program" might be a good rule.
The "cross platform" way of setting the environment is to set it "from outside" of the program - meaning, through the executor, whether that's the shell or the container runtime or even the kernel commandline if you insist to rewrite init in rust/go/zig/...
It can be as-easy-as spawning your process via "env -i VAR1=... ... myprogram ..." - and given this also clears the dangers of env-insertion exploits, it's good practice.
(the argument that the horses have long bolted with respect to "just do the right think ok?!" here holds some water. I'm of the generation though where people on the internet could still tell each other they were wrong, and I assert that here; you're wrong if you believe a non-threadsafe unix interface is a bug. No matter what kind of restrictions around its use that means. You're still wrong if you assume the existence of such restrictions is a bug)
Some of the docker containers I made ended up having a bash shell as the entry point and I moved most of the environment variable init out of the code and into the script. But in dev sandbox some of that code runs without the script, so it was still a headache.
>Note that Java, and the JVM, doesn't allow changing environment variables. It was the right choice, even if painful at times.
Not sure why would it be considered painful. Imo, use of setenv to modify your own variable, the definition of setenv is thread unsafe. So unless running a single threaded application it'd never make sense to call it.
Java does support running child processes with a designated env space (ProcessBuilder.environment is a modifiable map, copied from the current process), so inability to modify its own doesn't matter.
Personally I have never needed to change env variables. I consider them the same as the command line parameters.
> Java doesn't even allow to change the working directory also due to potential multi-threading problems.
Linux and macOS both support per-thread working directory, although sadly through incompatible APIs.
Also, AFAIK, the Linux API can't restore the link between the process CWD and thread CWD once broken – you can change your thread's CWD back to the process CWD, but that thread won't pick up any future changes to the process CWD. By contrast, macOS has an API call to restore that link.
That would be so much wasted engineering effort. The actual solution is simple: read what you need from env, and pass it as parameters to the functions you want to. The values of what you have read can be changed... and if you really, really want start a child process with a modified env.
if you really wish - you can change the bootstrap path and allow changing env() for whatever reason you want to (likely via copy on write). If you don't wish to do that feel free to spawn a child process with whatever env you desire, then redirect/join sys in/our/err (0/1/2)
Those are trivial things in around 100 lines of code and have been available since System.getenv() got back (it used to be deprecated and non-functional prior Java 1.5 or 2004)
You can’t convince me that there is EVER a reason to call setenv() after program init as part of a regular program, outside needing to hack around something specific.
Environmental variables are not a replacement for your config. It’s not a place to store your variables.
Even if the env var API is fully concurrent, it is not convention to write code that expects an env var to change. There isn’t even a mechanism for it. You’d have to write something to poll for changes and that should feel wrong.
> You can’t convince me that there is EVER a reason to call setenv() after program init as part of a regular program, outside needing to hack around something specific.
The most common use I see for this is people setting an env in the current process before forking off a separate process; presumably because they don't realize that you can pass a new environment to new processes.
I wonder what bugs you'd find if you injected a library to override setenv() with a crash or error message into various programs. Might be a way to track down these kind of random irreproducable bugs.
Given how old most UNIX APIs are, and that when I do man fork I get information to look into execve(), which provices the feature, I guess not knowing is a typical case from google-copy-paste programming.
As a really old school UNIX guy I'd agree with this. Programmatic manipulation of the environment is an 'attractive nuisance' in that I feel anything you might be trying to achieve by using the environment as a string scratch pad of things that are different for different threads, can be coded in a much safer way.
I'd be happy to have you copy the immutable read-only environment vector of strings into your space and then treat that as the source of such things.
I think it would be interesting to build all the packages with a stdlib that dumps core on any call to setenv() or unsetenv(). That would give one an idea of the scope of the problem.
Environment variables are a gigantic, decades-old hack that nobody should be using... but instead everyone has rejected file-based configuration management and everyone is abusing environment variables to inject config into "immutable" docker containers...
> The setenv() function need not be reentrant. A function that is not required to be reentrant is not required to be thread-safe.
With the increased use of PIE, thunks for both security and due to ARM + the difference between glibc and musl, plus busybox and you have a huge mess.
I would encourage you to play around with ghidra, just to see what return oriented programming and ARM limits does.
Compilers have been good at hiding those changes from us, but the non-reentrant nature will cause you issues even without threads.
Hint, these thunks can get inserted in the MI lowering stage or in the linker.
But setenv() is owned by posix, with only getenv() being differed to cppr.
Perhaps someone could submit a proposal on how to make it reentrant to the Open Group. But it wasn't really intended for maintaining mutable state so it may be a hard sell.
That applies mostly to databases using the filesystem.
For configuration files, the write-fsync-move strategy works fine. Generally you don't need fsync, since most people don't use the file system settings that allow data writes to be reordered with the metadata rename.
We use env vars on cloud machines to hold various metadata information about the machines. They can be queried by any program and is extremely useful. It's too useful to be considered a hack. People just misuse them.
It's funny how any hack, no matter how big, somehow becomes a commonplace everyday "solution" once it's needed to work around some quirk of whatever technology is fashionable at the time.
That (managing the env "from the outside) is and always has been the "supposed" way of using it.
Modifying _your own_ environment _at runtime_ is not. The corresponding functions - setenv/getenv - and state - envp/environ - have in the UNIX standards "always" (since threads exist, really) been marked non-MT. "way back when" people were happy to accept that stated restrictions on use don't make bugs. Today, general sense of overentitlement makes (some) people say "but since whatever-trickery can remove this restriction... you're wrong and I'm entitled to my bugfix". I agree the damage is done, though.
Even then, you could maintain a separate copy of the environment that you control and freely mutate. Basically, during startup, you create a copy of the env you received. Any setenv primitive you expose to users will modify this copy (that you can sync properly yourself). When you want to launch a process, you explicitly provide the internal copy of the env to that process, you don't rely on libc providing its own copy.
Of course, this means you won't see any changes to env vars from libraries you may use that call setenv(), but you also shouldn't need, or want, that in a shell.
I still think having a proper synchronous thread safe setenv()/getenv() in libc is the better choice.
If you're writing a shell, you can spend the 15 minutes to write a custom mutable data structure for your envvars; no need to significantly worsen the entire ecosystem to reduce the size of shells by a couple dozen lines (or, rather, move those lines into libc..)
I’ve written a lot of subprocess runners and environmental variables passed to a sub-process is just data at that point and you store it in your own variable like you would store someone’s name or someone’s age.
The underlying problem isn't just setenv, because the string returned by getenv can be invalidated by another call to getenv. ISO C says:
"The getenv function returns a pointer to a string associated with the matched list member. The
string pointed to shall not be modified by the program, but can be overwritten by a subsequent call
to the getenv function."
In a single threaded virtual machine, you can immediately duplicate the string returned by getenv and stop using it, right there.
Under threads, getenv is not required to be safe.
I think that with some care, it may be; an environment implementation could guarantee that a non-mutating operation like getenv doesn't invalidate any previously returned strings.
I think POSIX does that. It allows getenv to reallocate the environ array, but not the strings themselves:
"Applications can change the entire environment in a single operation by assigning the environ variable to point to an array of character pointers to the new environment strings. After assigning a new value to environ, applications should not rely on the new environment strings remaining part of the environment, as a call to getenv(), secure_getenv(), [XSI] [Option Start] putenv(), [Option End] setenv(), unsetenv(), or any function that is dependent on an environment variable may, on noticing that environ has changed, copy the environment strings to a new array and assign environ to point to it."
environ is documented together with the exec family of functions; that's where this is found.
So whereas there are things not to like about environ, it can be the basis for thread safety of getenv in an application that doesn't mutate the environment.
Mutating argv is fine for how it is usually done. That is, to permute the arguments in a getopt() call so that all nonoptions are at the end.
It is fine because it is usually done during the initialization phase, before starting any other thread. setenv() can be used here too, though I prefer to avoid doing that in any case. I also prefer not to touch argv, but since that's how GNU getopt() works, I just go with it.
Once the program is running and has started its threads, I consider setenv() is a big no no. The Rust documentation agrees with me: "In multi-threaded programs on other operating systems, the only safe option is to not use set_var or remove_var at all.". Note: here, "other operating systems" means "not Windows".
It may work for top, but not ps among others. The only reliable way is clobbering argv. That's just the way it is. In my opinion, glibc should finally provide setproctitle(), so programs like postgresql or chrome (https://source.chromium.org/chromium/chromium/src/+/main:bas...) don't have to resort to argv hacks.
Yes, and if there were "setargv()" or "getargv()" functions, they'd have the same issues ;) … but argv is a function parameter to main()¹, and only that.
¹ or technically whatever your ELF entry point is, _start in crt0 or your poison of choice.
> but argv is a function parameter to main()¹, and only that.
> ¹ or technically whatever your ELF entry point is, _start in crt0 or your poison of choice.
Once you include the footnote, at least on linux/macos (not sure about Windows), you could take the same perspective with regards to envp and the auxiliary array. It's libc that decided to store a pointer to these before calling your `main`, not the abi. At the time of the ELF entry point these are all effectively stack local variables.
I mean, yes, we're in "violent agreement" there. It's nice that libc squirrels away a copy and gives you a `getenv()` function with a string lookup, but… setenv… that was just a horrible idea. It's not really wrong to view it as a tool that allows you to muck around with main()'s local variables. Which to me sounds like one should take a shower after using it ;D
(Ed.: the man page should say "you are required to take a shower after writing code that uses setenv(), both to get off the dirt, but also to give you time to think about what you are doing" :D)
Thing is, the (history of the) UNIX APIs - call'em "libc" if you like - is littered with the undead corpses of horrible ideas. Who thought that having global file write offsets are great ? Append-only writes ? Global working directories ? The ability to write the password db via putpwent() ? Modifying your own envp or argv ? Why have a horribly-scaling hack like fcntl-based file locking even in the standard ?
"Today", were one to start from scratch, the userspace API of even unix-ish operating systems would be done much differently. After all, systems designers and implementors are intelligent people and learn, and there's 50y+ of history to learn from. But the warts are there, and sometimes, there to "program around" them.
You can see how that would look like, done by UNIX authors themselves, by looking into Inferno and Limbo standard library.
It is kind of ironic how so many stick with UNIX and C ideas as religious ideals from OS and systems programming ultimate design, while the authors moved on creating Plan 9 and Inferno, Alef and Limbo.
Append-only writes are actually amazing, having several processes writing into the same file and have their writes interleaved instead of destroying each other is almost impossible to re-create in the user space.
And I still don't understand why processes "modifying their own envp or argv" are met with such revulsion in this comment thread except from the "I dislike that on ideological grounds" reason. Now, the ability to modify envp and/or argv of other processes while those are running, yes, that's a horrible idea. But modifying your own internal process state?
Oh, and fcntl file locks are horrible for the historical reasons: basically, when POSIX (or its predecessor?) were trying to decide on a portable interface, the representative of one of the vendors cobbled together this API and its implementation in a week or two, and then showed to the meeting with it. To his surprise, instead of arguing everyone else basically said "eh, looks fine", and that was it, we now have broken "why on earth does close()/fork()/exec() interact with locks like that" behaviour.
I had to smirk at the sarcasm (intended or no).
I merely included "processes modifying their env" amongst all these historical warts. I consider doing so as inevitably necessary as append writes, the advantages of which you aptly described. That's my opinion, underpinned by the history of those interfaces. I hope we can agree that the breakage is by-and-large in an (old, historical) interface that allows braindead usage, not in either the implementor or the user ?
On Linux, a privileged process can change the memory address which the kernel (/proc filesystem) reads argv/etc from... prctl(PR_SET_MM) with the PR_SET_MM_ARG_START/PR_SET_MM_ARG_END arguments. Likewise, with PR_SET_MM_ENV_START/PR_SET_MM_ENV_END.
This shouldn't cause the kind of race conditions we are talking about here, since it isn't changing a single arg, it is changing the whole argv all at once. However, the fact that PR_SET_MM_ARG_START/PR_SET_MM_ARG_END are two separate prctl syscalls potentially introduces a different race condition. If Linux would only provide a prctl to set both at once, that would fix that. The reason it was done this way, is the API was originally designed for checkpoint-restore, in which case the process will be effectively suspended while these calls are made.
No amount of locking can make the getenv API thread-safe, because it returns a pointer which gets invalidated by setenv, but lacks a way to release ownership over it and unblock setenv safely (or to free a returned copy).
So setenv's existence makes getenv inherently unsafe unless you can ensure the entire application is at a safe point to use them.
Getenv() could keep several copies of the value around: one internal copy protected by a mutex, that it never returns, and one copy per thread that it stores in thread local storage. When you call getenv(), it locks the mutex, checks if the current thread's value exists, populates it from the internal copy if not, and returns it. It will also install a new setenv-specific signal handler on this thread and store info about this thread having a copy.
Setenv() will then take the same mutex as getenv(), check if the internal copy is different from the new value; if it is, it will modify the internal copy, modify the local thread's copy if that has one, and then signal each other thread in the process that has a copy in TLS. The setenv signal handler will modify the local copy that thread holds.
It's gonna be slow for a large multi-threaded program, but since setenv() used to corrupt memory for such programs, they probably don't care. And for single-threaded programs, or even for programs that don't access getenv()/setenv() on multiple threads, there should be no extra overhead other than the mutex and the bookkeeping.
The only issues that would remain are programs which send the pointer they get from getenv() to other threads without ensuring locking access, and programs which rely on modifying the pointer from getenv() directly as a way to set an env var, and expect this to be visible across threads. Those are just hopelessly broken and can't use the same API - but aren't more broken then they are today.
Of course, in addition to this complex work to make the old API (mostly) thread safe, it should also offer a new API that simply returns a copy every time, doesn't promise to show modifications to your copy when setenv() gets called (you need to call getenv() again), and puts the onus on you to free that copy explicitly.
> it should also offer a new API that simply returns a copy every time
Returning a copy isn't great (memory allocation!), the API should probably be something like:
int getenv(const char *varName, char *buf, size_t bufSize, size_t *varSize);
Where the caller manages the buffer and getenv writes into it (so it can e.g. be stack or statically allocated), the third argument is the size of the caller-managed buffer, then the last variable is an "out parameter" that returns the "true" length of the environment variable. Then afterwards, you can check if `*varSize > bufSize`, and if so, you need to make your buffer larger. The return value is an error code.
Doing it like this, you can easily implement the "return a malloced copy" if you want to, but it also gives you the option to avoid allocation entirely. This is important for e.g. embedded or real-time applications, or anything that just likes to avoid `malloc()/free()`.
If you only consider `getenv`/`setenv` there are indeed many solutions, but it's not that simple. You also need to consider `putenv` (not that nasty, you just need to treat it like initial environment, which means you can't use a single range check) and accessing the `environ` variable directly (nasty).
Your particular solution doesn't work because people expect `getenv` to be async-signal-safe, which means you shouldn't be allocating memory.
Hmm ... doing an incref-like operation during `getenv` for a previously `setenv`ed variable that hasn't yet been accessed in this thread would be fine ... clear those refs during calls we know indicate knowledge refreshes ...
"mutating" there involves the need to (re)allocate memory. To do so in a signal handler is hard ... because memory allocators are, while threadsafe, not async-signal-safe. You can't make a hard problem easy by asserting dependence on another (unsolved) hard problem.
Btw, you can _also_ substitute libc's setenv/getenv/putenv with your own (locking) implementations, courtesy preload and all the funky features of ELF symbol resolution. Actually easy. But impossible if you link against static code using it (go ... away). Hmm. easy ? impossible ? damn this grey world. Gimme some color.
Someone above mentioned getenv_r(). I needed to Google about it. It is not impl'd by GNU GLibC (that I know). I do see it on NetBSD: https://man.netbsd.org/getenv_r.3
There has to be some sort of nuance regarding why this seemingly simple fix hasn't been made yet. Changing from crashing to blocking doesn't seem like a big breaking change.
Because it doesn't actually solve anything: You're still replacing whatever getenv returned from under the nose the program code - if that happens in another thread or in a signal handler in the same thread doesn't make any difference.
And that's before you even get to the `extern char *environ` global.
According to ISO C, getenv returns a pointer to storage that can be overwritten by another call to getenv! Only POSIX slightly fixes it: the string comes from the environ array, and operations on environ by the library preserve the strings themselves (when not replacing or deleting them), just not the array. A program that calls nothing but getenv is okay on POSIX, not necessarily on ISO C.
C could provide functions to lock/unlock a mutex and require that any attempt to access the environment has to be done holding the mutex. This would still leave the correctness in the hands of the user, but at least it would provide a standard API to secure the environment in a multi threaded application that library and application developers could adopt.
That is basically "what it means" if an interface is non-MT: you can call this no-problem if you know you're singlethreaded, and if you're not, find your own way to serialize (meaning: have your own locking prinitive you acquire/release where you make calls to these functions).
One could "dream of" a func that tells libc "acquire/drop this mutex of mine around get/set/putenv calls" but that'd simply move the problem - because the nifty "frameworks" would do that (independently of each other, we're sovereign and entitled frameworks around here) and race each other's state nonetheless.
Obviously not, but _threading_ primitives are not the subject of this post at all. Declared-as Non-threadsafe interfaces are. And of course one (as is happening here) one can argue whether all "system runtimes" shall be threadsafe. Right now though, they are not, and agreed/sanctioned standards don't require them to be. Again (also as happening here) opinions may differ whether changes-to-make-threadsafe would be bugfixes, enhancements, or (require) new interfaces.
I have expressed my views on this. Happy to agree to disagree, though.
Is that a problem? I feel like calling getenv and setenv from different threads is a design antipattern anyway. Any environment setting and loading should happen in the one and only main thread right after process init.
The latter is always true even when you don't use chdir(2) and/or always use absolute file paths since, you know, there are other processes that can re-arrange the file system whatsoever way the like. The file system is one example of the unavoidable global mutable shared state (another example is network) which one simply has to deal with.
If files end up in a different directory because the user rearranged the filesystem under your nose, that's on the user. Most applications would deal with that by telling the user not to do that.
If your sensitive logs end up in the webserver root because one thread used chdir to temporarily change the working directory it's on the application writer.
Or to put it another way, the filesystem as a whole being shared mutable state does not make the current working directory being shared mutable state between threads any less of an issue.
chdir is thread-safe, but interacting with the current directory in any context other than parsing command-line arguments is still nearly always a mistake. Everything past a program's entry point should be working exclusively in absolute paths.
chdir is only thread safe to the extent that corruption won't occur.
If one thread is using relative paths, and another is doing a chdir-based traversal (as using the nftw function, for instance), that first thread's accesses are messed up.
This is why POSIX now has various -at functions; the provide stable relative access.
Welcome to the C standard library, the application of mutable global state to literally everything in it has to be the most consistent and predictable feature of the language standard.
I used to think this was bad too. But when C was designed an entire single threaded program was considered the unit of encapsulation for functionality. Now it’s mostly libraries.
The former allows you to design a coherent system. a lot of design questions which are annoying (“how do I access config data consistently, etc) become very clear.
It also makes C more productive. If global vars and static locals are unbanned, features like closures become less important.
The mutex would have to be held by the caller until it no longer needs the string returned from the environment, or makes a copy:
stdenvlock(); // imaginary function added to ISO C or POSIX
char *home = getenv("HOME");
char *home_copy = strdup(home);
stdenvunlock(); // only here can we unlock
// home pointer is now indeterminate
Other solutions:
1. Put the above sequence into a function, and don't expose the mutex. Thread-safe code must use:
getenvbuf("HOME", mybuf, sizeof mybuf); // returns some value that helps to resize the buffer
All functions that retain pointers out of the classic getenv remain unsafe.
A mutex can be provided to those applications that want to manipulate the environ array directly, or use getenv and setenv, or any combinations of these.
The main problem is all the code out there using getenv.
It's the same problem with global vars, but at a machine scope. The real solution here would be for the OS to have a better interface to read and write env vars, more like a file where you have to get rw permission (whether that's implemented as a mutex or what).
This is neither an OS nor a machine scope problem. The environment is provided by the OS at startup. What the process does with it from there on is its own concern.
> The environment is provided by the OS at startup.
That's part of the design of the OS. How the OS implements this is primitive, and so it leaves it up to every language to handle. The blog mentions the issue is with getenv, setenv, and realloc, all system calls. To me, that sounds like bad OS design is causing issues downstream with languages, leaving it up to individual programmers to deal with the fallout.
None of these 3 functions is a system call. open(), mmap(), sbrk(), poll(), etc. are system calls. What you're referring to is C library API, which as Go has shown (both to its benefit and its detriment) is optional on almost all operating systems (a major exception being OpenBSD.)
If you really want to lose some sanity I would recommend reading the man page for getauxval(), and then look up how that works on the machine level when the process is started. Especially on some of the older architectures. (No liability accepted for any grey hair induced by this.)
Neither getenv, setenv nor realloc are system calls, they all are functions from C stdandard library, some parts of which for historical reasons are required to be almost impossible to use safely/reliably.
Imagine you get a signal during getenv itself with the mutex held. Then your signal handler calls getenv. (On the other hand -- getenv is not marked async-signal-safe, so this use is already illegal.)
If it's not a "recursive mutex" (where you can call lock within the same thread on the same mutex more than once consecutively and it handled that), it's possible to lock on itself again (say in code which is recursive)...
The problem (with get/set/putenv as they are) was isn't the non-use of a mutex. It's the "meaning" of the pointer returned to by getenv(). It returns a char*. Nevermind the persistance of that value - you can work around that by deliberately leaking memory - but it's writeable. Whether it's a good idea to do so ... well. But simply locking "inside" these funcs doesn't solve all the / your issues.
Is that the underlying problem, or is the underlying problem that libraries are using thread-unsafe setenv in threaded contexts when they could just do something else?
But it would force Rust programs to add their own synchronization mechanism around them. As long as no two threads can call getenv/setenv at the same time then it’s fine.
The Rust stdlib is already using synchronization on the versions of these functions that are exposed from the Rust stdlib. That's why those functions were allowed to be marked as safe in the first place.
The problem is that people are calling C code from Rust (which already requires an unsafe annotation), and then that C code is doing silly thread-unsafe shenanigans for regrettable historical reasons.
It's beyond Rust's power to fix without cooperation from the underlying C code, which happens to be provided by the OS, which is just being compliant with Posix. Rust can only do so much when the platform itself is hell-bent on sabotaging you.
Ah, that’s a detail that I either forgot or did not know. Thank you.
It certainly would be nice if the C library had fewer built–in footguns. And if we could write programs in other languages without ever depending on it (which wouldn’t but much use when you’re relying on a C library anyway, but it still would be nice).
In the React world, the only times I've seen dangerouslySetInnerHTML consistently used is for outputting string literal CSS content (and this one is increasingly rare as build tools need less handholding), string literal JSON content (for JSON+LD), and string literal premade scripts (i.e. pixel tags from the marketing content). That's not to say there's no danger surface there, but it's not broadly used as a tool outside of code that's either really bad or really exhaustively hand-tuned.
I've only really seen dangerouslySetInnerHTML used while transitioning from certain kinds of server side rendering to React. There is still lots of really old internal tools in ancient html out there.
React doesn't have a tag and attribute sanitizer built in, so having non-js-programmers edit JSX isn't especially safe anyways, as an img or a href could exfiltrate data. If it were they could just block out an innerHTML attribute. A js programmer can get around it by setting up a ref and then using the reference to set innerHTML without the word dangerously appearing.
> A js programmer can get around it by setting up a ref and then using the reference to set innerHTML without the word dangerously appearing.
If DOM nodes during the next render differ from what react-dom expects (i.e. the DOM nodes from the previous render), then react-dom may throw a DOMException. Mutating innerHTML via a ref may violate React's invariants, and the library correctly throws an error when programmers, browser extensions, etc. mutate the DOM such that a node's parent unexpectedly changes.
There are workarounds[1] to mutate DOM nodes managed by React and avoid DOMExceptions, but I haven't worked on a codebase where anything like this was necessary.
In the Rust std, `set_var` and `remove_var` will correctly require using an `unsafe {}` block in the next edition (2024). The documentation does now mention the safety issue but obviously it was a mistake to make these functions safe originally (albeit a mistake even higher level languages have made).
There is a patch for glibc which makes `getenv` safe in more cases where the environment is modified but C still allows direct access to the environ so it can't be completely safe in the face of modification https://github.com/bminor/glibc/commit/7a61e7f557a97ab597d6f...
keep[s] older
versions around and adopt[s] an exponential resizing policy. This
results in an amortized constant space leak per active environment
variable, but there already is such a leak for the variable itself
(and that is even length-dependent, and includes no-longer used
values).
There have got to be pathalogical uses out there where this will cause unbounded memory growth in well-formed (according to the API) programs, no?
Interesting to see this _introduce_ a ‘bug’ (unbounded memory growth) for these programs that follow the API in order to ‘fix’ programs that don’t (by using the API in multiple threads). Pragmatism over dogma I guess. Leaves me feeling a bit sketched out though.
FWIW you can make a singly linked list with infinite number of nodes too. Memory leaks happen in well formed programs just fine, glibc is just one of many examples.
Because the std implementation can not force synchronisation on the libc, so any call into a C library which uses getenv will break... which is exactly what happened in TFA: `openssl-probe` called env::set_var on the Rust side, and the Python interpreter called getenv(3) directly.
But the standard implementation could copy the environment at startup, and only uses its copy.
And the library's use of setenv is clearly a bug as setenv is documented to be not threadsafe in the C standard library. So that would take care of that problem.
If you clone the environment at startup, then you get a situation where code in the same binary can see different values depending if it uses libc or Rust's std. It's also no longer the same environment as in the process metadata.
Using a copy by default may have worked if it was designed as such before Rust 1.0, but Rust took the decision to expose the real environment and changing this now would be more disruptive than marking mutations as unsafe.
In short, no - because environment variables are userland state only, you can't interact with them using system calls, the kernel doesn't keep a "canonical" copy of them on behalf of the process.
So the "environment" is part of libc, and "libc's way" of interacting with it at runtime "is the way".
From the syscall interface point of view ... you pass the initial env of a process when you exec(), and the kernel copies that to (userland) memory of the new process. The fact "default initialisation" can copy from the environment of the exec()'ing parent, or the fact that the kernel can "read" a process' env (see /proc/<PID>/environ) doesn't change this; the kernel needn't be "accommodating" all the possible and impossible ways how a user application may want to interact with that state there, if you mess-too-much with it, you get garbage. Sooo ... the portability wart is setenv(), because as far as the system is concerned... your "initial" env is passed to you when exec() is called, and any modification thereafter is your concern, your problem, but foremost, your choice. And choices come with taking responsibility for the ones you make.
In general, no, because of FFI. In special circumstances, yes, but this isn't really important because the libc implementation is trivial (on all platforms that matter, envp is a char** to strings formatted as KEY=VALUE, set_env(key, value) is equivalent to allocating a new KEY=VALUE string and finding the index of a key if it exists or appending to the array).
Under the hood the pointer is initialized by the loader, in a special place in executable memory. Most of the time, the loader gets the initial environment variable list by looking at argv* (try reading past the end of the null separator, you'll find the initial environment variables).
It would be possible for a language to hack it such that on load they initialize their own env var set without using libc and be able to safely set/get those env vars without going through libc, and to inherit them when spawning child processes by reading the special location instead of the standard location initialized by your platforms' loader/updated by libc. But how useful is a language with FFI that's fundamentally broken since callees can't set environment variables? (probably very useful, since software that relies on this is questionably designed in the first place)
If you wanted to make a bullet proof solution, you would specify the location of an envp mutex in the loaders' format and make it libc's (or any language runtime) problem to acquire that mutex.
Or any multithreaded program that uses a C or C++ library that calls setenv somewhere internally, and failed to document that it does so and is thus unsuitable for use by multithreaded programs.
No library does that documentation, so you can't use libraries on POSIX systems if writing multithreaded code. Or you do and hope for the best. So everyone just hopes for the best.
Caveats: POSIX.1 does not require setenv() or unsetenv() to be reentrant.
...
Interface: setenv(), unsetenv()
Attribute: Thread safety
Value: MT-Unsafe const:env
Libraries that are thread-safe DO provide that documentation. One assumes that
libraries that don't provide that documentation are not thread=safe.
GnuTLS docs: The GnuTLS library is thread safe by design, meaning that objects of the library such as TLS sessions, can be safely divided across threads as long as a single thread accesses a single object.
It can only synchronize if everything using is Rust's functions. But that's not a given. People can use C libraries (especially libc) which won't be aware of Rust's locks. Or they could even use a high level runtime with its own locking but then they'll be distinct from Rust's locks.
The only way to coordinate locking would be to do so in libc itself.
libc does do locking, but it's insufficient. The semantics of getenv/setenv/putenv just aren't safe for multi-threaded mutation, period, because the addresses are exposed. It's not really even a C language issue; were you to design a thread-safe env API, for C or Rust, it would look much different, likely relying on string copying even on reads rather than passing strings by reference (reference counted immutable strings would work, too, but is probably too heavy handed), and definitely not exposing the environ array.
The closest libc can get to MT safety is to never deallocate an environment string or an environ array. Solaris does this--if you continually add new variables with setenv it just leaks environ array memory, or if you continually overwrite a key it just leaks the old value. (IIRC, glibc is halfway there.) But even then it still requires the application to abstain from doing crazy stuff, like modifying the strings you get back from getenv. NetBSD tried adding safer interfaces, like getenv_r, but it's ultimately insufficient to meaningfully address the problem.
The right answer for safe, portable programs is to not mutate the environment once you go multi-threaded, or even better just treat process environment as immutable once you enter your main loop or otherwise finish with initial process setup. glibc could (and maybe should) fully adopt the Solaris solution (currently, IIRC, glibc leaks env strings but not environ arrays), but if applications are using the environment variable table as a global, shared, mutable key-value store, then leaking memory probably isn't what they want, either. Either way, the best solution is to stop treating it as mutable.
Yep. GetEnvironmentStrings and FreeEnvironmentStrings are probably even more noteworthy as they seem to substitute for an exposed environ array, though they push more effort to the application.
It can't ensure synchronization because any code using libc could bypass the sync wrapper. In particular, Rust lets you link C libs which wouldn't use the Rust stdlib.
Because it can still race with C code using the standard library. getenv calls are common in C libraries; the call to getenv in this post was inside of strerror.
you've gotten a lot of answers which say the same thing, but which I don't think answer your question:
synchronization methods impose various complexity and performance penalties, and single threaded applications which don't need that would pay those penalties and get no benefit.
Unix was designed around a lightweight ethos that allowed simple combining of functions by the user on the command line. See "worse is better", but tl;dr that way of doing things proved better, and that's why you find yourself confronting what it doesn't do.
But it is possible to safely use it in a single threaded program.
There's no way to use it safely in a multi threaded application that may use setenv (unless you add your own synchronisation, and ensure everything uses it, even third party libraries).
Actually I don't believe that's the case. The getenv function as described by ISO C cannot be safely used in a program that only uses getenv, if that program uses ISO C threads, and more than one threat calls getenv without synchronizing with the others.
I don't think POSIX fixes this: it doesn't specify that the environ array is protected against concurrent access.
If two threads call getenv right around the same time, one of them could invalidate the environ array just as the other one has started to traverse it.
If you want to be safe, copy the environment to a different data structure on program startup. Then have all your threads refer to that data structure.
Hmm, I'm apparently correct for C++11, where calling getenv only is thread safe, but that's not guaranteed by earlier standards (or, as far as I can tell, by C or POSIX).
Well it was better in the short term but is worse in the long term. In particular, the error handling situation is generally atrocious, which is fine for interactive/sysadmin use but much worse for serious production use.
Even if C stdlib maintainers are resistant against making setenv multi-thread safe, at a minimum there should be a new alternative thread-safe API defined, whether within POSIX or defining a defacto standard and forcing POSIX to adopt it over time. If instead of explaining why nothing could be done was spent fixing this problem, a new thread-safe API could have replaced the old setenv which could have been deprecated and removed from many software projects.
I'm also not convinced by Musl's maintainer that it can't be fixed within Musl considering glibc is making changes to make this a non-issue.
The biggest problem is not the absence of a thread safe API, it's the existence of this:
extern char **environ;
As long as environ is publicly accessible, there's no guarantee that setenv and getenv will be used at all, since they're not necessary.
If you're willing to get rid of environ, it's pretty trivial to make setenv and getenv thread safe. If not, then it's impossible, although one could still argue that making setenv and getenv thread safe is at least an improvement, even if it's not a complete solution (aka don't let the perfect be the enemy of the good).
> aka don't let the perfect be the enemy of the good
Exactly my point. Over time *environ would disappear, at least from the major software projects that everyone uses (assuming it's even in use in them in the first place).
That still doesn't mean getenv would be safe. Unless you know nothing uses **environ (e.g. by breaking the ABI, which no-one will do because it'll break everything), you can't rely on getenv being safe.
There should be locking getters/setters for the environ, and all users should switch to them.
Yes, it will take a long time, and some users will complain it doesn't work on their PDP-11, but the problem will never be solved if there's no migration path to a safe solution.
Yeah I don't think I've ever seen a single use of it. However I just checked on grep.app and at least a few big softwares use it - git, nginx, Postgresql, neovim, etc, which suggests that setenv/getenv is not sufficient.
Guess that would also require some locking for all the exec() functions that don't take the environment as a parameter or that search PATH for the executable.
I'll take existence proofs [1] over personal insults but YMMV. You also may want to be careful assuming the expertise of people on this forum. Some people here are quite technical.
Its like a rite of passage to be hit by an environment related bug on linux, which is mysteriously less a problem on other unix's. Which is sorta funny given how pragmatic Linus and the kernel are about fixing POSIX bugs by making them not happen, while glibc is still lagging here decades after people tried to at least make the problem better. Sure there is all the crap around TZ/etc, but simply providing getenv_r() and synchronizing it with setenv() and warning during compile/link on getenv() would have killed much of the problem. Nevermind, actually doing a COW style system where the env pointer(s) are read only. Instead the problem is pushed to the individual application, which is a huge mistake, because application writers are rarely aware of what their dependencies are doing. Which is the situation I found myself in many many years ago. The closed source library vendor, at the time, told us to stop using that toy unix clone (linux).
> environment related bug on linux, which is mysteriously less a problem on other unix's.
How do you figure? The problem isn't the implementation, it's the API. setenv(), unsetenv(), putenv(), and especially environ, are inherently unsafe in a multithreaded program. Even getenv_r() can't really save you, since another thread may be calling setenv() while the (old) value of an env var is being copied into the provided buffer. Sure, a getenv_r() fixes the case where you get something back from getenv(), and then another thread calls setenv() and makes that memory invalid, but there's no way to protect the other calls breaking the API.
There are ways to mitigate some of the issues, like having libc hold a mutex when inside getenv()/setenv()/putenv()/unsetenv(), but there's still no way for libc to guarantee that something returned by getenv() remains valid long enough for the calling code to use it (which, right, can be fixed by getenv_r(), which could also be protected by that mutex). But there's no good way to make direct access to environ safe. I suppose you could make environ a thread-local, but then different threads' views of the environment could become out of sync, permanently (and you could get different results between calling getenv_r() and examining environ directly).
Back-compat here is just really hard to do. Even adding a mutex to protect those functions could change the semantics enough to break existing programs. (Arguably they're already broken in that case, but still...)
Considering this is a libc issue, not a Linux specific one, I wonder how thread safe other libc implementations like musl and Bionic are. How do the BSDs stack up? Humorously, illumos also ships with glibc...
I think you would have to change the API to return a copy of the string as the get_env result which the caller is responsible for free-ing or the env implementation would have to ensure returned values from get_env are stable and never change which is effectively a memory leak.
> Even getenv_r() can't really save you, since another thread may be calling setenv() while the (old) value of an env var is being copied into the provided buffer.
Won't that depends on the libc implementation. For example, maybe setenv writes to another buffer, then swaps pointers atomically; wouldn't that work?
Most of the rest of the problem here seems to be the development environment. They're testing on a remote machine in an Amazon data center and using Docker. This rig fails to report that a process has crashed. Then they don't have enough debug symbol info inside their container to get a backtrace. If they'd gotten a clean backtrace reported on the first failure, this would have been obvious.
Yup, it's mostly just the story and tools we used to get ourselves out of a mess that was made harder by some decisions made earlier -- the tests were running in a container with stripped symbols (we're going to ship symbols after this, no reason to over-optimize), our custom test runner failed to report process death (an oversight).
There's no reason setenv should have been called here. The `openssl-probe` library could simply return the paths to the system cert files and callers could plug those directly into the OpenSSL config.
Oversights all around and hopefully this continues to improve.
> Yup, it's mostly just the story and tools we used to get ourselves out of a mess that was made harder by some decisions made earlier -- the tests were running in a container with stripped symbols (we're going to ship symbols after this, no reason to over-optimize)
It's worth noting here that you can also build your binaries and keep debug symbols separately.
You don't need to ship them with the binary (although it will make many scenarios a bit simpler if you do, since you'll always have the right ones available).
It really does not look like a good idea to setenv() . The very notion is quite terrifying. Messing with a bunch of globals, that other code knows about as well? Nuh-uh.
The thing is, the OP people weren't doing that at all, it was some irresponsible library maintainers. If your code does that, you have to include something like the "surgeon general's warning" everywhere: "CAREFUL: USING THIS LIBRARY MAY CAUSE TERMINAL CRASHES".
History: V7 research UNIX had "getenv()", but not "setenv()".[1]
BSD Unix 4.x had "getenv()" and "setenv()"[2] Google's "AI Overview" says "The setenv() and unsetenv() functions were included in Version 7 of AT&T UNIX.", but that does not seem to be correct.
This misfeature seems to be what was once called a "Berkeleyism", a Berkeley mod to UNIX.
I think you're confusing setting the environment before running a process, with setting the environment _within_ the process. If you're running a shell session, or even a compiled process which is just a "runner" for some other process - then certainly, we all do "export SOME_SETTING=value" and run things. But if you're writing a C library, which could well be used in a multi-threaded environment - you don't need to "adjust" anything, and should not invoke setenv. If your library is not pleased with the settings of another library, then it should start returning errors, or even exit() if you're a violent kind of a guy - but not setenv().
This reminded me of that whole "12-factor app" movement, which several of my former coworkers had really bought into. One of the "factors" is that apps should be configured by environment variables.
I always thought this was kinda foolish: your configuration method is a flat-namespace basked of stringly-typed values. The perils of getenv()/setenv()/environ are also, I think, a great argument against using env vars for configuration.
Sure, there aren't always great, well-supported options out there. I prefer using a configuration file (you can have templated config and a system that fills in different values for e.g. dev/stage/prod), and I'll usually use YAML, despite its faults and gotchas. There are probably better configuration file formats, but IMO YAML is still significantly better than using env vars.
I often find that there's a lot of intense animosity towards windows and Microsoft, but a lot of their API design is vindicated by time. Environment variables can be typed and templated in NT, not to mention there's a namespaced config database (the registry, even if it's really verbose and strange). Plus msvc provides threadsafe versions of nearly every stslib function. I often hear new C/C++ developers lament the lack of POSIX compatibility with MSVC, but without a lot of consideration for what that actually means; they just want cross compatibility with C programs written in the 1990s
I have similar reservations about env vars. I dislike how they can be read from anywhere--it interrupts the ability to reason about a function's behavior from its signature and makes impure plenty of functions that could otherwise have been pure.
If there were a language feature that let me mark apps such that during any process env vars are not writable and are readable only once (together, in a batch, not once per var), I'd use it everywhere.
getenv() is perfectly fine, it's setenv() that is the problem. Which in theory this wouldn't be using since the env would be set up prior to starting that mystical app.
But yes, a flat namespace, with string values, shared as a free-for-all with who knows what libraries and modules you're loading… that's not a good idea even if it didn't have safety issues in setenv().
There probably should be an addendum to the "12-factor app" movement that says that the environment should be treated as read only for the duration of the process. Most of the issues people talk about here seem to relate to people trying to abuse the environment as some kind of key value store for mutable global state (which sounds like a bad idea). Why would you even want to do that?!
Being on the JVM which actually treats the environment as immutable that and which probably inspired a lot of the 12 factor app movement (with companies like Soundcloud being big Scala and Java users and pushing this), I've never experienced any issues with the environment changing on me or causing any threading issues. The environment is effectively immutable and there's nothing in my processes that sneakily circumvents that (via some native calls into libc). So, complete non issue on the JVM.
Even if somebody manages to modify the environment, the immutable copy stays the same. That copy gets created on JVM startup and is immutable. Anything using normal Java apis to interact with the environment will never see the modification. I'm sure people might have tried to work around that but it's not a wide spread practice. Because, again, why would you even want to do that?
The problem with configuration files is that their parsing is process specific. That's why Linux/Unix is such a mess. Every single tool seems to have its own conventions and mechanisms for configuration. There are no standards for this.
Other of course than the Docker ecosystem. You can do whatever you want inside the container but effectively your only interface to the outside world is either messily mounting some volume and doing whatever convoluted way of configuration your app requires; or just using environment variables. Most modern software is docker ready/friendly in the sense that you can fully control their behavior via the environment. It's perfectly adequate for most things that people run via docker these days. Which of course is pretty much anything.
And of course with Docker compose or kubernetes (which I'm not necessarily a fan of) you get yaml files defining lists of environment variables that define how your process starts. So you more or less get what you are asking for. I'm not a big YAML fan but it works well enough. Too much potential for syntax issues really ruining your day IMHO. But it's not like the alternatives are free of issues.
This is unrelated really. If you read your enviornment variables into config and never touched them again, then you're totally safe.
I personally use 12 factor app style, but once it's entered the app I validate the env variables and data and then store them. It's totally fine after that.
Great article about digging into a non-obvious bug. This one had it all! Intermittent bug, architecture-specific, hidden in a dependency, rust, the python GIL, gettext. Fantastic stuff.
These kinds of detailed troubleshooting reports are the closest thing you can get to having to do it yourself. Thanks to the authors. It's easy to say "don't use X duh" until a dependency relies on it, and how were you supposed to know?
Yes. It’s shocking just how much cloud SaaS has distorted peoples understanding of things. You need all kinds of layers of cloud complexity and deployment to do the most trivial stuff. We have 100% reversed the PC revolution and returned to the era of clunky expensive mainframe computing.
The reason is that cloud is where all the money is because cloud is DRM. Put software there and you can charge a subscription and nobody can evade it and you have perfect lock in forever. People usually can’t even get their data out. You can also do all kinds of realtime analytics conveniently to optimize your product.
Computing architecture is downstream of the business model. Mainframe died originally because there was no Internet and PCs were cheaper, but vendors also lost a lot of their lock in power. Now they have a way to bring a model that is much more profitable back. No more pesky freedom for users, who to be fair if given such freedom will often just refuse to pay, making quality software a non-viable business.
There is a lot to like about the clould model as a user. I can access my data where ever I am, from what ever device I have, and I won't lose it to a disc crash.
there are faults to the cloud but it solves real problems users have.
There are other ways that could be achieved, like cloud storage constantly mirroring local but encrypted with local keys or keys controlled by the user.
This is the iCloud model and it works. Imagine a more open version with competing storage providers.
This, however, would hand control back to the user, which would be bad for the software industry with its addiction to lock in and recurring revenue.
For CPU power, a Raspberry Pi today is faster than servers that ran whole medium to large businesses 20 years ago. Much of what people do with SaaS involves backend processes that could run on a 1990s era PC.
There are exceptions, like large AI models and huge databases like web search, though in the case of AI models I can run pretty decent ones locally already, but on an admittedly expensive laptop. If the rate at which models grow is not as fast or faster than the rate at which computers grow, mainstream PCs or even phones will catch up eventually.
I've actually wondered if that might be a major factor that swings the pendulum back... if you can run an AI that has memorized the entire Internet locally, that makes all kinds of things possible in local compute.
Installing apps could be easy, even automatic on demand. That's kind of what the web does. Imagine the web with better caching of program objects, maybe a runtime built around WASM, and an iCloud-type data model, and you can visualize personal computing for today. The kludgy idea of installers that vomit files all over the system is already legacy.
But it would still break SaaS lock-in, so this isn't where the money goes. Our software paradigms wrap themselves around whatever works as a business model.
Modern 3D games are local, so I’m not sure the point there. My point about 90s machines was that most business SaaS is not compute or data heavy unless it’s for a huge corporation.
As far as local data: my laptop has terabytes, my phone over a hundred gigabytes. I have fiber at home and have seen speeds approaching a gigabit on 5G.
It’s not that often that people sit down at entirely unfamiliar machines they’ve never used, log in, and try to do data intensive work. In that case I suppose an iCloud model of compute would be downloading a lot.
This is a random trash only on arm. I doubt they could get the crash to happen locally - most likely their developer machines were all x86 where it never crashed.
they should have handled crashes better - a problem they seem to recognize but not the issue here so not covered.
How would you debug locally when you probably don't have a device that runs the arch that is causing an issue? It's much faster to just debug in the actual environment where the failure happens anyways.
The problem is that applications sometimes need to set environment variables which will be read by libraries in the same process. This is safe to do during startup, but at no later times.
Ideally all libraries which use environment variables should have APIs allowing you to override the env variables without calling setenv(), but that isn't always the case.
> The problem is that applications sometimes need to set environment variables which will be read by libraries in the same process. This is safe to do during startup, but at no later times.
No, the problem is that libraries try to do this at all. Libraries should just have those APIs you mention, and not touch env vars, period. If you, the library user, really want to use env vars for those settings, you can getenv() them yourself and pass them to the library's APIs.
Obviously we can't change history; there are libraries that do this anyway. But we should encourage library authors to (in the future) pretend that env vars don't exist.
The place where it makes sense for a library to read environment variables is where the program is not written to use that specific library. For example, I can link a program whose author has never heard of TCMalloc against TCMalloc rather than the system malloc, and then configure TCMalloc via environment variables. This does not require modifying a single line of code, while manually forwarding configuration onto the allocator would. Another common example is configuring sanitizers. Not having to do anything other than pass another command-line switch to the compiler is one of the things that makes them really painless to use.
I do think you'd be hard-pressed to find a situation where a program calling setenv() to configure a library actually makes sense. It's a pretty strong sign that someone made a bad decision. People will, however, make mistakes in API design.
If env vars don't exist, that makes it much harder (and more likely impossible) for users to modify library/application behavior at run time.
I agree with you that it would be much better if, when libA needs to set behavio Foo in libB, it called libB:setBehavior (Foo) rather than setenv ("LibBehavior", "Foo")
But let's not throw the baby out with the bathwater.
I’d argue that libraries shouldn’t read environment variables at all. They’re passed on the initial program stack and look just like stack vars, so the issue here is essentially the same as taking the address of a stack variable and misusing it.
Just like a library wouldn’t try to use argv directly, it shouldn’t use envp either (even if done via getenv/setenv)
There are certainly levels of the abstraction pyramid where mutable global state is unavoidable; however, it shouldn't be too difficult to get to a point where we have enough abstraction so that we don't need to worry about mutable global state for what we do.
And even if those abstractions can't be 100% effective, we'd go a long way to achieving the desirable results of getting rid of it, if we just develop the mindset of avoiding it if at all possible, excepting for very rare instances where it's needed as a last resort.
Go ahead and write lots of mutable global statics. But when your program crashes randomly and you need my help to debug and it is, once again, a global mutable then you have to perform a walk of shame.
the problem is not linux, not mutable global state or resources and not libc.
the problem is not getting time at work to do things properly. like spotting this in GDB before the issue hit, because your boss gave you time to tirelessly debug and reverse your code and anything it touches....
there is too much money in halfbaked code. sad but true.
It definitely is the current libc. That one's proven by systems which do not have the same problem. Then the next layer problem is trying to pretend we can get everyone to pay attention and avoid bugs in code instead of forcing interfaces and implementations where those bugs are not possible.
just because someone makes a window doesn't mean you gotta jump out of it. there are good and bad uses for things, and the bad ones should be avoided lest one hurt themselves?
https://en.wikipedia.org/wiki/Death_of_Garry_Hoy people will assume more safety than necessary. You don't have to jump, but someone will try. We can accept that fact or watch people fail over and over on the same issue. It's better to help everyone avoid the problem in the first place.
For some reason lots of programmers will behave like the comment section on an accident video. "I would notice that earlier", "I'd avoid that", "I can react faster".
doesn't make it less true. all sorts of 'dangerous' things are used by people daily all around the world. There are issues sure, but that doesn't mean it's neccesarily a bad thing. for example cars, stairs, kitchen knives...
These things are perhaps more commonly known to be bad, and the dangers are perhaps more obvious.
There will always be people who use things in the wrong way too, which doesn't make the thing bad, but how it's used.
There are buildings in my country with nets around them because people keep jumping off them (suicidal). The buildings are safe. The nets are not a solution, they just shift the problem and don't tackle the root cause.
There are many car crashes with fatal victims. Sure care manufacturers try to make cars safer, but there's no hordes of people hating on cars calling for them to be abolished in favor of safer technology because people rely on them heavily.
Same for libc. People try to improve its safety, and try to advice and write about its dangers. Just because bugs exist and unsafe conditions can occur doesn't mean something should be dropped all together... a lot of the world relies heavily on libc, safe and unsafe uses of it even.
What's more is that libc and linux etc. are open-source. If someone knows a sound solution to these issues which does not break the entire world, they are free to submit pull requests....
simply stating something is 'rubbish' and needs to be put down is an unproductive and shortsighted sentiment.
One of the reasons X is being fazed out in favor of Wayland is because X is far more global than it needs to be -- and this is one of the reasons it has security risk that can't be completely removed without API-breaking effects.
If both Rust and C have independent standard libraries loaded into the same process, each would have an independent set of environment variables. So setting a variable from Rust wouldn't make it visible to the C code, which would break the article's usecase of configuring OpenSSL.
The only real solution is to have the operating system provide a thread-safe way of managing environment variables. Windows does so; but in Linux that's the job of libc, which refuses to provide thread-safety.
If there was a libc implemented in rust (like https://github.com/redox-os/relibc), you could use that for the C code in the process, and you'd be sharing the relevant state.
The crash in the article happened when Python called C's getenv. Rust could very well throw away libc, but then it would also be throwing away its great C interop story. Rust can't force Python to use its own stdlib instead of libc.
Linux is an unusual platform in that it allows you to call into it via assembly. Most other platforms require you to go through libc to do so. It's not really in Rust's hands.
This is not unusual at all. Windows allowed it for years before Linux came along. It was also true of some other *nix systems - IIRC, Ultrix (DEC) allowed this, and so did Dynix (Sequent).
*BSD allows it too, or used as of 2022.
What is unusual about Linux is that it guarantees a syscall ABI, meaning that if you follow it, you can make a system call "portably" across "any" version of Linux.
Sure, I’m speaking about platforms that are relevant today, not historical ones. Windows, MacOS, {Free,Open,Net}BSD, Solaris, illumos, none of these do.
It's quite easy to find out the actual situation on this since Go decided to do it their way. Last I checked, OpenBSD is the only OS where they go through libc, but I haven't really kept up.
Yep, in 2022 it finally started using libc on *BSD too.
But ... there's a difference between being able to do direct syscalls via asm, and them being portable across kernel versions, which is what this subthread was about.
Granted, most people want version portability, but still on a technical level, it's not the same thing.
No, my comment was about what APIs a platform considers to be their stable, external API. That you can technically call them anyway (except for ones like OpenBSD that actively check and prevent you) doesn't mean you're not doing something unsupported.
> and environment variables require an operating system
Is that true? It's just a process global string -> string map, that can be pre-loaded with values before the process starts, with a copy of the current state being passed to any sub-process. This could be trivially implemented with batch processing/supervisory programs.
Sure, there's a broader concept here, which doesn't require any operating system. But any alternate string->string map you define won't answer to C code calling getenv, won't be passed to child processes created with fork, won't be visible through /proc/$PID/environ, etc.
> They did, it's called core. But it assumes no operating system at all, and environment variables require an operating system.
I think there's some confusion here. The C standard library is an abstraction layer that exists to implement standard behavior on hardware. It's entirely unrelated to the existence of an OS. Things like "/proc/$PID/environ" have nothing to do with C.
There are many standard libraries, for embedded, that implement these things, like getenv, on bare metal [1].
Standard C libraries exist to implement functionality. It does not define how to implement the functionality. That's the whole point of C: it's an abstraction that has very little requirements.
The implementation of environment variables don't require an OS. If they made this "core", they could trivially implement the concept.
I think newlib requires a discussion of its own, and more generally, the concept of a "full" libc outside of a formal operating system.
To put it bluntly, newlib is an antisocial libc. It provides bare compileability of programs by implementing C and POSIX facilities atop a small set of system calls. However, in practice, it requires basically nothing to actually work. If you look at what it requires [1], you can see that virtually all of the system calls are allowed to do nothing but return an error. The only function that is actually shown to do something is sbrk which is a simple bump allocator, and even then it's only strongly recommended to work so that malloc also works since a lot of ordinary C programs use malloc. This says to me "get code to compile at all costs" with no concern for a wider environment (since there may be no "wider environment" in the first place).
More charitably, we can view newlib as a set of compatibility shims bridging hosted and freestanding C. This has a place, of course; there are C libraries that assume a hosted implementation but don't really need (all of) a hosted implementation.
This doesn't really apply to nostd Rust, and creating a set of "environment variables" that interoperate with nothing, just because you can, is kind of pointless when there's no O/S and no FFI involved. I explained more about why (IMO) core::env/alloc::env shouldn't exist in the other comment.
All that having been said, newlib does seem to sit in a position somewhere between core+alloc and full std in terms of Rust (std also includes networking). Maybe there is a need for FFI/C compatibility without networking? I can't say for sure, but I haven't needed it.
I don't think I'm confused, but let's recapitulate the thread history as I understand it:
Context: The setenv function is not thread-safe even in Rust
Question: Why doesn't Rust implement a standard library without C?
Answer: It does, but core lacks std::env, because env vars are part of an O/S
Question: Is an O/S really necessary for env vars?
Answer: Not conceptually, but without an O/S, env vars don't work as expected
I also like the sibling comment that pointed out env vars are social as much as technical. The key element is interoperability. And we haven't even discussed Windows, which has different functions and conventions for environment variables.
Now, let me address what you just said. First of all, on embedded, a freestanding C implementation is not even required to provide getenv at all. Second, while getenv is in standard C and required for hosted implementations, setenv is not. And the whole thread is really about setenv. Once we pull in setenv, we're talking not just about standard C but about POSIX, which is a specification for operating systems. I assume for the sake of fruitful discussion, we both accept that a variable put into the environment with setenv should be retrievable thereafter with getenv. This moreover should apply even if it's Rust that calls setenv and C that calls getenv and vice-versa.
So, however Rust implements environment variables should be consistent with how C implements environment variables, and since C provides the foundation for system calls and FFI for most other major languages, adhering to this convention allows interoperability across very many languages. This convention is defined by libc (the implementation of the C standard and POSIX interfaces) and thus interoperability is based on libc compatibility. So either Rust implements its own libc, which C programs would have to be (re-)compiled to use, or else it uses an existing implementation of libc, inheriting all of its quirks. Indeed, Rust targets specify the libc (or equivalent) they're using, such as -gnu, -musl, -darwin, -mingw, -msvc, etc. Linking with libraries built for a different libc on an otherwise identical platform (-gnu vs. -musl on Linux, -mingw vs -msvc on Windows) generally doesn't work and even when it appears to work leads to strange issues later. So you can't just write your own getenv and expect it to work with some other implementation of setenv.
To connect back with my other comments, there is no core::env because core assumes no libc at all. The nostd flavor of Rust (where core is available but not std) is basically equivalent to freestanding C and like freestanding C there is no interoperability guarantee, not even with freestanding C on the same hardware (indeed, the whole concept of "freestanding" is that there are no conventions to adhere to in the first place). So, std::env::set_env has the exact same problems as C setenv because it's the same thing under the hood. This cannot be addressed without fixing libc itself. Moreover, when libc is not involved, then there is no env to support to begin with.
Finally, to round out addressing what you said, core::env could exist, but probably shouldn't, for two reasons. First, it would be misleading. As I've already laid out, it would not interoperate with anything since there's nothing there to interoperate with. It would just be a global string->string map exclusive to that program, which the programmer could just as well create on his own. Second, because presumably you want it to be something other than empty, it would require some kind of global allocator, which core also assumes doesn't exist. So it would have to be something like alloc::env instead, and once you've pulled in alloc, you can just use one of the collection types (though, notably, HashMap isn't in alloc yet [1]).
Well, it's used by the OS when exec-ing a new process, but at least the Linux syscall for that takes the environment as an explicit parameter. So it could be managed in whatever way by the runtime until execve() is called.
It would be a tremendous amount of work, and would take years. Meanwhile, the problems are avoidable. It's not exactly the "rust way" to just remember and avoid problems, but everything in language design is compromises.
> Why use Eyra? It fixes Rust's set_var unsoundness issue. The environment-variable implementation leaks memory internally (it is optional, but enabled by default), so setenv etc. are thread-safe.
I think glibc made the same trade-off. It makes sense for most types of programs, but there's certainly a lot of classes of programs that wouldn't take it.
This reminds of the time I was not able to get setproctitle to work in certain code base. Eventually I narrowed the issue to this line:
import numpy
setproctitle() worked before numpy import but not after because it couldn't find the memory address of **environ.
I'm hazy on the details but it led me to a somethingenv call (possibly getenv or setenv) in numpy initialization and it turned out that function changed the address of **environ and that was the reason for why setproctitle couldn't find it.
What is the rationale for libc not making setenv/getenv thread safe? It does seem rather odd given how environment variables are explicitly defined as shared between threads in the same process!
It doesn't seem it would take much to do it efficiently, even retaining the poor getenv() pointer-returning API (which could point to a thread local buffer). The coordination between getenv and setenv could be very lightweight - spinlock vs mutex.
The spec says it's not supposed to be thread safe.
There's also no real backwards compatible way of fixing setenv(). getenv() returns a pointer that can be read at any time, and then there's the *environment parameter that can also be used to read env variables.
IMO the entire API should be deprecated for a thread safe one, but until someone comes with a standard setenv() alternative that's implemented by the libc runtimes, we'll be stuck with the shitty POSIX API, and every year we will read blog posts about get/setenv() crashing processes.
I think the argument was that the standard states that setenv is not thread safe, although from what I see it says that it does not have to be thread safe:
The setenv( ) function need not be thread-safe. A function that is not required to be thread-safe is not required to be reentrant.
Sure, but given that Linux defines the environment as state that's shared between threads, not having a thread-safe way of accessing it is hard to defend...
Is "the standard says it doesn't NEED to be thread safe" the argument that the Linux libc maintainers are using for not enhancing it to be thread safe, or is it based on some technical or backwards compatibility issues in doing so ?
The only thread-safe way to implement getenv/setenv as they currently exist is to leak the previous state when setenv allocates, such that existing pointers stay valid. The existing API simply lacks a mechanism to synchronize correctly.
Leaking would be good enough for many use cases, but it would break long-running users of setenv (mainly those with libraries abusing env vars, as in TFA), and doesn't even solve how they interact with putenv and environ. This whole API is just cursed.
Libc could of course get better APIs, like GetEnvironmentVariable on Windows, but that won't fix all existing code.
Only if we're willing to take for granted that a call to getenv invalidates the previous one. POSIX allows it, but I'm concerned about runtimes scheduling user tasks on the same thread.
If current platforms are safely making a copy of getenv before allowing their scheduler to interrupt, then yes I'd be ok with your solution.
If you wanted to avoid "only latest getenv pointer per thread is valid", then the thread local data structure could be a var-name-> buffer map rather than a single reused buffer.
Worst case memory usage (all threads get all vars) is that you end up having a separate copy of the environment per thread, but it seems this is the best that can be done given the awful API.
Yet another person is burned by calling setenv() in a multi-threaded context. There really needs to be a big warning banner on the manpage for setenv() that warns about this because it seems like a far more common problem than you would expect.
It's time to move beyond this attitude and make things safe by default. For example, Solaris has a safer version of setenv().
"It is ridiculous that this has been a known problem for so long. It has wasted thousands of hours of people's time, either debugging the problems, or debating what to do about it. We know how to fix the problem." https://www.evanjones.ca/setenv-is-not-thread-safe.html
One of the major differences between X Window and the win32 GUI APIs is that the windows one builds in thread safety, and it cannot be removed. This means that you pay the price of mutexes and the like (what the windows world likes to call "critical sections"), even if you have a single threaded GUI. X Window, on the other hand, decided to do nothing about threads at all, leaving it up to the application.
30 years after these decisions were made, most sensible people do single threaded GUIs anyway (that is, all calls to the windowing API come from a single thread, and all redraws occur synchronously with respect to that thread; this does not block the use of threads functioning as workers on behalf of the GUI, but they are not allowed to make windowing API calls themselves).
Consequently, the overhead present in the win32 API is basically just dead-weight, there to make sure that "things are safe by default".
There's a design lesson here for everyone, though precisely what it is will likely still be argued about.
"If you detached a thread in your application using a non-Cocoa API, such as the POSIX or Multiprocessing Services APIs, this method could still return NO."
Also, I've never heard of this behavior despite years developing for macOS (admittedly tangentially). I don't see how that could work given that threads can come and go during the life of the application.
Interesting. Definitely a 3rd approach that threads the needle between what win32 and X Window chose. Thanks for the link.
[ EDIT: not quite sure how to think about this ... if I create NSThreads to act as worker threads that do not make cocoa calls, I still have to deal with new overhead in any cocoa call stacks. That's not ideal, but again, it's a "middle-way" approach, and like every other approach has its own pros and cons. ]
Yet 30 years later people are calling setenv()/getenv() from different threads even though "it is known" that it crashes. For whatever reason the lesson from GUIs doesn't apply here.
Judging from a lot of the comments in this thread, the idea that there could even be parts of the *POSIX API* that are not thread-safe seems like an idea that hasn't even occured to a lot of (younger?) programmers ...
uncontended mutexes are very cheap but not free. lock cmpxchg has way higher latency (and coherency traffic) costs than a simple move or (xchg). Java had lock elision, effectively trying to solve the hardware problem in software back in mid 00s. There are optimizations to be made it running on a single core (no need for the lock), e.g. docker with taskset
You could wrap setenv in a mutex, but that's not good enough. It can still be called from different processes, which means you'd need to do a more expensive and complex syncing system to make it safe.
That ballons out to other env related methods needing to honor the synchronization primitive in order for there to be a semblance of safety.
However, you still end up in a scenario where you can call
setenv
getenv
and that would be incorrect because between the set and the get, even with mutexes properly in place and coordinated amongst different applications, you have a race condition where your set can be overwritten by another application's set before your get can run. Now, instead of actually making these functions safe you've buried the fact that external processes (or your own threads) can mess with env state.
The solution is to stop using env as some sort of global variable and instead treat it as a constant when the application starts. Using setenv should be mostly discouraged because of these issues.
How does an external process mess with env state? As far as I know, you pass the environment when doing the execvpe() and then you cannot touch it from outside of the process anymore.
You're correct. Parent comment is inaccurate. The problem is that a different library in the same process can use getenv without locking (or without locking the same lock as your code)
Of course you can. Mutexes are system objects, so it's not a huge problem to sync across processes, if you really have to (is it really expected that one process can set env vars inside another process?).
Making global state, especially state that has no reason to be modified or even read very often like the env, thread safe is a trivial issue, well studied and understood. Could an intern do it? Probably not. Could literally any maintainer of a standard C library? Easily.
This is much more of a culture problem preventing such obvious flaws from being recognized as such.
Side-note: your set-then-get example is a theoretical problem in search of a use case. Why would you ever want to concurrently set an env var and expect to be guaranteed to read that same value? And even if this is a real thing that applications really use, exposing a new function to sync anything on the env mutex is, again, trivial. So, if you really needed that, you could do
That doesn't solve anything. You could be using a library (perhaps a closed-source one) that doesn't use these hypothetical lockenv()/unlockenv() functions.
This needs to be fixed inside libc, but there's no way to do so completely without breaking backward-compatibility.
Yes, I was talking about fixes inside libc. The poster above was claiming it can't be done inside libc. And the lockvenv/unlockvenv functions I was mentioning were meant to exist besides the internal locking inside setenv/getenv. They would only be used if you needed transactional access (a combination of setting/getting multiple env vars atomically).
using copy on write would be easier (and more performant), along with getenv_r. POSIX requires not to copy the data which makes the entire mutex/lock or CoW pointless. Of course, there will be the mandatory mentioning of "extern char *environ;"[0]. That returns are raw C strings as you can find them.
What could work is per thread env. changes - but that's not likely to happen
That is a technical solution. What is your solution to the much more serious social problem of adding this check to every codebase in existence? What points of leverage do you have?
The point was about adding a mutex inside libc in getenv and setenv. That way, every codebase in existence automatically gets this safety. The poster I was replying to claimed that this wouldn't help, because it would still not offer thread safety when doing multiple operations.
I pointed out that, in addition to libc setenv/getenv using a mutex internally, they could also expose new functions to allow transactional access for anyone that really needs it - though I suspect that is a vanishingly small minority.
I am not sure making things safe by default is a good idea. This always comes with a cost. Thats also the reason why basic data types (array, dictionaries, etc) are generally not thread safe… because its usually not needed or handled on a much higher level.
Its a different story for languages/environments that are supposed to be safe by default and where you have language features that ensure safety (actors, optionals etc) but not for something like libc which has a standard it has to conform to and like 100 years of history.
The problem with `setenv` is that people expect one process to have one set of environment variables, which is shared across multiple languages running in that process. This implies every language must let its environment variables be managed by a central language-independent library -- and on POSIX systems, that's libc.
So if libc refuses to provide thread-safety, that impacts not just C, but all possible languages (except for those that cannot call into C-libraries; as those don't need to bother synchronizing the environment with libc).
A conformant implementation can make a non-reentrant function actually safe under the hood for people that call into it erroneously. Unfortunately, there is no way to do this for getenv/setenv, because of the API they expose (specifically, when environ is accessed directly).
In some cases this is true. In the case of setting and getting env vars, it is not. There is no comceivable reason for making a process that spends any significant portion of its runtime calling setenv() or getenv(). Even if those calls were a thousand times slower than today, it would still be a non-issue.
By definition, a "reentrant function" is a function that may be invoked even when it has not returned yet from a previous invocation.
So a non-reentrant function is a function that may not be invoked again between a previous invocation and returning from that invocation.
When a function may be invoked from different threads, then it is certain that sometimes it will be invoked by a thread before returning from a previous invocation from a different thread.
Therefore any function that may be invoked from different threads must be reentrant. Otherwise the behavior of the program is unpredictable. Reentrant functions may be required even in single-thread programs, when they may be invoked recursively, or they may be invoked by signal handlers.
An implementation of "malloc" may be reentrant or it may be non-reentrant.
Old "malloc" implementations were usually non-reentrant because they used global variables for managing the heap. Such "malloc" functions could not be used in multi-threaded programs.
Modern "malloc" implementations are reentrant, either by using only thread-local storage or by using shared global variables to which some method for concurrent access is implemented, e.g. with mutual exclusion.
No, this is confused. Reentrancy ("reentrant-safe", or the somewhat related POSIX definition of async-signal-safe) and thread safety are not the same thing.
A reentrant function is thread-safe, but a thread-safe function may or may not be reentrant.
For instance, if a function uses mutual exclusion (say, posix_mutex_lock() and friends) to ensure thread-safety it won't be reentrant, because if the function is invoked via a signal handler it may deadlock. Which is why many common libc functions like malloc and stdio are not required to be async-signal-safe in POSIX, whereas they are required to be thread-safe.
Therefore I do not think that anyone has bothered to implement a signal-safe malloc, as this is likely to be complicated.
Allocating memory in a signal handler makes no sense in a well designed program, so not being allowed to use malloc and related functions is not a problem.
1. If a process crashes and dumps, be sure to look at the system log of the cause (e.g. SIGSEGV, OOM, invalid instruction, etc.)
2. Be certain you’re looking at the right core dumps — I believe UID 1000 just means posix UserID (which is unrelated to a PID), though I don’t use containers.
3. Stay focused on the right level of abstraction — memory model details are great to know, but irrelevant here.
4. Variables do not correlate 1:1 with registers, except in C calling conventions. The assumption about x20 and a local variable is incorrect, unfortunately.
5. getenv() and setenv() do not work as implied in the post. When a process starts via execve(), the OS/libc constructs a new snapshot of the environment, and cannot be modified by an ancestral process. It’s a snapshot in time, unless updated by the process itself. When a process fork()s, the child gets a new copy of the parent’s environment — updates do not propagate.
getenv() is thread safe and reentrant. You don’t use an environment to pass shared data — setenv() is generally used when constructing the environment for a child process before a fork(). See man environment.
6. FWIW, ‘char** env’ is a null-terminated array of pointers, so dumping memory from *env (or env[0]) is only valid until you hit the first NULL. The size of the array is not stored in the array.
I hope this helps! And apologies if this is redundant — I read so many comments; mostly variations of “the problem with getenv is x”, but gave up before reading all of the (currently) 168 comments.
I'm kind of confused by this response. It doesn't seem to match the actual article? For example, they consulted the code to find what x20 had in it, rather than blindly guessing. Doing that is perfectly fine and even desirable when analyzing crashes. There is no forking mentioned. People call setenv all the time when trying to modify their own environment (hence the crashes!). Nobody said anything about the size of env.
x20 is a general purpose register; optimizing compilers can use it for any number of variables, immediate values or intermediate computations at different points within that same function — or none at all (the variable ep could be optimized away).
Re: fork(), I just meant to be thorough in explaining the environment is copied, not shared by processes. Setenv() only affects the process from which it’s called.
The array size bit in the article:
The value 0x220 looks suspiciously close to the size of the old environment in 64-bit words (0x220 / 8 = 68), and this value was written over the terminating NULL of the environment block…
No, it does not. I don't think you understand what you are talking about, because none of these actually address the points I brought up. They use the same words, but semantically they are talking about something completely different.
I provided a copy/paste from the site about the envp array size you asked about.
I clarified why I mentioned fork().
I tried to explain the difference between registers and variables.
I’m not trying to show off or bring anyone down… I just like to help people. I’m old (my first Linux kernel commit was in 2004). And I could be wrong — please LMK if I made a factual error (I’d appreciate it, honestly).
I am going to do this once, but not again. Please pay attention to it. You are not just wrong, but failing to demonstrate an understanding of the actual topic being discussed. I can't say whether you actually have it or not, but your responses do not demonstrate this. I have dealt with plenty of people on this site who say things that are factually incorrect, many of whom have argued with me when I do so. You are not doing that; rather you are not even understanding what I am saying.
The article specifically mentions that the authors consulted the disassembly to see what was in x20. I know it is a general purpose register. They know it is a general purpose register. This knowledge is completely irrelevant: they read the code, they matched it against the actual source, they can confirm that at the time of crash x20 contains what they said it contains. The compiler optimizations have already run. They can't change anything anymore. That you mentioned this shows that you do not follow the actual order of events here.
envp, similarly, is in the process of being operated on in the crashing code. The authors grabbed its size from some random context at the time of the crash. The fact that it is not actually stored in the array itself is completely irrelevant to the fact that its numeric value was present in the crash dump. Obviously, some code that operated on it had computed the value and stashed it, which is a completely natural and expected thing for this code to do.
Finally, nobody cares about setenv across processes. The article didn't talk about this. It's completely irrelevant to mention this, and in fact there is another comment further down (which you may not have read, I'm ok with that) that also has the same confusion and it belies a poor grasp of what the actual problem is.
You can see that I am forced to do significantly more work than you to respond to what specifically is the problem here. It looks like you are pattern matching on specific words and then regurgitating your knowledge on it, whether it is relevant or not. When it's not, it's essentially just spam; when it is you fail to actually take into account the content that is actually being discussed. When I'm talking about how I almost got run over by a driver on their phone you are not welcome to step in and start talking about how a lot of hit-and-runs involve drunk drivers. I wasn't talking about a hit-and-run, and I just told you the person was on their phone. Somehow you completely missed that and kept talking about what you wanted to mention, like if you gave the gist of the conversation to someone else and asked them for their response on it and then pasted that here without checking to see if it was relevant or not. Don't do that.
My policy about interacting with a person using a bot is actually the exact same as it is when interacting with someone who writes their own comments. This is actually very convenient because it completely eliminates any arguments about whether or not they are using an LLM or whether I have some sort of "bias" against them. My core argument is this: I treat the content coming out of it as being said by you. In this case the comments were of substandard quality. If the user was writing them by themselves, then the hope is that they will read my message and realize why and improve themselves in the future. If it was done by consulting something else, the idea is that they should reconsider the quality of its output. Either way, they're the one who comes out of it looking poorly.
For 1 & 2, the issue wasn't that the author was looking at the wrong logs/coredumps. It's that coredumps from inside containers typically don't match the symbols available outside the container - you either have to run gdb inside a matching container, or rebuild the contents of the container in the host environment (as they did here).
3. There's nothing wrong with the level of abstraction here. If you have a crash that occurs on ARM but not on amd64, the differences in how those architectures operate is a very reasonable initial assumption.
4. The value in x20 is the same value in the local variable in question. Even though there may not be a general one-to-one mapping between variables and registers, at this particular instant in time that variable does correspond to this register.
5 is irrelevant, as the article isn't discussing forking. It's discussing the (somewhat questionable) practice of a program using getenv/setenv as mutable state.
For 6, the article doesn't say that env stores its own array length. It says that setenv called something like free() on the old env array, and free() overwrote env with the length of the memory allocation (which is a quite reasonable way for malloc to do book keeping).
Click bait title? GLibC is very clear about what is and what is not thread-safe. I looked at the article: They fell victim to the classic getenv()/setenv() trap. This has been blogged about many times. If you look at the man page for setenv():
Rust literally bakes data race safety into the language. While it does not resolve general race conditions, thread safety issues which cause memory unsafety (which an UAF or dangling pointer would be) are very much within its remit.
It is weird that I got this right before Rust did.
Because I use structured concurrency, I can make it so every thread has its own environment stack. To add to a new environment, I duplicate it, add the new variable, and push the new enviroment on the stack.
Then I can use code blocks to delimit where that stack should be popped. [1]
This is all perfectly safe, no `unsafe` required, and can even extend to other things like the current working directory. [2]
IMO, Rust got this wrong 10 years ago when Leakpocalypse broke. [3]