
Don't setenv in multi-threaded code on glibc - r4um
http://rachelbythebay.com/w/2017/01/30/env/
======
roblabla
My first thought when reading this is "what about rust" ? Rust uses the system
libc to support most of it's standard library, so could it have the same
problem ?

A quick look at the std::env::set_env docs [0] tells us that Rust is aware the
underlying implementation are inherently thread-unsafe, and looking at its
implementation [1] tells us that rust uses a global lock for all access to the
environment.

So at least, it will avoid segfaulting your programs.

Good job rust :D

EDIT: However, I guess if you fork() (which is not something you can do with
the standard rust library AFAIK, so requires unsafe), you may get the first
problem of having a deadlock.

[0]: [https://doc.rust-lang.org/std/env/fn.set_var.html](https://doc.rust-
lang.org/std/env/fn.set_var.html)

[1]: [https://github.com/rust-
lang/rust/blob/51d29343c04a27570a8ff...](https://github.com/rust-
lang/rust/blob/51d29343c04a27570a8ff8282611007d6e6408de/src/libstd/sys/unix/os.rs#L415)

~~~
gens
Why does everything related to C have to have comments on rust ?

~~~
Sharlin
Because the point of Rust is to be a better C.

~~~
Keluri
So? That doesn't mean every submitted article mentioning C or C++ has to
immediately be filled with comments on how Rust is so much better, safer, etc
- So is Java and C#, and you don't see those communities espousing their
language the same way!

~~~
theamk
Because Java and C# are not system programming languages and are not intended
to replace C?

Would I want my init system written in Java or C#? Hell no. Would I want it
written in Rust? Yes please maybe it will crash less and have fewer CVEs.

I have a belief that every time a C developer switches to Rust, users benefit
(unless developer goes crazy from Rust's cognitive load, but this is a
separate issue :) )

~~~
gens
If you ask me rust is also not for programming systems. (the error handling
specifically)

Side note: Who even started the whole "x is a system programming language" ?
Who said you can't write "system software"[0] in just about any language that
can (directly or indirectly) do "lower level" things. For example the MS
Singularity OS has a virtual machine and all the drivers and everything is
high level code.

>Would I want my init system written in Java or C#? Hell no. Would I want it
written in Rust? Yes please maybe it will crash less and have fewer CVEs.

Contrary to popular belief init systems are _simple_ and can be written in
_any_ language. Not to mention that when people talk about inits they don't
think about the actual initialization part (mounting, setting network, etc)
but about the starting of programs part. Those two things are similar from a
far but are really not so much.

>I have a belief that every time a C developer switches to Rust, users benefit
(unless developer goes crazy from Rust's cognitive load, but this is a
separate issue :) )

I have a belief that people should make up their own mind and do whatever they
want. This topic is about C and only about C. All other talk is "noise".

Note that i actually like programming in C as i like optimized programs, so i
may be biased. (although not nearly as biased as the rust crowd)

[0]
[https://en.wikipedia.org/wiki/System_programming_language](https://en.wikipedia.org/wiki/System_programming_language)

~~~
jjnoakes
> This topic is about C and only about C. All other talk is "noise".

Why do you get to decide? There are a lot of people in this thread talking
about similar issues in other languages (Python, Rust, etc) and other
functions (like fopen() and exit()). I think those discussions are very on-
topic.

Why isn't this topic about "issues between threads and setenv in any language
that uses glibc"?

Why isn't this topic about "issues between threads and other functions in any
language that uses glibc"?

If you don't like a particular conversation in this topic, you can collapse it
and move on.

~~~
gens
From one (or more) of the comments here you can see that the reasoning for
rust in topics about C is "everybody should write rust instead of C". At least
for some people.

One other comment got it right, it is annoying to have multiple rust comments
in _every_ thread about C.

But you are right, best way to deal with these kinds of... i don't want to be
rude, is ignoring.

PS Thank you for addressing the rest of the comment, not just the part you
made personal.

------
quotemstr
Windows has sane environment-retrieval functions:
[https://msdn.microsoft.com/en-
us/library/windows/desktop/ms6...](https://msdn.microsoft.com/en-
us/library/windows/desktop/ms683188\(v=vs.85\).aspx)

Unlike getenv, GetEnvironmentVariable copies a variable's value into a _caller
provided buffer_ , addressing the use-after-free problem inherent in getenv's
interface.

Wouldn't it be nice if glibc got something like a getenv2 with a similar
interface? Why should people have to trip over weird corner cases in POSIX
over and over and over again? Why are we afraid to add new APIs that make some
damn sense?

~~~
Sanddancer
This is a pure glibcism. FreeBSD libc does similar with getenv, in that the
returned environment is a snapshot of that point in time.

~~~
simias
IMO getenv's API itself is to blame since it won't let you provide a buffer.
As such if the libc wants to provide a local buffer it has to figure out a way
to allocate it for the caller.

As far as I can tell glibc's implementation is probably what the people who
designed getenv in the first place had in mind, just return a pointer to a
global buffer. At best FreeBSD managed to create a workaround for modern
environments.

There are unfortunately many such oddities in the standard libc APIs, remnants
from an other time. Let us remember that "gets" was standardized as part of
the C standard at some point, a function that's by design literally impossible
to use safely.

~~~
cesarb
> Let us remember that "gets" was standardized as part of the C standard at
> some point, a function that's by design literally impossible to use safely.

It _is_ possible in theory to use gets safely, as long as your standard input
is trusted. For instance, if a process forks and connects the standard input
in the child to a pipe from the parent, which always writes a fixed amount of
data to the pipe, you can use gets() without risk of overflow.

(This is a really contrived scenario, but I don't know of any simpler one
where using gets() is safe.)

~~~
MaulingMonkey
> It _is_ possible in theory to use gets safely, as long as your standard
> input is trusted.

Only if we define "trusted" to include "known to be bug free" in e.g. it's
truncation or bounding of output over the pipe to the child. I argue that, in
theory, this is impossible to know, and thus that it this level of trust is
impossible, and thus that this is _not_ an example of a potential safe use of
gets.

Even a mathematical proof of safety, after all, could contain errors - or
could prove the wrong thing - or could apply to the code as written and not
the code as messed with by your optimizer - or another thread - or an injected
dll - or ...

> For instance, if a process forks and connects the standard input in the
> child to a pipe from the parent, which always writes a fixed amount of data
> to the pipe, you can use gets() without risk of overflow.

This is also insufficient - one must also prevent nonstandard invocations of
the child process. Even if your normal parent process gives the child input
that is 100% safe, that's no guarantee that an attacker won't launch your
child process in an unusual manner. If the child process is suid, for example,
this would be a potential avenue for privilege escalation.

~~~
cesarb
> Only if we define "trusted" to include "known to be bug free" in e.g. it's
> truncation or bounding of output over the pipe to the child. I argue that,
> in theory, this is impossible to know, and thus that it this level of trust
> is impossible, and thus that this is not an example of a potential safe use
> of gets.

"In the security engineering subspecialty of computer science, a trusted
system is a system that is relied upon to a specified extent to enforce a
specified security policy. As such, a trusted system is one whose failure may
break a specified security policy." \--
[https://en.wikipedia.org/wiki/Trusted_system](https://en.wikipedia.org/wiki/Trusted_system)

That's the definition of "trusted" I'm using.

> This is also insufficient - one must also prevent nonstandard invocations of
> the child process.

I'm thinking of pure fork(), not fork+exec. That is, the child process is the
same executable image, so the only way to invoke the child process in a
nonstandard way would be through a debugger. And that is why the child process
can trust the parent process in my example: they're the same process until the
fork().

(As I said, it's a really contrived scenario. In more realistic scenarios,
gets() is unsafe.)

~~~
comex
Well, on Linux, there are various hardening mechanisms that lock down ptrace,
as well as things like /proc/PID/mem/, but I don't think the same restrictions
apply to grabbing fds from other processes via /proc/PID/fd/. So in theory a
process running as the same user (but which can't just debug your process due
to hardening) could steal your pipe and exploit your program... probably not a
terribly realistic threat, but a contrived objection to a contrived scenario
:)

------
ChrisFoster
I've had fun debugging this very problem in the past, as the root cause of a
very puzzling intermittent failure.

The symptoms were intermittent segfaults in some high level scientific python
code running on a large cluster in AWS. Given we had no custom C library
extensions, this should be impossible, but there they were; the jobs would
fail randomly for one in every several thousand runs, if memory serves.

We eventually tracked this to a call to getenv() in the OpenBLAS thread pool
per-thread initialization code. This seems innocuous enough by itself, but
couple it with some other unrelated python library which happened to call
setenv() and you have a nice race condition accessing the internal glibc data
structures. Ultimately this ended up with a segfault due to dereferencing some
already freed memory.

Ah, good times. Here's the root cause analysis if anyone wants the gory
details :-)
[https://github.com/xianyi/OpenBLAS/issues/716#issuecomment-1...](https://github.com/xianyi/OpenBLAS/issues/716#issuecomment-164339663)

------
noselasd
Another fun one is "don't exit() in multi-threaded code on glibc".

It goes like this:

1\. Thread A calls FILE f = fopen(); if (f == NULL) {error}

2\. Thread B calls exit(). exit() flushes and closes all open FILE (it does so
it an internal atexit() handler)

3\. Thread A determined that the file opened just fine, f != NULL. But Thread
B has called fclose() on it, so any use in thread A is a use-after-free if
Thread B is a tad slow in actually exiting the process

~~~
quotemstr
If this bug actually still happens, it's an implementation problem in glibc,
not a fundamental problem with the fopen and fclose interface. Regular locks
can guard against this scenario just fine.

Do you have a glibc bug number?

~~~
mzs
Not really, here's a different example where locking would not help you with
file streams. You expect all streams to be flushed on exit() but I can't think
of a unix-like that handles at_exit(_exit); correctly then, correctly in the
sense that the streams no longer get flushed. In fact I have used this feature
to prevent that flushing on exit() ;)

~~~
quotemstr
Your example doesn't result in a deadlock or undefined behavior though. It
just results in some at-exit code not being run.

Really, programs shouldn't expect _anything_ to happen on exit. After all,
programs can die at any time for any reason anyway. Personally, I'd do away
with both atexit and static destructors. Your program needs to be able to
handle the power cord being yanked, so it needs to be able to handle sudden
death no matter what.

------
Sharlin
I guess this is one of those "obvious in retrospect" things, but modifying
global state in a multithreaded program is definitely something that should
immediately ring alarm bells unless the API is specified to be thread-safe.

------
nickcw
I wrote the equivalent code in Go to see what happened

[https://play.golang.org/p/E3glBkbAo3](https://play.golang.org/p/E3glBkbAo3)

(You'll need to run it locally for the full effect - things on play are
limited to a single thread.)

It didn't crash.

I wasn't suprised though as Go doesn't use the C library and its authors are
very careful about multithreaded code.

~~~
bonzini
Indeed go protects environment variable access with a lock.

[https://golang.org/src/syscall/env_unix.go](https://golang.org/src/syscall/env_unix.go)

~~~
mzs
more importantly go never calls getenv()

~~~
bonzini
Not "more importantly". It would have exactly the same behavior if it called
getenv (and copied the result) under a mutex.

And likewise it would be subject to use-after-free races if it didn't call
getenv but also didn't use a mutex to protect its accesses.

So the important bit is the mutex.

~~~
mzs
[https://blog.golang.org/c-go-cgo](https://blog.golang.org/c-go-cgo)

Say you call foo() in foo.c and foo() calls getenv() and setenv() with your
mutex in the safeenv.go approach you described, all those calls from foo.c
don't have the mutex around them that your foo.go uses. Worse yet it might not
be your foo.c but part of libfoo in your OS.

~~~
bonzini
But Go's Setenv calls setenv_c, so a rogue C getenv would still be able to
provoke races in Go code. Yes, a rogue C setenv would not provoke races in Go
code calling getenv, but C interoperability is busted anyway---that's why I
said that the mutex is the important part. It would work just as well
(excluding cgo) if Go getenv called C getenv.

They don't use getenv because it does a slow linear search, while Go can use a
more efficient string->string map.

------
amelius
How is it even defined to setenv something in multi-threaded code?

Also, if you want to pass an environment variable to a child process from a
specific thread, how do you do that without interfering with other threads
that might do the same (alter the same variable to pass it to a different
child process)? Would that require an ugly mutex around invoking child
processes?

~~~
to3m
Same way anything would be: the set is atomic, and if you the programmer want
something specific then you'll have to arrange that. It doesn't have to be
magic - it just has not to crash. That's a much lower bar.

As for how you do this sort of thing in general, fork/exec from a
multithreaded POSIX program is a minefield and you're best off not doing it.
But, if you insist, it looks like posix_spawn might help, or you can do your
setenv in the child before the exec. setenv isn't async signal-safe, so
strictly speaking you're not allowed to do this.

If this turns out actually to be a problem, introduce a second wrapper EXE
that you spawn, supplying environment stuff and arguments for subchild EXE.
Fork, exec the wrapper, have it do its thing, then have it exec the EXE you
actually want (no fork).

I haven't done that for setting the environment, but I have for setting
CLOEXEC on file descriptors when doing fork/exec as part of a (large, gnarly,
zillions of FDs, tons of threads, closed source) program.

~~~
agwa
> If this turns out actually to be a problem, introduce a second wrapper EXE
> that you spawn, supplying environment stuff and arguments for subchild EXE.
> Fork, exec the wrapper, have it do its thing, then have it exec the EXE you
> actually want (no fork).

It doesn't have to be that complicated. After the fork, in the child, you can
construct your own local environment array, and pass it to execve or one of
the other exec functions that takes an environ argument. This is perfectly
safe behavior in a multi-threaded program.

~~~
to3m
Yes - good point. I think the wrapper approach can still make sense for
CLOEXEC sometimes, but you should absolutely always ignore what I said about
using it for the environment ;)

Never having needed to do exactly this myself in practice, I'd quite forgotten
about the ...e functions.

------
jakeogh
Does musl have the same prob?

~~~
bonzini
Yes. You can get a use-after-free on __environ.

------
qwertyuiop924
You know, I'm starting to think that multithreading may not be the best idea
in an environment explicitly not designed for it, and we should be using other
approaches, it the latency requirements of the application make that possible.

If only we had some sort of shared-nothing concurrency, like a syscall that
_forks_ your process and uses copy-on-write to make it efficient. But that's
just crazy talk.

------
tropo
We should just deprecate setenv(). There is no need for it because we have
execve().

I suppose there is system() on non-POSIX, but then you don't have a portable
shell anyway nor do you have portable environment variables. You're off in
100% non-portable territory at that point, so no point having it in the
standards.

------
jwilk
If setenv() is not thread-safe, then why is it using locks?

~~~
kelnos
setenv() is indeed thread-safe. getenv() is not. That is, you can end up in
the following situation:

1\. thread1 calls getenv() and gets a pointer to a section of environ A

2\. thread2 calls setenv() & acquires lock & allocs environ B & frees environ
A & releases lock

3\. thread1 tries to look at the pointer in environ A, and crashes because
environ A has been freed

Now, if you never call getenv(), and just call setenv() from multiple threads,
that's safe, because setenv() itself is thread-safe. It's the combination that
kills you.

------
py_throw
What about Python? Is it also affected?

~~~
RabbitmqGuy
From the way I read the python source code[1] implementation of the
Environment class, I think python is also affected. 1\.
[https://github.com/python/cpython/blob/master/Lib/os.py#L664](https://github.com/python/cpython/blob/master/Lib/os.py#L664)

~~~
poizan42
Uhh that just looks the key up in a python dict in data. That itself is thread
safe. But if you read the code more carefully you would see that the _Environ
class is constructed from _createenviron[1], where it expects a dict-like
global called environ to already exist, which it's just a wrapper over. The
pre-existing environ is then overwritten by the _Environ object[2].

The original os.environ is created somewhere else entirely. This file just
sets it to an empty dict if it isn't already set[3].

os.environ is actually set in native code, here [4]. It is created by the
createenviron function[5]. Now what you should note is that this is actually
created as a python dict, and the native environment is copied into it. So a
python program usually never touches the native environment after
initialization. This is fine as long as only python functions are used to
start child processes as these converts the environment into a native
environment block and gives it as the envp pointer to the exec/spawn
functions.

[1]:
[https://github.com/python/cpython/blob/a65137dc41fb52bdb1ca2...](https://github.com/python/cpython/blob/a65137dc41fb52bdb1ca294122cced3f63373e9b/Lib/os.py#L723)

[2]:
[https://github.com/python/cpython/blob/a65137dc41fb52bdb1ca2...](https://github.com/python/cpython/blob/a65137dc41fb52bdb1ca294122cced3f63373e9b/Lib/os.py#L754)

[3]:
[https://github.com/python/cpython/blob/a65137dc41fb52bdb1ca2...](https://github.com/python/cpython/blob/a65137dc41fb52bdb1ca294122cced3f63373e9b/Lib/os.py#L516)

[4]:
[https://github.com/python/cpython/blob/b9e40ed1bcce127893e40...](https://github.com/python/cpython/blob/b9e40ed1bcce127893e40dd355087cda7187ac27/Modules/posixmodule.c#L12953)

[5]:
[https://github.com/python/cpython/blob/b9e40ed1bcce127893e40...](https://github.com/python/cpython/blob/b9e40ed1bcce127893e40dd355087cda7187ac27/Modules/posixmodule.c#L1223)

~~~
JdeBP
The C runtime libraries on various platforms do much the same. The environment
array is not provided as such by the operating system, so the C runtime
libraries set up their internal instances of the environment array, and
sometimes of the entire environment. As you say, everything is easy as long as
one always sticks to the C library functions for accessing/using the
environment, and as long as one does not mix multiple C runtime libraries with
one another within a single process (or indeed multiple runtime libraries from
multiple languages).

------
dendisuhubdy
grok-machine:glibc dendisuhubdy$ gcc -g -o mtenv metenv.c -lpthread && ./mtenv
start

no segfault detected

------
ambrop7
(ignore this comment)

~~~
jjnoakes
The issue has nothing to do with fork or exec; if you have two threads in a
process, and one calls setenv() while another calls getenv(), you can hit this
issue.

~~~
ambrop7
Ah, right, seems like I have mistaken the first paragraph as a summary of the
post.

------
aangjie
Funny.. it seems gcc 5.4.0 shipped with ubuntu 16.04 deletes the source file
if we try to compile(fails with a linker error) it.

