
C library system-call wrappers, or the lack thereof - Tomte
https://lwn.net/SubscriberLink/771441/96f587a2dec5ba1a/
======
joshumax
I ran into this problem a little while back trying to get my Linux
implementation of the BSD unveil() system call merged into mainline. Some of
the responses to the RFC told me that it shouldn't be added because glibc
likely won't add a syscall wrapper for it. However, a response from glibc
states that they won't consider adding it until it's been successfully merged
into mainline, creating a sort of catch-22 situation.

~~~
kjeetgill
Thank you for your efforts! Can you just cc them both on an email and get them
to agree together?

~~~
joshumax
Hopefully! I plan on doing that for my next RFC!

------
int_19h
This is one area where I feel that BSD approach (where the same team maintains
both the kernel and the libc, and they're shipped in sync as part of the same
release) makes a lot more sense.

In fact, come to think of it, Linux is the only OS where syscalls are the
official public userspace API, is it not? On all other platforms, they're an
implementation detail behind the system libraries.

~~~
saagarjha
macOS lists them all in /usr/include/sys/syscall.h, so I guess you can
consider that public API?

~~~
akvadrako
That file doesn't even exist on my system, but in any case public means
official, not in some header.

~~~
saagarjha
If you have the command line tools installed, it should also be under
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/syscall.h.
macOS, by default, does not ship with these headers at all, and as of macOS
Mojave they no longer provide them under /usr/include unless you install a
certain package.

With that being said, I disagree with your characterization of “some
header”–anything that’s in Apple’s headers is public API, whether it has a
fancy page on developer.apple.com or not. Apple has a very clear definition of
what they consider to be “private”, and anything in /usr/include isn’t it.

~~~
akvadrako
Actually the path is:

    
    
      /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/sys/syscall.h 

and you can clearly see that the file is wrapped with:

    
    
      #ifdef __APPLE_API_PRIVATE
      ...
      #endif /* __APPLE_API_PRIVATE */

~~~
saagarjha
Again, if you have the Command Line Tools installed, it'll be under
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/. Xcode also has a copy,
but I think fewer people have that installed.

------
jws
I’m sure it would give people discomfort, but I wonder how it would work if
the kernel presented a pseudo file system with an “include” and a “src”
directory to provide C interfaces to the unassimilated syscalls. Just enough
syntax to keep people from defining their own types and have to use the
syscall() interface.

Maybe make it a module so space constrained systems can leave it out.

The kernel patch process could keep everything nicely in sync and native build
processes would easily find the right source code. Cross compiling would
require you to find and copies though.

~~~
kayamon
It actually already has this (kinda) -- the VDSO shared library. Although they
only tend to implement a couple of syscalls in there.

[http://man7.org/linux/man-pages/man7/vdso.7.html](http://man7.org/linux/man-
pages/man7/vdso.7.html)

~~~
klodolph
VDSO does not have syscalls in it, that's actually the entire reason it
exists… so you can communicate between the kernel and userspace without
syscalls.

------
userbinator
I'm curious what led to Linux (and it seems the other Unices) adopting the
"double indirection" strategy of having a separate wrapper/stub function for
system calls vs. the approach common in the MS-DOS world where the compiler
would directly embed e.g. INT 21H instructions and generate the code to put
parameters into registers itself. It's a small inefficiency, but still seems a
bit wasteful nonetheless.

~~~
tacostakohashi
There's nothing stopping the wrapper function from being inlined (as it
commonly is for some compiler + libc combinations), with all the usual
tradeoffs for code size and ease of debugging.

~~~
megous
It doesn't happen, unless you compile C library with LTO. I'm yet to succeed
in doing that for my musl C mipsel target.

------
xenadu02
The comments on the article have some funny ideas about versioning.

Apple platforms manage to support the idea of a “deployment target” and binary
compatibility; the new symbols are weak-linked. Broken old behavior is
preserved with linked on-or-after checks.

Not sure what makes it so difficult for glibc.

~~~
matthewbauer
You need quite a bit of coordination between your toolchain, your kernel, and
the standard c library for that to work. Linux/GCC/Glibc has never had that.

------
jancsika
Two noob questions:

What are the technical reasons that glibc cannot adhere to the Linux dogma,
"Don't break userspace?"

Since glibc does not adhere to that dogma, why the decades-long reluctance to
add certain syscall wrappers? If they screw up and make a bad interface just
modify it and bump the version number.

I just waded through the lwn cross-purpose-writing-festival comments and did
not see them answered.

~~~
cesarb
> What are the technical reasons that glibc cannot adhere to the Linux dogma,
> "Don't break userspace?"

I don't know if they have something like that officially, but in practice,
they do follow it. Programs linked to an older version of glibc continue
working with a newer version of glibc, in a large part thanks to symbol
versioning, which allows them to keep the old versions of an interface
available to old binaries, while new binaries get the new functionality.

> If they screw up and make a bad interface just modify it and bump the
> version number.

Bumping the glibc version number would mean recompiling everything (a program
can't have two versions of glibc at the same time, so all libraries a program
links to would have to be recompiled); we had that in the libc5 to libc6
transition last century. And since they won't bump the version number, it
means they will have to keep the bad interface forever, even if it's just
visible to binaries compiled against an older glibc.

For a recent example in which they actually went ahead and removed a bad
interface:
[https://lwn.net/Articles/673724/](https://lwn.net/Articles/673724/) and
[https://sourceware.org/bugzilla/show_bug.cgi?id=19473](https://sourceware.org/bugzilla/show_bug.cgi?id=19473)
\-- and according to the later, they did it in a way which still kept existing
binaries working.

~~~
jancsika
> I don't know if they have something like that officially, but in practice,
> they do follow it.

If that's true then I don't understand ldarby's comment on the article:

> The common problem that I suspect Cyberax is actually moaning about is if
> software uses other calls like memcpy() which on centos 7 gets a version of
> GLIBC_2.14:

> readelf -a foo | grep memcpy 000000601020 000300000007 R_X86_64_JUMP_SLO
> 0000000000000000 memcpy@GLIBC_2.14 + 0 3: 0000000000000000 0 FUNC GLOBAL
> DEFAULT UND memcpy@GLIBC_2.14 (3) 55: 0000000000000000 0 FUNC GLOBAL DEFAULT
> UND memcpy@@GLIBC_2.14

> and this doesn't work on centos 6:

> ldd ./foo > ./foo: /lib64/libc.so.6: version `GLIBC_2.14' not found
> (required by ./foo)

Just to be clear, my original question is why glibc technically cannot follow
the _same exact_ development model of the Linux kernel for retaining backward
compatibility.

~~~
cesarb
It's backwards compatible: software compiled on centos 6 will be using the
older memcpy@GLIBC_2.2.5 symbol, which still exists on newer glibc together
with the current memcpy@GLIBC_2.14 symbol, so it will work. What doesn't work
is compiling with a newer glibc and expecting it to work on an older system.

The example above is actually a great example of bending over backwards to
keep compatibility with broken userspace. Some programs incorrectly called
memcpy with overlapping inputs, and an optimized version of memcpy started
breaking these programs. Instead of just letting them break, the older symbol
was kept with a slower implementation which accepts overlapping inputs, while
new programs get the faster implementation at the memcpy@GLIBC_2.14 symbol.

------
molticrystal
Arbitrage issues wherever glibc and the syscall differ seem to be where the
bugs and security issues lay or at least a good place to look.

Whenever a person has to roll out their own handler it will almost always
undergo less testing and auditing. The article points gettid which for at
least 10 years required your own method to use, and the comment section for
the article points out that the getpid call had caching that was bugged for a
long time.

Having no glibc implementation of a syscall affects its usage and the total
number of people knowledgeable about that function, so it would be a perfect
place to look for bugs and security. In the nearly reverse case a poor
implementation of a glibc handler might be doing something that would allow an
attacker to take advantage. The same applies where the glibc and syscall
functionality differ, an aspect only part of the syscall might be undertested.

------
pmoriarty
How did Plan 9 and Inferno handle this?

~~~
sebcat
Similar to the BSDs: by providing wrappers for syscalls in their own libc.

Plan 9 has very few syscalls compared to Linux and the BSDs.

------
Annatar
This is exactly why BSD and illumos based operating systems ship libc, the
kernel and userland (/usr) as one coherent whole. Perhaps now, reading the LWN
article, people who are comfortable with GNU/Linux will start to realize it's
high time to outgrow it and move on to one of the BSD or illumos based
systems. The longer you wait, the harder the transition will be and besides,
it's good to go out of one's comfort zone.

~~~
majewsky
Sure, I'll just write up a proposal for my employer to move multiple tens of
thousands of Linux servers, VMs and containers over to BSD. I have a good
feeling about this. /s

~~~
Annatar
Here is something to consider: how did your employer get to tens of thousands
of Linux servers from whatever they were running on before?

And: do you really want to spend the rest of your professional career
wrangling with a shoddy product, or do you want to actally do professional,
cutting edge IT?

I can't write for you, but I did not graduate computer science at the top of
my class so that I could spend the next several decades working with / on the
shittiest, amateur knock-off copy of UNIX when I could run the real thing for
free & cheap. That's not why I studied at a university and got a degree for.
How about you, what's it gonna be, shitty Linux for the next 20-30 years or
the real computer science with SmartOS or FreeBSD?

------
bogomipz
The post states:

>"In such cases, user-space developers must fall back on syscall() to access
that functionality, an approach that is both non-portable and error-prone."

I understand about portability but can someone elaborate on why using
syscall() is inherently error-prone?

------
en4bz
Lack of `gettid` and `futex` have always annoyed me.

~~~
monocasa
There's this sense that they're not for the consumption of mere mortals.

That being said, you can still at least always call syscall(2).

~~~
glandium
One problem is that not all system calls take the same kinds of arguments on
all platforms. Example: SYS_mmap takes a pointer to a struct containing all
the arguments on s390. Even better, on Alpha, glibc's syscall cannot call
system calls with 6 or more arguments (although maybe that was fixed in the
past 8 years?).

~~~
murderfs
> Even better, on Alpha, glibc's syscall cannot call system calls with 6 or
> more arguments

That's just generic kernel ABI: syscalls can have at most 6 arguments:
[https://elixir.bootlin.com/linux/latest/source/include/asm-g...](https://elixir.bootlin.com/linux/latest/source/include/asm-
generic/syscall.h#L104)

~~~
glandium
Skip the "or more" part, then, but that doesn't make it less true: it wasn't
possible to make a 6-argument system call with syscall on alpha 8 years ago. I
don't know whether that's been fixed or not.

~~~
pm215
The syscall(5) manpage documents the Alpha syscall abi as passing arguments in
a0,a1,a2,a3,a4,a5 which would suggest so. (MIPS o32 is the only listed one
which is a bit oddball: you can only pass 4 args in registers and then use
stack for 5 and 6.)

