There's no reason for an operating system to have a stable system call ABI. A st...

jeremyjh · on Feb 2, 2021

For Linux it makes sense because there is no project running the entire operating system. The kernel's system call interface is stable because there is no other "Linux" system interface. For an OS that maintains a libc they can make a different choice.

quotemstr · on Feb 2, 2021

So we should endure technical mediocrity forever because we couldn't get our act together socially?

There is a way out. I've previously proposed on LKML that the Linux kernel team provide an official userspace system call library that sits below libc and that all libc implementations would share. We'd forbid new system calls being called except through this library. Optionally, we'd enforce this constraint on all system calls.

This is how Fuchsia works, by the way: all Fuchsia system calls must go through one giant vDSO.

jeremyjh · on Feb 2, 2021

I agree this is a better architecture but this creates more work for the kernel team but doesn't free them from ABI stability constraints anywhere in the near-term. It would take 10-15 years to pay off I think.

wahern · on Feb 2, 2021

> Go's libc avoidance is just hubris.

Maybe my impression was wrong, but it was my understanding that Go originally preferred direct syscalls because of stack management headaches. You can't know how much stack space a libc syscall wrapper requires, which even for seemingly simple syscalls can be quite complex--e.g. glibc has to emulate POSIX thread semantics. OTOH, treating such libc wrappers like regular C FFI functions would obliterate the design and implementation assumptions around goroutine stack management. Considering that Linux was originally the first (and, let's be honest, only real target), it made perfect sense to rely on Linux syscall ABI promises.

Fast-forward a few years: 1) Go has a more mature binary format and dynamic linking capabilities, shrinking the gap between Go's internal ABI and the native libc ABI. 2) Goroutine stacks switched from split-stacks to movable stacks, and the minimum stack size became larger. 3) Demand and motivation for supporting libc wrappers (i.e. for Windows) grows. Result: Go surmounts one of its original simplifying design compromises. Though, I would assume that libc wrapper support still incurs ongoing maintenance costs on each platform; namely, managing the minimum stack requirement for each particular call, which could change overtime, while it's important not to be too pessimistic so that a syscall doesn't force an unnecessary stack resize.

jart · on Feb 2, 2021

Even if you use the system libc, if you pass the -static flag to gcc then you end up with a binary that depends on the syscall abi. If the kernel interface breaks, then all your programs need to be recompiled from scratch in order for them to work again. Are you opposed to static linking?

Blikkentrekker · on Feb 2, 2021

What, do you believe, are the advantages of static linking against a libc?

a1369209993 · on Feb 2, 2021

The same as static linking any other library: it stops the library semantics from being changed out from under you by a 'update'.

Blikkentrekker · on Feb 2, 2021

The difference is that other libraries are actual libraries that factor out common patterns, whereas the libc, despite it's name, is more so an interface, especially it's system call wrappers.

It sits so close to the kernel that the concerns of changing semantics apply as easily to the kernel as they do to the system call wrappers.

quotemstr · on Feb 2, 2021

Yes, I am opposed to fully static linking. What's the point of static linking? Windows has no static linking (all system calls go through ntdll) and it has a compatibility story better than any Unix. Static linking to libc is unnecessary for long term ABI support.

Dynamically link against libc and statically link the rest for all I care, but there's no reason not to talk to libc.

Also: the vDSO is also a form of dynamic linking. Are you opposed to the vDSO?

jart · on Feb 3, 2021

I distribute binaries. My binaries work on six different operating systems. In order to do that I had to roll my own C library. I'm happy I did that since it's so much better than being required to use six different ones.

I'm not opposed to vDSO but I disagree with how Linux maps it into memory by default. Linux should not be putting anything into the address space that the executable does not specify. MMUs are designed to give each process its own address space. Operating systems that violate that assumption are leaky abstractions imposing requirements where they shouldn't.

The main thing dynamic shared objects accomplish is giving upstream dependencies leverage over your software. They have a long history of being mandated by legal requirements such as LGPL and Microsoft EULAs. It's nice to have the freedom to not link the things.

quotemstr · on Feb 3, 2021

> My binaries work on six different operating systems. In order to do that I had to roll my own C library.

Other people have made software for decades without writing program-specific libc instances. Tell me you at least started with something decent like musl instead of literally writing your own libc from printf on up.

> Linux should not be putting anything into the address space that the executable does not specify

Execution has to start somewhere, and kernels have often reserved parts of the system address space for themselves.

> The main thing dynamic shared objects accomplish is giving upstream dependencies leverage over your software.

Loose binding in interfaces allows systems on both sides of the interface to evolve. If you want 100% complete control over your system for some reason instead of writing programs that play well with others, just ship your thing as a VM image and be done with it.

jart · on Feb 4, 2021

I used lots of code from Musl, OpenBSD, and FreeBSD. I used Marco Paland's printf. I used Doug Lea's malloc. I used LLVM compiler_rt. I used David Gay's floating point conversion utilities. The list goes on. Then I stitched it all together so it goes faster and runs on all operating systems rather than just Linux. See https://justine.lol/cosmopolitan/index.html and https://github.com/jart/cosmopolitan

Trapping (SYSCALL/INT) is a loose binding. The kernel can evolve all it wants internally. It can introduce new ABIs. Processes are also a loose binding. I think communicating with other tools via pipes and sockets is a fantastic model of cooperation. Same goes for vendoring with static linking. Does that mean I'm going to voluntarily load Linux distro DSOs into my address space? Never again. Programs that do that, won't have a future outside Docker containers.

Also, my executables are VM images. They can boot on metal too. Read https://justine.lol/ape.html and https://github.com/jart/cosmopolitan/issues/20#issuecomment-... Except unlike a Docker distro container, my exes are more on the order of 16kb in size. That's how fat an executable needs to be, in order to run on six different operating systems and boot from bios too.

quotemstr · on Feb 4, 2021

> Same goes for vendoring with static linking. Does that mean I'm going to voluntarily load Linux distro DSOs into my address space? Programs that do that, won't have a future outside Docker containers.

Strong claim. Wrong, but strong claim.

The completely-statically-linked model you're proposing might be acceptable on servers, but on mobile and embedded devices like Android, it's a showstopper: without zygote pre-initialization and DSO page-sharing, Android apps would each be at least 3MB heavier than they are today and take about 1000ms longer to start --- and a typical Android device has a lot of these processes running.

More broadly, yes, in most contexts, I see a general trend away from elaborate code-sharing schemes and towards "island universe" programs that vendor everything. But these universes need to interact with their host system using a stable ABI somehow, I believe that SYSCALL is fundamentally the wrong layer for this interaction, as it's not flexible enough. For example, the Linux gettimeofday() optimization couldn't have been done without the ability to give Linux programs userspace code to run pre-kernel via the vDSO. How do you propose the kernel do things like vDSO gettimeofday optimizations?

jart · on Feb 5, 2021

If you think I'm wrong then why don't you tell me what requirements you've faced as a software developer distributing binaries? 99% of developers have never needed to deal with the long tail of corner cases.

Doesn't everything on Android start off with the JVM as a dependency? In that case the freedom to not use DSOs is something that Google has already taken away from you. That's not a platform I'd choose to develop for unless I was being paid to do it.

On x86 RDTSC returns invariant timestamps so you technically don't need shared memory to get nanosecond precision timestamps. XNU does the same thing and they don't call it a DSO. Because that's just shared memory. I have nothing against shared memory.