
Eleven syscalls that rock the world - beefhash
https://www.cloudatomiclab.com/prosyscall/
======
meredydd
Woo, this list has "my" syscall in!

I originally wrote execveat() because I was experimenting with implementing
FreeBSD's Capsicum[1] API on Linux, and I needed a "true" fexec().
(Previously, glibc fexecve(fd,...) was implemented as
execve("/proc/self/fd/N",...), which isn't so helpful in a sandbox where you
aren't allowed to navigate the full filesystem.) Turns out, the Chrome team
really wanted this too, and David Drysdale took on the work of polishing it up
and getting it into the tree, so big props to him.

[1] Capsicum is a sandbox based on per-file-descriptor permissions. Eg you
could have a web server that can only append to files in its log directory, or
read files from its Web root, but nothing else. Man pages:
[https://www.freebsd.org/cgi/man.cgi?capsicum(4)](https://www.freebsd.org/cgi/man.cgi?capsicum\(4\))

It's mentioned in the article, but I personally think Capsicum's syscall
interface should have made this list too. File descriptors as capabilities is
just so elegant!

------
JdeBP
wstat is of course how the MS-DOS, OS/2, and Win32 world has been doing this
stuff all along. DosQueryFileInfo() is symmetrical with DosSetFileInfo(). The
Windows NT Native API also has symmetric system calls.

Similarly, clonefile is akin (but not identical) to the old DosCopy() system
call on OS/2, which likewise made file copying available as a system API call.
DosCopy() had the semantics of sending the copy operation over to the file
server for actual enactment on (LAN Manager/LAN Server/Netware) remote
volumes.

And of course Win32 programmers will recognize the result of pdfork that one
can give to kqueue, as a process handle that one can give to
WaitForMultipleObjects(), although they will be unhappy about the default
semantics of closing the process handle. The process descriptor also lacks the
ability to poll/select/kevent for stop/continue events in the child process,
unlike waitpid/wait6.

The point about implementations of fexecve() and execveat() is a good one. For
example:

If one passes a close-on-exec file descriptor to fexecve() on Linux or
FreeBSD, and the executable is actually a script with a #! line, the script
interpreter is passed a /dev/fd/N argument for the script that has of course
been closed. But if the executable is a binary, the call works fine. Of
course, if one reads the first few bytes of the file to find out whether there
is a #! line and thus whether the executable is a script or a binary, one
needs to open it as O_RDONLY rather than as O_EXEC, thus preventing the
execution of files where one does not have read access but does have execute
access. One thus ends up opening the executable twice, first with O_RDONLY to
check whether it is a script or a binary, and then with O_EXEC to allow a non-
readable binary to be executed with fexecve().

kqueue() is good but not ideal. For one example: It has problems if one wants
to combine NOTE_TRACK, NOTE_FORK, and NOTE_EXIT, because it returns one event
for both a forked child and an exited child, which both try to store different
things in the "data" field of the event.

------
monocasa
[https://ldpreload.com/blog/signalfd-is-
useless](https://ldpreload.com/blog/signalfd-is-useless)

SignalFD leaves a little to be desired.

~~~
JdeBP
Certainly it does when the Windows NT Subsystem for Linux does not actually
implement it.

* [https://github.com/Microsoft/WSL/issues/129](https://github.com/Microsoft/WSL/issues/129)

------
nine_k
11 syscalls with a description why are they cool. From many different Unices,
and one from Plan 9.

