Hacker News new | past | comments | ask | show | jobs | submit login

Just keep in mind that adding syscalls increases the attack surface of the kernel, and complexity needs justification in general.

Linux's syscall list is out of control. The many BSDs have managed to keep it more reasonable.

There is nothing specific about system calls that increases the attack surface. Individual sysctls, files in procfs, sysfs, ioctls, etc. are all separate attack vectors.

The last time I checked, FreeBSD's list of system calls was larger than Linux's. This is because Linux tends to expose more functionality through sysfs, procfs, ioctl, while FreeBSD uses dedicated system calls.

https://filippo.io/linux-syscall-table/ <- max = 313

https://github.com/freebsd/freebsd/blob/master/sys/kern/sysc... <- max = 576, 356 marked 'STD'

That Linux table is quite a bit out-of-date -- it's at least older than Linux 3.14 (2014) based on sched_setattr's absence from the list. [1] is a much more modern table (and is updated more frequently), but it's probably simpler to just look at the actual syscall table[2]. On x86_64, there are currently 401 syscalls (syscall numbers 387 through 423 are reserved and 436 is used by a syscall not-yet-merged).

[1]: https://fedora.juszkiewicz.com.pl/syscalls.html [2]: https://elixir.bootlin.com/linux/latest/source/arch/x86/entr...

Thanks for the correction! It confirms that Linux’s system list is by no means ‘out of control’.

If 400+ syscalls doesn't scream out of control to you, I doubt I can convince you.

Again, there is nothing special about a "system call", it's one of MANY entry points into the kernel. Counting them in isolation means nothing. And, again, Linux has historically been very careful to resist arbitrary subsystem-specific bloat in syscall variety. Almost all of its new kernel-exposed functionality uses other mechanisms (e.g. sysfs, new filesystems like cgroup, etc...) which are more auditable and amenable to userspace-managed authorization via stuff like filesystem permissions, chroot and containers.

And of course, as with everything else, virtually all this new functionality is modular. Don't want the system call (or whatever)? Don't put it in your kernel.

Basically: you're wrong here. Cite the specific functionality you think is being shipped in an insecure way.

There's an argument that files can generally just, not be visible for specific processes using the same tooling that protects other files, but none of the kernels make this particularly easy as far as I can tell.

On the other hand, the 'everything is a file' paradigm does sometimes cause issues where you can make kernels crash by doing unexpected things with them.

- macOS could once easily be panicked by calling something like fpathconf() on a message queue.

- If you want to have fun, try calling revoke(2) on character devices that are not TTYs. I remember fixing a bug in FreeBSD once, where you could make the system panic by calling that function on /dev/bpf.

> If you want to have fun, try calling revoke(2) on character devices that are not TTYs. I remember fixing a bug in FreeBSD once, where you could make the system panic by calling that function on /dev/bpf.

IIRC there's a lot of problems with revoke(2) on anything that's not a tty device, so on OpenBSD revoke(2) returns ENOTTY in those cases.



This was discovered earlier on during pledge(2) development.

On the other hand, having a single abstraction / entry point makes it easier to implement generic sanity checks. If you add a check for that kind of problem at the right layer, it will cover other / future interfaces. On the other hand, if you use ad-hoc system calls, any mitigation or fix will typically only cover that one specific call.

Unfortunately, generic sanity checks are often not enough. You immediately run into problems where very file-specific concepts (owner, RWX permissions) aren't sufficient to handle certain types of represented-as-files objects (such as procfs files, where privileges with regard to a process aren't accurately described through Unix DAC permissions).

And then you get into some of the really hairy issues -- any user can trick a privileged program into writing or reading from any file by simply spawning a setuid program with stdio set to the file they wish to operate on. Thus, any interface which is administrative is simply unsafe to expose through the standard open/read/write interfaces -- which means that you have to come up with some alternative interface anyway.

Wasn't this plan 9's whole deal? I mean, sure, no major kernel, but it has been done.

Have you also counted the syscalls masked as ioctls or eBPF scripts?

Hey, at least Linux has now added eBPF and io_uring, both of which are ways of feeding the kernel an increasingly wide range of instructions without needing a syscall for each one!

Yup, these examples are just toys but they have some issues. One is mentioned in the post itself; see if you can find more :)

Yes, but Linux also have a number of security mechanisms to limit access to syscalls (e.g. seccomp).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact