Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fun fact, some weird syscalls don't even appear under /sys/kernel/debug/tracing because they lack ftrace metadata. It was pretty fun (read: a nightmare) to deal with some of those in my tool. You can grep -R -F "NOT found in ftrace metadata" in the logs in my db (https://github.com/mebeim/linux-syscalls/tree/master/db) to see which ones.

The most interesting one, which doesn't even appear in my logs because I had to hardcode it since its esoteric definition, is fast_endian_switch for PPC64 (https://elixir.bootlin.com/linux/v6.10/source/arch/powerpc/k...).




Good to know, I suspected that this might be the case, but never got to confirm this. I guess one could set up a test comparing the syscalls listed in syscall_64.tbl (or syscall table read from kernel memory) with the syscalls listed under /sys/kernel/debug/tracing/events/syscalls


Nope, not even that, because believe it or not, sometimes not even the .tbl files have all of them :'). In fact, the only arch where IMHO syscalls make sense and are organized in a sane way is arm64 that doesn't even have a .tbl file. And not even the table in kernel memory is enough sometimes! Some are special handlers in syscall entry code (like the one I mentioned above). It's just a mess, hence why I sort of gave up at some point and for some "esoteric" syscall I just hardcode them.


> It's just a mess, hence why I sort of gave up at some point and for some "esoteric" syscall I just hardcode them.

Presuming you don't want to keep doing this forever, but would rather do insane amounts of up-front work if it would enable you to never have to touch this again:

1. Have you considered writing some code that takes a configured + built kernel source tree; finds the intermediate build artifacts pertaining to the code unit that contains the syscall handler; and parses those? And then taking the resulting IR data-structure / AST / whatever, and doing some symbolic interpretation of it — to enable you to essentially do an xpath-like expression match on "does something specific with a concrete syscall number that isn't already in the known set for the arch"? AFAICT you could generate your own syscall table from that, and it would be exhaustive.

2. Have you considered dropping a little bit of driver-program code into the kernel source tree, that just "does syscall handling according to the passed-in paralemeters" — i.e. where the artifact built from compiling this file, would be an EFI-app pseudo-unikernel that naively pretends all kernel services were already initialized (they weren't); would do one syscall operation, calling directly into the syscall handler; and then would immediately halt afterward — and then feeding the resulting "executable" to https://github.com/google/AFL ?


Yeah the few "esoteric" syscalls that I hardcode in my tool are historical ones. That's pretty much the only reason why I bothered hardcoding them. I don't assume any new syscall will ever be implemented like that nowadays. Such insane implementations would be rejected unless there is a very specific compelling reason.

> finds the intermediate build artifacts pertaining to the code unit that contains the syscall handler

Hmm I think this is unneeded, vmlinux already has all the code. Also things move around too much across kernel versions and archs so can't easily pinpoint which object files to choose. Additionally, you would need an entire built kernel source tree, which is a lot more than simply a built vmlinux plus an optional non-built kernel source dir (that is what I use right now). Just as an example: currently I have some 600 kernel images with debug info that I keep for reference, which requires around 76 Gigabytes of space on my disk. Having 600 built kernel trees would require a lot more space, in the order of Terabytes.

> taking the resulting IR data-structure / AST / whatever, and doing some symbolic interpretation of it

I have been thinking about this a lot. I do a simplified version of this for x86 >= v6.9 because the syscall table was removed and turned into a giant switch case, which I symbolically emulate to extract syscall numbers, but that's pretty simple and definitely not an exhaustive analysis (some other stuff could be happening before reaching the handler). The problem is that this kind of solution is very hard to implement and I think would be way too slow on a general case. There also aren't even decent symbolic execution engines to do this for some archs. You are right when you say "insane amounts of up-front work" - that is definitely too much for me for a hobby project like this :').

The first main problem however is that all of this starts from the assumption that you already have built a kernel with all the syscalls available. This is not the case unless you meticulously configure it accordingly, which is not so simple and requires constant manual (sigh) updates to the build configuration each kernel release. There isn't a way to e.g. pretend that "all kernel services were already initialized" as you say in point #2. If a kernel is built w/o a certain syscall, the code will simply not be there. Kernel configuration remains a problem also for your point #1. The only real solution I see would be submitting kernel patch to add a target in the root Makefile that enables all syscalls with their related configs, and hope kernel devs like it (doubt it).


> Hmm I think this is unneeded, vmlinux already has all the code.

Yeah, I was just thinking about it as a way to reduce the scope of the "preload" step of symbolic interpretation, for the case where you want to work with semi-structured IR (GIMPLE) rather than machine code.

My assumption was that by the time you're down to machine code, you'll still be able to recover the key column of the table — the syscall numbers themselves — but the rest of the data you want to show in the table won't exist any more, having existed only as things like identifier names. So you'd want to back up at least one or two steps.

> This is not the case unless you meticulously configure it accordingly, which is not so simple and requires constant manual (sigh) updates to the build configuration each kernel release.

I was less assuming the possibility of one kernel that has all syscalls, and more assuming that you could build O(N) "probe kernels", one per uarch.

I think the concept of there being "optional syscalls" that only appear if you configure in added capabilities beyond the uarch, didn't even occur to me.

How does that even work, libc-wise? I had assumed that the userland-kernel-ABI expectation was such that the set of syscalls possible to call for a given uarch is static, but with some just be stubbed to always return an error if the given capability isn't in the kernel. But I guess, if the "return an error like a stub" logic is the same as the "this syscall isn't implemented logic", then there needn't be any concrete code in the kernel that calls out those syscall numbers as existing...

If so, maybe consider that a bug? Submit a patch to have an arch's stubbed optional syscalls return a different error than for syscalls that don't exist for that arch, thus forcing such syscalls to be somehow documented in the kernel even when stubbed?

> There isn't a way to e.g. pretend that "all kernel services were already initialized" as you say in point #2.

To be clear, I wasn't talking about compile-time code inclusion; I was talking about runtime, when using the strategy I outlined to compile a subset of the Linux kernel as a "library kernel" / exokernel. The kernel does a lot of stuff on boot — brings up hardware, starts daemons, etc — and you'd want to skip including any of that, if you wanted to throw the code into a fuzzer, because that'd all distract the fuzzer from your goal of fuzzing the syscall handler. So you'd want the executable you built to just call the syscall handler as if it was running in the context of a bootstrapped-and-running kernel — statically declaring all the same static globals, but just never calling the code to initialize any of it. So you'd likely get a program that always crashes with a null dereference — but that doesn't matter, since your goal is to discover through fuzzing the conjunction of value constraints that overdetermines the control-flow to reach one null dereference vs another.


> for the case where you want to work with semi-structured IR (GIMPLE) rather than machine code

Most of the code for syscall handlers is carefully hand-crafted assembly, so probably not GIMPLE. Maybe something like Valgrind's VEX IR. I see what you mean though.

> How does that even work, libc-wise?

It works as you say, the "return an error like a stub" logic is the same as the "this syscall isn't implemented logic". AFAIK libc will provide the wrappers regardless (if there are wrappers, not all syscalls have them) and the kernel will just return -ENOSYS, like it would do for any invalid syscall number.

> If so, maybe consider that a bug? Submit a patch to have an arch's stubbed optional syscalls return a different error than for syscalls that don't exist for that arch

I am 99.9% sure that'd be impossible. The "stubbed optional syscalls" return -ENOSYS (as if they did not exist) by design. Although annoying, it's not really a bug, it's the way it's intended to work. I doubt such a patch would such an API-breaking change would be accepted, as a lot of existing code relies on this behavior. I don't think there even is an appropriate errno number to return in such case. It's unfortunate, but it is what it is.

> To be clear, I wasn't talking about compile-time code inclusion; I was talking about runtime

Yeah, it was clear that you meant runtime but less clear what you exactly meant with "all kernel services were already initialized". Now I see what you mean. Yes, what you describe definitely seems doable from a theoretical point of view for some architectures, but I struggle to think about such a solution given its complexity. It would still require manual recognition of interesting source code files and syscall handler code, plus a significant amount of scripting/patching/compiling to get it to work. Not to talk about emulation since this would need to be done for different archs. That's why even though it'd be nice in theory, it practically seems like a borderline unapproachable problem to me, from multiple sides.

I appreciate all the input anyway, this is definitely an interesting topic.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: