Running an eBPF program may require lifting the kernel lockdown

zokier · on Sept 27, 2019

One of the issues that came to mind is how limited and overloaded the Linux error codes are, and EPERM might be one of the worst offenders. While obviously not expecting Linux to change that due compatibility concerns, it'd be wonderful if we'd have unique identifier for any code path (or nearly so) that causes error to be returned. Of course it'd be useful to have some grouping, so that applications that do not care do not need to know every possible error code, but for investigations it'd be useful to know exactly why something failed. For EPERM especially (but also others) there might some security concerns about leaking more information about why the request was denied, so in some cases some discretion is needed. But I'd be surprised if that is frequent true concern.

I do wonder how much stuff would break if you patched Linux to return error numbers with high bits set to autogenerated values. Obviously libc (and go if used) would need patching, but how many places call syscalls directly and are checking error codes carefully. Probably surprisingly long tail, but still could be fun experiment.

quotemstr · on Sept 28, 2019

If error codes were instead some kind of exception packaged with ancillary data, you could easily stick a kernel stack in the event payload and get the localization you're discussing that way. The ability to add context is what makes error codes lose so badly to exceptions.

cyphar · on Sept 29, 2019

Generating a kernel stacktrace for every syscall error return seems like it would be needlessly wasteful on the return path (not to mention that it would change between kernel releases, and wouldn't be useful to most developers).

A richer error system could be as simple as giving some more information about why a syscall returned -EINVAL (because checking all possible flag bits to see which one is not supported is really not a fun exercise).

cyphar · on Sept 29, 2019

> One of the issues that came to mind is how limited and overloaded the Linux error codes are, and EPERM might be one of the worst offenders.

Funnily enough, EPERM is awful for another reason -- the vast majority of EPERM returns for kernel APIs should really be EACCES. The problem is that EPERM reads as though it means "Permission denied" when in reality it means "Operation not permitted".

scintill76 · on Sept 28, 2019

Cool idea. I would put the extended error information in a separate variable, like errno.

zokier · on Sept 28, 2019

I happen to hate errno with passion, so that option is out :)

Besides especially on 64 bit systems I don't think we are going to run out of negative numbers any time soon. Of course there are the few outliers that need the whole space (mmap), but those should be fairly rare?

theamk · on Sept 28, 2019

I was super surprised about ability to lift kernel lockdown programmatically, using sysrq_trigger file. I think it completely defeats entire point of the lockdown - it is like a safe with spare key duct-taped to the side, annoying but useless against any advisory.

The original kernel patch had a facility to disable programming lockdown lifting, but this apparently did not make it into the kernel he is using. Hopefully this was intentional, to make this less annoying for users during the testing period.

londons_explore · on Oct 1, 2019

The sysrq subsystem supports permissions - the distro maker could have prevented it if they had wanted to.

btown · on Sept 28, 2019

This is not the first time that security patches have caused eBPF to behave oddly: https://blog.cloudflare.com/ebpf-cant-count/ is an amazing anecdote about how side-channel mitigations + a bug in the BPF verifier caused arithmetic bugs to appear.

I hope that by the time things begin to hit mainline, Cloudflare's engineers will chime in with ideas that allow (e)BPF to continue to run, as they seem to use it widely internally.