
The Seccomp Notifier – New Frontiers in Unprivileged Container Development - zdw
https://people.kernel.org/brauner/the-seccomp-notifier-new-frontiers-in-unprivileged-container-development
======
abaines
The article doesn't mention SECCOMP_RET_TRAP which was an existing way to
inspect syscall pointer arguments during interception (when combined with a
SIGSYS signal handler).

I'm curious how the two approaches compare - does USER_NOTIF give a greater
range of possibilities, or is it mostly just a different interface?

In a small application that forks once and uses seccomp on the child process,
would there be much benefit in moving from RET_TRAP to USER_NOTIF?

~~~
sargun
PTrace comes with some pretty big restrictions. For one, no process can be
ptraced by two processes at the same time.

User notify aims to be a safer way to do this that doesn’t require the
overhead and complexity of ptrace.

~~~
brauner
In addition, the trap isn't usable safely with shared libraries that because
of signals. For example, glibc once it adopts rseq will block all signals
during thread-creation making it impossible to use RET_TRAP. That's an issue
that Firefox/Chromium has already run into and is one of the reasons why they
are interested in switching to the seccomp notifier. The trap also doesn't
allow to continue syscalls nicely and - as Sargun pointed out - will require
ptrace() to be used to inspect syscall arguments and so on. The notifier also
has built-in protection against pid recycling, is more secure and is way more
efficient.

This is somewhat unrelated to the seccomp notifier but since ptrace() came up
I want to lose a few words about it. (And this is more a criticism of the
interface not the implementation. I love Oleg who maintains it and is one of
the few people who understand all its intricacies!)

As a rule of thumb: you can do almost anything with ptrace(). Which is why
people not really putting an effort into kernel patch reviews often come up
with the argument "Why do you need a separate api for that. You can already do
that with ptrace().". To which the correct answer in my book almost always is:
"Because it would be a horrible hack." Effectively, when introducing a
dedicated api to do something that you can in some shape or form do with
ptrace() is the equivalent to moving it from a debugging hack to a (hopefully
well-designed) feature.

Hell, the history of CRIU is essentially the history of building apis out of
ptrace() hacks (I'm being facetious of course.).

Imho, with ptrace() you're always in non-cooperative mode to some extent, i.e.
you force the behavior on the task. The whole kernel code for ptrace() attach
is literally "I'm your parent now." whereas features such as the notifier are
almost always cooperative since the task itself is doing the work.
Specifically for the notifier the nice thing is that all the work is happening
in the task itself. This is especially relevant when you e.g. install file
descriptors into the task which is a future patchset that is about to be
merged.

