
VDSO, 32-bit time, and seccomp - Tomte
https://lwn.net/SubscriberLink/795128/6b8ce4fe54123148/
======
temac
This class of problem for seccomp has been known for ages. This makes it
quasi-unusable IMO (or reserved for full-contained binaries, maybe in some JIT
scenario but probably not a lot else), and even quasi in self-contradiction
with the kernel non-regression rules (because by design seccomp can not
achieve that in the general case -- at least when you consider the bigger
picture that is a classic userspace design and ecosystem, like in a mainstream
standard distro). To detail more: under this "standard" model the applications
do not often do syscalls themselves, but use intermediate libraries (ex:
glibc) that provide (quasi)-Posix and reserve some of the syscalls or even
provide syscall like functions that actually use others -- or just start to
use new syscalls in new versions to provide higher levels of abstraction. But
they do not provide another abstraction to obtain a seccomp-like services
suitable for this model. So seccomp is basically unusable in this context,
which is one of the most important.

Now the kernel itself made the mistake, proving that the whole idea was not
practical; it breaks too easily, and it breaks even when used in most
restricted ways or when the whole (non-kernel provided) userspace has been
designed for it (which was not a practical condition at all, to begin with).

IMO seccomp should be phased out entirely and eventually replaced by something
else. Trying to "fix" it will yield nowhere: it is broken by design, since
forever.

~~~
nwmcsween
The other issues: Parent cannot seccomp itself without being highly coupled to
the child (filters persist across exec), Different syscall numbers,
socketcall, etc, etc make it almost mandatory to use something to build the
filter.

But there isn't really anything that Linux can replace it with, pledge works
because Openbsd controls both libc and the kernel.

~~~
viraptor
> Parent cannot seccomp itself without being highly coupled to the child

You can workaround this sometimes though. Unless you actually care about fork
with current memory copy, (i.e. you care about spawning new processes only)
you can fork a "spawner" process early which is only a thin proxy for
pipe->exec commands. You apply seccomp after spawner is ready and you're all
good.

------
pdw
seccomp is such a sad story. In theory seccomp can do a lot more than
OpenBSD's pledge. In practice, the OpenBSD devs added pledge support with
comparatively little effort to 100s of programs, while seccomp is a constant
headache for the few programs which use it.

~~~
the8472
Afaik it's not even possible to implement a pledge-like wrapper around seccomp
because seccomp is inherited while pledge leaves the responsibility of
securing itself to each process.

~~~
X-Istence
> Allows a process to call execve(2). Coupled with the proc promise, this
> allows a process to fork and execute another program. If execpromises has
> been previously set the new program begins with those promises, unless
> setuid/setgid bits are set in which case execution is blocked with EACCES.
> Otherwise the new program starts running without pledge active, and
> hopefully makes a new pledge soon.

[https://man.openbsd.org/pledge](https://man.openbsd.org/pledge)

Pledge can be inherited by child processes too.

------
stefan_
You could make a quiz based on "what syscalls does this familiar libc API end
up calling". selinux and seccomp are dead ends.

~~~
debatem1
Yet somehow we run SELinux successfully on every Android device. I've never
understood the antipathy towards SELinux (which by the way, doesn't operate at
syscall granularity).

~~~
d2mw
One reason to dislike SELinux is the arbitrary assumptions baked in by default
that barely anyone knows how to or cares to change, which makes carefully
designed software look broken when the framework itself is broken. One example
I've faced is UNIX pipes, which for all purposes were obsoleted by UNIX domain
sockets in the mid 80s as part of the original intent of the BSD socket API,
but to SELinux they are profoundly different things. You can pass pipes across
a user->root boundary but not a socket.

To the end user, they only see your code broken by SELinux, and assume you
haven't done your job. On the other hand, SELinux is codifying rules about
UNIX that never existed and amount to emotional heuristics about the risk of
Internet domain sockets being inherited around the system by the wrong
process. The effect isn't to prevent Internet domain sockets being inherited
by privileged processes, but breaking all sockets. That's garbage design,
hidden behind marketing suggesting because the NSA contributed some code that
the problem couldn't possibly be SELinux

~~~
debatem1
But unix domain sockets are not pipes, not least in the important way that you
can reliably determine the identity of the other end. That can in some cases
be an infoleak, and therefore needs independent access control.

~~~
d2mw
If SCM_CRED passing were a real problem in _any_ scenario, SELinux should
target that instead, not an entire subsystem on which half the system is built

~~~
debatem1
The point is you need the ability to express both "these are the same thing"
and "these are different". You can do both with SELinux. What's your
alternative?

------
tyingq
MySQL also still has Y-2038 issues, even on 64 bit machines.

