Hacker News new | past | comments | ask | show | jobs | submit login

Thanks, that's exactly what I meant. Why was it removed? I did notice pledge(2) first appeared in 5.9 [1]

How do other Unices such as Linux deal with this issue (IIRC systrace was ported to other Unices)? Is Pledge ported to other BSDs?

[1] http://man.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man2/...




1) Like with Linux seccomp, systrace couldn't safely work with path strings because of TOCTTOU races. That made it much less appealing. Seccomp never permitted filtering path strings precisely because of the published systrace exploits.

2) To address #1, there's no avoiding intrusive changes deep in the kernel (e.g. into the dentry system, as opposed to a thin layer over syscalls). You need to copy path strings into kernel buffers to avoid the race. Once you accept that it's not possible to have a clean interface boundary in the kernel if you want to keep userland simple, offloading certain core concerns to the kernel makes more sense.

3) systrace, like SELinux, required an external configuration policies for each program. But they were never kept up-to-date, and there were few workable policies. For the most part the only people who understand a program--especially a complex program--well enough to know what privileges are needed are the developers themselves. An external privilege declaration policy is theoretically powerful but in practice it merely creates burdensome indirection and friction for the people best positioned to create and maintain the policy: developers. Moreover, if the policy is optional both users and developers will opt out, particularly during the critical development phase. If a feature doesn't fit neatly into the edit+compile+debug cycle than it's arguably broken by design, particularly for security-oriented features.

4) When doing privilege separation more often than not you _want_ privileges for a certain period of time before dropping them. The classic example is the privilege to open a port < 1024. Policy files that, e.g., let you open specific ports are actually too permissive because you don't want a process to be able to open a port for its entire lifetime. Solutions like inetd/systemd where you simply pass in privileged resources only work for the simplest cases, such as the aforementioned privileged port problem; they don't solve the problem of, e.g., privilege-separating access to the credential database that a program like sshd needs to handle (this is worse on PAM-based systems like Linux as BSDAuth uses groups and external helpers to privilege separate by design). Better but more intrusive interfaces like Capsicum that revolve around runtime privilege sharing have too high of an entry cost to spur widespread adoption. Some of the more complex OpenBSD daemons are obvious fits for Capsicum, but you can't ignore path dependency--they'd have never been ripe for easy Capsicum integration if these simpler mechanisms weren't available.

pledge() was designed to address all of these problems. The "dns" privilege, for example, implicitly permits read access to certain filesystem paths. The intrusive kernel changes are built into the kernel as needed, maximizing overall simplicity by splitting the burden between kernel logic and userland logic.

pledge() handled very simple path handling well (e.g. permit /dev/null, /etc/resolv.conf, etc) but a system for specifying generic filesystem namespace privileges was elusive. Elusive because OpenBSD wants, again, to maximize simplicity and utility while minimizing overall complexity. Apparently unveil() represents the semantic and implementation compromises they've been searching for through trial+error.

Compared to SELinux or seccomp, pledge has been a wild success. Both SELinux and seccomp are much more complex, while being simultaneously more burdensome on developers. In some respects they're more powerful, but often more powerful in the least useful ways. In other respects pledge is more powerful featurewise, and exactly in the ways you'd want. Very few projects or systems utilize SELinux or seccomp. Particularly seccomp as Red Hat put a ton of effort into SELinux for what better yet still meager usage its seen. Much more common is to see frameworks like systemd utilize the privilege systems on behalf of programs. But having systemd manage these things limits the extent to which daemons can make use of fine-grained privilege separation. And doing so makes all the complexity of SELinux and seccomp almost for naught.

By contrast, the OpenBSD developers have "pledge'd" vast swaths of the OpenBSD software ecosystem. Not just daemons but most command-line utilities. You could argue that it was easier for them, but that doesn't explain why Red Hat with considerably more people and money, and decades longer lead time, had such ridiculously meager results with SELinux and seccomp.

Another thing to keep in mind is that pledge() and unveil() aren't the only privilege separation changes OpenBSD has made. Another one is that, along the lines of getentropy(), they made syslog() a syscall so that you don't even need to worry about filesystem privileges for the common cases. Everything is a file is a powerful construct, but for various reasons it's _intrinsically_ handicapped in Unix because of core design elements. Sometimes it's better to simply rethink the best place to put a feature in order to maximize cost+benefit. Leveraging filesystem namespace semantics makes much more sense in Unix than in Windows, but less sense in Unix than in Plan 9. You have to appreciate the nuances and limitations.

pledge() in particular is a very pragmatic, very experience-driven compromise. I honestly don't know if it'd work well for Linux. Linux is a vastly different ecosystem, both in the kernel and in userland. But people would do well to appreciate and understand why pledge() works as well as it does.


From what I understand it was not maintained anymore, and it was also hard to keep up to date lists of allowed or forbidden system calls for the binaries that had to be run under systrace. [1] [2]

Linux provide seccomp-bpf for system call restrictions. [3]

[1] https://marc.info/?l=openbsd-misc&m=146170224108205&w=2

[2] http://www.openbsd.org/papers/hackfest2015-pledge/mgp00009.h...

[3] https://lwn.net/Articles/656307/


> How do other Unices such as Linux deal with this issue (IIRC systrace was ported to other Unices)? Is Pledge ported to other BSDs?

I don't believe pledge is ported to other BSD operating systems, for example FreeBSD uses their own framework, capsicum[1] instead of pledge.

[1]: https://www.freebsd.org/cgi/man.cgi?capsicum(4)


Systrace was removed because it's unsafe for multi-threaded programs because of TOCTOU.


The race only existed for path strings. systrace was no worse than seccomp in this respect (which doesn't even permit filtering on paths precisely because of the systrace exploit), yet still much easier to use.

systrace was removed because it went largely unused. Theoretically powerful, in practice it made the wrong compromises. seccomp recapitulated the same compromises, and it's not surprising seccomp uptake has been similarly weak.


TOCTOU = time-of-check time-of-use




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: