Hacker News new | past | comments | ask | show | jobs | submit login
Pledge and Unveil in OpenBSD [pdf] (openbsd.org)
125 points by gshrikant on June 10, 2018 | hide | past | favorite | 34 comments


Given the Chrome example starting on page 6, here's my guess as to how pledge and unveil will contain Chrome to e.g. protect SSH keys. First, 3 of the 5 Chrome processes are already pledged to disallow filesystem reads. The two remaining ones (RenderProcess and UtilityProcess) can be unveiled to allow directories like

  * ~/.config/chromium
  * ~/.cache/chromium
  * ~/Downloads
  * /tmp
  * and anything important I don't know of
Additionally, if unveil works like pledge and can be further restricted after e.g. reading files into memory, unveils can then be undone. Anyone know if the following would work to first allow access to /tmp and then revoke that access?

  unveil("/tmp", "rw");
  /* do some work */
  unveil("/tmp", "");

Indeed! The full unveil semantics aren't known yet, may be worth proposing! But for the specific case of /tmp, there is already a tmppath promise.

I wouldn't expect your /tmp example to work, because if unveil is anything like pledge, additional calls can only add restrictions, not take them away.

Shouldn't it be veil()?

The idea is that everything is "veiled" and you "unveil" the stuff you need access to.

Yes, but the weird thing is that before you call unveil, everything is already unveiled, which is not in sync with the dictionary definition of unveiling. I hope a better name is found.

It's unveiling in the context of pledge, as you explicitly declare the things to be unveiled.

Your comment made me try to look up the exact semantics of unveil(2). It was a bit hard to find (I could only find something on its predecessor pledgepath), but apparently, unveil doesn't take effect immediately, but only at an invocation of pledge(2) (which usually follows immediately after it). That was not at all clear from TFA.

Honest question, why is this downvoted?

The PDF has no introduction section, seems to be aimed at people who already know what it's talking about. Can anyone shed some light on what is the idea here? I honestly don't understand what's going on, apart from that it seems to be some security-related feature (or actually two of them?)

pledge(2) on OpenBSD is used to drop the privileges of a process. Processes are meant to call pledge(2) to drop their own privileges. The way pledge(2) works is that the system calls of that process get limited. If a process calls a system call outside the allowed range after calling pledge(2), it gets killed. Starting with OpenBSD 6.3, it is also possible to configure pledge to make the kernel return ENOSYS instead of killing the process when violating the pledge.

For example:

    /* Only the system calls required
     * for the standard I/O library and
     * for accessing /dev/tty are allowed
     * by the kernel from this point on.
    pledge("stdio tty", NULL);
The second argument is the execpromises, i.e., the pledges enforced for child processes. This does not need to be specified if you pledge in a way that does not include any way of spawning a new process.

What's new in the slides linked is unveil(2). This seems to be used to limit the exact paths a process can access and with what access flags (rwxc).

Wasn't there some kind of equiv in OpenBSD long ago, by Niels Provos? Or was it the Stephanie patch? [1] At the bottom they mention Qmail which was immediately my first thought as an example as well, since it breaks up MTA tasks to different daemons.

[1] http://packetfactory.openwall.net/projects/stephanie/index.h...

I think you're refering to systrace. It was removed in OpenBSD 6.0.


Thanks, that's exactly what I meant. Why was it removed? I did notice pledge(2) first appeared in 5.9 [1]

How do other Unices such as Linux deal with this issue (IIRC systrace was ported to other Unices)? Is Pledge ported to other BSDs?

[1] http://man.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man2/...

1) Like with Linux seccomp, systrace couldn't safely work with path strings because of TOCTTOU races. That made it much less appealing. Seccomp never permitted filtering path strings precisely because of the published systrace exploits.

2) To address #1, there's no avoiding intrusive changes deep in the kernel (e.g. into the dentry system, as opposed to a thin layer over syscalls). You need to copy path strings into kernel buffers to avoid the race. Once you accept that it's not possible to have a clean interface boundary in the kernel if you want to keep userland simple, offloading certain core concerns to the kernel makes more sense.

3) systrace, like SELinux, required an external configuration policies for each program. But they were never kept up-to-date, and there were few workable policies. For the most part the only people who understand a program--especially a complex program--well enough to know what privileges are needed are the developers themselves. An external privilege declaration policy is theoretically powerful but in practice it merely creates burdensome indirection and friction for the people best positioned to create and maintain the policy: developers. Moreover, if the policy is optional both users and developers will opt out, particularly during the critical development phase. If a feature doesn't fit neatly into the edit+compile+debug cycle than it's arguably broken by design, particularly for security-oriented features.

4) When doing privilege separation more often than not you _want_ privileges for a certain period of time before dropping them. The classic example is the privilege to open a port < 1024. Policy files that, e.g., let you open specific ports are actually too permissive because you don't want a process to be able to open a port for its entire lifetime. Solutions like inetd/systemd where you simply pass in privileged resources only work for the simplest cases, such as the aforementioned privileged port problem; they don't solve the problem of, e.g., privilege-separating access to the credential database that a program like sshd needs to handle (this is worse on PAM-based systems like Linux as BSDAuth uses groups and external helpers to privilege separate by design). Better but more intrusive interfaces like Capsicum that revolve around runtime privilege sharing have too high of an entry cost to spur widespread adoption. Some of the more complex OpenBSD daemons are obvious fits for Capsicum, but you can't ignore path dependency--they'd have never been ripe for easy Capsicum integration if these simpler mechanisms weren't available.

pledge() was designed to address all of these problems. The "dns" privilege, for example, implicitly permits read access to certain filesystem paths. The intrusive kernel changes are built into the kernel as needed, maximizing overall simplicity by splitting the burden between kernel logic and userland logic.

pledge() handled very simple path handling well (e.g. permit /dev/null, /etc/resolv.conf, etc) but a system for specifying generic filesystem namespace privileges was elusive. Elusive because OpenBSD wants, again, to maximize simplicity and utility while minimizing overall complexity. Apparently unveil() represents the semantic and implementation compromises they've been searching for through trial+error.

Compared to SELinux or seccomp, pledge has been a wild success. Both SELinux and seccomp are much more complex, while being simultaneously more burdensome on developers. In some respects they're more powerful, but often more powerful in the least useful ways. In other respects pledge is more powerful featurewise, and exactly in the ways you'd want. Very few projects or systems utilize SELinux or seccomp. Particularly seccomp as Red Hat put a ton of effort into SELinux for what better yet still meager usage its seen. Much more common is to see frameworks like systemd utilize the privilege systems on behalf of programs. But having systemd manage these things limits the extent to which daemons can make use of fine-grained privilege separation. And doing so makes all the complexity of SELinux and seccomp almost for naught.

By contrast, the OpenBSD developers have "pledge'd" vast swaths of the OpenBSD software ecosystem. Not just daemons but most command-line utilities. You could argue that it was easier for them, but that doesn't explain why Red Hat with considerably more people and money, and decades longer lead time, had such ridiculously meager results with SELinux and seccomp.

Another thing to keep in mind is that pledge() and unveil() aren't the only privilege separation changes OpenBSD has made. Another one is that, along the lines of getentropy(), they made syslog() a syscall so that you don't even need to worry about filesystem privileges for the common cases. Everything is a file is a powerful construct, but for various reasons it's _intrinsically_ handicapped in Unix because of core design elements. Sometimes it's better to simply rethink the best place to put a feature in order to maximize cost+benefit. Leveraging filesystem namespace semantics makes much more sense in Unix than in Windows, but less sense in Unix than in Plan 9. You have to appreciate the nuances and limitations.

pledge() in particular is a very pragmatic, very experience-driven compromise. I honestly don't know if it'd work well for Linux. Linux is a vastly different ecosystem, both in the kernel and in userland. But people would do well to appreciate and understand why pledge() works as well as it does.

From what I understand it was not maintained anymore, and it was also hard to keep up to date lists of allowed or forbidden system calls for the binaries that had to be run under systrace. [1] [2]

Linux provide seccomp-bpf for system call restrictions. [3]

[1] https://marc.info/?l=openbsd-misc&m=146170224108205&w=2

[2] http://www.openbsd.org/papers/hackfest2015-pledge/mgp00009.h...

[3] https://lwn.net/Articles/656307/

> How do other Unices such as Linux deal with this issue (IIRC systrace was ported to other Unices)? Is Pledge ported to other BSDs?

I don't believe pledge is ported to other BSD operating systems, for example FreeBSD uses their own framework, capsicum[1] instead of pledge.

[1]: https://www.freebsd.org/cgi/man.cgi?capsicum(4)

Systrace was removed because it's unsafe for multi-threaded programs because of TOCTOU.

The race only existed for path strings. systrace was no worse than seccomp in this respect (which doesn't even permit filtering on paths precisely because of the systrace exploit), yet still much easier to use.

systrace was removed because it went largely unused. Theoretically powerful, in practice it made the wrong compromises. seccomp recapitulated the same compromises, and it's not surprising seccomp uptake has been similarly weak.

TOCTOU = time-of-check time-of-use

pledge is seccomp

Not at all, for example you can't implement the ratcheting down semantics of pledge() using seccomp. Say starting with a broader promise set "stdio rpath recvfd", and then dropping to "stdio" after full init.

pledge() can also be found in over 85% of OpenBSD's base system.

On linux you can use firejail if it's necessary (or a container if it's needed).

This doesn't address what they just said - dropping privileges incrementally. Firejail is just a whole process filter applied at process start.

brings me to the next question: are there linux equivalent ?

There's Capsicum for Linux[1]. Its a port of Capsicum[2] from FreeBSD to Linux. Capsicum was a joint project between the FreeBSD foundation, Cambridge and Google to create a hybrid capabilities framework. But Capsicum allows developers to do the same privilege dropping that pledge does. However Capsicum is more fine grained then pledge so its less easy to use. Also Capsicum for Linux is also out of tree currently.

[1]: http://www.capsicum-linux.org/ [2]: https://www.freebsd.org/cgi/man.cgi?capsicum(4)

Link #1 is neat for Linux users, but Google seems to have stopped updating Capsicum after v4.11 (which was released April 2017).

The firejail is the closest i know.

Not really; seccomp(2) is for _specific_ system calls, pledge(2) is for more broad functionalities.

I didn’t say they are bug-for-bug compatible and 100% interchangeable. Seccomp was not mentioned anywhere in the thread. People who are interested can look up the specific details.

These are the slides from Bob Beck (beck@'s) talk at BSDCan 2018 (Jun 8-9th), apparently missing its first page.. [0]


Video should eventually show up on YouTube.

[0] https://twitter.com/bob_beck/status/1005162340956794880 ;-)

A somewhat related talk from BSDCan was Florian Obser's slaacd(8) - "A privilege separated and sandboxed IPv6 Stateless Address AutoConfiguration Daemon"



Nice. Back in earlier versions of pledge(2), there was another argument that took paths to allow fs access on, as unveil(2) is doing, but it was never supported/implemented. (see http://man.openbsd.org/OpenBSD-6.0/pledge.2 for the old syntax)

Does anybody here know when the videos will be up?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact