Hacker News new | past | comments | ask | show | jobs | submit login
OpenBSD's unveil() (lwn.net)
115 points by messe 5 months ago | hide | past | web | favorite | 51 comments

> There is no reason why your PDF viewer should be able to read your SSH keys, for example.

Well GNOME's evince can read remote files over at least http/https and ftp, eg: evince ftp://remote/foo.pdf. I guess it's just a short step to adding ssh/sftp support. The remote support is implemented using glib's https://developer.gnome.org/gio//2.36/GFile.html#g-file-new-... which has a generic plugin mechanism for URL schemas, although I didn't look into whether ssh is already supported.


I'd argue this is a misfeature. Different users have different needs, but I really like knowing that especially my basic "view a file downloaded from the Internet"-applications have a limited attack surface, even if that means it has fewer features. Actually, I'd prefer to have that confidence for all my applications, possibly except necessarily overcomplicated behemoths like web browsers or Emacs.

Wouldn't it be better to expose these other schemas via FUSE mounts anyway? I suppose the main issue is one of UX— how do you make a system "Open file/path" dialog that under the hood is capable of setting up and properly lifecycling something like an sshfs mount? Almost seems like a thing you'd want support for at the systemd service level.

OTOH, doing it as a system dialog that runs safely in its own process feels a lot better to me than building the functionality into every app in a piecemeal way.

Even then, an ssh subprocess can talk to ssh-agent instead of directly reading keys.

I have some interest in this since I wrote the initial version of ssh integration in qemu, and I have looked at how sshfs is implemented too. For the ssh integration in qemu we used libssh2 (initially, about to be replaced by libssh which is confusingly a different but better library). sshfs on the other hand runs the plain ssh binary in a subprocess and performs its own sftp protocol on top of the ssh connection (with sftp being done inside the sshfs binary).

From a security point of view sshfs is certainly a better approach. However for qemu we were concerned mainly about performance, and some testing that I did at the time indicated that the library approach (libssh2/libssh) would perform better which was why we went that way.

I guess the lesson here is that retrofitting security to existing applications is difficult, or perhaps the lesson is that Linux applications should be split into cooperating processes using lightweight messaging, like QNX or L4.

Even ignoring ssh, the same problem applies to evince + https currently (reading client certificates), and that is certainly not implemented using a subprocess, nor would it be very easy to change that considering how difficult it is to correctly run a subprocess from a multithreaded library using Linux / POSIX APIs.

Things like sshfs exist to solve this transparently.

You could even create a wrapper that'd mount the remote endpoint to a temporary location, runs your application with that path, then unmounts once your application exits.

This doesn't actually help assuming that unveil(2) persists across a fork (and if it doesn't then that's a back door, and if fork is banned using pledge(2) then you can't use sshfs anyway).

It’s a great concept. But I would NAK the API for Linux:

> Calling unveil() for the first time will "drop a veil" across the entire filesystem, rendering the whole thing invisible to the process, with one exception...

Sorry, but the semantics of calling it once should be similar to the semantics of calling it twice. The first call veils and the second call unveils.

Also, on Linux at least, efficiently implementing “you can only see this subtree” is quite tricky due to races against moving the directories in question around. I wonder how OpenBSD gets around those races.

It feels less inconsistent if you put it like this: every call unveils, but the first one also veils.

Or: every call does the following two things. (1) It ensures that a veil is present. (2) It makes a hole in the veil at the location provided.

Yeah, an easier formulation would be:

noveil(...) to mark pathes we plan to use (fails on veiled pathes), and a subsequent veil() that irreversibly removes access to everything that was not protected by a noveil(...) call between this and the last veil() call.

If a veil() call never comes, well, then we did not restrict access at all and only spammed some kernel structure.

If all you care is protection in case someone gets codeexec in your process, then this would be simpler and give additional features (we can incrementally drop access, instead of only once).

I see an advantage of the unveil(...) in that: Suppose my process cannot irrevocably drop access, because e.g. it anticipates to need to write to some pathes we don't know already. But still, the unveil can help to mitigate stuff like directory traversals: I can't lock (unveil(0,0)), so codeexec is still game over; if my code for late unveiling has bugs it's game over; but if the late unveiling code has no bugs, but the actual file access code is buggy, then this construction will protect me.

Think of it as backward compatible progressive enhancement. You're opting in to veiled FS by making an initial call to unveil a path. The system drops the veil, but only if you indicate that you want to operate in the veiled environment.

Subsequent versions could be veiled by default, requiring calls to `unveil` for everything, but obviously that would break every existing program.

From context I understand the meaning, but what does "NAK" mean here? I'm guessing it's an acronym?

Not acknowledge, e.g. would advise against and, if were the maintainer, would reject the change.

ACK and NAK are used quite a lot in communication protocols, and the usage drift from there.

I like this. It's simple, progressive, delivers a real benefit instantly, cheap and easy to understand.

Entirely the opposite of SELinux.

I'm working on a problem where I have a single executable that is run as a dynamic number of long-running processes where I'd like to restrict each process to a separate directory.

For example, process 1 could only r/w files in "/some/path/1" and process 2 could access "/some/path/2". But the number of processes is dynamic. I might have two or I might have 100. So I need to be able to specify new directories at runtime.

I've been looking at the linux hardening mechanisms but I couldn't find a way to do this with, for example, selinux. With selinux it seems you are mostly able to create system level policies that applied to whole programs. I haven't found a way to provide something like a parameter to change the policy for a given process.

But it looks like this would be very easy to accomplish with unveil(2). This is very interesting work!

On Linux it's a bit more DYI. Look at `unshare()` and cgroups to limit resources.

For example your process scheduler could fork, create a new mount namespace with unshare() , bind-mount the temporary workdir to a known location, drop privileges and the exec the target program.

You might be interested in bwrap.[0]

[0]: https://github.com/projectatomic/bubblewrap

Isn't this essentially the same problem as securing container (ie docker) access to the host file system? Also similar if you just want to give every user their own process. Are docker's tools sufficient for this, for example?

If you aren't executing third party code, don't worry about it? And if you are; provide an api third parties must use that validates paths before r/w files? Disallow '..' './' '~/' etc?

Are you not describing plain old chroot?

Out of curiosity, what would be the point of restricting child processes to specific directories?

If I trust a binary to read a directory, what's the point of constraining specific processes further?

The only use case I could think of is where the binary itself runs untrusted code from a third party. (e.g. plugins).

The point of pledge(2) (and unveil which builds on and complements pledge(2)) is to protect against the consequences of human failing: if the software has an accessible exploit, limiting its ability to see the system means the attacker has a much harder time pivoting into something useful.

That's why pledge(2) and unveil(2) are useful despite literally being part of the software it restricts.

unveil also protects you against untrusted code you wrote yourself (and yes, you shouldn’t trust code you wrote yourself. You may know it doesn’t do anything wrong on purpose, but it might be buggy and do something wrong by accident)

Also, if you didn’t write the OS, your untrusted code will call third party code, if it is worth running it at all (it has to communicate results back to the caller. That requires some OS calls, even if only to trigger some semaphore)

If you ever read untrusted data, you should consider the possibility that you’ll be exploited and end up running untrusted code even though you never intended to.

I don't know about selinux, but systemd can easily do this:


This has the same problem, you have to define a systemd unit/service/whatever beforehand, the program can not change these parameters, then signal it wishes to be locked down and prevented from changing them again.

That's the real beauty of how the OpenBSD approach works, it lets you incrementally ratchet up the restrictions as various stages of the startup/run loop of a program progress, so you don't have to specify everything a program might ever try to do ahead of time, and have that huge pool of permissions persist throughout the entire life of the program.

Between templated unit files and systemd-run things can be a bit more dynamic, but yeah, not at runtime.

As I understand it, seccomp can do what unveil and pledge can do, just with a much more complicated API. I wonder if anyone is working on a pledge/unveil compatible wrapper API for seccomp...

Not at all, for example you can't implement the ratchet down semantics of pledge() using seccomp. For instance having initial broader promise set "stdio rpath recvfd", and then irrevocably dropping to a runtime promise of "stdio" after init. This is a very paradigmatic use of pledge(2), which a large portion of the 85% of OpenBSD's base system pledged will demonstrate.

unveil(2) would be very challenging, if not impossible. Both would require proper Linux integration and kernel implementation.

People have certainly attempted to cobble something together using different pieces on Linux, but the semantics will never be truly perfect unless a native implementation is attempted.

Yup, sadly unlike pledge(2) there doesn't seem to be an execveil so you'd have to fork(); unveil(path, perms); unveil(null, null); pledge(...); exec();

Tiny nit. An unveil pledge promise was added so pledging without it is the same as unveil(null, null).

Good point, and makes a lot of sense given the normal pattern of things.

If your program copes with all the implications it entails, you could chroot() to the relevant directory.

An audience member recorded a recent talk about OpenBSD unveil(2), given last month.


The simplicity, and the fact that it forces conscious choices at source/compile time are great, but I wonder if it comes at the cost of flexibility. For anything sufficiently complex, you'll need some configuration after deployment, which selinux handles through policies and labels. In case of unveil, are the file paths hardcoded ? As a simple example, you build chromium with write access to ~/Downloads. What happens if you want to write to ~/New_Downloads ?

The paths that you unveil(2) can be hardcoded, set from a command line flag, or parsed in from a config file. What can't be done, however, is arbitrary file access at runtime, as there would be no point at which you could lock unveil or drop the equivalent pledge promise.

For the chrome case, you can currently only access ~/Downloads. This involves changing our habits as /users/. If you need to access files from outside this directory, you need to open a terminal and copy/move files there yourself.

That said, for now.. the chromium package does actually have a external unveil policy in /etc/chromium/unveil.* for each process type (main, gpu, renderer, etc). But that won't necessarily always be there.

> For the chrome case, you can currently only access ~/Downloads. This involves changing our habits as /users/. If you need to access files from outside this directory, you need to open a terminal and copy/move files there yourself.

That is where Apple's approach is probably better: have an entirely separate process bridging the software and specific tasks. The bridge process should be easier to audit (and can have its own restrictions and sanboxing e.g. the only things it needs are to communicate over a socket and see the filesystem), and the browser never gets to see the filesystem.

OpenBSD was already using pledge(2) to protect the different chrome process types, many of them can't access the filesystem directly. Chrome actually has a privsep design, IPC, fd passing over Unix sockets etc.

For example, the GPU and plugin processes can only act on file descriptors received this way:


But this alone doesn't protect this "bridge" process itself from accessing the filesystem, this includes your SSH keys, which is where unveil() comes in.

> But this alone doesn't protect this "bridge" process itself from accessing the the filesystem

Of course not, but the point is that the bridge process only comes in contact with the user and a software invoking it, and its sole purpose is to pick a file on the filesystem.

Including your ssh pubkey if that's what you need to select/upload.

Perhaps that's the case, and that's all it's doing, but at least for chromium, it's the main browser process itself currently handling the Open/Save File dialog box.

The chrome case is relatively simpler (UI-wise), but I can see this becoming a major source of friction for a more general server-like setup. If you are setting up a web host with nginx, php-fpm etc., you would likely want a pretty custom permission list (which you do by relabelling in selinux with the various http* type things). In this scheme, it looks like it would mean a recompile, right ?

The path doesn’t need to be hardcoded into the binary, the server could instead unveil those paths you specified as data directories.

> this involves changing our habits as users.

In other words: We have to do what the software requires of us, not the other way round.

I think the point is that security is everyone's responsibility, and everyone will need to chip in.

> There is no reason why your PDF viewer should be able to read your SSH keys, for example.

Is there any way to make use of unveil() at the command line to accomplish the above? To clarify, I suppose unveil() is not intended to be used that way, but will there be some sort of wrapper that allows us to restrict a third-party already-compiled program to certain directories only?

Slide 12: https://www.openbsd.org/papers/BeckPledgeUnveilBSDCan2018.pd... - it is one of the future goals.

I really hope a similar util lands on GNU/Linux one day...

The entire point of unveil() and pledge() is tight integration with the program itself, the programmers know best what their program requires, and unveils/pledges are typically invoked AFTER all the startup junk executes (such as reading in libraries, config file dances, etc).

To restrict it before the program even starts would require a huge amount of what amounts to spurious unveils once the program starts its main operating loop, and really misses the point of the abstraction its trying to create.

This is exactly where approaches like selinux fail, they rely on static distro/user defined policy that can only ever be set from before the program starting, and thus must encapsulate everything the program might ever try to do. With unveil and pledge, the program itself indicates what each branch needs to do, and can much much more tightly restrict itself.

I don't think they fail. Usually you profile the program first, and once you set up the rules, a deviation might get blocked. If you want to define what the program is definitely not authorized to do, this is the way to go.

Using the unveil() is a completely different approach that should yield less false positive and involve less hassle. There is just one caveat: you need to make sure the binary hasn't been tampered with. If someone replaced/modified it, SELinux will block unauthorized behavior, whereas counting on unveil() still working in this case might let you down.

”If someone replaced/modified it […] counting on unveil() still working […] might let you down.”

Might is an understatement. If somebody changed what you run, there’s no guarantee at all that it even still calls unveil.

However, I would think chances that your system is set up such that third parties can tamper with your binaries are about equal to those that the system setup allows them to ‘tweak’ your SELinux configurations.

Would unveil and exec on the target binary work as intended?

Is it just me or are they reinventing the plan9 vfs and it’s mount, bind and rfork calls?

It would make more sense for process spawns to specify veiling behavior, such that unveil() only unveils things.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact