Pledge() and unveil() in SerenityOS (2020)

dang · on July 15, 2022

Discussed at the time:

Pledge() and Unveil() in SerenityOS - https://news.ycombinator.com/item?id=22116914 - Jan 2020 (28 comments)

Related from yesterday:

Show HN: Porting OpenBSD Pledge() to Linux - https://news.ycombinator.com/item?id=32096801 - July 2022 (114 comments)

unicornporn · on July 15, 2022

He's live on YouTube now if you want to ask a question :)

https://www.youtube.com/watch?v=h3IasVVL_k0

jamal-kumar · on July 15, 2022

This guy's story is really impressive. Sounds like he managed to quit his job to work on livestream developing this full time as part of a drug addiction recovery strategy. [1] That's pretty impressive, I hope the best for him going forward with that. It looks like a pretty interesting introduction to operating system development, a huge topic with tons of stuff to cover.

[1] https://awesomekling.github.io/I-quit-my-job-to-focus-on-Ser...

IshKebab · on July 15, 2022

Why is the interface string-based? Seems very hacky and unusual. Also, special bonus evil points for whoever decided that "rpath" should have nothing to do with RPATH.

Seriously please can all of you other programmers stop pointlessly abbreviating things that are already quite short?! "read_path" is perfectly fine.

kotborealis · on July 15, 2022

`pledge` is from OpenBSD, as suggested by the article, and it also implements it using string-based interface (https://man.openbsd.org/pledge.2)

I think, the main point is to make it as easy as possible to use.

IshKebab · on July 15, 2022

Yeah I know, I was really asking why OpenBSD did it that way. I don't think string-based arguments are easier to use. If it was a struct with a load of `bool` fields you'd get code completion, compile time type checking, built in documentation, discoverability, etc. Much easier!

cyphar · on July 15, 2022

You can't trivially extend structs in a kernel ABI (to be fair this is worse in Linux as there is more than one libc and many programs do raw syscalls bypassing the libc anyway -- though it can still be done[1]) but string APIs are simpler to use and upgrade. They can also be far more ergonomic in some cases.

The core issue is that userspace programs and libraries can be compiled using structs with the old size (in theory the libc can abstract this using symbol versioning but then the same issue lies within the libc) causing out-of-bounds memory accesses when the kernel tries to access the struct fields. There's also forwards-compatibility issues but bad memory accesses are marginally worse.

[1]: https://lwn.net/Articles/830666/

NavinF · on July 15, 2022

From the LWN article you linked:

>This mechanism works by marshaling parameters to a system call into a single C structure; a pointer to that structure and the size of the structure are passed as the parameters to the system call. That size parameter acts as a sort of version number.

Oh hey, that’s exactly how every WIN32 call works. I see an article from 2003 talking about why Windows is strict about the struct size parameter: https://devblogs.microsoft.com/oldnewthing/20031212-00/?p=41...

> You can't trivially extend structs in a kernel ABI

IMO the article you linked is evidence that this is trivial. Especially for syscalls that will only be called a few times in a process’s lifetime. I dunno why there’s so much bike shedding about this on the mailing list.

Edit: Ahh I didn’t realize you were the Aleksa mentioned in the article. I wish you good luck.

cyphar · on July 15, 2022

That Windows article gives a better argument than I ever could as to why putting the size in the argument list is better than in the struct. Some pre-openat2 Linux syscalls are designed in the same way. (Having no forwards nor backwards compatibility but still having struct size versioning really is an interesting design choice...)

But yes, it is relatively trivial -- my point was more that you can't just use a struct in the way the first comment suggested, you need to come up with some scheme (even if it seems trivial in retrospect).

As for the bike-shedding, that's LKML for you (though in fairness it is a bit of a tall order to try to come up with some enforceable API design rules in Linux -- syscalls with half-baked designs being added is less rare than one would hope, so clearly there's not an overarching design principle being applied already, though thankfully it's becoming pretty rare to see a completely borked syscall that clearly has no users being merged).

loeg · on July 15, 2022

The kernel can easily version APIs using struct size, as long as new members are only appended. The libc function would pass the size of the struct to the syscall, or you could have pledge() itself be a macro that computes sizeof in the caller.

cyphar · on July 15, 2022

That is the exact solution described in the link I included in my comment (I am the "Aleksa" in that article). It is not entirely trivial (certain edge cases need to be handled) but it is entirely doable. But string arguments also work if you don't have complicated data parsing requirements.

I (obviously) prefer the extensible struct solution but there are downsides (and other solutions weren't an option for Linux anyway).

Toaster-King · on July 15, 2022

There's a great blog post[1] by one of the OpenBSD developers about why they did so. tl;dr using bitmasks necessitates namespaced enums/defines that take up horizontal space, strings are easier and don't need to go through the C pre-processor.

[1] https://flak.tedunangst.com/post/string-interfaces

IshKebab · on July 15, 2022

Nice find. That article is highly unconvincing though and mostly argues against straw men.

> Although using strings subverts C’s already weak type checking, that’s probably not a major concern. One can screw up bit masks by using || in place of |. Or, as above, one can incorrectly pack the magic array. It’s usually much easier to visually audit a string than the C code used to plaster a dozen option together.

It's pretty easy to design an interface that is way way less error-prone than strings (especially ones full of single-letter differences!) and the visual auditing argument falls apart as soon as you have to `snprintf()` some string together from parts.

This code is way more readable, way less error prone, more discoverable, faster and more easily extendable than strings:

    auto config = make_pledge_config();
    config.read_path = true;
    config.stdio = true;
    pledge(&config);

You'd think security focused people would care about static type checking.

jeshin · on July 15, 2022

if you really care about static type checking, you probably wouldn't be using C

IshKebab · on July 16, 2022

You probably would if you care about static typing and are working on a kernel syscall interface.

jeshin · on July 17, 2022

and yet they chose to use strings for their api, eh?

IshKebab · on July 17, 2022

Well exactly. That's why it's so weird.

hag · on July 15, 2022

Nice! I would probably have used bitmasks in this situation, but as usual there is a reason behind the choice and I get the reasoning.

agileAlligator · on July 15, 2022

great find!

bowsamic · on July 15, 2022

Probably because the ABI would change when a pledge is added

sedatk · on July 17, 2022

Not necessarily. You can pass a bitmask of an arbitrary length using char*.

tomjakubowski · on July 15, 2022

Would printf be easier or harder to use if the format were specified with a struct? (instead of a string)

diath · on July 15, 2022

https://github.com/SerenityOS/serenity/issues/11140

sdwvit · on July 15, 2022

Please ELI5, is it like application permissions? I am not a cpp/c coder.

masklinn · on July 15, 2022

Kinda except not. pledge/unveil is about privilege dropping: in searching for ways to better secure the system, the openbsd developers came to the conclusion that the average program (especially things like network daemons and such) tend to have a complicated setup phase where they read config files, open sockets, query the system state, etc… then a much simpler “steady” phase which needs much less access.

With application permissions or external constraints, this is not really helpful, because the application needs to do its setup.

However if the application can pledge not to do the setup things between the setup and the steady state, and it gets corrupted or owned during the steady state (e.g. because it’s a network daemon and there’s a bug), it becomes a lot harder to exploit since there should be very little the would-be exploiter can do or explore before the OS kills the program.

So this is not really about protecting the system against the application, it’s about the application participating in the system’s protection by dynamically reducing its own permissions while running.

teawrecks · on July 15, 2022

How does this work for child processes? What if a service regularly starts new processes to accomplish various tasks over its lifetime. Would each process also declare promises that have to be a subset of the parent? If the parent is compromised, could it then cause its children to ask for more permissions than it needs?

freeone3000 · on July 15, 2022

Exec can indeed cause an pledge escalation; the caller can cause it to inherit or have predetermined limits, but in practice this isn't used since the child would have to do similar setup tasks. As such, "exec" is a commonly pledged-out permission!

simongr3dal · on July 15, 2022

AFAIK, with pledge() a process can tell the kernel “I’m only going to use X, Y, Z features” (e.g. read, write from file system)

After the process has told this to the kernel the process can then only do these things for its life time. You can pledge() again later, but you can only restrict your pledge never expand it.

This is a nice feature because it limits the number of processes that can potentially be security liabilities even if they have bugs.

unveil() is a similar feature but for file system paths.

It’s a feature of SerenityOS (inspired/borrowed from OpenBSD), and not a feature of C/C++.

Try reading the article, it’s pretty easy to follow :)

beebmam · on July 15, 2022

It also offers opportunities for run-time optimizations that a kernel can make around context switching. Seems extremely useful.

calvinmorrison · on July 15, 2022

If you fork, can you change your pledge?

tel · on July 15, 2022

There’s a second explicit a am list of pledged capabilities available if/when you exec.

elbigbad · on July 15, 2022

No a forked process only inherits the broadest permissions of the parent and can only downscope.

notaplumber1 · on July 15, 2022

It's not the broadest permissions from the parent, but the promises at the time of the fork, for example you can setup the parent in such a way that you fork off early a unprivileged (or privileged) child that has a different set of promises from the parent.

waynesonfire · on July 15, 2022

kinda lose the setup / steady state benefit for child processes.

notaplumber1 · on July 15, 2022

Not at all.

loeg · on July 15, 2022

If you can exec, pledges disappear (by default, and also in common practice).

KerrAvon · on July 15, 2022

This statement conflicts with other statements here -- is this actually true? It sounds like a security hole.

ben_bai · on July 15, 2022

if you have exec permission (pledge "exec") you can exec another program and it starts with a clean slate. It's about dropping privileges so it's assumed you know what your doing and in the best case scenario the executed binary will pledge itself.

Pledge is not some external security feature but something that every program itself manages.

legalcorrection · on July 15, 2022

Why not just pledge not to exec?

loeg · on July 15, 2022

Fork and exec are different operations.

marcodiego · on July 15, 2022

The syscalls pledge and unveil were created on OpenBSD to easily limit what a process can do. The pledge syscall specifies what syscalls that process can access while the unveil syscall specifies what directories it can access. After calling each in a certain way it is no longer possible to call any syscall not allowed by pledge or access any directory not allowed by unveil.

The idea is that every process should call each early when it starts running and, after that, if the process is ever compromised, it will not be able to do much harm since the files and syscalls it can interact with are limited.

For example, a browser should never access /etc/passwd or call the exec syscall. So, a browser, when run, should as early as possible call pledge and unveil to prevent itself from accessing /etc/passwd or calling exec if it ever becomes compromised.

catskul2 · on July 15, 2022

> is it like application permissions?

FWICT, only sort of. It's like giving up permissions that you might already have (presumably to reduce potential security problems). And it's a bit more specific to the kernel.

Pledge: "I will at most use these kernel facilities" (don't let me do otherwise)

Unveil: "I will at most access these fs paths" (hide all other paths)

teawrecks · on July 15, 2022

Most programs have a pretty good idea of what they’ll be doing in their lifetime. They’ll open some files, read some inputs, generate some outputs. Maybe they’ll connect to a server over the Internet to download something. Maybe they’ll write something to disk.

pledge() allows programs to declare up front what they’ll be doing. Functionality is divided into a reasonably small number of “promises” that can be combined. Each promise is basically a subset of the kernel’s syscalls.

Once you’ve pledged a set of promises, you can’t add more promises, only remove ones you’ve already made.

If a program then attempts to do something that it said it wouldn’t be doing, the kernel immediately terminates the program.

freemint · on July 15, 2022

In unix like user spaces applications have access to a lot of stuff by default (such as the complete filesystem accessible to the executing users and a lot of system calls). For security purposes tighter restrictions would be better. Pledge() and Unveil() allow application authors to opt in at run time to restrict what they can do in the future.

perryizgr8 · on July 24, 2022

The way Andreas Kling builds up code in his livestream makes it seem so easy to do OS development. After watching one of his videos I feel I could easily build up any system level program or feature, but I know from prior experience that it is not so easy for me.

ape4 · on July 15, 2022

Are all files veiled (inaccessible) until an unveil() call?

notaplumber1 · on July 15, 2022

No, simply because using unveil is not mandatory. It's opt-in. The filesystem becomes "veiled" to the application only after the first call to unveil(), subsequent calls "unveil" files/directories until the final call which locks it, preventing any future unveils.

j_m_b · on July 15, 2022

This is awesome, now if someone asks me what a promise is, I can say "it's an argument to pledge about which system resources I am declaring access to" and watch their faces glaze over. "No I'm talking about the async promise..."

easton · on July 15, 2022

(Although the post says “Written on January 22, 2020”, it clearly wasn’t. I’m guessing there’s just an old template somewhere.)

jeroenhd · on July 15, 2022

SerenityOS has had pledge and unveil for a while now, to the point where many system tools implement pledge by default. The video covering a reimplementation of justine.lol's pledge.com was uploaded yesterday (https://youtu.be/T6YkQF6ohoA) leveraging these APIs to replicate the original tool's behaviour (though it needs some pledges just to spawn the child process which I think are a bit clunky, but surely can be worked around).

aeyes · on July 15, 2022

What leads you to believe this? pledge() has existed for years in SerenityOS.

https://github.com/SerenityOS/serenity/commits/master/Kernel... and it used to be in a different file previously.

compressedgas · on July 15, 2022

Well, the videos linked to were uploaded on the 11th and 20th of January 2020. I think the article's date of the 22nd is correct.

easton · on July 15, 2022

I’m dumb and thought that the video being posted yesterday meant this was more recent.

olliej · on July 15, 2022

Meh, it happens :D