https://man.openbsd.org/pledge.2 is also pretty damn cool. It effectively lets you make your program use a least authority model of execution, and then you can grant it the capabilities it actually needs to run. I would love to see something similar on Linux (there is capsicum-linux, but it seems like it's abandoned).
So like seccomp? Or do you mean a simpler interface like pledge? I haven't really used the seccomp syscall or eBPF directly but I have used libseccomp (https://github.com/seccomp/libseccomp) successfully.
I'm not saying you're wrong, but at least in terms of the default behaviour of pledge with regard syscalls and configuring seccomp so the kernel kills your whole process when it hits a syscall you haven't whitelisted, it sounds the same, so it would be good to know what the differences are.
I think the difference is just that seccomp/eBPF is lower level and more powerful in terms of what it can do. So pledge gives you a simpler interface to manage things, whereas with eBPF you have much more flexibility (since you're writing in a subset of C) but with that flexibility also comes more complexity. I.e., eBPF is its own scripting language for the kernel, pledge is just a way of managing fine-grained capabilities.
Totally uncool and a dead end. Nobody else will do it that way. "The promises argument is specified as a string, with space separated keywords". caps as strings to be tokenized at runtime are slow, insecure and not validated at compile-time. Never trust a parser in core. This needs to be a bitmask of course. Don't let ruby programmers add OS API's.
Pretty sure they've deliberately kept it out on principle: better to encrypt swap, ensure permissions on core files are strict, etc, rather than hope all applications correctly set MAP_NOCORE on every bit of sensitive data. Sounds like it just became too inconvenient to avoid, though.
”better to encrypt swap, ensure permissions on core files are strict”
I don’t think that’s entirely true. A risk this mitigates against is that system administrators read a user’s most sensitive data when they help them debug a crash.
Encrypting the disk won’t protect against that.
Making the core dump owned by the user running the executable won’t, either, because a) the user will have to make the file readable by the system administrators to get help fixing the crash problem, and b) one can assume the system administrators have root privileges, anyways.
On the other hand, a) programs can’t know what a user’s most sensitive data is (for example, they might be writing in their diary), and b) why would users trust a program to protect their most sensitive data in that way, given that it doesn’t even manage to “not crash“?
Deliberate makes it sound like we wanted secrets to appear in the core files. I don't think that's the case, but otherwise yes, there's been a certain amount of care to making core files secure.
I feel like this kind of culture may be dying in most places. But it used to be a common thing that you could pass a core file around to someone knowledgeable who knew what to do with it. In a workplace or university, maybe someone on staff who is technical. In a piece of software, either open source or commercial, maybe its developer. Obviously there is trust implicit in that, because core files contain your data.
On Windows, their equivalent of core files gets sent to Microsoft and they run analysis on them to get most common stack traces for crashing bugs. If you're opted in.
Why doesn't the system automatically pass core files to someone who knows how to deal with them. I work on an embedded system and our system to get core files back from the users has meant a lot of crashes go from "no idea what happened, closed" to "found the bug and fixed it".
Of course there are security implications. Developers are not allowed to see core files from real customers unless the customer agrees. However our test department and our beta tests agree to this. I don't think anyone has ever looked into a core file to see sensitive data not related to the crash but the possibility exists.
There's a few utilities in the FreeBSD base system which use MAP_NOCORE -- I think mostly as a "this isn't useful so don't waste time dumping it" flag: mkimg, sort, grep. The libsodium we have imported into sys/contrib also mention it, but I don't know if those parts of libsodium are ever used by the FreeBSD userland.
Grepping the ports distfiles on my laptop, I see MAP_NOCORE mentioned in Cython, rust, qemu, firefox, and thunderbird; again, no idea how it's being used in those.
In the interest of compatibility, can I suggest
#define MAP_NOCORE MAP_CONCEAL
for now, and in the future if MAP_CONCEAL adds new functionality define it as MAP_NOCORE|MAP_CONCEAL_EXTRA? Since you have the capability to exclude regions from core dumps, you might as well expose it to programs which are aware of MAP_NOCORE.
PS. Linux has MADV_DONTDUMP and FreeBSD has MADV_NOCORE; I'd suggest handling those as well if you don't already do so.
So the name conceal was chosen to allow some flexibility, like prohibiting ptrace. The idea is to keep secrets from escaping into other programs. Other programs generally can't read swap, so that's not a concern.
It dates back to a hack at Yahoo. We originally excluded all mmap data from core dumps, but switched to using MAP_NOCORE extensively when it came in from upstream FreeBSD.
For us it was a performance issue as we used large mmap files in our distributed shared memory system.
The PROT_WRITE tweak is interesting. Being able to enforce a bit of Write XOR Execute behavior in Write OR Execute arenas is nifty. It took this change for me to read into W^X and exactly what it entailed because my naive understanding was that the new no-syscall-from-writeable-page behavior would be almost identical in effect to the strict W^X behavior.
If you have data you don't want written to a core dump, then MAP_CONCEAL will literally only help you if that memory is in an mmap'd region. If it's in regular old virtual memory, you're fucked.
So if you're going to add a flag to something to let users conceal a region of memory from a core dump, add it to madvise, adding it to mmap is just adding arbitrary restrictions on the programmer.
All memory is mmaped, especially on OpenBSD, which deprecated basically-mmap-with-a-mustache brk. There's no such thing as "regular old virtual memory".
Anyone know why this link works in Chrome but not in Firefox? I get Peer could not decode an SSL handshake message. Error code: SSL_ERROR_DECODE_ERROR_ALERT
Why don’t they just default to not writing core dumps, and forcing the sysadmin to explicitly enable them when debugging an app?
This setup seems like the worst of both worlds. Core dumps get written but now probably lack the information critical to debugging, making them worthless.