More

kobzol · 2025-02-24T07:15:18 1740381318

Yeah by user space I just meant without root, sorry. HQ runs on supercomputers where the environment is heavily locked up, even Docker doesn't work. I think that PID namespaces aren't really possible, but I haven't tried it yet.

Subreaper doesn't help, because if the worker dies, the children aren't killed, even if they are the children of the worker, they will be just reparented to init.

zamalek · 2025-02-24T18:01:17 1740420077

FWIW you can unshare PID and user at the same time: https://github.com/porkg/porkg/blob/rs/crates/porkg-linux/sr...

If you don't care about being able to use different uids and gids then simply become root in the new namespace: https://github.com/porkg/porkg/blob/rs/crates/porkg-linux/sr... . Root inside the namespace will then be equivalent to the original uid+gid outside.

I am using clone, which has the very important caveat: more than one thread running is UB. That's why I use a zygote (a process forked from the root very early on - i.e. before starting the tokio runtime). You can probably avoid all of that by using exec+unshare.

But, given you're running on old kernels and constrained environments this may be not possible at all. Maybe make it configurable?

LegionMammal978 · 2025-02-24T20:41:15 1740429675

Ubuntu [0] and some other distros have been trending towards disabling unprivileged user namespaces, unless you have specific AppArmor capabilities or other such mechanisms. So it's not something you can count on being available, unfortunately. (At least, not without jumping through many hoops to satisfy every distro's maintainers.) I've also had some ideas that have been stymied by a lack of unprivileged user namespaces.

[0] https://ubuntu.com/blog/ubuntu-23-10-restricted-unprivileged...

fc417fc802 · 2025-02-24T09:25:12 1740389112

> I think that PID namespaces aren't really possible

Depends on the cluster. If they're using nix or guix then they presumably enabled user namespaces but a few years ago guix had an article about (generally shitty) workarounds for people running in environments where those were disabled.

Edit: Maybe you should have two code paths. A fast namespaced one and the slower old one as a fallback.

kobzol · 2025-02-24T07:13:29 1740381209

Could be done, yeah, but 20s isn't that much, and I'd like to avoid adding more test-only magic environment variables zo configure this (our end-to-end tests are in Python and they use HQ as a binary).

kobzol · 2025-02-24T07:12:22 1740381142

It is sadly not propagated to grandchildren.

I tries the subreaper approach, but it doesn't help. The children are reparented to the worker, but when the worker dies, they are then just reparented to init, like normally.

TheDong · 2025-02-24T09:35:49 1740389749

You also need to specifically have the subreaper process call the "wait" syscall, and wait for all children, otherwise of course they'll end up reparented to init.

If you want to write a process manager, one of the process manager's responsibilities is waiting on its children.

ComputerGuru · 2025-02-24T14:19:28 1740406768

Just a nitpick: They don’t get reparented to init regardless of whether you call wait or not, so long as the parent process exists. They’ll be in a zombie state waiting to be reaped via a parent call to wait. Only if the parent dies/exits without reaping will they be reparented to init.

kobzol · 2025-02-24T07:10:51 1740381051

I do use setsid when spawning the children (I omitted it from the post, but I set it in the dsme pre_exec call where I configure DEATHSIG) but they don't receive any signal, IIRC. Or if they do, it does not seem to be propagated to their children.

kobzol · 2025-02-24T07:08:49 1740380929

The stdlib already mostly does all of that :)

Check out https://kobzol.github.io/rust/2024/01/28/process-spawning-pe....

kobzol · 2025-02-24T07:07:57 1740380877

That's a very good point! But yeah, we use the single threaded runtime, so this shouldn't be a concern.

kobzol · 2024-03-20T04:51:05 1710910265

The Rust bhild config defaults are pretty fine for the general case. It's just that not everyone has the general case :) In Rust the normal distribution will be quite flattened.

kapilsinha · 2024-03-20T08:02:05 1710921725

Ha yes that is a diplomatic way to put it. To the commenter's point though, I too question some of the defaults. mold does seem to be objectively better than the default linker. Stripping debuginfo does seem like it would have been a better default, which is why it was made so recently! Pipelined compilation also falls into this latter category, so perhaps because there is (understandably) just a delay until adoption to stable Rust.

I know you mean it as a figure of speech, but I would consider the complexity (and build time) distribution for Rust to be heavy-tailed and skewed right, more so than a flattened normal.

kobzol · 2024-03-16T10:42:37 1710585757

Yeah, I actually generated these small charts out of a flamegraph, because it contains too much information and isn't easily split into three distinct parts. And once you condense the information into just 3 blocks, then using a flamegraph doesn't really add any further value, IMO.

hobs · 2024-03-17T00:26:17 1710635177

/shrug that's fair, with only like four things its not that much more readable.

kobzol · 2024-03-16T10:41:42 1710585702

There have been some recent improvements to this, but yeah, it can be still quite large. There is a WIP development of a garbage collector in Cargo that could help with this.

kobzol · on Jan 24, 2024

There is some work underway to enable removing the backtrace generation/parsing from Rust binaries. It's hardcoded for now though.