Hacker News new | past | comments | ask | show | jobs | submit login
NsJail: A light-weight process isolation tool for Linux (nsjail.dev)
63 points by yamrzou 7 days ago | hide | past | favorite | 28 comments





I have forked this project long ago and have built an online judge utilizing its BPF integration to filter out unwanted syscalls. The fork implements the time/mem usage reporting to satisfy the judge's need and it has improved my knowledge to modern Linux kernels.

There were some rough edges back then, but it had been my go-to tool to run user-provided code in isolation.

https://github.com/NeoHOJ/nsjail


it'd be interesting to see a comparison of these -- the building blocks are (mostly) the same, but the interfaces differ in interesting ways:

- nsjail

- firejail

- bubblewrap

- runc

etc.


As a bubblewrap user, beware https://github.com/containers/bubblewrap/pull/586 still missing. The usual ^C doesn't work with your sandboxed stuff, very annoying.

A cursory look at NSjail tells me its filesystem stuff is less granular than bwrap's bind mounting.

Firejail can't handle : in some paths (at all, no escaping provided) which made me dump it.


> Firejail can't handle : in some paths (at all, no escaping provided) which made me dump it.

This doesn't match my experience. For example, the following works just fine in a profile file:

  blacklist /sys/devices/pci0000:00/*
Can you give an example of what you had problems with?


Another interesting and modern alternative is Syd written in Rust.

https://gitlab.exherbo.org/sydbox/sydbox


One is not like the others - firejail is aimed at more of desktop type applications you interact with, where the others can do so but are more suited for arbitrary workloads.

A parent comment mentions ebpf syscall interception, many end up combining gvisor and nsjail and seccomp.


Me too, for me the ease of use is rather important. NSJail is very easy to use, I am not sure which ones I tried when looking for these tools but some of them were an absolute pain to get going.

Edit: funnily, chatgpt 03-mini tells me nsjail is the second hardest to use (first = systemd) of these...


And jailer from firecracker and systemd itself which has some similar capabilities


pledge is the openbsd version of landlock, a pretty different category from the other namespace based solutions listed.

It's still a reasonable comparison though. The seccomp-bpf is part of nsjail is achieving the same thing, one way to look at it is that Landlock/pledge are just a better implementation for the same approximate feature.

I don't really find it reasonable, landlock type functionality is a tiny subset of what namespace based sandboxing offers. It's like comparing a scanner to authenticate ID cards against a fortified house.

Namespace are very useful to build virtual environments, but I think it's important to keep in mind that they are not designed for sandboxing and don't provide security guarantees (e.g. mount point propagation), nor fine-grained access rights, nor security events (e.g. logs)... which might be OK according to use cases. Also, namespaces increase the attack surface of the kernel (e.g. vulnerabilities that can be reached through user namespaces). That being said, even if Landlock can control the most important filesystem access rights, not all of them are supported yet. New kernel releases bring new Landlock features (e.g. IPC, network control). It takes some time to build a new and safe access control system but we'll get there!

Oh yeah I was just talking specifically about the seccomp-bpf bit. It's not comparable to nsjail as a whole.

A few decades back we had the ability to cryogenically freeze processes, save them to storage, move the bins to another system, and defrost them to be run again. This was a great feature that I had hoped would make its way into mainstream kernels, but it seems to have disappeared off the face of the earth.

I wonder if the expansion of process isolation tooling will ever lead us back to this situation again, anyone know? It seems to me that strict isolation would be a vital rudimentary requirement for cryofreezing processes...


You might be looking for CRIU (https://criu.org/) - it works perfectly on the current kernel.

IIUC this even has logic to reconstitute TCP connections - https://criu.org/TCP_connection

Well at the VM level live migration and vmotion have been around for a while. I've watched a VM get migrated while ping is running without missing a single packet.

CRIU is used lots of places for Linux processes but in my experience is far more low level and finicky and it tends to do things that require root permissions. It's used in production, but I would be shocked if, for example, someone made it so k8s could just live migrate any pod with CRIU.

Just think of the possible ways apps that might break if you changed their hostname or pid out from under them. And that's not even including stuff like connections to localhost or shared memory.


A bit of tangent, but reminds me of the Deep Freeze Windows app: https://www.faronics.com/products/deep-freeze

I wonder if a similar tool exists for Linux.


It would be easy enough using an overlay file system, but I'm not aware of anybody having nicely packaged it up.

Guest user account??

Yeah there is some capability for this, for example https://criu.org

Is there an equivalent for MacOS ?

Isn't macos already isolating each app?

Isn't that for graphical apps (.app) only? How do I sandbox ffmpeg I installed via MacPorts, for example?

Oh you're right, I remember something about sandbox-exec but I'm not sure if it's similar to jails.

So this is like jails for BSD?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: