Hacker News new | past | comments | ask | show | jobs | submit login
BPF port-based firewall for systemd services (kailueke.gitlab.io)
103 points by Klasiaster 51 days ago | hide | past | web | favorite | 43 comments

Maybe someone can enlighten me because I'm failing to see the relevance of SystemD here.

So the idea is, instead of having a central firewall managing all the host rules, each service define it's own firewall policy ? How do I override a policy ?

I maybe missing something but somehow I'm not sure it's the right place to do this.

I'll end up joining the camp of SystemD does too much and breaks a lot of POSIX semantics making Linux systems hard to debug.

Lately it's been getting more and more in my way. Things that I have problems with lately, DNS, cgroup and namespace. Every time I've lost a considerable amount of time because of poorly documented and mostly unexpected SystemD behavior. Color me annoyed.

Edit: Hum, well, wasn't supposed to but it end up into a rant

Systemd is acting as the spiritual successor to inetd here (which is, as it happens, the core thing that systemd is designed to do: wire up services to each-other or the internet through activation ports/sockets, and manage access to those ports/sockets.)

I'm sorry to say, but you might not have been paying attention. Nowadays SystemD actually does much more than that.

So are complaining it breaks POSIX. That does not bother me that much. What does is that their changes are weakly documented and some of them make systems very hard to debug.

Yes, nowadays, it does. But if your complaint is that systemd does too many things, then this is potentially the least-effective example to use, as

1. if you stripped systemd back down to one function, it would be this one (management of services through activation sockets.) Activation sockets are, in fact, the core thing that the systemd/launchd/upstart paradigm offers over traditional init systems, and were the core reason that Linux distros switched to these init systems (because activation sockets allow for concurrent service startup—leading to faster initlevel-change times, impacting boot, shutdown, sleep/wake, dock/undock, etc.)

2. Activation sockets (and a few other core features, like targets and timers) are the totality of what “systemd” by itself means. The other components are a part of the systemd project, but are not what you get if you just download and compile the repo called “systemd.” You get an activation-socket-based init(8).

3. Even before systemd, Linux (and many other POSIX systems) had much of this same functionality already implemented in the form of inetd/xinetd. And since inetd only makes sense to run on certain kinds of POSIX systems (specifically: ones that have motd(5), uname(1), nsswitch(5), services(5), and a bunch of other above-POSIX.2 utilities, and whose init systems work like sysvinit/initrcd), you could group these concerns together and call them all “the base system.” Back then, every distro had its own “base system”, but that didn’t mean that the components of them were any less numerous or that those components didn’t need to evolve in precise lockstep. The components that were shared upstreams between multiple distros (like inetd) simply didn’t evolve—because there was no multilateral place for those evolution discussions to take place—and thus gradually code-rotted and were reduced to obsolescence, rather than keeping up with the times as something you’d actually want to plug new services into.

Systemd (the project, not the core “init system” component of it) is just a multilateral implementation of a base system, with “the systemd project” being the multilateral forum where distro vendors can propose and discuss the changes to base-system components like inetd that were previously fixed+rotting due to lack of ability to coordinate. The results are no more and no less than what you’d expect to happen when a working group, composed of a bunch of vendors who make their money off of the needs of enterprise customers, get together to evolve the Linux base system.

I’m not saying systemd (the project) is good or bad; I’m saying that there’s no alternative that sits in the same ecological landscape of players in the Linux space, that wouldn’t result in the same agendas being expressed through it. Systemd (the project) is inevitable; it’s a Nash equilibrium. (The only other one being the one we were in before, where we had upstream components like sysvinit and inetd and they never changed for ~30 years despite the demands of the distros and their customers.)

On point 1, I though Unix philosophy was do one thing and do it well ? So we agree is not unixy on purpose.

The point 2 is moot. I get one SystemD package from my distribution anyways. The "SystemD project" renders my system hard to debug. To the point of having fchmoat syscall working and chmod syscall not working and having no f*g clue why. And nobody can help on IRC because nobody, even the most experienced, understand what is going on. It just is not acceptable !

Point 3 is a recurring argument. Stating that SystemD simply coalesce base system binaries is misleading. It goes way beyond that. Does setting a BPF filter in a SystemD unit result in line stating so in the log ? I bet not. How the hell do I know then ? Seriously, for real, how do you debug those kind of systems ?!

I'm very sorry to say, the last 3~4 year, the 4 most hard to debug issues I've had all have their source in "SystemD, the project".

I have not doubt the SystemD project has good intention. I also have not doubt they do not caring one bit about what cognitive load and work load they _impose_ on others.

The road to hell is also paved with good intentions.

Systemd creates cgroups and you can attach BPF filters with the new `IPIngressFilterPath` option. It's also possible with some `ExecStartPre=bpftool …` hack but it's not so nice because it's racy.

The value is that the service configuration knows which exact network behavior a service has. The global iptables state is not context aware unless you tag things by PID. And anyway it's a cleaner approach to bundle the firewall with the service instead of manipulating global iptables state.

You can override the BPF firewall by adding a drop-in service file which either appends an additional filter with `IPIngressFilterPath=filter` or deletes all previously configured filters with `IPIngressFilterPath=`.

Really, what is the goal and when the feature creep ends ? At Kubernetes certification ?

As much as I loved the unit conf + socket activation concept at first and supported SystemD. It's become and unruly teenager. It has severely hindered my productivity on several occasion but I'm stuck with it on Debian. One more glitch and I'll probably start hating it's guts badly...

I agree (and I'd say the low point was when I upgraded a laptop to buster and silently ended up with systemd-resolved ignoring my local LAN DNS to try to forward everything to so couldn't resolve local hostnames).

However, you're not stuck with systemd on Debian. Install sysvinit-core and remove systemd-sysv, and install libpam-elogind and elogind instead of libpam-systemd if you need it. You'll very likely still end up with some of the libraries installed - it's not the end of the world and you can install Devuan if you really don't want that, but systemd won't be running as PID 1 and you don't get rubbish like journald or resolved.

On my personal machine I might end up doing just that. But that can not always be changed in working environment.

Edit : Also been bitten by the DNS mingling.

Shipping the required network settings alongside makes sense. i.e.: a smtp mailer declares it needs to access tcp/25.

How does one get a global view though ? What am I allowing on this host and more importantly, how do I fix it when things go wrong ? (they always do at some point!)

It looks sensible, until you have to debug the beast.

If you're not a SystemD developer and/or don't following extremely closely what they are doing you end up with un-manageable system before you know. Just upgrade your distro for security issues and bam lot of thing stop working and you don't know why. It's getting to a point it's ridiculous.

https://skarnet.org/software/s6/ also supports socket activation, Obarun, Gentoo and Void support it so far. Would be interesting to see how it'd compare to systemd's use cases with wider adoption.

This isn't a systemd feature. It's a Linux kernel feature (BPF) which is simply being called from a systemd exec line. This could also be implemented with sysvinit.

This is the same kind of reply I got when systemd implemented IP accouting in unit files. Lennart said systemd gets a lot of flak, but in the end it's only implementing kernel features. I proceeded to ask him: why is it necessary to have this ? Do you feel obligated to make every single feature available ?

Parent comment raises a very valid question. How do you manage the firewall policy with this system ? Netfilter configuration is already a mess on Linux (iptables ? nftables ? iptables-persistent package ? netfilter-persistent package ? some custom shell script that calls individual iptables rules ? rules dynamically inserted by scripts ?) Each of these tools/methods has its flaws, but it becomes completely unmanageable if two are used at the same time.

When this won't work as expected, how should a sysadmin handle the situation ? Where do you even start debugging ? Is he/she expected to inspect every single unit file in search of the one that is amiss? Can one get a list of all currently loaded rules ? (preferably with counters about matched packets and ideally with the possibility to log a packet matched by a rule)

systemd making this 'easy' to use may be a bad idea. In my eyes, it's just giving users more rope to hang themselves with. The unit file format is often touted as an asset, because it's much simpler than the shell-goo you would find on most distributions (Debian and derivatives for example provided a skeleton file for people to write their own init.d services. Just the boilerplate was almost 100 lines of sh. Contrast with OpenBSD where most scripts to configure service startup are only a couple of lines).

Having a key=value format was touted as a plug as it made things easier. Turns out that's not exactly true because some settings will have the expected effect only if you add enable a different setting at the same time. In my mind, this translates as an if/else which makes me think systemd.unit(5) format is not INI style configuration, but a small programming language masquerading as a configuration.

Anyhow, this turned out longer than I expected. The fact that it would be possible to do with sysvinit does not mean it would be a good idea to do this sysvinit.

This is the classic configuration problem: a program is too complex for users, we'll use configuration files. Then configuration becomes more and more complex so either your configuration file language becomes an ugly programming language (XML and XSLT for example) or the complexity is pushed into the parser of the configuration files and you have lots of 'magic' combination of parameters..

And no I don't have the answer, maybe using Lisp like Guix do?

I know what BPF is. You are missing the point. I have enough work to not want to have to learn what new little specific twist SystemD that goes the other direction of everyone else has made in its latest release. While is the 'old ways' are HEAVILY documented on thousands of sites and new feature X has barely a half-page documentation.

I've spent a whole week trying to figure out why I had filesystem permission issue on chmod syscall and not on fchmodat. The culprit was SystemD trying to be clever with namespacing which could have been fine if the reported error was semantically correct. It gets in the way provokes untrackable issues.

And yes, namespace is a kernel tool. But it's SystemD that sets it up ...

Edit: tipo and clarification

The systemd feature is that in v243 you can now specify a BPF program for all sockets in the cgroup via `IPEgressFilterPath=/sys/fs/bpf/yourfilter` (same for `IPIngressFilterPath`).

This approach is very similar to what https://github.com/cilium/cilium is doing for containers right? I wonder if it would be easy to reuse the battle-tested bpf programs that cilium provides and load them into systemd units.

There is more crazy shit that we can do. Like set up entire service meshes with load balancers for your systemd units. Very neat.

yes, it uses the same technology.

Snabb I think that also has XDP/BPF support and you can do similar things using Lua

Many people are surprised to learn that Linux (and I guess some of the BSDs?) have a virtual machine that runs in the kernel and executes its own specific bytecode. LLVM has a BPF backend, as mentioned in this article.

Pretty sure BPF originated in BSD land :)

Prior to BPF there was "Enet Packet Filter", then Ultrix Packet Filter then something under SunOS before it became BPF. BPF was created in 1990 (at Berkeley) which was widely BSD oriented.

> and I guess some of the BSDs?

BPF, BSD — notice the first letter is the same ;-)

Also, many systems (some BSDs, SmartOS, macOS, Windows10, …) have an in-kernel VM for running dtrace bytecode.

Isn't the BPF program has to be attached with root privilege? If so, the idea that to have per service is not enforceable anyway, right? As potential my filter can affect any other process running on the host.

what's the user case for this and how does this complements iptables? why do I need this?

BPF is very interesting, I remember one thing is that it's of very small size and has no loops, but I don't understand its use case for firewall yet.

This is a good talk about the how's and why's of BPF as a firewall: https://youtu.be/_Iq1xxNZOAo

My understanding is that it allows you to write a program to decide what to do with a packet and allow that program to run in kernel space for efficiency. The limitations of bpf are to protect the kernel from the non-kernel code running in its memory space and ring 0.

There is already support in systemd for IP address filtering, so that a service can or cannot communicate with certain IPs. Internally this uses BPF as well. The blogpost was about extending this to port filtering through a custom BPF program.

sooo.. tcpwrappers?

Compiling dynamically generated C programs on demand to provide packet filtering? It's clever but not in a good way. This is a really dangerous approach to solving this problem.

First, there's no need to do this "on demand." You can ship a .o around and load it with bpftool. My team does a version of this at Facebook (using libbpf directly) and it works really well. The big use case for on-demand programming is in tracing where you want to capture structs, but BTF and Compile-once, run-everywhere will mitigate this need [0]

Second, I don't actually agree. JIT compilation is a generally accepted approach -- who cares what the IR is? And the bpf runtime is ULTRA constrained, so you can't slip a `system("rm -rf /")` in there (by a long shot).

[0] https://www.kernel.org/doc/html/latest/bpf/btf.html

I'm thoroughly skeptical that general acceptance means that JIT compilation doesn't increase the attack surface of the stack.

You'd need an interpreter at least, to avoid some kind of hideously complex system.

So it's a question of interpreter vs. JIT, and a JIT makes it more feasible to use an ultra-simple language with aggressive verification, without losing too much speed.

Every feature has an attack surface, but you have to compare against the alternative, not the lack of feature.

(This assumes that you can't force it to use a kernel interpreter despite the JIT existing. Otherwise perhaps that is the part that should be disabled.)

Tangential: Does your team publish papers or write blog posts on the kind of engineering or research it does? I'd like to read it, or at least add it to my queue. Thanks.

There was a recent talk on tupperware that gets into some of what Alex's team does: https://engineering.fb.com/data-center-engineering/tupperwar...

That was interesting. Thanks.

What are the dangers?

AFAIK, BPF has in-kernel verifier on the safety of the BPF program, which is conservative and would reject safe program, let alone the real dangerous ones.

The verifier has had bugs in the past. Here's six of them from 2017: https://www.openwall.com/lists/oss-security/2017/12/23/2 one of which was discovered to have an incomplete fix in 2018: https://crbug.com/project-zero/1686

https://medium.com/@tyanir/understanding-bpf-check-alu-op-vu... has a longer description of one of them - namely, you can convince the verifier that a certain part of the program is dead code and therefore doesn't need to be verified when it does, and thereby get arbitrary unverified eBPF into the kernel.

The talk hasn't been posted to youtube yet, but slides alone show some of the risks of eBPF. The speaker's previous talk at 35C3 went into how the verifier is arbitrary and not really designed to prevent attackers anyway.


Hi, I chose this way because it's a proof-of-concept. Normally this filtering expression would be stored in a BPF map but I didn't invest more time and wanted to keep it simple as an example for writing BPF programs for systemd services.

How is it any different than letting a script run as root? This seems very similar to varnish cache which is amazing.

There's a limited instruction set, no loops, built in verifier exists so that no anomalous code can be executed on the in kernel virtual machine. More so, this feature is in 4.X kernel and I think there's been no exploit discovered in the wild.

> More so, this feature is in 4.X kernel and I think there's been no exploit discovered in the wild.

There won't be an exploit for bpf. It's kind of a different layer and it's own system. "Exploit in bpf" is about the same level as "exploit in C". There's just no such general thing.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact