We used BPF at Arbor Networks as a medium for signatures for DDoS attacks (other tools in our stack automatically generated them from statistical models), and I spent a pretty big chunk of my first year there working on ways to optimize evaluation of lots of BPF filters on NetFlow records in parallel.
The BPF virtual machine itself is trivial; any competent systems programmer given its specification should have no trouble implementing it.
What's most interesting about BPF is libpcap. There is a surprising amount of compiler theory baked into pcap, and it bears by far the larger share of the weight in making Unix packet capture applications workable and efficient (this may have changed with JIT'd BPF). I actually found this frustrating, as simple changes to the BPF VM (for instance, "vectorized" comparisons to support multiple filters) would break the libpcap optimizer.
I confess to being a bit baffled as to why things like seccomp-bpf would choose to use such a simple bytecode scheme; network packet captures uses it because of tradition and compatibility, and because a competently-designed high level filter language compiler supports only BPF.
I don't know if I'd call what libcap does "compiler theory" it appears to be a darker art that has extra knowledge about a lot of different things. It's remarkably sophisticated though. I implemented a BFF jit engine for a popular ips and spent a lot of time trying to work out a good way to replace libcap before we just went with it. Optimized libcap output will routinely make you scratch your head and wonder how it can work.
I would read with great interest a blog post about your experiences. I have no practical need, but I thought this tour of BPF was fascinating, and it left me hungry for more.
In addition to his blog post, which will probably be great, my reading list would be:
* First, the McCanne BPF+ paper.
* Then, the pcap source code, particularly pcap/optimize.c, perhaps starting with the peephole optimization stuff.
* Finally, one of the BPF JIT projects (there's one on SourceForge). Amusingly: the Linux BPF JIT was at one point used as an aid to memory corruption, by jit-spraying kernel memory. So also consider reading any of the jit-spray papers.
hey Steve here, the BPF+ work was done mainly by Andy Begel... really good stuff but it never got incorporated into libpcap, and I never otherwise published how the original optimizer worked. I did, however, give a retrospective talk on it a couple years back at sharkfest...
What would you want to do with seccomp-bpf that can't be expressed with BPF? Or did I misunderstand the comment?
You want to allow or deny calls based on syscall args, which seems to fit the BPF model. Perhaps string matching is where it breaks down, since syscall args have strings and network packets don't? I suppose a stripped down regex like Lua's might be useful for matching arguments that are file system paths, and that is firmly outside BPF as far as I understand.
I would guess that they wanted to reuse a VM already in the kernel, and that had a fast JIT implementation, but I'm just speculating.
Well, I think it makes sense given the history. When seccomp-bpf and the other stuff was introduced, there was already a BPF VM in the kernel designed to efficiently run untrusted code (although seccomp-bpf can't use the JIT today); adding another VM is a maintainability and security hazard, and simplicity is good for security, so why not adapt it?
And now that it exists, it makes sense to evolve the bpf design to be more efficient but keep the result backwards compatible and unified between subsystems.
MPF is interesting because it was designed to make packet filters fast as the primary way to dispatch packets to user-space endpoints- BPF just has to run all the filters on all the packets, but MPF will merge filter prologues and then do a switch on some common value (like the port).
DPF takes this a step further by using a more declarative bytecode. It can merge multiple chunks of packet filters so that with layered protocols switching on IDs, ports, etc. you end up with a decision tree (or DAG maybe?). They also wrote a JIT with some interesting optimizations like picking the best dispatch method for each switch based on the number of branches, so they actually get better dispatch performance than hand-written packet demultiplexers.
Eric Dumazet, one of the many kernel hackers, wrote a JIT compiler that compiles BPFs directly to x86 code _in the kernel_. Much to learn from his code: http://lwn.net/Articles/437884/
As tptacek notes, the VM is very simple -- it's straightforward to write a BPF VM to C-code translator that you can then compile by gcc. But it will still be slower than edumazet's compiled code.
Given the restricted nature of BPF, what kind of input is expected to be used with this? Surely you can't just run any code through it. There wasn't any additional info in that posting, unfortunately.
Yes and no. It's important to keep in mind that BPF uses a register machine. If you look at the paper[0] describing BPF, the authors explicitly considered a decision tree model (CSPF). Turns out that trees perform redundant computation, can't deal with variable length headers, and are not memory efficient.
Which allows bpf-jit to be part of a larger privilege escalation scheme. (To properly use this attack you need to gain control of an applicaion first, then you can maybe use this attack, and if successful you own the box).
I haven't kept up with this at all, so I don't know if it has been fixed (or even is fixable) but it is interesting.
❝Pseudo-state machine in kernel-space: the userspace utility nftables interprets the rule-set provided by the user (using a new syntax), it compiles it into the pseudo-state machine bytecode and then it transfers it to the kernel via the nftables Netlink's API. Roughly, the idea behind nftables is similar to the Berkeley Packet Filters (BPF).❞
The BPF virtual machine itself is trivial; any competent systems programmer given its specification should have no trouble implementing it.
What's most interesting about BPF is libpcap. There is a surprising amount of compiler theory baked into pcap, and it bears by far the larger share of the weight in making Unix packet capture applications workable and efficient (this may have changed with JIT'd BPF). I actually found this frustrating, as simple changes to the BPF VM (for instance, "vectorized" comparisons to support multiple filters) would break the libpcap optimizer.
I confess to being a bit baffled as to why things like seccomp-bpf would choose to use such a simple bytecode scheme; network packet captures uses it because of tradition and compatibility, and because a competently-designed high level filter language compiler supports only BPF.