
BPF: The Forgotten Bytecode - jgrahamc
http://blog.cloudflare.com/bpf-the-forgotten-bytecode
======
tptacek
We used BPF at Arbor Networks as a medium for signatures for DDoS attacks
(other tools in our stack automatically generated them from statistical
models), and I spent a pretty big chunk of my first year there working on ways
to optimize evaluation of lots of BPF filters on NetFlow records in parallel.

The BPF virtual machine itself is trivial; any competent systems programmer
given its specification should have no trouble implementing it.

What's most interesting about BPF is libpcap. There is a surprising amount of
compiler theory baked into pcap, and it bears by far the larger share of the
weight in making Unix packet capture applications workable and efficient (this
may have changed with JIT'd BPF). I actually found this frustrating, as simple
changes to the BPF VM (for instance, "vectorized" comparisons to support
multiple filters) would break the libpcap optimizer.

I confess to being a bit baffled as to why things like seccomp-bpf would
choose to use such a simple bytecode scheme; network packet captures uses it
because of tradition and compatibility, and because a competently-designed
high level filter language compiler supports only BPF.

~~~
TheCondor
I don't know if I'd call what libcap does "compiler theory" it appears to be a
darker art that has extra knowledge about a lot of different things. It's
remarkably sophisticated though. I implemented a BFF jit engine for a popular
ips and spent a lot of time trying to work out a good way to replace libcap
before we just went with it. Optimized libcap output will routinely make you
scratch your head and wonder how it can work.

~~~
wpietri
I would read with great interest a blog post about your experiences. I have no
practical need, but I thought this tour of BPF was fascinating, and it left me
hungry for more.

~~~
tptacek
In addition to his blog post, which will probably be great, my reading list
would be:

* First, the McCanne BPF+ paper.

* Then, the pcap source code, particularly pcap/optimize.c, perhaps starting with the peephole optimization stuff.

* Finally, one of the BPF JIT projects (there's one on SourceForge). Amusingly: the Linux BPF JIT was at one point used as an aid to memory corruption, by jit-spraying kernel memory. So also consider reading any of the jit-spray papers.

~~~
mccanne
hey Steve here, the BPF+ work was done mainly by Andy Begel... really good
stuff but it never got incorporated into libpcap, and I never otherwise
published how the original optimizer worked. I did, however, give a
retrospective talk on it a couple years back at sharkfest...

[http://sharkfest.wireshark.org/sharkfest.11/](http://sharkfest.wireshark.org/sharkfest.11/)

Click on "Keynote Video" for a really boring talk :-)

------
Rusky
There are a couple later innovations on the BPF design that are really
interesting:

Mach Packet Filter:
[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.4...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.44)

MPF is interesting because it was designed to make packet filters fast as the
primary way to dispatch packets to user-space endpoints- BPF just has to run
all the filters on all the packets, but MPF will merge filter prologues and
then do a switch on some common value (like the port).

Dynamic Packet Filter:
[http://pdos.csail.mit.edu/~engler/dpf.html](http://pdos.csail.mit.edu/~engler/dpf.html)

DPF takes this a step further by using a more declarative bytecode. It can
merge multiple chunks of packet filters so that with layered protocols
switching on IDs, ports, etc. you end up with a decision tree (or DAG maybe?).
They also wrote a JIT with some interesting optimizations like picking the
best dispatch method for each switch based on the number of branches, so they
actually get better dispatch performance than hand-written packet
demultiplexers.

------
xtacy
Eric Dumazet, one of the many kernel hackers, wrote a JIT compiler that
compiles BPFs directly to x86 code _in the kernel_. Much to learn from his
code: [http://lwn.net/Articles/437884/](http://lwn.net/Articles/437884/)

As tptacek notes, the VM is very simple -- it's straightforward to write a BPF
VM to C-code translator that you can then compile by gcc. But it will still be
slower than edumazet's compiled code.

~~~
haberman
In my article about DynASM I mention the BPF JIT and suggest that it could use
DynASM (it would make the JIT more readable at basically no cost):
[http://blog.reverberate.org/2012/12/hello-jit-world-joy-
of-s...](http://blog.reverberate.org/2012/12/hello-jit-world-joy-of-simple-
jits.html)

------
majke
Here's a recent LWN article on BPF developments:

[http://lwn.net/SubscriberLink/599755/1ca601d92ed23e7f/](http://lwn.net/SubscriberLink/599755/1ca601d92ed23e7f/)

(via:
[https://twitter.com/justincormack/status/469388089354633216](https://twitter.com/justincormack/status/469388089354633216)
)

------
asb
If only there was an LLVM backend to generate BPF. Oh wait, there is! As
readers of LLVM Weekly ([http://llvmweekly.org](http://llvmweekly.org)) will
know, an implementation was posted to LKML a while back:
[https://lkml.org/lkml/2014/2/5/743](https://lkml.org/lkml/2014/2/5/743)

~~~
MrBuddyCasino
Given the restricted nature of BPF, what kind of input is expected to be used
with this? Surely you can't just run any code through it. There wasn't any
additional info in that posting, unfortunately.

~~~
asb
This backend is for an "extended" BPF.
[https://lkml.org/lkml/2014/2/5/728](https://lkml.org/lkml/2014/2/5/728)

------
userbinator
_All the jumps are only forward, which guarantees that there aren 't any loops
in the BPF program._

In other words, BPF programs can be thought of as encoding a DAG that
represents a decision tree.

~~~
bhickey
Yes and no. It's important to keep in mind that BPF uses a register machine.
If you look at the paper[0] describing BPF, the authors explicitly considered
a decision tree model (CSPF). Turns out that trees perform redundant
computation, can't deal with variable length headers, and are not memory
efficient.

[0] [http://www.tcpdump.org/papers/bpf-
usenix93.pdf](http://www.tcpdump.org/papers/bpf-usenix93.pdf)

~~~
userbinator
Right, that's why I said it's a DAG -- a decision DAG -- instead of just a
tree structure.

------
sophacles
As with everything, there are tradeoffs - For instance this from a few years
ago:

[http://mainisusuallyafunction.blogspot.com/2012/11/attacking...](http://mainisusuallyafunction.blogspot.com/2012/11/attacking-
hardened-linux-systems-with.html)

Which allows bpf-jit to be part of a larger privilege escalation scheme. (To
properly use this attack you need to gain control of an applicaion first, then
you can maybe use this attack, and if successful you own the box).

I haven't kept up with this at all, so I don't know if it has been fixed (or
even is fixable) but it is interesting.

------
nine_k
\--

> Finally you can apply the rule like so:
    
    
        iptables -A INPUT \
            -p udp --dport 53 \
            -m bpf --bytecode "14,0 0 0 20,177 0 0 0,12 0 0 0,7 0 0 0,64 0 0 0,21 0 7 124090465,64 0 0 4,21 0 5 1836084325,64 0 0 8,21 0 3 56848237,80 0 0 12,21 0 1 0,6 0 0 1,6 0 0 0," \
            -j DROP

\--

Writing raw opcodes to execute in the kernel context on each packet received?
Yes, please! :)

~~~
cnvogel
Well... using the new “nftables” packet filter, this will be the "normal" way
of packet filtering. So, yes, please! ☺

[http://netfilter.org/projects/nftables/](http://netfilter.org/projects/nftables/)

❝Pseudo-state machine in kernel-space: the userspace utility nftables
interprets the rule-set provided by the user (using a new syntax), it compiles
it into the pseudo-state machine bytecode and then it transfers it to the
kernel via the nftables Netlink's API. Roughly, the idea behind nftables is
similar to the Berkeley Packet Filters (BPF).❞

~~~
nine_k
Well, I like bytecode machines.

But I'd expect a compiler or and assembler of some sort that allows to write
human-readable code.

------
pjf
tracedump is a single application packet sniffer that generates BPF code on
the fly:
[http://mutrics.iitis.pl/tracedump](http://mutrics.iitis.pl/tracedump)

~~~
dekhn
Wow, didn't know about that. Very useful!

------
wslh
Not forgotten, indeed we can say that the idea of BPF and proving that is
secure is related to what Google did with NaCl. More food for thought
[http://www.csl.sri.com/users/neumann/2012resolve-
cheri.pdf](http://www.csl.sri.com/users/neumann/2012resolve-cheri.pdf)

