
BPF at Facebook and beyond - Tomte
https://lwn.net/SubscriberLink/801871/c81eb8656543805f/
======
bdd
Excuse me while I shill for my employer but we're indeed big fans of BPF at
Facebook.

Our L4 load balancer is implemented entirely in BPF byte code emitting C++ and
relies on XDP for "blazing fast" (comms approved totally scientific
replacement for gbps and pps figures...) packet forwarding. It's open source
and was discussed here at HN before
[https://news.ycombinator.com/item?id=17199921](https://news.ycombinator.com/item?id=17199921).

We discussed how we use eBPF for traffic shaping in our internal networks at
Linux Plumber's Conference [http://vger.kernel.org/lpc-
bpf2018.html#session-9](http://vger.kernel.org/lpc-bpf2018.html#session-9)

We presented how we enforce network traffic encryption, catch and terminate
cleartext communication, again, you guessed, with BPF at Networking@Scale
[https://atscaleconference.com/events/networking-
scale-3/](https://atscaleconference.com/events/networking-scale-3/) (video
coming soon, I think.)

Firewalls with BPF? Sure we have 'em.
[http://vger.kernel.org/lpc_net2018_talks/ebpf-firewall-
LPC.p...](http://vger.kernel.org/lpc_net2018_talks/ebpf-firewall-LPC.pdf)

In addition to all these nice applications we heavily rely on fleet wide
tooling constructed with eBPF to monitor:

    
    
      - performance (why is it slow? why does it allocate this much?)
      - correctness (collect evidence it's doing its job like counters and logs. this should never happen, catch if it does!)
    

...in our systems.

~~~
davemarchevsky
> \- performance (why is it slow? why does it allocate this much?)

One of the pieces of fleet-wide tooling that heavily uses eBPF is PyPerf,
which we talked about publicly at Systems@Scale in September ("Service
Efficiency at Instagram Scale" \-
[https://atscaleconference.com/events/systems-
scale-2/](https://atscaleconference.com/events/systems-scale-2/) \- video also
coming soon, I think).

~~~
stephen999
Is there any public code for these perf tools?

~~~
bdd
[https://github.com/iovisor/bcc/tree/master/examples/cpp/pype...](https://github.com/iovisor/bcc/tree/master/examples/cpp/pyperf)

~~~
lathiat
woah I hadn't seen that.. sounds like PySpy but implemented in BPF. That's
crazy and cool: [https://github.com/benfred/py-
spy](https://github.com/benfred/py-spy)

------
saagarjha
> Facebook, he began, has an upstream-first philosophy, taken to an extreme;
> the company tries not to carry any out-of-tree patches at all. All work done
> at Facebook is meant to go upstream as soon as it practically can. The
> company also runs recent kernels, upgrading whenever possible.

I was chatting to a Facebook engineer on their use of BPF this summer and
heard the same thing, which was surprising to me. There seem to be a number of
companies that take advantage of Linux being licensed under GPL and keep their
own forks/patches of the kernel that they use internally (anecdotally, I’ve
heard Google does this), and the they stay on some old version, which
apparently Facebook doesn’t do.

~~~
habitue
I can't wait until this is the norm. Facebook (and presumably, Google if they
wanted to) can do this with an army of engineers and some excellent continuous
deployment systems. I'm hoping these kinds of systems become commoditized over
time and regular companies can stay close to the latest kernel version at all
times

~~~
jumpingmice
Google can't necessarily upstream everything because of social problems in the
kernel process. For example their datacenter TCP improvements have never been
accepted by the gatekeeper of the net subsystem, which was a significant
motivation to develop QUIC.

~~~
bifrost
Reinventing TCP over UDP is sortof silly, I hope they have a better reason
than "they don't want to upstream our changes" lol.

~~~
jumpingmice
Isn't it a pretty good reason? gRPC is terrible in a datacenter context
without Google's internal TCP fixes that Linux won't adopt (and which have
been advocated for in numerous conference papers since at least 2009). If they
are steadfast cavemen what other workaround exists?

~~~
derefr
What parts of gRPC are fixed by using it over QUIC vs. TCP (presuming intra-DC
traffic and equally long-lived flows)?

~~~
jumpingmice
Latency caused by packet loss. TCP needs microsecond timestamps and the
ability to tune RTOmin down to 1ms before it is suitable for use in a
datacenter. With the mainline kernel TCP stack you are looking at, at a
minimum, a 20ms penalty whenever a packet is dropped.

------
aey
BPF is awesome. We build a full rust toolchain that targets it
[https://github.com/solana-labs/rust](https://github.com/solana-labs/rust)

~~~
roblabla
That looks cool! Any chance it gets upstreamed into rust proper? And if so, is
there a place to keep track of that progress?

~~~
aey
We would love to, maybe in a few months as we stabilize the changes.

One challenge is that bpf upstreaming is much harder. We had to add support
for relocations, spilling, multiple return values, and a few other things that
might not be needed by the C bpf folks.

~~~
saagarjha
> We had to add support for relocations, spilling, multiple return values, and
> a few other things that might not be needed by the C bpf folks.

I'm curious why C BPF programs wouldn't need this.

~~~
aey
There is no Linux usecase asking for it. We happen to be using bpf outside of
the kernel.

~~~
saagarjha
Ah, that explains it. To be clearer, I should have probably instead asked why
you needed those things ;)

------
farisjarrah
Bounded loops and concurrency management sounds pretty awesome. Can't wait
till Cloudflare's next write up on BPF with these new features.

------
DSingularity
Is there a difference between BPF and eBPF?

~~~
alexgartrell
Short answer: no, same thing

Original bpf is a much simpler bytecode. Ebpf extended it and made it
essentially x86_64 assembly (semantically). Then we all decided to call ebpf
bpf for reasons.

~~~
sanxiyn
Isn't it the case that you still need original BPF for seccomp?

~~~
fdee
The old classic BPF is still used by seccomp, but in the kernel transparently
converted into eBPF. In the kernel we dropped the notion of eBPF and just call
everything BPF (as the old classic BPF is pretty much a thing of the past and
not extended / developed any further).

------
azifali
Is there a easy to use Golang library for eBPF? I do know that Cilium /
cloudflare are attempting to build one but I gather that the project is not
yet ready

~~~
lmb
Can you define ready? The functionality of the library is solid, it's based on
code we're using in production. We've not committed to a stable API however.

(I'm one of the maintainers of said library.)

------
vernie
Cool that the term isn't defined once in the post.

~~~
corbet
It's hard to decide what we need to define and what we don't. In this case, I
kind of opted for assuming that the readers know what BPF is, given that a
fairly high percentage of our articles seem to be about BPF these days. Still,
I'll try to include a link next time, sorry.

Meanwhile, the LWN kernel index
([https://lwn.net/Kernel/Index/#Berkeley_Packet_Filter](https://lwn.net/Kernel/Index/#Berkeley_Packet_Filter))
will lead you to more information about BPF than you ever wanted.

~~~
teddyh
This is what the HTML <abbr> tag is for.

