
Cilium 1.0: Bringing the BPF Revolution to Kubernetes Networking and Security - eatonphil
https://cilium.io/blog/2018/04/24/cilium-10/
======
atonse
I can’t be the only person thinking “What the hell is BPF?”. I know that
probably means I’m not the audience but it wouldn’t hurt them to just state
that the very first line.

Updated with article about BPF:
[https://lwn.net/Articles/747551/](https://lwn.net/Articles/747551/)

Really good talk by the Cilium folks that explains these concepts:
[https://m.youtube.com/watch?v=ilKlmTDdFgk](https://m.youtube.com/watch?v=ilKlmTDdFgk)

~~~
jitl
My first thought is that it’s “Berkeley Packer Filter”, see here for some
context on what else it’s used for: [http://blog.memsql.com/bpf-linux-
performance/](http://blog.memsql.com/bpf-linux-performance/)

~~~
eatonphil
I believe it started out as an extension of BPF. The real term they mean to
use (and most people mean to use, I think) is eBPF. Calling it eBPF makes a
whole lot more sense to me because it's evolved a lot since it was just doing
packet filtering. Also, it's Linux-only so again differentiating eBPF from BPF
(which is not Linux-only) makes sense to me. But I didn't write the post.

------
hardwaresofton
It's really exciting to see technology being moved into the mainline kernel
(see the lwn/mailing list posts) and being so quickly useful to many entities
doing serious work (tm) with it.

KubeCon Copenhagen just wrapped up, and I'm working through to watching all of
them but here's a video on eBPF applied to tracing:

[https://www.youtube.com/watch?v=ug3lYZdN0Bk&index=5&list=PLj...](https://www.youtube.com/watch?v=ug3lYZdN0Bk&index=5&list=PLj6h78yzYM2N8GdbjmhVU65KYm_68qBmo)

RIP to people who were working on nftables.

~~~
catern
The OP states that BPF is replacing nftables as if it was a foregone
conclusion, but it's not at all certain - it's still far from happening.

~~~
hardwaresofton
It was actually the lwn post (which is basically a summary of the mailing
list) that made me think that nftables was doomed.

[https://lwn.net/Articles/747551/](https://lwn.net/Articles/747551/)

seems that BPF is taking the nftables API, which seemed to be the core value-
add (reimagined iptables API), and actually delivering on the performance
benefits as well.

I hold people that work on the kernel in pretty high regard, and I expect them
to be pragmatic about it (the whole "strong opinions loosely held" thing), and
if BPF doesn't introduce too many possible security vulnerabilities (that's
about the only issue with it I can see), it might represent the best of both
worlds -- new api + improved performance.

------
fulafel
All this "service mesh" layer technology seems very complex. Does anyone have
a link to write-up that would cover the motivations?

It seems all this could be just done with traditional networking tech, like
microservice endpoints just having real IP addresses and using normal
application level auth/load balancing methods when conversing with internal
services.

~~~
zxcmx
Agreed re: complexity, the motivation behind the heavier service meshes (not
necessarily cilium per se) is that it's a bit like AOP (aspect oriented
programming) applied at the service level.

Different orgs work at different scales and in different styles; some orgs are
producing monoliths, others are producing "fat services" or "microservices"
(without going too much into what that might mean).

Some orgs have template repos or base libraries (big difference!) that they
use to produce services. Others just have standards and you can do it how you
like but plz conform to the standard (have /healthz, use statsd or export for
prometheus, etc etc).

Also, how do all the things auth to each other? Do you TLS all the things or
do you have api keys and secrets n-way between all the things? Does stuff
trust each other based on IP? Etc.

There are lots of "illities", but particularly various kinds of monitoring,
metrics, circuit breakers, access control and so on that you can either bake
in to each service independently or implement via shared code somehow or
other.

Notice that the above generally implies some degree of language homogenisation
(usually a sane thing to have when you take into account other illities like
artifact repos, dependency analysis, coding style guides, static analysis
tooling etc - adopting a new language is not "easy" at scale) or else you are
rewriting all these things a lot.

Anyway one option in this whole rainbow of possibilities is that you pull some
of this out of the service itself and push it into a network layer wrapper
somehow.

And that is how you end up with a service mesh...

Broadly speaking, my estimation is that if your company doesn't have multiple
buildings with lots of people who have never met each other, you probably
don't need a service mesh. And maybe not even then.

~~~
w4tson
Great explanation: AOP at a macro level. You made an interesting point about a
homogenous language across services. It’s prescient problem in my team. I’m
not alone in thinking that shared binary code for the “ilities” produces more
problems in the long term than it solves.

At this point the solutions seem to be intro a service mesh or copy pasta.

------
tango12
How do you compare some of the features/goals of Cilium to istio's network
policy? [https://istio.io/blog/2017/0.1-using-network-
policy.html](https://istio.io/blog/2017/0.1-using-network-policy.html)

Edit: Just came across this
[https://cilium.io/blog/istio/](https://cilium.io/blog/istio/) :)

~~~
throwbacktictac
Cilium lives below userspace which makes it perform better than istio. This
article has more information about the differences from the cilium developer's
point of view.

[https://cilium.io/blog/istio/](https://cilium.io/blog/istio/)

~~~
hueving
>Cilium lives below userspace which makes it perform better than istio.

There are a lot of fast userspace networking projects that bypass the kernel
precisely to be faster. Which approach is better is up for debate but the
kernel is definitely not faster in all cases.

~~~
woah
How can you bypass the kernel?

~~~
aseipp
One major difference between the kernel and any other piece of standard
software is access control: the kernel has privileged access to most hardware
peripherals. The software for the driver is nothing particularly special; it
is simply run in a context in which it has control of the hardware.

More concretely, most hardware devices are relatively easy to interface with:
for example, you may simply set up a region of DMA memory, poke the hardware
device with the address of this memory, and write into it, then read results
back out. A NIC is a good example of such a model. This can be done with any
block of memory, except normally the kernel is the only thing that can talk to
the NIC (to tell it where to write to/read from).

So the main thing you need to do is pass control of the hardware to a
userspace process. For the NIC/DMA example, the easiest way is to just
allocate some memory, make sure it's non-swappable, and then get its physical
address. You then just need a small driver to connect userspace with the
hardware -- it must give you a way to tell the hardware where to read/write.
Maybe it exposes a sysfs-based file with normal unix permissions (a common
method). Writing an address into this file is equivalent to telling the
hardware to "read here, and write there". Now you can write to the memory you
allocated (in userspace) to control the NIC.

At this point, the kernel is more-or-less out of the loop completely. Of
course, this is the easy part, since now you must write the rest of the
hardware driver. :)

------
yanslookup
Has anyone found any articles comparing and contrasting Cilium with other
popular k8s networking impls? ie Flannel, Calico etc?

