
Comparison of Networking Solutions for Kubernetes - gliush
http://machinezone.github.io/research/networking-solutions-for-kubernetes/
======
kelseyhightower
I've been a big fan of ipvlan since it was added to the kernel[1] mainly for
the reasons you've outlined and the overall reduction in complexity compared
to bridges and overlay networks. It should also be noted that ipvlan offers a
bit of an improvement over macvlan because ipvlan also works in places where
L2 is not an option (i.e cloud providers).

In the post you mentioned one of the drawbacks to adopting ipvlan was the lack
of tooling to manage it. To address this issue, and a few others[2],
Kubernetes has recently adopted the Container Network Interface(CNI)[3]
standard as the solution for managing network plugins in Kubernetes. CNI ships
with a few plugins including ipvlan and, maybe more importantly, the ability
to allocate and manage IP addresses[4]. Over time CNI should give Kubernetes
the flexibility to work well with multiple container runtimes including Docker
and rkt, while allowing third party solutions to fill in the gaps around
networking.

[1]
[https://www.kernel.org/doc/Documentation/networking/ipvlan.t...](https://www.kernel.org/doc/Documentation/networking/ipvlan.txt)

[2] [http://blog.kubernetes.io/2016/01/why-Kubernetes-doesnt-
use-...](http://blog.kubernetes.io/2016/01/why-Kubernetes-doesnt-use-
libnetwork.html)

[3]
[https://github.com/appc/cni/blob/master/SPEC.md](https://github.com/appc/cni/blob/master/SPEC.md)

[4] [https://github.com/appc/cni/blob/master/SPEC.md#ip-
allocatio...](https://github.com/appc/cni/blob/master/SPEC.md#ip-allocation)

~~~
lobster_johnson
ipvlan sounds great, but I'm not a networking guru, and I'm missing a lot of
pieces. Do you know of any documentation that explains it at both a high and
low level, and shows the steps needed to cluster a bunch of nodes together
(say, on a cloud provider where you only have public IPs) in a single virtual
LAN?

------
chris_marino
Great post!

The results of these benchmarks do not surprise me at all. To me, they all
fall in to the category of 'less (overhead) is more (performance)'. With VXLAN
encap being the obvious example of greatest overhead.

I think its also worth mentioning that k8s networking is being enhanced in
v1.2 to support isolation and multi-tenancy through ThirdParty resources (back
end network solutions). The alternatives included in the benchmarks aren't
going to be able to support these kinds of network policy as is.

And, unfortunately, things get a more complicated when you want to provide
more than simple reachability (which is all that k8s asks for today). The
tradeoff is to be able to control the packets with the lowest overhead
possible. VXLANs will give you isolation, but at the cost of encapsulation.
Stacking bridges and tunnels and distributing VNIDs/routes not only introduces
more latency, but becomes another multi-host coordination problem (matching
tunnel IDs, etc).

We're working on a new way to build cloud native networks that avoids the
encap, but still lets you control all the packets.

You can learn more at [http://romana.io](http://romana.io) if you're
interested.

------
tobad357
We have been using calico Cni with kubernetes with quite some success. It's a
solution that assigns ips and then sets up bgp routing between your
docker/kubernetes nodes. It makes it very easy to trace what's happening. Not
as easy as a flat lan but still pretty easy.

Netstat -rn and you can see where the traffic is going or coming from. The
added benefit is you can then bgp peer with other clusters and get routing
across them

[http://www.projectcalico.org](http://www.projectcalico.org)

~~~
alvelcom
Calico looks good. As far as we understand calico works the same way as
flannel/host-gw does, i.e. creates a route for all subnets in the cluster. Of
course, calico is more advanced technology than flannel/host-gw, but the
reason why we haven't tested it is following: they're different in control
path but the same in data path. Our original intention was to test underlying
linux-kernel mechanisms and understood how much we lose in terms of latency
and throughput.

For small configurations flannel/host-gw is OK, I guess, but if one have
several kubernetes clusters, flannel/host-gw becomes harder to maintain.
That's a place where calico should be useful.

~~~
tobad357
Actually my understanding is a bit different. Flannel uses VxLan and acts as a
overlay network (packet inside of packet) while Calico is a pure Layer 3
routing solution. This means some performance gains but less protocol support
vs VxLan. I think it would be quite interesting to test

------
moigagoo
Hi!

I'm one of the authors of this research. Feel free to ask any questions, I'll
be happy to tell you more.

~~~
justinsb
I'm not sure if you're focusing on bare metal, but Kubernetes can assign IPs
itself, and does so out-of-the-box on AWS, GCE, GKE, Azure & OpenStack (and
probably others as well). On AWS it uses the equivalent to flannel's aws-vpc
by default (and one of my personal goals is to make it easy to install one of
the networking options that goes to bigger scale, so I greatly appreciate your
guide here!)

So you don't need a third party networking solution, but Kubernetes is open so
a number have been nicely integrated. flannel is the most popular choice, but
there are many others (I hear about Calico a lot)

~~~
alvelcom
Yes, you're right. We have to edit article's introduction.

Our original intention in doing this research is to answer the question "how
much do we lose in terms of latency and throughput?" So, as you've said,
flannel's aws-vpc is equivalent to out of the box k8s/AWS solution, flannel's
host-gw is equivalent to calico. Since they use same "backend", in benchmarks
we focused on flannel because it's easier to switch in flannel from one
configuration to another (basically just put new config via etcdctl and
restart the daemons).

However, thanks in pointing it out. In the article we should state our
motivation clearer and add references to another solutions, such as calico.

------
jpgvm
I understand you are only testing for performance but it's worth mentioning
that VXLAN does encapsulation and allows for a wider set of use-cases than the
solutions that require cluster nodes share a L2.

Other than that, great benchmarking! IPvlan is pretty neat, it's the natural
evolution of macvlan and hopefully catches on quickly.

~~~
virtuallynathan
Also, VXLAN offload is in almost all modern NICs.

~~~
alvelcom
Does this mean, that NICs computes checksums inside vxlan payload? As far as I
understand, the major additional cost of vxlan comes from additional passing
thought the whole network stack, which includes iptables and routing.

~~~
virtuallynathan
I think it both adds and strips the VXLAN header and then does the checksums.
Probably depends on the vendor.

------
doublerebel
I'd like to see Joyent's "Fabric" VLANs in this list. They have been very easy
to administrate and seem quite performant.

~~~
alvelcom
Yeah, Joyent's VLANs looks very interesting but as far as I understand I can't
deploy it amazon ec2, since I need special hardware for it. Am I right?

------
n00b101
What other potential container overheads are there, besides network
containerization?

~~~
gliush
Here's a good IBM research about it:
[http://domino.research.ibm.com/library/cyberdig.nsf/papers/0...](http://domino.research.ibm.com/library/cyberdig.nsf/papers/0929052195DD819C85257D2300681E7B/$File/rc25482.pdf)

