
LXC container networking deep dive - tobbyb
http://www.flockport.com/flockport-labs-extending-layer-2-across-container-hosts/
======
contingencies
TLDR; author looks quickly at options for layer 2 sharing between containers
on physically disparate hosts.

That said, note that this is a very bad idea in most cases because you are
increase network complexity, expose additional code (both points bad for
security) and create startup latency. The only excuse for this normally would
be an application clearly requiring layer two connectivity that you are
prepared to test, maintain and debug across flaky somehow-encapsulated WAN
infrastructure.

As the maintainer of _lxc-gentoo_ and an early LXC adopter (5 years), I
applaud the author for bothering to look at things instead of jumping on one
of the PaaS bandwagons. However, one should be careful of designing spherical
cows.

Personally I recommend the following for container networking:

(1) Use _veth_ device pairs. These are host-to-container and provide you with
per-container isolated networking and accounting very cheaply.

(2) Preconfigure your interfaces with static IPs from the host side by
generating an appropriate _lxc.container.conf_ and network up script to
configure the host-side interface. This removes any latency in interface
configuration due to layer two issues such as spanning tree protocols, and
layer three issues such as DHCP. Note that you may have to set _resolv.conf_
using a similar automated process.

(3) Provide outbound (internet) or inter-container connectivity to the various
container _veth_ interfaces on the host via explicit rules on a per-guest
_iptables_ chain that can be conveniently and precisely destroyed with the
termination of each guest system. Treat in-kernel bridges as a hack with STP
hassles, like they are.

~~~
shykes
veth pairs are a reasonable default for simple single-host connectivity (it's
the default for Docker, for example). But for high-throughput production
deployments where latency matters, you may need to look at other options. This
is what we most commonly see when Docker is deployed in production (note that
the lesson applies to any orchestration system based on linux containers, not
just Docker):

\- Sometimes people dedicate a whole physical interface for the sole use of a
container, by moving it into the namespace. The downside is that the interface
is completely unavailable to other namespaces, including the host.

\- macvlan is pretty popular, it allows multiple containers to share access of
the same physical interface, without conflicts and without the performance
overhead of a veth+bridge combination.

\- you can also connect your container to an overlay network using a variety
of tunneling systems. vxlan is the _flavor du jour_ but there is a flurry of
competing standards and implementations in that area.

By far the coolest part about container networking on Linux is that _it 's
just regular linux networking_. The entire stack, and all the tooling that
comes with it, is available to containers. You just need to glue it together.
And of course these options are not mutually exclusive: a namespace can
contain as many network interfaces as you want.

Disclaimer: I am not a networking expert, but a lot of networking experts want
their favorite toys natively integrated into Docker, so they take the time to
explain to me how they work.

~~~
contingencies
_It is unsurprising that it is easy to beat a general system by specializing
it._ \- Schwarzkopf et al, 'The seven deadly sins of cloud computing research'
(2014)

 _A general-purpose product is harder to design well than a special-purpose
one._ \- Frederick P. Brooks, 'The Design of Design: Essays from a Computer
Scientist' (2010)

 _10th Fundamental Truth of Networking: One size never fits all._ \- RFC1925
(1996)

[https://github.com/globalcitizen/taoup](https://github.com/globalcitizen/taoup)

------
justincormack
There is an excellent post about why you dont want layer 2 tunnels here
[http://markburgess.org/blog_broadcast.html](http://markburgess.org/blog_broadcast.html)

~~~
wmf
Did he end up suggesting... NAT?

~~~
justincormack
No. ipv6 if you don't have enough ipv4 addresses.

------
exabrial
Ok this is a great article... but what is Flockport stuff? I'm kinda shying
away from Docker since their move from LXC... This looks really neat!

~~~
tobbyb
Hi author here, thanks, glad you liked it! Flockport provides ready to deploy
containers of popular apps based on LXC. We like the simplicity of LXC
containers and find it easier to use.

The comments are spot on, most networking gurus appear to be strictly against
extending layer 2 as we note in the post but we wanted to put the options out
there. You definitely need to address latency and mtu issues across remote
hosts that will crop up, but with standards like VXLAN based on extending
layer 2 perhaps we will soon start seeing interesting workarounds.

