
Rudder: An etcd backed overlay network for containers - pquerna
https://coreos.com/blog/introducing-rudder/
======
wmf
I'm glad to see this since an easy overlay for Docker is badly needed. But
ugh, userspace encapsulation. This would be a lot better if it used OVS +
VXLAN.

~~~
philips
The plan is to add more backends; we started with userspace encapsulation
because it works everywhere and is easy to setup and control.

Initially we wanted to use an existing in-Kernel encapsulation format like a
simple ip-ip encapsulation. However, IP-IP doesn't work on AWS. Then we looked
at VXLAN but it relies on multicast which doesn't work on most cloud networks
either. Most recently we started looking at the VXLAN DOVE extensions and are
getting a prototype together for this.

tl;dr the initial goal is to show that something generic is needed and can
work, we will get something that is performant and/or has encryption next.

~~~
jpgvm
The kernel VXLAN implementation actually supports manual endpoint
configuration via NETLINK (or newer versions of the iproute2 package).

------
derefr
Would this allow you to mesh together containers in separate datacenters? Or
mesh together, say, the containers on your home PC with containers in the
cloud? I'm guessing not.

What I'm really excited for are the possibilities of docker containers with
public-routable IPv6 addresses. It would move the world away from "one host:
many services on different arbitrary ports", and back to the "one host: one
service, possibly speaking a few protocols with ports being used for OSI-
layer-5/6 protocol discovery" model of the 1970s (and eliminate the madness of
SRV records, besides.)

Imagine if, say, bitcoind (which normally speaks "JSON-RPC" to clients -- a
specific layer-6 encoding over HTTP) sat on "bitcoind.host:80" instead of
"host:8332". Suddenly, it'd be immediately clear to protocol clients (e.g. web
browsers) which hosts they could or couldn't speak to, based on the port
alone! The whole redundancy between schema and port in URLs could go away:
they'd be synonymous. And so on.

~~~
shykes
I totally agree that containers in general, and Docker in particular, could
play a big role in moving the status quo towards IPV6 and a more sane approach
to service-oriented networking. I would love to turn on IPV6 by default on
every Docker runtime everywhere - the question is, how do we deal with 1)
existing host systems, 2) existing networks and 3) existing applications which
may not be IPv6-ready? We are already upgrading the guts of Docker for more
powerful networking and clustering in general, so if you give me a solid
answer we can get this out the door pretty quickly :)

------
Oculus
Only recently did I realize what a power house the team at CoreOS is. They're
building some really cool shit. I can spend hours on their blog just right-
clicking and searching on Google. Definitely a good way to learn tons about
distributed computing and that whole subject area.

------
MartinMond
This is interesting, it's pretty similar to [http://tinc-vpn.org](http://tinc-
vpn.org) which is a mesh VPN.

~~~
eyakubovich
Correct, tinc is another example of a mesh overlay network. However tinc
requires configuration files to be created on each host and then distributed
to others. If machines are part of an etcd cluster, you can use Rudder to
create a mesh without the need to create and distribute configuration files.

------
contingencies
Sorry, what problem does this solve?

 _Things are not as easy on other cloud providers where a host cannot get an
entire subnet to itself. Rudder aims to solve this problem by creating an
overlay mesh network that provisions a subnet to each server._ ... is unclear.

What host for virtualized infrastructure needs an entire, fake, non-internet-
routable subnet that it cannot provision itself?

I believe there's a broken one size fits all network architectural assumption
or provisioning methodology at the root of all this.

(Edit as reply to child as rate-limited: Sounds like I was right, and it's
docker's fault. How is this not better solved with the standard approach of
applying network namespaces and/or unique interfaces to containers?)

~~~
wmf
It solves port conflicts caused by running multiple copies of the same service
on the same host. Kubernetes likes to have a few sidecar containers hanging
off each service instance (e.g. memcached might have an sshd sidecar that
wants to be on port 22 and nginx might want to have its own sshd sidecar also
on port 22), and if your host only has one IP address then Docker has to do
dynamic port mapping and your service discovery system has to track port
numbers and such.

~~~
jbeda
sshd is kind of a poor example.

Kubernetes has an idea of a pod -- a group of containers that share an netns
and have an IP.

Reasons you might want a pod: * A thick client or client side proxy that
follows the ambassador pattern for service discovery and access. * A data-
loader and data-server pair. The loader would grab data from some persistent
source and write it to disk or a shared memory segment. The data-server would
then use that data and serve it up. You'd could run the data-loader at a lower
QoS so it doesn't stall the data-server. * Some sort of server and a log
saver. The log saver could periodically batch up and compress structured log
data and upload it to a persistent store (such as BigQuery in GCP). You want
to build/configure/restart/upgrade the log saver separately from the server.
You'd also run the log saver at a lower QoS.

Inside of Google we have all sorts of examples where we have sets of
containers/tasks/processes that are co-scheduled onto a machine and work
together.

------
vquemener
FYI there's already an open source software going by the name of Rudder :
[http://en.wikipedia.org/wiki/Rudder_(software)](http://en.wikipedia.org/wiki/Rudder_\(software\))

------
mrmondo
"... it has almost no affect on the bandwidth." \- looking at those numbers
it's not the case at all, those numbers are really low to start with (as AWS
isn't exactly the fastest) but obviously this would be much more noticeable at
the higher end of the scale when we're talking about 100-200MB/s transfer
rates, not to mention nearly doubling the latency!

------
kapilvt
also works great with lxc, i pushed a juju charm which automates the config
for lxc
[http://bazaar.launchpad.net/~hazmat/charms/trusty/rudder/tru...](http://bazaar.launchpad.net/~hazmat/charms/trusty/rudder/trunk/view/head:/readme.txt)

