
Weave is kinda slow - lclarkmichalek
http://www.generictestdomain.net/docker/weave/networking/stupidity/2015/04/05/weave-is-kinda-slow/
======
mdekkers
"Thankfully, the fact that these problems were solved decades ago has not
stopped people from coming up with their own solutions, and we now all get to
witness the resulting disasters."

Spot on

~~~
Confusion
So what are the decades old solutions to these problems?

~~~
otterley
Thankfully, they are all discussed in the linked article.

~~~
Confusion
I read the article: if they are, that is quite unclear to me.

Possible answers that are discussed are only 'some sort of IP encapsulation',
which is vague and GRE, which is just a single solution. He doesn't seem to
disapprove of VXlan, so probably something was missing in 'IP encapsulation
and GRE'. Was 'all problems solved decades ago' merely hyperbole or is there
actually something to it?

~~~
otterley
Is GRE inadequate? A single solution that solves most cases, is codified in an
RFC, and has mature, reliable, performant implementations sounds like a winner
to me.

~~~
Confusion
I don't know if GRE is inadequate: if I knew that, I wouldn't need to ask
these questions. The author doesn't disapprove of VXlan, so there must be
something insufficient in GRE.

Look, I'm just trying to understand the playing field here, for when the
moment comes that I need that knowledge. I don't currently have a need for
funky networking between Docker containers, but I do have Docker containers
and can imagine a future need for funky networking. The article slams new
technologies, but doesn't clearly explain the alternatives, which is what I'm
interested in, so I'm asking follow-up questions. There is nothing rhetorical
here.

~~~
theptip
L3 routing protocols like BGP are one solution to the network connectivity
problem which have been around for decades. BGP powers the internet, so we
know it can scale to millions of endpoints.

------
bascule
Weave has other issues... like they homebrewed their own ECDHE-PSK-based
transport encryption protocol on top of NaCl. Homebrewing your own crypto,
especially transport encryption which has to solve problems like key exchange,
replay attacks, etc is generally the wrong answer.

Also, even if they were using a standard transport encryption like SSL/TLS or
IPSEC, PSKs are generally frowned upon for anything other than point-to-point
connections.

They describe the PSK as a "password", so what they really want is a PAKE
algorithm, however they do not use a password hashing function, so weak
"passwords" are susceptible to brute force attacks.

Anyway, all these things are why you should just stick to standard protocols
like SSL/TLS or IPSEC.

~~~
lclarkmichalek
Yeah, I feel a little guilty after writing this article, as the speed of the
implementation is simply a detail. However, I feel no such guilt in condemning
Weave's security. This is a conversation I had with @weave a while ago about
their encryption

[https://twitter.com/lclarkmichalek/status/544882194456776705](https://twitter.com/lclarkmichalek/status/544882194456776705)

~~~
msutherl
This "our project is open source, feel free to submit a patch" dismissal is so
passive aggressive. If you mean "fuck you," then just say "fuck you."

That said, you shouldn't be saying "fuck you" in the first place: it's rude,
it contributes to bad vibes in the OSS community, and it hurts you more than
anybody. Try instead something like: "I'm having trouble understanding your
argument, do you mind explaining in more depth in an email?" Even if you're
dealing with a troll, this is still the best strategy.

~~~
juliangregorian
I find your attitude the ruder. Users of paid products have the right to
complain about stuff like that; it's literally what they paid for. Users of
open source projects have no such right: if you know what to do, why not make
yourself useful instead of bitching out someone who's volunteered their free
time to make your life easier? I have very little patience with armchair
pundits myself, if you submit a pull request we can happily have a
conversation, but everybody's a critic and some of us are trying to get things
done.

~~~
msutherl
"why not make yourself useful instead of bitching out someone who's
volunteered their free time to make your life easier"

Because as soon as you've found issues with more than, say, 3 things, you no
longer have enough of your own free time to volunteer to solve the problem in
a better way, let alone whatever you were already working on. Do you honestly
believe that criticism has no value?

~~~
weavenetwork
Complaining on twitter is not the same as finding an issue! Criticism has
value, but not all commentary deserves equal weight or time before it is
reasonable to request reciprocal effort.

------
weavenetwork
hello, weave here. a few very quick comments!

weave has lots of very happy users who find that weave is plenty fast enough
for their purposes, see eg [http://blog.weave.works/2015/02/24/get-your-kicks-
on-cloud66...](http://blog.weave.works/2015/02/24/get-your-kicks-on-cloud66/)

the strong points of weave network, as it is right now, are ease of use (not
to be sniffed at), and enormous flexibility. it is really quite easy to create
an application involving containers, that runs anywhere and does not commit
you to specific architectural choices...

typically though, one weave network might be used by one app, or just a few.
but you might run a _lot_ of weave networks

weave works very nicely with kubernetes - later I shall dig out a few links
for this

in our own tests, throughput varies by payload size; we tend to think of
weave-as-is is best compared with using, for example, amazon cloud networking
directly

for users with higher perf needs, we have a fast data path in the works, that
uses the same data path that ovs implementations use... the hard problem to
solve here is making that stuff incredibly easy and robust w.r.t deployment
choices -- see above :-)

~~~
hueving
>weave has lots of very happy users who find that weave is plenty fast enough
for their purposes

That just means that your users aren't using it for large production workloads
or they are just wasting excessive resources to make up for it.

~~~
omginternets
Couldn't the same be said for [insert dynamic langugage here] ?

Fast enough is fast enough, as they say...

~~~
weavenetwork
Correct! Once upon a time people said Amazon cloud was too slow. Then they
said it wasn't suitable for large workloads. Then they said it did not make
money ... etc etc. I'm not saying we are like Amazon, I'm just saying that
making new stuff excellent _in all dimensions at once_ in hard ;-)

------
justinsb
User-space overlays are slow, Weave more so than flannel. The really
interesting data-point for me is that flannel with VXLAN has negligible
overhead.

~~~
q3k
Actually, userspace packet switching can be fast (10GbE linespeed-fast) thanks
to approaches like DPDK [1], where a userspace process has zero-copy, direct
access to NIC ring buffers.

1 - [http://dpdk.org/](http://dpdk.org/)

~~~
lclarkmichalek
The problem with DPDK is (as I understand it) that it doesn't handle
multiplexing the connection between multiple cores/processes/containers (i.e.
lack of polling makes scheduling hard/impossible). The nice thing about using
the Linux kernel as your data plane is that you can still have all your bog
standard routing, in addition to this fun VXLAN/GRE/etc stuff. That said, I
haven't ever implemented anything using DPDK, so I may be talking out my arse.

You are correct however that userspace networking is awesome, and I look
forward to it becoming more and more prevalent in applications that can
benefit from it.

~~~
swansonc
It depends on how you use DPDK. If I use it from the container directly to the
NIC, you certainly do loose all of the kernel capabilities. However, we
believe (but have not tested) that you can use a DPDK virtual interface in the
container/vm (memnic or virtio) that connects to the DPDK driver in the
kernel, so the path from the container/VM is 0 copy. The kernel then does it's
processing, and then, another DPDK path could (potentially) be used to 0 copy
the traffic to the NIC (really uncertain about that last stage). Basically,
you are just using DPDK to save on the copy cost.

This is all academic until tested, btw. As of yet, we (on Calico) haven't had
anyone stand up and say that they need more performance than what the native
data-path we use today is capable of delivering.

~~~
lclarkmichalek
Now that sounds interesting. I'd love to read about that if you ever do move
the idea from paper to production :)

~~~
swansonc
We'll let you know.

------
otterley
I'm not sure I even understand the problem that weave and Docker bridging/NAT
solves for real world cases. IP allocation for containers isn't a problem for
most networks, is it? Certainly AWS can give you up to 8 IPs per instance, and
every datacenter I've ever worked in can give you even more, if you ask. All
you have to do is spin up additional virtual NICs with virtual MACs and use
DHCP to assign IP addresses to them.

Or is there something fundamental that I don't understand? Please edify me.

~~~
wmf
Docker was designed to be very easy to get started on your laptop with one IP
address and it looks like some people are getting stuck in that model.

I agree that if you are running on AWS VPC or some other overlay you should
just use VPC for container networking. You shouldn't overlay your overlay. But
there isn't any tooling that I know of to do that.

~~~
otterley
Everyone I know who runs Docker runs it in a virtual machine manager that has
a built-in DHCP server and provide multiple virtual interfaces to the virtual
machine. Certainly both VirtualBox and VMWare do.

Even if one runs a Docker bridge in his development VM, that doesn't one must
do so in production as well.

Are we in this mess because production engineers don't understand networking?

~~~
bmurphy1976
We're in this mess because cloud providers don't give you all the
functionality you need to solve things properly.

It's only a matter of time before they do, until then...

------
d136o
what I appreciate about weave is that it solved the cross host container
networking problem easily (it's very very easy to use) and _now_ , i.e. no
waiting for promises of future solutions or fooling around with more
complicated set ups.

Here's where I got burned: I set up an elastic search cluster using containers
and weave and life was great, but it then grew to need another node. Upon
setting up the new host with docker and weave it turns out the new node
couldn't talk to the old nodes because they were using different versions of
the weave protocol. That was disappointing and a pain, I stopped experimenting
with weave at that point.

~~~
bboreham
Hi, I work on Weave; it may well have been one of my commits that broke the
protocol compatibility for you. We've changed things over time to improve
performance and resilience.

It should be fairly straightforward to deploy the same version on every host,
but maybe that wasn't explained well enough, or didn't work for you. We'd
welcome more feedback.

Lastly, I appreciate the positive comments. "Very very easy to use" is exactly
what we aimed for.

~~~
garthk
Thanks for using your own account.

------
SEJeff
flannel appears to be what kubernetes is using as well and I know it is what
RedHat is using for their OpenShift platform ontop of k8. It seems like the
obvious path forward.

~~~
leg100
Flannel relies on etcd. I had stability issues with v1 of etcd which meant
flannel couldn't route packets. Since then, the idea of using an immature SDN
atop an immature distributed key value store fills me with dread.

~~~
nine_k
What do you use instead?

~~~
leg100
I gave up on SDNs and fell back to doing what anyone does without an SDN:
published ports to the host interface and advertised the endpoint
<host_ip>:<container_port> to etcd. Note this wasn't with kubernetes but with
a similar system. Still reliant on etcd, which I wasn't happy with, but one
less cog to go wrong.

------
justinholmes
Would have been interesting to compare
[http://www.projectcalico.org/](http://www.projectcalico.org/).

~~~
grkvlt
Agree. I may try some benchmarks, as I have recently added support for Calico
(as well as Weave, which was my original choice due to its simplicity) as
another swappable SDN providers for Clocker [1].

[1] [http://clocker.io/](http://clocker.io/)

------
api
Networking is a disaster. Kernel level networking is nice, but that requires
access to the kernel which you can't provide in a container. Doing so means
containers no longer contain.

IPV6 was supposed to solve a lot of this by having an address space so huge
you could easily give every vm host a few billion IPs. But nobody uses it.

~~~
thinkingkong
Networking isnt a disaster. It's just that the current container ecosystem on
linux hasnt yet resulted in any real domain specific improvements yet. For
"default" systems, networking is pretty damned reliable and performant.

A lot of the problems with regards to containment have already been solved in
different systems. I believe solaris or opensolaris had Crossbow [1]. Any
system that aims to provide connectivity needs to do the least amount of
encapsulation possible and probably be a kernel module.

1\.
[http://en.m.wikipedia.org/wiki/OpenSolaris_Network_Virtualiz...](http://en.m.wikipedia.org/wiki/OpenSolaris_Network_Virtualization_and_Resource_Control)

~~~
adamc
"the least amount of encapsulation possible" doesn't sound like much of a
solution if what you are looking for is networking with strong encapsulation.

~~~
hueving
>strong encapsulation

What does that term even mean? Are you talking about encryption? If not, there
is no 'strength' to encapsulation. Something is either encapsulated
efficiently or it's not.

~~~
adamc
That's silly. There are definitely degrees of encapsulation.

------
tobbyb
Container networking is not special or different from VM networking. There are
tons of proven and widely used technologies available in the open source
ecosystem.

We have been building out a series of networking tutorials at flockport from
basics; static, public, private IPs, NAT, bridging etc to multi-host container
networking with GRE, L2TP, VxLAN, IPSEC focussed on LXC but these will work
with VMs and containers in general.

They don't need any special tools, just IP tools and the kernel and deliver
performance and security. A lot of the Docker centric networking projects use
these under the hood but they are easy enough to use on their own.

[http://www.flockport.com/news](http://www.flockport.com/news)

------
tehbeard
Out of interest, how does rancher's network offering compare to weave/flannel?

------
aristotle
There are ways to make networking easy and performant at the same time without
resorting to these user space hacks. Wait for much better products to show up
(including one from us).

------
hobarrera
Looks like flannel only supports IPv4, while weave would work on IPv4 or IPv6.
That's a huge difference IMHO, since one will work on any scenario.

~~~
ploxiln
If you're using private address space, you don't need IPv6.

------
ceequof
In the terminal snippets, latency is specified in "us", which I'm guessing is
µs, microseconds.

In the table, those latency numbers are specified in "ms". Also microseconds?
Couldn't possibly be milliseconds, for two VMs that should only be a couple
dozen metres away from each other, right?

~~~
lclarkmichalek
Wow, huge mea culpa on that one. I would assume that qperf is using us to mean
microseconds, and judging by the source line

    
    
        char *tab[] ={ "ns", "us", "ms", "sec" };
    

it would certainly seem that way. I'll update the post, thanks.

Now I'll just be waiting for someone to complain about inconsistent precision.
I am not a good scientist...

------
erni1234
Yes

------
cbsmith
In other news, we believe that there isn't much oxygen on the moon...

