
How to Achieve 20Gb and 30Gb Bandwidth through Network Bonding - ajpelley
http://45drives.blogspot.com/2015/07/how-to-achieve-20gb-and-30gb-bandwidth.html
======
suprjami
This is an absolutely terrible idea.

Round Robin delivers packets out each slave in sequence, but there is nothing
in a switch which guarantees that packets will be delivered in the order
received.

These TCP sessions will be suffering huge Out-of-order packets which makes TCP
CPU and memory usage increase. It will get worse as more traffic is added to
the switch.

This also makes your sessions very hard to troubleshoot, cos they'll be full
of Dup ACKs and Out-of-orders so it's hard to spot a genuine fault.

Also round-robin is a transmit-only speed increase, the traffic coming back
from an unmanaged switch will only come in one slave.

This is a messy use of bonding to "brute force" a speed increase and I
wouldn't recommend anyone set it up this way.

I don't know a way to reliably load balance a single TCP session for faster
bandwidth. Get faster NICs.

~~~
GauntletWizard
The only-slightly-more complex hostport hashing scheme solves your problems,
but... TCP OOO is almost never an issue. Out-of-order packets happen all the
time, and if you're not moving huge quantities of data with hugely lagging
packets, it won't matter. If you swap the positions of every pair of packets,
BADCFE style, your machine will hold B, process A, and make AB available to
the upper layer - almost zero real cost. There's a ringbuffer involved in
nearly every TCP connection that's several packets long, and B can be
delivered to it's proper place in the queue before A arrives.

Sure, it'll look bad in wireshark, but you can simply disable the reporting of
OOO, and the stream is perfectly interpretable anyway.

~~~
suprjami
I'm afraid that doesn't solve the problem I cite either.

Using Mode 2 or Mode 4 with xmit_hash_policy=layer3+4 (so you're load
balancing on the IP/port tuple) still only balances one TCP stream out one
slave.

An out-of-order might be common over the internet, but shouldn't happen on a
good high-speed LAN connection. As we approach tens of gigabits, too much out-
of-order processing will kill wirespeed bandwidths.

Considering people suggest disabling TCP SACK because the CPU overhead is
(allegedly) too high on good working connections, out-of-order processing cost
would be much higher.

All bonding modes except Mode 4 (LACP) and Mode 6 (ALB) are transmit-only. The
receive throughput on a Mode 0/1/2/3/5 bond is still the max of one slave.

------
virtuallynathan
Is it not just cheaper/easier to buy a 40GbE NIC for $350-400 at that point? A
40GbE switch is < $250/port.

EDIT: I see they are using RJ45-based 10GbE - this can end up being more
expensive and power consuming than SFP+ based stuff (fiber/twinax).

~~~
pixl97
>A 40GbE switch is < $250/port.

Huh? Show me a 40Gbe switch for that much per port. The cheapest I see is
around $500 per port, plus you need the cabling which can be from $200-$300.
Though I agree, I'd rather pay more for the more reliable speed.

~~~
jing
DAC is an order of magnitude cheaper than optical and might be fine for their
use based on their comments of cat6 v cat6a and "short runs in their lab".

Also, if they are using two or three NICs per box, you'd need to factor in the
cost of two / three of the X540's vs. a single 40gbe connection and you'd need
to double the cost-per-port of the switch vs the 40gbe option.

~~~
kev009
Also intel NICs aren't a very good deal, and the driver quality isn't what it
once was in common *nixes. The stateless 2x10g and 2x40g cards from Chelsio
are cheaper and all around better IMHO.

------
nickpsecurity
Didn't know they had a kernel-supported way of doing this. Thanks for the
info!

