
Obese Pipes - duck
http://www.tbray.org/ongoing/When/201x/2011/01/09/Fat-Pipes
======
smutticus
Disclaimer: I've been designing and troubleshooting networks at ISPs for about
12 years. Below are my observations of the problem given what I know to be
true from experience. I do not fully understand the problem as described but I
will do my best to drop some knowledge.

I would be very surprised if any ISP is still using routers and switches that
buffer anything in software. Even modern store and forward switches only use a
very small hardware buffer for packets. All major vendors use layer-3 hardware
switching on their carrier platforms. The ASICs and TCAM are designed to run
at a high enough clock rate so that buffering in SW isn't necessary.

That said, wireless routers almost always use software buffers. And so do most
home CPE devices like DSL routers or something like DLink, Netgear or Linksys.
This article is strictly talking about buffering at the end-point. Or host
buffering.

There are a few quick and dirty solutions that involve TCP windowing.
<http://en.wikipedia.org/wiki/TCP_tuning>

But bear in mind most protocols that need very low latency are UDP so this
might not help.

It's really really really hard to do any kind of latency testing without a
hardware based packet sender and analyzer. Without it you never really know if
the problem is your OS or the network device you're testing.

If the problem really is because your CPE device has too bigbuffers than build
your own. Get a *NIX box and make the network buffer really small in the
kernel. If you no longer have bad latency than you found the problem. If you
still have bad latency then look for something else to replace.

~~~
wmf
I don't see how it matters whether the buffering is implemented in software or
hardware; it's still there and it's too big.

 _If the problem really is because your CPE device has too bigbuffers than
build your own._

I think Gettys wants to solve the problem for everyone, not just ubergeeks.

~~~
adestefan
What he's saying is that you can make custom hardware that's fast enough so
you don't need any buffering.

~~~
smutticus
That's right. But this hasn't trickled down to the consumer market yet.

Maybe there is a market for a pro-sumer home router or cable modem? It would
cost more than your typical Best Buy Netgear or whatever. But it would
implement L3 forwarding in hardware and have advanced congestion control and
QOS.

My guess is the market research has been done and consumer router
manufacturers have determined that not enough consumers are willing to pay for
things they don't understand.

------
tumult
One effect some older gamers may have noticed over time – lag has gotten
worse. I used to be able to ping 5ms to servers in the Bay Area on my old
Pacific Bell DSL connection back around 1999/2000. Now I'm lucky to get 50
from Comcast.

It also means that most online action games have come to be built around
prediction and latency compensation management. Games feel mushier, and
they're easier to cheat (by intentionally lagging your client, etc.) because
lots of games no longer use the centralized server model. Good prediction
requires a dedicated server to run its own 'master' logic on, so that clients
can't cheat. If you saw the paper from Valve recently here on HN, that's how
it works.

The reason to not run dedicated servers is that it's cheaper. This is what
most Xbox Live and PSN games do – just dumbly pipe data between clients. So it
becomes a trust-based peer-to-peer system, except you can't actually trust the
other peers. And predictably, the games end up feeling like crap.

Lots of games also have latency in their own local engine. Multiple rendering
passes with buffering, physics thread synchronization, and then TVs that
buffer the video, interpolate frames, and apply sharpening. The cumulative
effect is like a car that's been built up with cost-cutting engineering,
loaded down with driving aids, and designed to look good for 11 months. Numb,
mushy, and ugly this time next year.

~~~
burgerbrain
I wonder how far away from your servers you are. Even 50ms from your computer,
down the wire, through who knows how many network appliances then onto it's
target is pretty damned good if you think about it, and at 5ms it's just nuts.
_Light_ will only go about 1500 KM that fast..

~~~
tumult
I'm in the Bay Area. The servers are in the Bay Area. I don't see why that's
unreasonable. People don't know what they're missing now.

~~~
techsupporter
I think burgerbrain is asking how far you are network wise from the target
servers. Comcast, along with most of the other big residential ISPs, is
notorious for having bad internal routing. It's possible your previous PacBell
DSL connection had better routing to the servers in question, but Comcast
doesn't. What does a traceroute look like?

My fiber connection at home pings an average of 12ms to my colocated server in
the same major city. However, a friend's cable modem connection through Time
Warner, just two suburbs over, pings at an average of 43ms. My traceroute? 10
hops. His? 21 hops, and goes by way of Houston.

~~~
burgerbrain
Exactly. Without careful investigation I think it's wrong to assume that
larger buffers are what is causing the difference between ping times between
two different ISP at different points in time.

------
jrockway
Guys, this is not traffic shaping by his ISP. I think it's just a poor
calculation job by scp; I see this mis-estimation all the time when I copy a
movie from my desktop in my bedroom to my TV in my living room over 20 feet of
Cat 6 cable. scp starts at about something 200MB/s and then eventually
converges to the real speed of 30MB/s. There is no traffic shaping, but there
is one write at the beginning that fills the kernel buffer without blocking.
(Why does this get "bigger and bigger every year"? Because people are setting
it bigger. By default, the tcp write buffer is something like 4k. I have mine
set to ~512k.)

If you want to accurately measure your bandwidth, try iperf. Between my server
on Slicehost and my desktop at home, I get about 6Mbps. And I pay for 6Mbps
DSL. So there is not some conspiracy here, people are just measuring wrong.
scp is not a benchmark.

~~~
soult
This is not scp's fault, it really is buffer bloat. It is noticeable in DSL
connections, but it is worst with 3G connections.

When I run some (down/up)loads via my UMTS dongle, I easily fill the buffer in
my modem, the buffers at my ISP. I can still ping remote hosts, with a latency
of up to 60 seconds. The problem is, TCP doesn't understand what happens,
because the packets get acknowledged after all. Only eventually my TCP
connections throttle themselves, and it reaches a somewhat stable balance
where the buffer still introduces about 10 to 20 seconds of lag.

The solution for me was to shape my own traffic on the edge of my "network"
and to never let the buffer be filled in the first place. This leads to a much
better ping (< 1 second) and a higher bandwidth utilization (when looking at
the big picture).

~~~
jrockway
Don't you think the slow part is the link between your modem and the ISP? My
guess is that the buffer you are filling is your TCP write buffer. If I make
mine tiny, then scp reports what my bandwidth actually is.

I blame scp because it chooses a moving-average algorithm instead of an
instantaneous rate-of-change measurement.

I don't have this problem with my UMTS dongle; ping times are about half a
second, but TCP works fine at this latency. You need a big buffer to keep the
data flowing, though.

------
gvb
"There’s another overly-fat-pipe symptom that’s been increasingly in my face.
I routinely copy big files here and there around the Internet, most commonly
from my laptop to tbray.org." ...and then Tim describes a symptom of a fast
initial burst followed by a much lower average transfer rate.

Note he is going external (over the internet). The problem is quite likely
traffic shaping by his ISP. I would contend it isn't really a "fat pipe" issue
so much as his ISP ...um... stretching the truth with respect to how much
bandwidth they _really_ can give to him. ISPs are in a "bandwidth battle"
(kind of like the CPU clock rate wars) - they advertise the maximum bandwidth
that they can theoretically provide, but if all subscribers tried to use that
much bandwidth at once, the ISP does not have the back-end infrastructure to
handle it. As a result, the ISP does traffic shaping: transfers a short burst
(the initial burst in the case of a big file transfer) at maximum speed and
then throttles the connection so that the _average_ bandwidth doesn't
overwhelm their back-end capabilities.

~~~
jedsmith
I observe this with "PowerBoost", and that was the first thing I thought of
when I read what he was describing.

~~~
detst
I love the marketing. "PowerBoost, the few seconds before we throttle your
connection."

------
gvb
"Speed" is related to _both_ bandwidth _and_ latency. High bandwidth with high
latency will "feel slow." Unfortunately, everybody talks about bandwidth and
ignores latency.

The "fat pipe syndrome" is a _latency_ issue, not a "speed" (aka bandwidth)
issue. Most home routers (and, I suspect, many commercial routers) only
advertise _bandwidth_ and not their latency.

ISPs also advertise their best case (burst) bandwidth and are typically silent
on their latency and average bandwidth.

~~~
regularfry
It's both. A long-running, high-bandwidth stream will cause short-lived, low-
bandwidth streams to be squeezed out, because the TCP congestion isn't being
managed properly. The first stream will see _relatively_ good speed but shonky
latency, while the later streams will see bad speed as well, when they should
share the bandwidth equally.

------
trout
This is probably more of a function of the way TCP operates than the potential
upstream bandwidth. Basically envision TCP increasing exponentially each time
a successful transaction occurs, and then cutting in half when a failure
happens. After a certain amount of failures it will linearly grow, which is
where you see accurate data rates. When TCP is scaling exponentially and
decreasing by half, it's pretty difficult to guess the bandwidth. This
probably takes under 10 seconds to stabilize. The only way to display accurate
data would be to wait. So then the choice is really whether you would like to
see nothing, or see the raw data of what's happening in TCP.

~~~
regularfry
As I understand it, it's caused by failures not happening when they should, so
that TCP can't do its flow control properly.

------
akira2501
Or, run egress traffic shaping. Put a hard limit on the upload rate from your
PC to your device, put this at about 98% of your actual "line rate." Problem
solved?

~~~
jbri
This could work ... if you only have one machine on your local network and
your outbound link can sustain that data rate 24/7.

It's not really a practical solution for 99% of cases.

~~~
akira2501
You do the egress shaping at the router, not on each individual machine. And
you shape the entire interface, not individual traffic flows.

As far as jitter in the outgoing rate, that's what the buffer is there for.
What shaping helps you avoid is: a) filling the buffer and those side-effects
and b) prematurely filling the buffer at a high rate while the rest runs at a
low rate.

~~~
jbri
The problem isn't the buffer being full, it's a problem with the buffer being
<em>large and full</em>. Buffering less and just dropping packets when your
smaller buffer fills (or even before then) lets the congestion control stuff
do its work.

~~~
akira2501
I've been doing this in my home networks for years, and this is exactly what
egress shaping does for me. The local router drops packets, the internal
clients back off.

------
signa11
Buffer bloat is a problem. But, as he quite accurately puts it, there are no
really good solutions out there. For those of us in the middle of the network,
there is clearly no good answer. We need to provide RTT * BW * <some
fraction>. We can argue about the fraction, but the RTT is a function of the
flows involved anyway. As a flow can have an RTT anywhere from 0ms to 300+ms,
we can't provide the _right_ answer.

The hosts can do better. As he points out, Linux is optimizing for long-haul
performance, which is not entirely unreasonable. Linux _could_ do much better,
but it would have to have buffering adaptive to the particular transport
connection. Not there yet.

So in the end, he ends up throwing stones all over the place. Gee, thanks.

------
queensnake
Cringely thinks 'bufferbloat' will become a major issue in 2011:
[http://www.cringely.com/2011/01/2011-prediction-4-bufferbloa...](http://www.cringely.com/2011/01/2011-prediction-4-bufferbloat-
may-be-terrible-but-your-cable-isp-wont-fix-it)

