
How a little bit of TCP knowledge is essential - dar8919
http://jvns.ca/blog/2015/11/21/why-you-should-understand-a-little-about-tcp/
======
Animats
That still irks me. The real problem is not tinygram prevention. It's ACK
delays, and that stupid fixed timer. They both went into TCP around the same
time, but independently. I did tinygram prevention (the Nagle algorithm) and
Berkeley did delayed ACKs, both in the early 1980s. The combination of the two
is awful. Unfortunately by the time I found about delayed ACKs, I had changed
jobs, was out of networking, and doing a product for Autodesk on non-networked
PCs.

Delayed ACKs are a win only in certain circumstances - mostly character echo
for Telnet. (When Berkeley installed delayed ACKs, they were doing a lot of
Telnet from terminal concentrators in student terminal rooms to host VAX
machines doing the work. For that particular situation, it made sense.) The
delayed ACK timer is scaled to expected human response time. A delayed ACK is
a bet that the other end will reply to what you just sent almost immediately.
Except for some RPC protocols, this is unlikely. So the ACK delay mechanism
loses the bet, over and over, delaying the ACK, waiting for a packet on which
the ACK can be piggybacked, not getting it, and then sending the ACK, delayed.
There's nothing in TCP to automatically turn this off. However, Linux (and I
think Windows) now have a TCP_QUICKACK socket option. Turn that on unless you
have a very unusual application.

Turning on TCP_NODELAY has similar effects, but can make throughput worse for
small writes. If you write a loop which sends just a few bytes (worst case,
one byte) to a socket with "write()", and the Nagle algorithm is disabled with
TCP_NODELAY, each write becomes one IP packet. This increases traffic by a
factor of 40, with IP and TCP headers for each payload. Tinygram prevention
won't let you send a second packet if you have one in flight, unless you have
enough data to fill the maximum sized packet. It accumulates bytes for one
round trip time, then sends everything in the queue. That's almost always what
you want. If you have TCP_NODELAY set, you need to be much more aware of
buffering and flushing issues.

None of this matters for bulk one-way transfers, which is most HTTP today.
(I've never looked at the impact of this on the SSL handshake, where it might
matter.)

Short version: set TCP_QUICKACK. If you find a case where that makes things
worse, let me know.

John Nagle

~~~
jvns
One thing that confuses me is -- are ACK delays part of the default TCP
implementation on Linux? I originally assumed this was some kind of edge case
/ unusual behavior.

~~~
andreyf
So it would appear, according to the man pages:
[http://linux.die.net/man/7/tcp](http://linux.die.net/man/7/tcp)

 _In quickack mode, acks are sent immediately, rather than delayed if needed
in accordance to normal TCP operation._

So "normal TCP operation" is to delay ACKs "if needed". Not sure if "needed"
is the right word to use, but whatever.

Looks like RHEL has a system-wide fix:
[https://access.redhat.com/documentation/en-
US/Red_Hat_Enterp...](https://access.redhat.com/documentation/en-
US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Tuning_Guide/sect-
Realtime_Tuning_Guide-General_System_Tuning-
Reducing_the_TCP_delayed_ack_timeout.html)

~~~
sourcesmith
There is also "ip route change ROUTE quickack 1"

------
jfb
I really enjoy reading Julia's blog. Not only does she have a real, infectious
enthusiasm for learning; not only is the blog well written; but I also often
learn a lot. Kudos.

~~~
bufordsharkley
Yeah, I was going to post something to the same effect. Her posts (and videos)
really drive into not just the how, but the WHY people should care, and her
writing is lively and clear. I really love the ambition she has to learn all
these things and share it with the world, and to make it inclusive for folks
of all experience levels.

------
barrkel
This is a general problem of leaky abstractions. If you're a top-down thinker,
you're going to have a bad time some day and have a hard time figuring it out.

OTOH bottom up thinkers take much longer to become productive in an
environment with novel abstractions.

Swings and roundabouts. Top down is probably better in a startup context -
it's more conducive to broad and shallow generalists. Bottom up is great when
you have a breakdown of abstraction through the stack, or when you need a new
solution that's never been done quite the same way before.

~~~
dunkelheit
Careless piling of layers atop layers is the main reason why everything is
slow when computers are crazy fast. Every moderately complex piece of software
is so inefficient that it is better not to think about it or else you become
paralyzed in horror ;)

Usually something is done to mitigate these inefficiencies only when they
become egregious. And that is when even basic knowledge of the inner workings
of underlying layers really pays off (see also: mechanical sympathy).

~~~
kuschku
I am currently writing a client to a synchronized application system, and you
only really notice how it’s layers upon layers when you write custom functions
to serialize/deserialize primitive data types to a raw socket, and then on the
next layer already can just abstract and write objects first to a HashMap, and
then use the HashMap serializer for sending the actual object. And then you go
yet another layer higher and use reflection to automatically sync method
calls.

It’s really crazy to think about it.

------
p00b
John Rauser of pinterest gave a wonderful talk about TCP and the lower bound
of Internet latency recently that has a lot in common with what's discussed in
the article here. Worth a watch I think if you enjoyed the blog post.

[https://www.youtube.com/watch?v=C8orjQLacTo](https://www.youtube.com/watch?v=C8orjQLacTo)

------
PeterWhittaker
Summary: If you know learn a little, you realize that each packet might be
separately acknowledged before the next one is sent. In particular, note this
quote: _Net::HTTP doesn’t set TCP_NODELAY on the TCP socket it opens, so it
waits for acknowledgement of the first packet before sending the second._

By setting TCP_NODELAY, they removed a series of 40ms delays, vastly improving
performance of their web app.

------
colanderman
You don't need to entirely disable Nagle; just flash TCP_NODELAY on then off
immediately after sending a packet for which you will block for a reply. This
way you still get the benefit Nagle brings of coalescing small writes, without
the downside.

(Alternatively, turn Nagle off entirely and buffer writes manually or using
MSG_MORE or TCP_CORK.)

------
dantiberian
I came across this this week working on the RethinkDB driver for Clojure
([https://github.com/apa512/clj-
rethinkdb/pull/114](https://github.com/apa512/clj-rethinkdb/pull/114)). As
soon as I saw "40ms" in this story I thought "Nagles Algorithm".

One thing I haven't understood fully is that this only seems to be a problem
on Linux, Mac OS X didn't exhibit this behaviour.

~~~
tirumaraiselvan
That might be because Mac OS X implements modified Nagle's algorithm as
mentioned here at the bottom:
[http://www.stuartcheshire.org/papers/NagleDelayedAck/](http://www.stuartcheshire.org/papers/NagleDelayedAck/)

~~~
masklinn
So OSX implements a slightly modified nagle to remove the bad interaction
between nagle and delayed acks?

------
bboreham
Why wouldn't an http client library turn off Nagle's algorithm by default?

------
neduma
Can wireshark/riverbed (application perf tests) profiling help to solve these
kind of problems?

~~~
spydum
Wireshark can show you the delay but it won't tell why it's there. You might
assume it's some quirk of your application.. Most people don't consider the
kernel/network libraries and drivers.. Those are all black magic

------
rjurney
In highschool I carried TCP Illustrated around with me like a bible. I
cherished that book. Knowledge of networks would eventually be incredibly
useful throughout my career.

------
mwfj
This can be generalised. It is also one of my favorite ways of doing developer
interviews. Do they have a working/in-depth knowledge of what keeps the inter
webs running? So many people have never ventured out of their main competence
bubble, and that bubble can be quite small (but focused, I suppose).

For all I know, they believe everything is kept together with the help of
magic. I guess I don't trust people who don't have a natural urge to
understand at least the most basic things of our foundations.

~~~
hueving
I used to thing this way before I realized it's just an arbitrary hoop you
make people jump through. To you, people understanding TCP might be what you
claim is a basic foundation. However, it's just about as arbitrary as asking
people to explain 802.11 RTS/CTS or clos switch fabrics, which are both
equally as important to delivering day-to-day network traffic. Additionally,
they both can come up as things you need to understand when trying to optimize
jitter/latency in sensitive local network traffic applications.

Don't judge people based on which components of networks they happened to take
an interest in and dive into.

~~~
sokoloff
On the other extreme, I've seen people ask why is that IP address divided by
16? (10.0.0.0/16)

~~~
jacquesm
And then you have to explain to them that it's divided by 2^16...

------
Ono-Sendai
This is my proposed solution to this kind of problem: Sockets should have a
flushHint() API call:
[http://www.forwardscattering.org/post/3](http://www.forwardscattering.org/post/3)

~~~
Animats
Look up the history of the PUSH bit in TCP.

~~~
Ono-Sendai
Ok, and is there a portable way to set this PUSH bit in the sockets API? the
semantics seem a little different as well since the PUSH bit seems to do
something on the receiver side as well.

