

How TCP backlog works in Linux - signa11
http://veithen.github.io/2014/01/01/how-tcp-backlog-works-in-linux.html

======
Animats
The article kind of misses the point. The reason for having a separate queue
for connections in a SYN-RECEIVED state is to provide a defense against SYN
flooding attacks.[1] An incoming SYN has a source IP address, but that may be
faked. In a SYN flooding attack, large numbers of SYN packets with fake source
addresses are sent. The connection will never reach ESTABLISHED, because the
reply ACK goes to the fake source address, which didn't send the SYN and won't
complete the handshake.

Early TCP implementations allocated all the resources for a connection,
including the big buffers, when a SYN came in. SYN flooding attacks could tie
up all of a server's connection resources until the connection attempt timed
out after a minute or two. So now, TCP implementations have to have a separate
pool of connection data for connections in SYN-RECEIVED state. There's no data
at that stage, so buffers are not yet needed, and a minimum amount of state
has to be kept until the 3-way handshake completes. Once the handshake
completes, full connection resources are allocated and the connection goes to
ESTABLISHED state.

This has nothing to do with behavior of established connections, or connection
dropping.

[1]
[https://en.wikipedia.org/wiki/SYN_flood](https://en.wikipedia.org/wiki/SYN_flood)

~~~
nly
> There's no data at that stage

There can be. The initial SYN can contain a payload.

~~~
Animats
True, although few implementations send data with SYN, because the BSD socket
interface, which everybody uses, forces a full handshake before sending data.
Some firewalls treat data with SYN as an attack.

------
ised
"The solution suggested by Stevens... The problem with this is..."

I see no problem with it. But perhaps I am missing something.

"... an application is expected to tune the backlog..."

Two simple applications I use everday called tcpserver/sslserver and tcpclient
meet this expectation.

See "-b" and "-c" switches.

Has the author looked at Stevens' own example?

[http://www.icir.org/christian/sock.html](http://www.icir.org/christian/sock.html)

------
pests
There was a recent video or article posted here discussing the poor
interaction between Nagle's Algorithm/Delayed ACK/TCP slow-start and how it
results in increased latency, especially for the first few packets.

From a first read it sounds like the decisions made in both BSD and Linux
could also be adding to the latency problem for the first initial packets.

Have OSes checked how their TCP backlog implementation affects the various
congestion control algorithms being used?

~~~
Animats
Different problem.

The bad interaction between the Nagle Algorithm and Delayed ACK still irks me.
I designed one, somebody at Berkeley designed the other, and by the time I
found out about it, I was out of network architecture and doing something else
for a different company.

That 200ms fixed timer in delayed ACK was a bad idea. The whole delayed ACK
thing was a hack to reduce overhead for character-by-character Telnet, which
mattered to Berkeley back then because Berkeley used a lot of dumb terminal
servers. The fixed time delay is based on human response time and the time
UNIX needed to process a character echo. If a typed character has to be
echoed, the ACK can be piggybacked on the reply packet with the echo. This is
one of the few cases in which delayed ACK is a win. It might also be a win
with some quick request-reply APIs.

Really, delayed ACKs should be off by default, and should only turn on when
the connection has been showing a consistent pattern of "packet received, ACK
sent, application quickly transmitted reply so ACK could have been combined
with reply."

John Nagle

~~~
dboreham
I recall emailing you about this in 1996 when I worked on LDAP at Netscape and
being rather surprised to receive a reply! Good times...

------
bbrazil
I ran into the overflow behaviour with our source repository provider, as
they'd get hammered on the top of every minute by all the continuous
integration servers and silently drop connections. The specific version of SSH
we were running didn't send the client banner until it received the server
banner, so the connection just hung for 2 hours on the client.

After much debugging and reading of kernel source this was all figured out,
and the provider adjusted things on their end so this wouldn't happen.

Moral of the story: You probably should set tcp_abort_on_overflow to 1.

