
SYN cookies ate my dog – breaking TCP on Linux (2018) - smitop
https://kognitio.com/blog/syn-cookies-ate-my-dog-breaking-tcp-on-linux/
======
majke
This is excellent. I never thought about it that way. Indeed, with syn cookies
the receiving party has no idea what was the original sequence number, so it
will treat _any_ delivered packet as first.

Couple of caveats:

\- you can jam syn cookies enabled with tcp_syncookies=2 sysctl

\- syn cookies are generally bad because they prevent to negotiate window
scaling. Window scaling is important unless you are doing low bandwidth like
telnet :)

\- you can somewhat negotiate window scaling when tcp timestamps are enabled.
But enabling tcp timestamps in general case brings little benefit and wastes
12 bytes of each packet for basically no gain.

\- for a bonus point, consider what happens when both syn cookies and
TCP_DEFER_ACCEPT are enabled.

More about syn packet handling in linux [https://blog.cloudflare.com/syn-
packet-handling-in-the-wild/](https://blog.cloudflare.com/syn-packet-handling-
in-the-wild/)

~~~
emmericp
> But enabling tcp timestamps in general case brings little benefit and wastes
> 12 bytes of each packet for basically no gain.

I disagree; TCP timestamps are awesome. Linux enables these by defaults.

Quick search gives me some measurements from 2012 [1] that indicate that TCP
timestamps are enabled on 83% of the top 100k web hosts.

You can afford to waste 12 bytes; the bottleneck isn't these 12 bytes but how
well you get congestion control to work. And congestion control relies on
getting an accurate estimate of the round-trip time

[1]
[https://link.springer.com/chapter/10.1007/978-3-642-36516-4_...](https://link.springer.com/chapter/10.1007/978-3-642-36516-4_14)
(paywall)

Edit: typo

Edit: Also, just because 83% of web hosts having it enabled does not imply
that it is a good idea to do so in general. They could just all be running the
linux defaults and these could be just wrong

~~~
jsnell
You can measure RTT continuously, accurately, and even in the presence of
packet loss just with selective acks and a little bit of extra bookkeeping in
the sender.

It's hard to overstate how expensive TCP timestamps are. The thing is that
they bloat every single packet including control packets. 2% of the world's
bandwidth is being wasted on this.

The only reason for anyone to implement TCP timestamps today is that iOS
clients have horrible receive window scaling if timestamps are disabled.
(Well, that was the only reason a few years ago when I was still in the game
of keeping up with the quirks of different TCP stacks.)

I wrote more on the subject at the time:
[https://www.snellman.net/blog/archive/2017-07-20-s3-mystery/](https://www.snellman.net/blog/archive/2017-07-20-s3-mystery/)

~~~
toast0
I would guess that Apple could be convinced to fix this, if someone has the
right contact. The iOS TCP stack shows a lot of care, generally: they do path
MTU probing well, and they've deployed MP-TCP (requires apps to enable it
though), among other things I can't remember. Fixing performance if timestamps
are disabled seems like something they'd do.

------
toast0
FreeBSD had a more fun SYN cookie bug in 2015 (fixed here [1], and I think
this is the diff where it was introduced [2]; determining which releases it
touched is an exercise for the reader); the initial sequence of the sender had
been left out.

I guess you would have a similar lack of dogs, but also, if a connection was
opened and closed quickly, a re-transmitted packet from the client would
satisfy the SYN cookie calculation, and the server would re-open the
connection, but at it's original sequence number.

The details are a bit hazy, but the client would get an ACK with SEQ behind
where it had ACKed, and would send an ACK probe with it's latest values. The
server would see an ACK ahead of where it had sent, and send an ACK probe. If
the hosts had low enough round trip times, the number of ACK probes sent could
be tremendous. For those unfamiliar with FreeBSD, the localhost interface on
FreeBSD runs full TCP, and under high load can drop packets and retransmit. We
ran into this on localhost first, but then later across the internet with
external clients.

[1]
[https://github.com/freebsd/freebsd/commit/56ba0a68edde7b3832...](https://github.com/freebsd/freebsd/commit/56ba0a68edde7b38323f81f9d56998b10668dd9e)

[2]
[https://github.com/freebsd/freebsd/commit/fc2be30b217171175b...](https://github.com/freebsd/freebsd/commit/fc2be30b217171175bae209edb4f697cc0cb542d)

------
vesinisa
This write-up is from over two years ago (Feb 2018). I was curious if the
issue has since been fixed. Turns out the functions for generating and
checking the SYN cookie values have not changed since, so I guess it's safe to
assume the bug has not been fixed.

------
ay
This particular problem is solvable by having (client_sequence_number+1) to be
part of the HMAC (thus checking it as well), and by storing the MSS in the MSB
part of the generated server sequence number rather than in the LSB.

Then every packet other than the first data packet will be discarded as
invalid, and eventually the client-side retransmits will take care that
everything works properly.

The problem that is impossible to solve, however, is the lost third ACK
(acknowledging SYN-ACK) from the client , if the client doesn’t send any data
to server upon the connect. It’s sufficiently rare in today’s protocols,
though.

Another problem that the above approach will create afresh is that it assumes
that the retransmitted client SYNs will have the same ISN, which isn’t the
same in practice with e.g. some load balancers (who also try hard not to keep
the state). And that behavior is kinda a slightly gray zone in the TCP spec,
IIRC...

Edit: (I wonder if the last paragraph above is the real reason or I missed
something else)

Edit2: oh, thanks to Majromax’s mention of the DJB’s write-up, the above has
the problem of not complying with “sequence numbers increasing slowly”, and
indeed brings up a real-world scenario where that approach was an issue -
using rcp/rlogin protocols, which reused a very narrow range of source ports,
so the 5-tuple reuse was common.

~~~
toast0
> Another problem that the above approach will create afresh is that it
> assumes that the retransmitted client SYNs will have the same ISN, which
> isn’t the same in practice with e.g. some load balancers (who also try hard
> not to keep the state). And that behavior is kinda a slightly gray zone in
> the TCP spec, IIRC...

I don't think this is a big problem with SYN cookies. If you get a SYN with
initial sequence X, you send an appropriate SYN+ACK, and if you get a
retransmitted SYN (because the other end didn't get your SYN+ACK), you send a
new SYN+ACK appropriate for that one. If you then get an ACK for either, you
would form a full connection; which should work fine.

I would have to review the RFCS, they might say that if you had room in your
syncache to hold the data, you should send a RST to the second SYN or the
first SYN, because the states are conflicting; but since you don't have the
information you don't have the information.

Anyway, unless the client end is _really_ messed up, it shouldn't send both
the ACK on the first SYN, because it received your SYN+ACK and a new SYN,
because it didn't receive your SYN+ACK. I acknowledge that there are plenty of
really messed up TCP stacks on the internet though :)

~~~
a1369209993
> Anyway, unless the client end is _really_ messed up, it shouldn't send both
> the ACK on the first SYN

This is a race condition; hypothetic sequence of events:

    
    
      send SYN-0, wait for reply or timeout
      timer interrupt fires
      timeout to resend SYN(-1) is ready, start running that
      packet interrupt fires (interrupts resend)
      got SYN+ACK-0, construct and send ACK-0
      iret
      finish constructing SYN-1 and send it
      iret again
    

This is clearly a bug, but it could easily work >99.99% of the time
(especially if the timeout is high enough that normal RTTs never hit it, which
is probably how the person setting the timeout would try to set it).

~~~
toast0
Yeah, I guess someone could write that, and if it worked mostly it would get
shipped. Like you said in another post, ugh.

Overall, I'd rank that like a 3 out of 10 on the scale of tcp bugs in the
wild.

------
edwintorok
I've noticed some issues a while ago with TCP syn cookies breaking DLM cluster
setup of 64 hosts. (63 hosts all trying to join the same cluster generate
enough TCP traffic that the kernel thinks it is a flood and starts sending syn
cookies, but then the DLM join of some hosts doesn't actually complete). This
can be worked around by increasing the backlog (one of the mitigations listed
in this article):

    
    
      --- a/fs/dlm/lowcomms.c
      +++ b/fs/dlm/lowcomms.c
      @@ -1209,7 +1209,7 @@ static struct socket   *tcp_create_listen_sock(struct connection *con,
       log_print("Set keepalive failed: %d", result);
      }
     
      - result = sock->ops->listen(sock, 5);
      + result = sock->ops->listen(sock, 128);
    

This is mostly a theoretical issue, because upstream RHEL used to support only
up to 16 hosts in a cluster.

------
p4bl0
Awesome write-up, thanks for sharing.

Network is fascinating. I initially introduced myself to the subject by
reading Michal Zalewski's _Silence on the wire_ [1]. I really recommend it to
anyone who wants to do the same.

[1] [https://nostarch.com/silence.htm](https://nostarch.com/silence.htm)

------
Majromax
It's interesting to look at this from the perspective of HMAC codes: the host
uses (several bits of) the sequence number as an authentication code over
other aspects of the connection. The client IP and ports are elsewhere in the
packet, so the sequence number also needs to carry data for the timestamp and
maximum segment size.

From that security-based perspective, this bug seems to belong in a common
category of data escaping the hash -- here where the (sequence number + MSS
category) sum has hash collisions.

------
barbegal
In my uneducated mind, it seems like the connection should only be accepted if
the initial handshake ACK packet is completely empty.

------
rubatuga
Well what's the mitigation for Linux? Did they end up disabling the MSS
negotiation?

------
TwoNineFive
Do yourself a favor and disable all javascript and XHR on this website if you
visit.

------
otherjason
(2018)

------
londons_explore
This bug really just falls in with "You didn't encrypt and authenticate your
data, so anything could have happened".

Sure, this time it's a software design bug in the endpoints, but next time it
might be a cosmic ray, or an evil middleman, or a buggy proxy. If data isn't
encrypted and authenticated, then you shouldn't care what form it arrives in.

~~~
cesarb
Nothing in this post says that the data wasn't authenticated; in fact, the
symptom they saw (the client is kicked off because the server doesn't
understand the message) is exactly what would happen if the data _was_
authenticated.

~~~
Majromax
The data's authenticated, but the authentication is accidentally broken.
SYN(seq_num) and SYN(seq_num+3) can both generate the same cookie using a
different (but still valid) maximum segment size.

To lump this into an existing category of bugs, syn cookies are a kind of
HMAC, only the implementation is custom and nonstandard. It isn't a surprise
that a bespoke HMAC leaks, but to the credit of kernel developers syn cookies
the initial 1996 specification pre-dates the common understanding. (But to its
demerit, it looks like the DJB spec
([http://cr.yp.to/syncookies.html](http://cr.yp.to/syncookies.html)) would
have not had this issue, since the MSS was encoded in the _top_ bits of the
cookie and not the _bottom_ bits.)

