iptables -t raw -I PREROUTING -i eth0 -p tcp -m tcp --tcp-flags FIN,SYN,RST,ACK SYN -m tcpmss ! --mss 640:65535 -j DROP
iptables -L -n -v -t raw | grep mss
84719 3392K DROP tcp -- eth0 * 0.0.0.0/0 0.0.0.0/0 tcp flags:0x17/0x02 tcpmss match !640:65535
For everything else at work, we are behind a layer 7 load balancer that is not vulnerable.
You may also find it useful to block fragmented packets. I've done this for years and never had an issue:
iptables -t raw -I PREROUTING -i eth0 -f -j DROP
To see the counters increase, here is a quick and silly cron job that will show you the MSS DROPs in the last minute, that I will disable after a couple days: 
 - https://tinyvpn.org/up/mss/
iptables -I INPUT -p tcp --tcp-flags SYN SYN -m tcpmss --mss 1:500 -j DROP
Am I understanding this correctly, and if so, do you have a thought about why they suggest this?
net.ipv4.route.min_adv_mss = 256
net.ipv4.tcp_base_mss = 512
net.ipv6.route.min_adv_mss = 1220
here  gives example of ... is your just inverting/negating the DROP rule ?
>iptables -A INPUT -p tcp -m tcpmss --mss 1:500 -j DROP
So for the raw table, it would be
iptables -t raw -I PREROUTING -i eth0 -p tcp -m tcp --tcp-flags FIN,SYN,RST,ACK SYN -m tcpmss --mss 1:500 -j DROP
iptables -t filter -I INPUT -i eth0 -p tcp -m tcp --tcp-flags FIN,SYN,RST,ACK SYN -m tcpmss --mss 1:500 -j DROP
The reason I use the opposite method is that we not the normal range we want. Programs can also set super high values or not set mss at all (which is not the same as 0).
I explicitly set the interface, so that we don't match interfaces such as lo, tun, tap, vhost, veth, etc... because you never know what weird behavior some program depends on. In my example, eth0 is directly on the internet. In your systems, that might be bond0.
This goes in the mangle table. DO NOT use this example unless you know for sure what you are doing.
iptables -t mangle -I POSTROUTING -o eth0 -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -m tcpmss --mss 1:100 -j TCPMSS --set-mss 1360
This may not even help, as the packet has already been generated and we are too late in the process. I just figure someone might ask. There are probably use cases where this may help (for proxies, edge firewalls, hypervisors, docker hosts, maybe)
Or just log and drop the connections, or send yourself (your app) a tcp-reset.
MSS is a TCP parameter, however, and operates at layer 4. Won’t matter if the protocol underneath is IPv4 or IPv6 in this case.
From https://en.m.wikipedia.org/wiki/Maximum_segment_size :
"To avoid fragmentation in the IP layer, a host must specify the maximum segment size as equal to the largest IP datagram that the host can handle minus the IP and TCP header sizes. Therefore, IPv4 hosts are required to be able to handle an MSS of 536 octets (= 576 - 20 - 20) and IPv6 hosts are required to be able to handle an MSS of 1220 octets (= 1280 - 40 - 20)."
So it seems like one needs to clamp tcpv4 to at least 536, and tcpv6 to at least 1220. (tcpv6 is common shorthand for TCP over IPv6, similar to udpv6 and icmpv6)
iptables -t mangle -I PREROUTING -p tcp --tcp-flags SYN SYN -m tcpmss --mss 1:500 -j TCPOPTSTRIP --strip-options mss
I experimented with a host that sends out mss 216 and the communication was still ok with the above, but not while dropping the traffic.
I just assume that stripping MSS should be enough, looking through the information that is available.
 - https://wiki.nftables.org/wiki-nftables/index.php/Mangle_TCP...
FYI if your instances are behind an Application Load Balancer or Classic Load Balancer then they are protected, but NOT if they are behind a Network Load Balancer.
A patched kernel is available for Amazon Linux 1 and 2, so you won't have to disable SACK. You can run "sudo yum update kernel" to get it, but of course you have to reboot. Updated AMIs are also available.
Amazon Linux 1: https://alas.aws.amazon.com/ALAS-2019-1222.html
Amazon Linux 2: https://alas.aws.amazon.com/AL2/ALAS-2019-1222.html
For Amazon Linux 2 the fixed kernel is kernel-4.14.123-111.109.amzn2. Looking at my instances, it look like I have been on that version since Friday.
As each direction of a TCP connection has its own MSS, it would make sense that an attacker's server could exploit this.
I remember the original ping of death back in the 90s https://web.archive.org/web/19981206105844/http://www.sophis...
I do wonder though, can anyone guess what kind of impact one might see with TCP SACK disabled? We don't have large amounts of traffic and serve mostly websites. Maybe mobile phone traffic might be a bit worse off if the connection is bad?
In my personal AWS instance from the last few days less than half a percent of the traffic had hit the firewall rule to log the error.
Most of that traffic seemed to come from the China, this was possibly port probing / portscans or really old hardware accessing my the server.
I would say that the iptables rule is a 'better' solution than dropping sack as you may find you use significantly more CPU/bandwidth when dealing with retransmits when not using selective acknowledgements.
I have a personal Digital Ocean (not my employer) instance that is frequently being probed for stuff (primarily Russian and Chinese IPs). Same old, same old.
I've been running with the rule for around a week just logging & dropping small MSS packets out of curiosity, but hardly seen anything worth writing home about. I was somewhat surprised. I'm curious to see how long it takes for that rule to go nuts (my shellshock rule still triggers from time to time, that had a definite curve of action)
More and more are moving away from $0.25 microcontrollers, and up to $5 SoC's running Linux, so the problem is going away gradually...
YOUR_RULE -m limit --limit 2/sec -j LOG --log-prefix="MALFORMED_MSS: " --log-ip-options --log-tcp-options --log-level 7
The bigger impact will be for users far away, with increased risk of packet loss and higher latency.
It's too easy to drop packets with very low MSS and, unless you've got specific needs (someone mentioned IOT), there's no reason to not drop packets with MSS < 536 or so. I believe Window's smallest MTU (MSS + IP and TCP headers) size is 576 bytes for example.
Multiple vulnerabilities in the SACK functionality in (1) tcp_input.c and (2) tcp_usrreq.c OpenBSD 3.5 and 3.6 allow remote attackers to cause a denial of service (memory exhaustion or system crash).
An alternate option would be to put the device behind another firewall or load balancer or proxy that you know is not vulnerable.
pending a patch simply disable SACK:
~$ echo 0 > /proc/sys/net/ipv4/tcp_sack
and/or disable segmentation offloading:
~$ ethtool -K eth? tso off
TCP and Checksum offloading still aren't super standard on customer grade NICs or virtual machines. I'd assume less than half of the internet's linux hosts are actually at risk.
I thought VMware shipped that at least decade ago — is there some specific sub-feature you had in mind? Similarly, at least Apple's consumer hardware had checksum offloading back in the early 2000s and segmentation support shipped in 10.6 (2009) so it seems like it should be relatively mainstream since they tended to use commodity NIC hardware.
As to the specific subset; TCP Segmentation Offload. As was mentioned in the article.
Yes, I know. I was asking for clarification on the off chance that you were describing something which didn’t ship a decade ago. I first used TSO on servers in the early 2000s and by 2010 even the consumer-grade hardware I was seeing had it.
"When Segmentation offload is on and SACK mechanism is also enabled, due to packet loss and selective retransmission of some packets, SKB could end up holding multiple packets, counted by ‘tcp_gso_segs’."
Segmentation offload in linux is dependent on checksum offloads per here:
$ ethtool -k eth0 | grep tcp-seg
Also on the virtualization side, VMWare VMXNet adapters also support offloading for guests.
It's a little bit more involved than a ping of death, but still, relatively easy to exploit.
5.0 is EOL as of 5.0.21.
My favorite of that era was simply the working-as-designed simplicity of sneaking the Hayes modem hangup sequence into various protocols: actual Hayes modems used +++ with a time-delay to send commands such as ATH0 (hangup) but everyone else skipped that time-delay in an attempt to avoid the patent so you could disconnect any modem-connected system if you could figure out how to get it to echo "+++ATH0". Some IP stacks (e.g. Windows 95) would simply send the received ICMP payload as the response so a simple `ping -p …` would do it but people found ways to cause similar problems with sendmail, FTP, etc.
Pop into some random channel, send "/ctcp #channel ping +++ATH0", and wait patiently... a moment or two later you would be rewarded with a flood of "signoff" messages as the users' TCP sessions to the IRC server timed out (by responding to the CTCP, they had, in effect, told their modems to hang up).
The goal, of course, was to get the highest "body count" possible from a single CTCP message.
Smurf attacks, the "ping of death", AOHell, the latest sendmail and wu-ftpd holes of the week, open proxies... the Internet was a very entertaining place for a bored teenager from the midwest back then.
Thanks for the flashback!
This might be fun...
Edit: Don't know if segmentation offloading is on by default in Android, but on my default Arch kernel it is, so I wouldn't know why not.
Description=Disable TCP SACK
ExecStart=/sbin/iptables -A INPUT -p tcp -m tcpmss --mss 1:500 -j DROP
imo, TSO is intel NIC card function, does this affect others like from Cavium CPU?
It's a Linux-specific implementation defect, not an intrinsic problem with the TCP SACK wire protocol or spec.
[ "$(uname -s)" = Linux ] && echo "Vulnerable"
So there's plenty of linux being used out there that's probably not effected.
Good luck out there, folks.
Red Hat / CentOS
https://linux.oracle.com/errata/ELSA-2019-4686.html (RHCK kernel)
https://linux.oracle.com/errata/ELSA-2019-4685.html (UEK5 kernel)
https://linux.oracle.com/errata/ELSA-2019-4684.html (UEK4 kernel)
https://alas.aws.amazon.com/ALAS-2019-1222.html (Linux 1)
https://alas.aws.amazon.com/AL2/ALAS-2019-1222.html (Linux 2)
SUSE / SLES
(please reply with additional vendor links if you have them)
Disclaimer: I work for SUSE
[edit: deleted link, OP has updated]
If you put each link on a line and then leave a newline between each one you should get a nice list though.