
When TCP sockets refuse to die - ingve
https://idea.popcount.org/2019-09-20-when-tcp-sockets-refuse-to-die/
======
majke
Then there is a discussion about forcefully killing sockets :)

* close(): socket will be lingering in background as usual

* shutdown(SHUT_RD): no network side effect, discards read buffer

* shutdown(SHUT_WR): equivalent to FIN SO_LINGER socket - if timeout non-zero blocks until write buffer flushed; if timeout is zero then immediately sends RST

* the trick with close() after TCP_REPAIR: ([https://lwn.net/Articles/495304/](https://lwn.net/Articles/495304/)) immediately discard a socket with no network side effects.

* "ss --kill" command: forcefully close a socket from outside process, done with netlink SOCK_DESTROY command.

~~~
nly
> * shutdown(SHUT_RD): no network side effect, discards read buffer

My understanding is that if the read buffer is not empty, or if you later
receive any further data from the other end, that this will result in a RST.

Wrt to linger behaviour: "if timeout non-zero blocks until write buffer
flushed" is only true of blocking sockets. For non-blocking sockets things get
complicated and vary across platforms

~~~
majke
I made an attempt to check:

* shutdown(SHUT_RD): seem not to have _any_ side effects. you can totally still recv() on that socket. Kerrisk writes 61.6.6: "However if the peer application subsequently writes data on its socket, then it is still possible to read that data on the local socket". Basically, SHUT_RD makes recv() return 0. That's all it does.

* SO_LINGER on O_NONBLOCK: shutdown() doesn't block. close() still blocks.

~~~
wruza
This highlights few more details on SHUT_RD:
[https://books.google.com/books?id=ptSC4LpwGA0C&pg=PA173&lpg=...](https://books.google.com/books?id=ptSC4LpwGA0C&pg=PA173&lpg=PA173&dq=posix+shutdown+shut_rd&source=bl&ots=Kt6GNjalMm&sig=ACfU3U08rZm-
krUqsrwl6Vlnk4XfHbDPPw&hl=ru&sa=X&ved=2ahUKEwjdysC5zubkAhVFpYsKHTmACB8Q6AEwCHoECAkQAQ#v=onepage&q=posix%20shutdown%20shut_rd&f=false)

That is not discussed in POSIX at all, I believe, so basically platform-
unaware SHUT_RD is vaguely defined and I wouldn’t even rely on recv()
returning zero in particular.

Edit: changed books domain to .com

~~~
dunkelheit
Ok so I decided to check it empirically. Behavior is indeed platform-
dependent.

Linux: after shutdown(SHUT_RD) all blocked recv() calls unblock and return 0.
But the other side can still send data and the recv() call will still read it!
It is just that after shutdown when there is nothing to read a recv() call
immediately returns 0 instead of blocking.

macOS (and BSD, I presume?): The read buffer is discarded and all subsequent
recv() calls return 0. If the other side sends data it is discarded.

Unfortunately I have no Windows machine around to try out.

Now, maybe someone can clarify, given such wildly different behavior what is
the intended use case for shutdown(SHUT_RD)?

~~~
wruza
It is likely a remnant of non-PF_INET families under SOCK_STREAM. There is no
SHUT_RD in TCP by design.

Sockets are known to be not very standard landscape historically. Best bet
here is to just stick with that “Disables further receive operations” posix
definition and follow it to the letter by not recv’ing anything anymore.

------
imglorp
Side conversation.

I am constantly amazed how a tiny piece of code, the linux (or bsd) TCP stack,
can be a source of mysteries and adventures for decades, even for kernel
experts and industry leaders like CF. The thing has around 11 states and 20
transitions, around 4000 LOC.

Compare this with some of the multi-million LOC, distributed monsters we all
know and love.

[https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/tcp.c?h=v5.3)

~~~
dunkelheit
This is an example of "the settings problem". When a system has even a
moderate number of behaviors controlled by their respective settings the
number of possible interactions rises exponentially. Combine that with
different possible user behaviors and you get a combinatorial explosion of
possibilities with some weird results that are non-obvious even to experts.

~~~
imglorp
That's a good point, plus it's not just (states x transitions), it's also a
whole bunch of hidden state: the other guy's state as well as all the packets
in flight.

------
Dylan16807
If you are going to force a minimum drain rate, please make sure you use a
large enough monitoring period. With the patch in "The curious case of slow
downloads", once 60 seconds have passed it starts checking download speed as
often as every second, which is really aggressive. If you have a slow
connection that's not super-stable, you're still going to get kicked, even if
you're well over the minimum drain rate on average. An average of some kind
over 15-20 seconds would be a lot more appropriate here.

~~~
vinay_ys
Very true. Just today, I had to use an old 3G phone tethered to my laptop for
data connection and found my ping times to be in orders of 5-10 seconds and at
sporadic intervals in between data was getting sent/received in bursts at much
lower latencies. It wasn't fun trying to get work done on such a connection.

------
vinay_ys
What's really weird is that TCP is an endpoint protocol and should have been
relatively easier to upgrade/replace/change (relative to say, IP protocol).

But why haven't we moved to something better?

Say, why doesn't Apple use a better suited protocol between Apple devices and
Apple servers? Why doesn't google use a better protocol between Google devices
and Google servers (oh wait, they do – QUIC..which is something-other-than-TCP
over IP).

More people should be doing this, yes? why not?

It is as if the layered architecture of the network isn't being taken
advantage of by engineers.

As people build 4G-5G networks that are IP based, shouldn't we insist they
build purely IP based and not peek into layers above and make assumptions?
thereby enable more of the flow control and reliable transmission protocol
experimentations?

~~~
ronsor
1\. You don't want to reimplement a quarter of the network stack in userspace.

2\. Network infrastructure may drop "unusual" packets.

~~~
Dylan16807
> 1\. You don't want to reimplement a quarter of the network stack in
> userspace.

That post doesn't say anything about user space. Ideally the new protocol
would be in the kernel, triggered with just a flag or even automatically.

------
gothroach
Thought this article was interesting as I've been working with a piece of
hardware lately that doesn't close STMP connections after sending mail. Took
me a while to figure that out, it would always send email properly the first
time but then wouldn't be able to again until after a reboot. Turns out it
doesn't close TCP sockets created while sending mail, unless you jump through
some hoops. Such is the world of embedded industrial devices, unfortunately.

------
tuukkah
TLDR: TCP_USER_TIMEOUT is an important setting but it's somewhat tricky, not
properly documented and there are kernel bugs related to it.

------
ausjke
This is indeed very problematic, I worked on ONVIF test suite 3 years ago, in
some failing test cases, the tcp socket can never die in time, it failed the
certification as a whole as all the following unit test cases can not
continue. All those immediately-kill-tcp-socket or socket-port-reuse can not
help, at least not reliably.

------
Psype
System Of A Down intensifies

