Hacker News new | past | comments | ask | show | jobs | submit login

Interesting. I conjecture this is because Python doesn’t gc the socket object immediately. Python is supposed to close file-descriptor like objects when they are gc’d, so I'd really like to better understand what's going on. Thank you for all the work so far. I'll try to figure out why my theory isn't empirically proving true.

Edit: I've just done a simple test with Python 3.9.5 and it definitely closes a socket when the object is GC'd. In one terminal, I did this:

   nc -l 9999
In a second terminal:

    Python 3.9.5 (default, May  3 2021, 19:12:05)
    [Clang 12.0.5 (clang-1205.0.22.9)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import socket
    >>> s = socket.socket()
    >>> s.connect(("", 9999))
    >>> s = None
As soon as `s = None` is executed, `nc` exits. Here's what the full sequence looks like in tcpdump. First the three-way handshake (SYN, ACK, SYN) when the socket is connected:

    13:09:24.448234 IP > Flags [S], seq 2594347262, win 65535, options [mss 16344,nop,wscale 6,nop,nop,TS val 62632264 ecr 0,sackOK,eol], length 0
    13:09:24.448385 IP > Flags [S.], seq 682643237, ack 2594347263, win 65535, options [mss 16344,nop,wscale 6,nop,nop,TS val 3095724505 ecr 62632264,sackOK,eol], length 0
   13:09:24.448410 IP > Flags [.], ack 1, win 6379, options [nop,nop,TS val 62632264 ecr 3095724505], length 0
   13:09:24.448422 IP > Flags [.], ack 1, win 6379, options [nop,nop,TS val 3095724505 ecr 62632264], length 0
Then the active close when `s = None` executes:

    13:09:28.234099 IP > Flags [F.], seq 1, ack 1, win 6379, options [nop,nop,TS val 62636050 ecr 3095724505], length 0
    13:09:28.234178 IP > Flags [.], ack 2, win 6379, options [nop,nop,TS val 3095728291 ecr 62636050], length 0
    13:09:28.234203 IP > Flags [F.], seq 1, ack 2, win 6379, options [nop,nop,TS val 3095728291 ecr 62636050], length 0
   13:09:28.234240 IP > Flags [.], ack 2, win 6379, options [nop,nop,TS val 62636050 ecr 3095728291], length 0
We see the FINs from both sides along with the final ACKs.

So, I don't know what the heck is causing this apparent leak. Just setting `self.sock = None` should be sufficient, modulo the vagaries of when Python GC occurs on the object.

You just need 1 other reference to the socket in code to prevent it from being garbage collected without socket.close().

For example, this won't automatically close the socket:

    >>> s = socket.socket()
    >>> other_sock = s
    >>> s.connect(("", 9999))
    >>> s = None

Ah, of course. Possibly this then:

    def _setup_transport(self):
        # Setup to _write() directly to the socket, and
        # do our own buffered reads.
        self._write = self.sock.sendall
        self._read_buffer = EMPTY_BUFFER
        self._quick_recv = self.sock.recv
(And similarly in SSLTransport.)

Then, the OSError exception raised inside Transport.close also prevents Connection.collect from destroying the Transport object.

Again, I'm not disagreeing with the original PR/fix. My point is only that when the socket object is destroyed, the socket is closed.

I'll bet a try/except around this (Connection.collect) would also fix it:

    def collect(self):
        if self._transport:

Edit: we're discussing this in two places. My apologies. For interested readers:


I'm satisfied I understand what's going on now, and that it was a bug in celery/py-aqmp retaining the socket object due to over-broad exception handling. Python closes sockets as expected when they are released.

Yeah it should probably be setting “self._quick_recv = None” and “self._write = None” when it shuts down the transport. This probably would have fixed the leak too.

I wasn’t seeing any leaky behavior after the fix though, so I think calling “socket.close()” allows GC to still clean up those other references.

Right, at that point the entire Transport object is destroyed by collect.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact