
Adventures in debugging: etcd, HTTP pipelining, and file descriptor leaks - rkday
http://www.projectclearwater.org/adventures-in-debugging-etcd-http-pipelining-and-file-descriptor-leaks/
======
philips
This is some interesting behavior from the blog post:

    
    
        Eventually, Python exits, and closes TCP connection A,
        by sending a FIN packet – but etcd doesn’t send a FIN
        of its own, or send a zero-byte chunk to finish the 200 
        OK, as it does in the success case. This causes the 
        socket to leak on etcd’s side – we think this is a bug
        in Go’s HTTP stack.
    

This really should work so we need to investigate that one more deeply. I am
not opposed to adding the timeout option as the author would like to do and we
would happily consider a patch for this.

 _Update_ : here is a proof-of-concept patch to address this feature request:
[https://github.com/coreos/etcd/pull/3816](https://github.com/coreos/etcd/pull/3816)

Longer term etcd will use gRPC/http 2.0 in the new etcd v3 API[1] to address
use cases like this. In particular gRPC will allow us to have multiple streams
while keeping open a single TLS connection. A major expense of the current
watch API is having to tear down connections as explained in this post. We
will still support http/rest likely through the use of grpc-gateway[2].

The file descriptor rlimit issue that was the side-effect of this issues is
why databases like etcd have to protect themselves. For example a few months
ago etcd learned to reserve file descriptors for itself in case we get clients
hogging up resources[2]. Longer term we would like to be able to let proxies
handle watches directly to further protect the core quorum of etcd.

[1]
[https://github.com/coreos/etcd/blob/master/Documentation/rfc...](https://github.com/coreos/etcd/blob/master/Documentation/rfc/v3api.md)
[2] [https://github.com/gengo/grpc-gateway](https://github.com/gengo/grpc-
gateway) [3]
[https://github.com/coreos/etcd/pull/3219](https://github.com/coreos/etcd/pull/3219)

~~~
mkulke
We have a very nasty issue in kubernetes with it's userspace-proxy leaking
handles, when misbehaving workload doesn't close connections properly (e.g.
Java InputStreams). Could this be related?

~~~
philips
Maybe. The kubernetes 1.0 proxy is a pure TCP proxy so would need more
details. Could you file an issue?

Kubernetes 1.1 will have an iptables based proxy too:
[https://github.com/kubernetes/kubernetes/issues/3760#issueco...](https://github.com/kubernetes/kubernetes/issues/3760#issuecomment-143048131)

~~~
mkulke
There is an open issue in which we came to more or less the same conclusion as
mentioned the article (not a bug, but a feature of the TCP/IP protocol).

i am a bit puzzled why other people are not constantly bitten by this, though.

------
Lukasa
Good spot Rob!

In the past we (urllib3) have been pretty optimistic about assuming
connections are still useful when we hit exceptions, but increasingly that
optimism seems to be hurting us.

For that reason, I've taken your fix, added tests for it, then proposed it to
the main urllib3 repository[0]. Right now the commits are in my name, but I'd
be totally happy for you to make them yourself and open a pull request
containing my commits. Either way, I've also added you to our contributors
list.

Thanks for the write up!

[0]:
[https://github.com/shazow/urllib3/pull/734](https://github.com/shazow/urllib3/pull/734)

------
potatosareok
afaik the cannot identify protocol is normally caused by, as is the case here,
half open TCP connection. Seems bug is caused by a mix of using pooled
connection manager on python side, not familiar enough with go or etcd to look
into it. Seems what author could really use is eventloop on python side, to
deal with having blocking forever to read long polling and also respond to
signals (or however kill handled). Probably some way to wrap
requets(stream=True) w/ it's yield into whatever people use for asyncio in
Python now, and another listener in event loop listening for kill singal?

Curious about what on go side doesn't actually close the tcp connection, I'd
assume channel (assumming it's using channels per watcher) would catch some
exception when socket is closed on python side, maybe just socket is never
closed after exception? Time to try learn some go to see in etcd code base...

~~~
barrkel
_I 'd assume channel (assumming it's using channels per watcher) would catch
some exception when socket is closed on python side, maybe just socket is
never closed after exception?_

Either end of a connection doesn't know if the other end closed it until they
try to do something with the socket, per the BSD socket API. The etcd watch
API, from reading the article, doesn't reply until an event occurs, so if an
event never occurs in time (before a whole bunch more requests come in), it
won't do anything with the socket, and thus won't find out the other end
closed it.

~~~
potatosareok
Huh always something new to learn for noob like me :) Could tweaking any of
the tcp timeout settings on the os level help with this at all?

