I was apparently working on this concurrently (and much more specifically, not f...

calloc · on March 27, 2012

I've had quite a few issues with the gevent_zeromq package not scaling. Especially when you start dealing with over 50 concurrent requests I was seeing issues whereby something would go haywire with gevent_zeromq and it would hang in the ZeroMQ send() function blocking everything else. This was just about 500 clients connected to a single service all making requests as required.

bombela · on March 27, 2012

There is a bug when using the edge-triggered fd from a zmq socket. I am not sure if it's fixed upstream yet or not. See here for an ugly workaround: https://github.com/dotcloud/zerorpc-python/blob/master/zeror...

calloc · on March 27, 2012

Upstream as in gevent_zeromq or in ZeroMQ itself? I haven't found this issue yet in our C++ written one which uses libev for event handling from ZeroMQ...

Also, this looks to be a fix in recv(), I am having issues in send() hanging randomly blocking the entire process. I ended up using a with timeout block around it so if send blocked it would eventually get back to me...

  sent = "WAITING"
  with gevent.Timeout(0.5, False):
      sent = self.socket.send_multipart(tosend)
  
  if sent is "WAITING":
      print "__incoming_consumer: Timeout fired"
      # We are going to try again
  
      with gevent.Timeout(2, False):
          sent = self.socket.send_multipart(tosend)
  
      if sent is "WAITING":
          print "__incoming_consumer: Timeout 2 fired"
          continue
  
  gevent.hub.sleep(0) # Yield to other gevent's, we can be fast and never let up ...

This fixed it for a little while, but even then every so often it would hang, and it was causing us to have to restart our frontend processes (accepts incoming connections for processing) so we decided it was worth the time and effort to re-write it in C++ with libev as our event handling mechanism. So far we have put it under more load but have not had any lockups or failures.

izak30 · on March 27, 2012

Interesting. I'll look into this this afternoon. I'm seeing it consistently handle 2500+ req/sec on my setup, but that's with less than 50 concurrent requests (about 10 concurrent requests via EC2 micro instances...easy to change my testing scripts to 200 I think)

hogu · on March 27, 2012

I'm interested in this - do you have any insights into why this might be happening?

calloc · on March 27, 2012

I have no idea and didn't get the time to do full debugging and or looking into it. We had some more requirements and decided that it would be in our best interest to rewrite it in C++. So far we have gotten at least 4 times the performance that a single Python frontend would get, and it has meant we could remove some load balancers on the frontend and is going to save us money in the long run.