

ZeroRPC - m0th87
https://github.com/dotcloud/zerorpc-python

======
Loic
> If you want to connect to multiple remote servers for high availability
> purposes, you insert something like HAProxy in the middle.

On our PaaS[1] we are running ZMQ everywhere and you do not need HAProxy in
the middle to get high availability, you do it directly with the right ZMQ
devices depending on your requirements. HAProxy is another piece of
infrastructure to maintain where you can get HA with the majordomo pattern
using several brokers and retrying the requests etc. Check the ZMQ Guide[2],
you have nearly everything nicely explained there. So this comment just ring a
"warning" for me, the system looks really interesting, but are the ZMQ
primitives well enough understood?

[1]: <http://notes.ceondo.com/mongrel2-zmq-paas/> [2]:
<http://zguide.zeromq.org/page:all>

Update: Missing some parts of the comment, stupid me.

~~~
shykes
Hi Loic, thanks for the note and it's cool to follow your work from afar. It
looks like we get excited about the same stuff :) Disclaimer: I co-founded
dotCloud and have the distinctive honor of being the least knowledgeable about
zeromq in the entire team.

Your comment might have been true 18 months ago - when we first started using
zerorpc in production at dotCloud. Since then, we have deployed and scaled
hundreds of thousands of applications and I shudder just to imagine how many
billions of zeromq messages we have emitted and processed. Believe me we have
been through the zeromq guide many times and have experimented with - and
abused - many patterns (including majordomo which, as you fail to mention, is
not supported out of the box by zeromq and requires a fair amount of custom
code of its own).

I'm sure zerorpc has many flaws and I know the team looks forward to many
constructive debates and - hopefully - patches. But lack of understanding of
the zeromq fundamentals, or lack of real-world usage, are 2 things you
definitely don't need to worry about :)

~~~
Loic
Hi Solomon, thank you for your nice comments. Bombela very nicely explained
the whys HAProxy, with his explanations everything fall in place nicely (and I
must say, I will test drive HAProxy with ZMQ).

------
KenCochrane
Here is a video about it from this years Pycon.

<http://youtu.be/9G6-GksU7Ko>

------
izak30
I was apparently working on this concurrently (and much more specifically, not
for general use) as dotcloud. I'm really glad they released it. We've seen
great performance characteristics and very easy development with
zeromq+python+gevent. I chose to use gevent_zeromq package rather than write
our own, but it's very similar here.

I'm really looking forward to using this next time.

~~~
calloc
I've had quite a few issues with the gevent_zeromq package not scaling.
Especially when you start dealing with over 50 concurrent requests I was
seeing issues whereby something would go haywire with gevent_zeromq and it
would hang in the ZeroMQ send() function blocking everything else. This was
just about 500 clients connected to a single service all making requests as
required.

~~~
bombela
There is a bug when using the edge-triggered fd from a zmq socket. I am not
sure if it's fixed upstream yet or not. See here for an ugly workaround:
[https://github.com/dotcloud/zerorpc-
python/blob/master/zeror...](https://github.com/dotcloud/zerorpc-
python/blob/master/zerorpc/gevent_zmq.py#L108)

~~~
calloc
Upstream as in gevent_zeromq or in ZeroMQ itself? I haven't found this issue
yet in our C++ written one which uses libev for event handling from ZeroMQ...

Also, this looks to be a fix in recv(), I am having issues in send() hanging
randomly blocking the entire process. I ended up using a with timeout block
around it so if send blocked it would eventually get back to me...

    
    
      sent = "WAITING"
      with gevent.Timeout(0.5, False):
          sent = self.socket.send_multipart(tosend)
      
      if sent is "WAITING":
          print "__incoming_consumer: Timeout fired"
          # We are going to try again
      
          with gevent.Timeout(2, False):
              sent = self.socket.send_multipart(tosend)
      
          if sent is "WAITING":
              print "__incoming_consumer: Timeout 2 fired"
              continue
      
      gevent.hub.sleep(0) # Yield to other gevent's, we can be fast and never let up ...
    

This fixed it for a little while, but even then every so often it would hang,
and it was causing us to have to restart our frontend processes (accepts
incoming connections for processing) so we decided it was worth the time and
effort to re-write it in C++ with libev as our event handling mechanism. So
far we have put it under more load but have not had any lockups or failures.

------
nivertech
You just reinvented (sort of) Erlang's erl_call [1] in Python:

Starts an Erlang node and calls erlang:time/0.

    
    
        erl_call -s -a 'erlang time' -n madonna
        {18,27,34}
    

[1] <http://www.erlang.org/doc/man/erl_call.html>

~~~
tbatterii
except it's in python. :)

~~~
rdtsc
I think it is rather typical pattern. You see something in another
language/platform, so you copy it to your current one, then keep doing it.
However after a while you just have to ask yourself, why am I not using this
other technology instead of spending time re-implementing it.

So after copying, say supervision trees, RPC mechanisms, distributed system
management, actor-based approach, one can ask "wait, am I not just using
Erlang then".

~~~
lloeki
> why am I not using this other technology instead of spending time re-
> implementing it

Because this other technology/platform may lack something that the current
platform has. Or you have constraints that lead you to usage of the first
platform.

Maybe this other platform could take a hint or two about stuff on the
"copying" platform too, so that things go full circle and it does not stay up
some ivory tower.

------
alexmic
We've done something similar here at EDITD but not as complete:

(1) The original: <https://github.com/geoffwatts/zmqrpc> (2) A rewrite I am
working on: <https://github.com/alexmic/zmqrpc>

------
encoderer
A lightweight, python-only Thrift alternative. I like it.

Thrift is great but it's not uncommon in some of our simpler services for
Thrift to be the CPU bottleneck. Especially when we're using Cassandra as the
data store. You've got our front-end code talking to a service using Thrift,
and then the service talking to Cassandra using Thrift, and each thrift call
has a serialize/deserialize process on each end.

Nice work dotcloud. Thanks for the free stuff!

------
ChuckMcM
Oh this looks very very cool. As a person who runs a bunch of machines I can
see several uses for it, not the least of which is monitoring diagnostics.

------
makmanalp
This is awesome! This saves craploads of trouble in terms of actually parsing
messages and interpreting them as functions. Instead I can have an implicitly
rigid and safe server / client hop. This makes it way easier to set up a set
of daemons talking to eachother in the backend of a web app.

------
calloc
Where I work we are doing something similar more by hand though in that we are
using ZeroMQ with protobuf.

------
espeed
Is there a ZeroRPC-Java or Jython interface so you can call JVM methods from
Python?

~~~
KenCochrane
Not yet, but feel free to write it and submit it, I'm sure a lot of people
will find it very useful.

------
DevX101
Can someone provide examples of where this would be useful?

~~~
izak30
Non-HTTP apis for internal use. Maybe you want a single-point ID generator,
maybe you have some sweet internal authentication method, or session server.

~~~
rdtsc
Obvious extension idea then -- tap this into a sockjs server and extend it all
the way to the client.

~~~
hogu
<https://github.com/hhuuggoo/ZmqWebBridge> is my project which does something
similar

<https://github.com/progrium/nullmq> is another which is more full featured
but I haven't had a chance to look at it yet.

