A few remarks:
1) It's worth to try Redis 2.6 against this. It is possible that it will perform better or worse, not sure, but more probably better.
2) Believe it or not Redis Pub/Sub was never tuned for speed so far, nor profiled / optimized, because as far as I can tell nobody asked for more performances given that with the order of magnitude we can see with both Redis and ZMQ, it is pretty hard to hit the wall. However there are demanding applications, so probably it's worth doing it.
3) Maybe ZMQ only uses one core as well, otherwise to have an absolutely fair comparison, N Redis nodes should be used simultaneously. Pub/Sub is the kind of application where sharding sometimes it is really really easy, just by channel. In general with Redis you have three options to go distributed with Pub/Sub.
Option A) Have N nodes and shard by channel.
Option B) Use replication, as it also does PUBLISH of messages on slaves.
Option C) Use Redis Cluster, but currently it is in alpha. However it already does message propagation across all the cluster so it is very easy to implement a reliable HA Pub/Sub system with it. However currently the propagation is not smart, every message is propagated to every node, however in Redis the cost of Pub/Sub is proportional to the number of receivers, so this is usually not a big issue, but we'll improve this aspect in the future anyway.
Also, both kqueue and select were slightly buggy in OSX: http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#OS_X_AN...
Not sure if that's still the case.
Would be interesting to see how these benchmark on a Linux or FreeBSD machine.
- For up to 4 clients, (buffered) redis is better than 0MQ in Python but worse in Go.
- For more than 4 clients it's exactly the opposite: redis is worse in Python but better in Go.
I'd be interested to hear an explanation for this, even if it turns out that a graph line was mislabeled :)
There still seems to be some work to do to improving the Golang bindings: https://github.com/alecthomas/gozmq#caveats
I would be curious to see what would happen if gevent were added to the Python code.
Also as njharman mentioned, you're still running a separate broker with zmq, so the number of components doesn't change.