I guess my point is, don't skim the article and think it's not for you because you don't know Clojure. Here's a list of languages it supports ( http://zguide.zeromq.org/page:all#Ask-and-Ye-Shall-Receive ):
C++ | C# | Clojure | CL | Delphi | Erlang | F# | Felix | Go | Haskell | Haxe | Java | Lua | Node.js | Objective-C | Perl | PHP | Python | Q | Racket | Ruby | Scala | Tcl | Ada | Basic | ooc
I attempted to use ATS features to make sure resources are cleaned up and msg data usage is statically checked for correct sizes, etc.
I've recently started to migrate my ZeroMQ/JNI to pure JeroMQ and I have a few simple components in production already.
If you are fine with having the whole message in memory before parsing (or sending) it (Http is not that bad when it comes to transferring huge documents over the network), writing raw MessagePack document to a regular TCP stream (or tucking it inside a UDP datagram) will do the trick just fine. MessagePack library does support parsing streams -- see e.g. its Python example in its homepage (http://msgpack.org).
Disclosure: I'm just a happy MessagePack (and sometimes ZeroMQ) user. I work on Spyne (http://spyne.io) so I just have experience with some of the most popular protocols out there.
The kinds of solutions, changes, and re-thinking it has allowed us to do with regards to our architecture have just been tremendous. For example, if your system is latency sensitive you can move a lot of your more intensive logging to dedicated nodes. All they have to do is listen on the specific multicast port. This allows one to reduce latency in the critical path because you don't have to spend time logging inputs and outputs which ultimately ends up as some form of disk write or other I/O. Obviously you can't get rid of all of the logging/disk writes that your system must perform to maintain durability and other availability guarantees that are required, but for most things I've noticed that the requirements aren't always that stringent.
What's even cooler about multicast is that there is nothing preventing you from re-implementing your existing point-to-point, sequential, guaranteed message delivery protocols on top of it. A reasonable question to ask would be why would any sane person person want to do this?
Because most of the time you know which sorts of requests are going to result in calls to a manageable and finite number of which services. When using point to point protocols you've essentially painted yourself into the proverbial corner because now somewhere in your architecture you're forced to write fan-ins and fan-outs at the application layer that accumulate the results of queries to various services you utilize. These things are difficult to implement, frequently a source of headache, and difficult to reason about in the event of a failure. Why not use a transport protocol that gives you all of that for free?
The real difficulty here is that in order to properly utilize one-to-many communication internally you really have to have it on your mind since it will require you to structure your software in a very specific way. It's certainly not a silver bullet, and won't cure all that ails you. But now that we've been using it for so long I'm constantly coming up with solutions or ideas that are only possible because we utilize multicast as a transport that would be otherwise too costly or tedious to implement using TCP/IP.
Using "fire and forget" multicast for logging and perception sounds interesting. And modelling systems in terms of perception sounds like it's generally a good idea. For truly "perceptive" systems you shouldn't coordinate the observers, since the death of a single observer should be irrelevant. If all your RPCs are idempotent (such as using "PUT" with a client-generated UUID instead of "POST" with no id) this should work well. You could also use vector clocks for eventual consistency. But I suppose one should also make sure to not re-implement dynamo. Or?
ZeroMQ can do multicast using PGM -- http://api.zeromq.org/2-1:zmq-pgm
See "Building a Modern Web Stack"...
AirBnB Tech Talk: https://www.youtube.com/watch?v=ZxfEcqJ4MOM
Haskell programs are generally thread-safe by construction. It's very easy to overlook thread-safety in foreign libraries, but this isn't just a problem in Haskell, multi-threaded code is the default is a number of realms.
I now try to avoid ZMQ where I can. Modern libraries should be thread-safe.
It's still young, but it grew out of some frustrations with using ZeroMQ in large-scale deployments.
You might argue that in this case having two sockets, a PUSH and a PULL might have been better, but the nature of the communication was very much between two processes and I didn't want to weaken that.
Coordination can be expensive and certainly, actors or some sort of channel-based model (which is my preference), have their place, but when you have light-weight communication and coordination primitives, it's not a given.
That would force me into an asynchronous style, but in my case, I was able to convey intent much better using threads.
If ZeroMQ transported anything besides binary blobs, it would be marshaling objects or enforcing some kind of encoding to your data, which would dictate much more about your program. Are you upset that you had to use some Haskell.Vector to void* to Haskell.Vector code?
Are threads in Haskell not asynchronous by nature? Do you get synchronization for free somehow?
I know that in Erlang, you do pass data by exchanging blobs, but the difference is that Erlang has per Erlang-process heaps. Haskell has one heap, so there is no reason to do this.
It comes down to this, the way that you communicate between threads in the same process is different than the way that you communicated between processes.
> Are threads in Haskell not asynchronous by nature? Do you get synchronization for free somehow?
I'm not talking about what's happening underneath. I'm talking about the style that code is written in. I suppose, I could use a coroutine style monad over ZMQ's asynchronous API and create pseudo-threads, but they wouldn't get pre-empted, so you'd have to be careful to ensure that they didn't stave.
And 0mq is for inter-process communication. While you are running a server with many threads in one process, you usually don't need it.
Since you have Haskell and Actors, but worrying thread-safety of some other libraries. You can consider a single-threaded/multi-process deploy.
By setting the thread pool to 1, Actors in a single process can still handle tens of thousands of concurrent connections, while there is surely no thread-safety problem.
- Explicit contracts through the IDL. I despise implicit contracts in service oriented code.
- Code generation for many languages.
- Server and client interfaces that don't require as much ceremony as HTTP.
- Fast binary protocol
- Sometimes the generated code isn't exactly what you want (can feel a bit Java centric)
- Binary protocol isn't human readable, can be harder to debug.
- Stream oriented calls aren't very feasible without some sort of home-grown chunking solution. This is why tools like Cassandra tell you that the whole request must be able to be held in memory.
Thrift is definitely worth a look if you're shopping for an RPC tool: http://diwakergupta.github.io/thrift-missing-guide/
(my previous company gave up on thrift when we discovered it couldn't do recursive data structures (i.e. trees), and made our own interface -> service layer on top of protocol buffers instead).
In a nutshell if the server goes away while a client is in recv the client will never know and waits forever.
Also note that most of the time PUB/SUB and PUSH/PULL are not a good idea either. The same results can be usually be achieved by returning a stream on top of ROUTER/DEALER (this is what zerorpc does). The performance gains of custom topologies are great in theory, but in a typical modern web or mobile stack, they are not worth the extra effort and lack of flexibility. The single best change we made to dotCloud's architecture was move away from custom topologies and stick to DEALER/ROUTER.
I presume there must be some upside, but I never saw it.
More details here: http://www.zeromq.org/whitepapers:brokerless
CurveZMQ was recently announced by Pieter Hintjens (one of the 0mq contributors) on his blog:
My only source for this is the wikipedia article  though, do you have some additional info about this?
I just remember seeing anecdotal comments about it not working well, but never really dug into whether that was sound.
Can't say I'd recommend it, but it can be done, depending on the needs of your project.
It's still pre-alpha but has a ZeroMQ compat layer already.
(Hopefully I managed to not make this seem like flaming, I'm genuinely curious.)
In other words, it can be considered a premature optimization
We do, however, use ZeroMQ for other stuff. But I'll always reach for HTTP for an API, unless there's a need to do otherwise. It's super simple to get running, interface and debug with. Every backend engineer knows how to use curl.
Do you have any recommendations for decent HTTP clients?
We're still shifting services over HTTP off a mix of Apache and IIS boxes. We can do 80 million service requests a day without hitting 5% of our capacity with commodity hardware.
These service requests are extremely complicated calculation and scoring algorithms as well so not some half-arsed CRUD API either.
We just do it in C# and C++ rather than Python/Ruby etc as that's where we gain the performance advantage.
A ZeroMQ proxy using a ROUTER/DEALER pair with a bunch of REP sockets in the background.
The clients use a simple REQ socket.
All in plain old C using ZeroMQ and jansson for JSON while conforming to the json-rpc 2.0 spec.
What are the advantadges of ZeroMQ when you're still on one computer?
The guide is a fascinating read. Even if you never write anything using ZeroMQ, it's a very useful intro to designing concurrent message passing systems.
Anyway, the proposed solution of a full-mesh FE-to-BE heartbeat network over UDP, switching to TCP in case of idleness, is not going to scale. That just guarantees that you are going to run out of file descriptors in case of a packet loss event, as everyone upgrades their heartbeat protocol to TCP.
Erlang's messaging protocol, Protocol Buffers, Msgpack and other carefully serializing solutions trying to reduce parsing and communication overheads.
ZeroMQ seems like natural choice for the core of so-called messaging middleware, because, well, it's what it was designed for.)
And, of course, avoiding JVM leads to great reduction of resource wasting.