Hacker News new | past | comments | ask | show | jobs | submit login

It looks like there's still no way for applications to detect when ZeroMQ encounters common networking problems. If my application can't differentiate between "no response received" and "network error", I'll end up re-implementing timeouts and error detection logic in the application protocol. In most situations that's a waste of time and adds extra weight to the system. No thanks.



In most cases 0MQ will recover silently (and usefully) from common networking problems. When a peer crashes, for example, and then comes back, its partners don't see the problem. Messages get queued, and then delivered. This works for the main socket types and transports (but not PAIR and inproc:)

This lets us do things like start clients and THEN start a server... the clients automatically connect when the server comes along. The server can go away, be replaced by another on the same endpoint, and clients will gracefully and invisibly start talking to the new server.

In some cases this is precisely what we want, in other cases it's not. If we need to detect dead peers, we add heartbeating as a message flow on top of 0MQ. Most larger 0MQ applications currently do this. Eventually 0MQ sockets may offer heartbeating, it seems a natural evolution. ("Seems" but is not necessarily.)

Additionally there are some patterns (like synchronous request-reply) that simply don't handle network issues properly. If your service dies while processing a request, your client if it does a blocking read will hang forever. There are work-arounds such as using non-blocking reads.

It would be a mistake to try to solve all challenges at once in any project. 0MQ is taking it step by step, starting with the core questions of how to design massively-scalable patterns correctly. Just doing that is worth gold, as you can see from people actually using 0MQ, who consistently tell us, "this makes our life orders of magnitude easier, thank you".

And as we solve the core problems properly, we'll continue to address other aspects, either in 0MQ core or in layers on top of that, and this process will continue ad infinitum.


That's the nature of networking. You can never differentiate between "network error" and "no response received". TCP in no better. You'll have accept that or keep with a single box.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: