> We couldn’t seem to find any way to subscribe to events and publish events without a single point of failure. We looked at everything we could find, and they all seemed to have this bottleneck.
“The Spread toolkit provides a high performance messaging service that is resilient to faults across local and wide area networks. Spread functions as a unified message bus for distributed applications, and provides highly tuned application-level multicast, group communication, and point to point support. Spread services range from reliable messaging to fully ordered messages with delivery guarantees.”
We use this for LAN event communication among standalone servers acting together, and WAN event communication among POPs acting together, for billions of events per month. Any server in a group dies, they elect a new master, so no SPoF.
What was it about your use case that made this feel like a single point of failure or bottleneck?
This sounds like something I would have loved to come across. It didn't come up in any google searches or any conversations with other software people who I talked to about this problem, so I didn't evaluate it. Quite simply, I had no idea it even existed.
You sound very happy with Spread -- no complaints? We had nothing but trouble with it ourselves.
This was Spread 3.x somewhere around 2008, so it may have improved since then, but at that time it just seemed very flaky and slow. This was in a cluster of about 8 nodes, with 100-200 messages/second published by any given node at peak, so it should have been more than adequate.
The worst part was that it was silently losing messages, and there seemed to be no trivial way to recover, get statistics/queue information, or generally determine what was going wrong.
“The Spread toolkit provides a high performance messaging service that is resilient to faults across local and wide area networks. Spread functions as a unified message bus for distributed applications, and provides highly tuned application-level multicast, group communication, and point to point support. Spread services range from reliable messaging to fully ordered messages with delivery guarantees.”
http://www.spread.org/
We use this for LAN event communication among standalone servers acting together, and WAN event communication among POPs acting together, for billions of events per month. Any server in a group dies, they elect a new master, so no SPoF.
What was it about your use case that made this feel like a single point of failure or bottleneck?