I tried SocketCluster for a highly scalable chat platform and it couldn't handle very big conversations. SocketCluster is good but the bottleneck here was NodeJS.
I finnaly used Nginx PushStream to do the sub and the pub was handled by NodeJS (with the authentication).
It has been used to host chat room with more than 70,000 concurrent clients, scaled on 7 m3.large EC2 instances and one relatively small Redis server.
So -- m3.large ec2 instances have a few problems with them. 1, your only running a 2 core machine (1 worker, and assuming 1 broker to distribute through redis). That is not nearly enough to handle really a real-time chat server where everyone is spamming it all the time. Another issue is the network performance of an m3.large instance -- its moderate (which is another word for not very good).
With 2 c4.8xlarge instances behind ELB, we consistently see between 20 to 50K active connections on live meetings, and the servers sit at < 2% usage. Latency between event cycles is < 0.015ms.
I see a lot of people having problems with this, but the issue is usually the resources they give to it, or the actual handling of socket events. If the socket server is just a relay there should be no reason a single m4.4xlarge instance cant handle 600K+ connections.
> It has been used to host chat room with more than 70,000 concurrent clients, scaled on 7 m3.large EC2 instances and one relatively small Redis server.
Knowing nothing about anything, 70k clients on 7 servers does not sound immediately very impressive. That's roughly 10k clients per server, and this is 15 years after "C10k problem" was coined. Could someone comment if I'm missing something major here?
From personal experience -- switching from socket.io to socketcluster.io for our conferencing platform was a life saver. Its so much more stable & performant its crazy. We also use it to run our webrtc video chat platform which works great as well. Cant say enough good things about this.
Should say that we horizontally scale this thing pretty heavily using the sc-redis module. Elasticache + ELB + 4 EC2 Instances = support for 5000+ person conferences :D
www.talkfusion.com -- in specific, the live meetings & video chat portion use socket cluster as its backer. We generally use c4.8xlarge instances to back the socket infrastructure.
Project 1 is submitted and trending, someone thinks "Oh, I have/have seen something similar, let's submit it while people are thinking about the topic?"
This seems very impressive, but I wonder why we would want to switch to it from socket.io in our use case.
We have built an open source platform that takes care of all the usual stuff you need when building realtime social apps. From the client side caching down to the pubsub, message ordering, security, and pushing updates via socket.io back to the client and updating the caches. It is designed to work in a distributed way so things are partitioned based on the stream you subscribe to. If you aren't online you can subscribe to get offline notifications delivered to your device or other endpoints (like custom nodes that would act on notifications like IFTTT).
So, given that we have the infrastructure - we use PHP for request handling, MySQL for persistence, Node for background service to do socket push and notifications... what does this offer over socket.io? We implement our own rules for subscribing to streams.
Your system does sound similar to SC - It seems to be a pretty standard realtime architecture - Those who started building their realtime systems with Socket.io (as you did) often end up with something similar to what you describe except it takes a lot of work to get there...
A lot of people who use SC decided to make the switch because they started implementing their own pub/sub stack (as you did) and then decided that it would be easier to just use something open source instead of writing their own.
There is a lot more to a realtime stack than just the bidirectional transport.
If you already have a fully working system and you don't need any new features, then you don't necessarily need to migrate to the shiny new tool ;p
I'm curious because I have a relatively new socket.io based application (we went into production in the last month). How difficult is the migration path? (If it's something i can do in an afternoon, I'd be willing to try it out)
Interesting these are coming around now. Some services (i.e. Pubnub) have been doing this (for money) for a long time and what I noticed were at the end of the day, the use cases were actually rather limited.
Apart from stock quotes, bitcoin quotes and multiplayer games - polling just isn't that bad. And it's actually a very good place to start when developing as it's simple.
Even if you don't need sub-second updates polling is still very wasteful. Your typical HTTP header is 700-800 bytes, so every 100,000 requests you're sending 75 megabytes of data you don't need to.
Websockets require 2 bytes, so those same 100,000 requests send only 200 kilobytes of additional data. Also it's worth noting that with Websockets data is only sent when there is new data to send. Saving you even more bandwidth.
Plus with services like Pubnub & Pusher websockets are really simply to get started with now as well. I would argue they're actually easier to use than polling.
Polling can be a fine place to start and is better than nothing. The speed of realtime updates does matter though. Even for seemingly boring use-cases like synchronizing CRM screens or something, it's still best if updates happen within a few seconds. And instant (sub-second) updates add an extra level of polish. Similar to non-flickering graphics and smooth scrolling, fast updates are impressive and can increase the appeal of an app.
If anyone opened the console and noticed the socket timing out earlier - That was because I upgraded to a bigger 4-CPU core Amazon instance and in the process, I accidentally installed a newer version of the SC server (which caused a protocol mismatch with the client version). The load per CPU was around 3% on each core at its peak of 250 concurrent users.
There were occasional spikes to 6% CPU use per core when people started spamming :p
I wonder if socket.io can be mixed with this for use with client-server side (for its polling fallback options) while using SocketCluster for server-server scaling side. Also, how is this different from nats.io?
There's always a pedant. :) You're probably right, but I think that ship has sailed. Projects and companies have been using "realtime" to mean "push" or "update a UI without a refresh button" for years.
Our realtime features seem to overlap based on a quick look on their website, except we get distribution for free from the Erlang VM so you can deploy Phoenix on a cluster and you don't an intermediate redis instance or similar for pubsub/IPC across nodes.
The platform is opensource and come with a lot of higher level features for chat usage: https://github.com/geekuillaume/chatup