Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: SocketCluster.io – flexible open-source real-time server with pub/sub (socketcluster.io)
217 points by cryptica on Feb 10, 2016 | hide | past | favorite | 39 comments



I tried SocketCluster for a highly scalable chat platform and it couldn't handle very big conversations. SocketCluster is good but the bottleneck here was NodeJS. I finnaly used Nginx PushStream to do the sub and the pub was handled by NodeJS (with the authentication). It has been used to host chat room with more than 70,000 concurrent clients, scaled on 7 m3.large EC2 instances and one relatively small Redis server.

The platform is opensource and come with a lot of higher level features for chat usage: https://github.com/geekuillaume/chatup


So -- m3.large ec2 instances have a few problems with them. 1, your only running a 2 core machine (1 worker, and assuming 1 broker to distribute through redis). That is not nearly enough to handle really a real-time chat server where everyone is spamming it all the time. Another issue is the network performance of an m3.large instance -- its moderate (which is another word for not very good).

With 2 c4.8xlarge instances behind ELB, we consistently see between 20 to 50K active connections on live meetings, and the servers sit at < 2% usage. Latency between event cycles is < 0.015ms.

I see a lot of people having problems with this, but the issue is usually the resources they give to it, or the actual handling of socket events. If the socket server is just a relay there should be no reason a single m4.4xlarge instance cant handle 600K+ connections.

www.jayway.com/2015/04/13/600k-concurrent-websocket-connections-on-aws-using-node-js/


How does nodejs stand up against Java netty NIO for the web socket use case?


Would you happen to have any metrics of how far you could push SC and how far you're getting with NGINX PushStream for sub?


> It has been used to host chat room with more than 70,000 concurrent clients, scaled on 7 m3.large EC2 instances and one relatively small Redis server.

Knowing nothing about anything, 70k clients on 7 servers does not sound immediately very impressive. That's roughly 10k clients per server, and this is 15 years after "C10k problem" was coined. Could someone comment if I'm missing something major here?


From personal experience -- switching from socket.io to socketcluster.io for our conferencing platform was a life saver. Its so much more stable & performant its crazy. We also use it to run our webrtc video chat platform which works great as well. Cant say enough good things about this.


Should say that we horizontally scale this thing pretty heavily using the sc-redis module. Elasticache + ELB + 4 EC2 Instances = support for 5000+ person conferences :D


This sounds interesting. Do you have a web-site? Open source?


www.talkfusion.com -- in specific, the live meetings & video chat portion use socket cluster as its backer. We generally use c4.8xlarge instances to back the socket infrastructure.


If you check their comment history (which is very brief) you'll see they comment on a job posting for Talkfusion.


What are your montly costs?


How long has it been around? Why is it better than socket.io 2?


How is this different than this? https://news.ycombinator.com/item?id=11071916


I was wondering the same. What are the odds of two very similar projects with nearly identical headlines appearing on HN at nearly the same time?


Project 1 is submitted and trending, someone thinks "Oh, I have/have seen something similar, let's submit it while people are thinking about the topic?"


Three: https://news.ycombinator.com/item?id=11073966. But detaro no doubt has it right.


This seems very impressive, but I wonder why we would want to switch to it from socket.io in our use case.

We have built an open source platform that takes care of all the usual stuff you need when building realtime social apps. From the client side caching down to the pubsub, message ordering, security, and pushing updates via socket.io back to the client and updating the caches. It is designed to work in a distributed way so things are partitioned based on the stream you subscribe to. If you aren't online you can subscribe to get offline notifications delivered to your device or other endpoints (like custom nodes that would act on notifications like IFTTT).

So, given that we have the infrastructure - we use PHP for request handling, MySQL for persistence, Node for background service to do socket push and notifications... what does this offer over socket.io? We implement our own rules for subscribing to streams.


I'm the main author of SC.

Your system does sound similar to SC - It seems to be a pretty standard realtime architecture - Those who started building their realtime systems with Socket.io (as you did) often end up with something similar to what you describe except it takes a lot of work to get there...

A lot of people who use SC decided to make the switch because they started implementing their own pub/sub stack (as you did) and then decided that it would be easier to just use something open source instead of writing their own.

There is a lot more to a realtime stack than just the bidirectional transport.

If you already have a fully working system and you don't need any new features, then you don't necessarily need to migrate to the shiny new tool ;p


I'm curious because I have a relatively new socket.io based application (we went into production in the last month). How difficult is the migration path? (If it's something i can do in an afternoon, I'd be willing to try it out)


Does SC require the need for a DB?


No but there is an optional CRUD module for RethinkDB https://github.com/socketcluster/sc-crud-rethink


Thanks for the answer! I was suspecting as much for our use case but wanted to clarify.


Interesting these are coming around now. Some services (i.e. Pubnub) have been doing this (for money) for a long time and what I noticed were at the end of the day, the use cases were actually rather limited.

Apart from stock quotes, bitcoin quotes and multiplayer games - polling just isn't that bad. And it's actually a very good place to start when developing as it's simple.


Even if you don't need sub-second updates polling is still very wasteful. Your typical HTTP header is 700-800 bytes, so every 100,000 requests you're sending 75 megabytes of data you don't need to.

Websockets require 2 bytes, so those same 100,000 requests send only 200 kilobytes of additional data. Also it's worth noting that with Websockets data is only sent when there is new data to send. Saving you even more bandwidth.

Plus with services like Pubnub & Pusher websockets are really simply to get started with now as well. I would argue they're actually easier to use than polling.


Not to mention things like logging and metrics overhead if you're logging every request at multiple levels (application, NGINX, HAProxy).


Polling can be a fine place to start and is better than nothing. The speed of realtime updates does matter though. Even for seemingly boring use-cases like synchronizing CRM screens or something, it's still best if updates happen within a few seconds. And instant (sub-second) updates add an extra level of polish. Similar to non-flickering graphics and smooth scrolling, fast updates are impressive and can increase the appeal of an app.


Polling works for current use cases, but products of the future aren't built around current use cases ;-)


Programming is akin to woodworking. Lots of tools and all of it is for wood. Use the right tools to craft the right solutions.


If anyone opened the console and noticed the socket timing out earlier - That was because I upgraded to a bigger 4-CPU core Amazon instance and in the process, I accidentally installed a newer version of the SC server (which caused a protocol mismatch with the client version). The load per CPU was around 3% on each core at its peak of 250 concurrent users.

There were occasional spikes to 6% CPU use per core when people started spamming :p


I wonder if socket.io can be mixed with this for use with client-server side (for its polling fallback options) while using SocketCluster for server-server scaling side. Also, how is this different from nats.io?


Am I the only one who thinks we need a better name for this kind of thing than realtime server ?


There's always a pedant. :) You're probably right, but I think that ship has sailed. Projects and companies have been using "realtime" to mean "push" or "update a UI without a refresh button" for years.


I admit naming things is hard. I'm very often pedantic! But I don't think it's pedantic to say, "Hey, guys, this name we're using is kind of shitty."


Well yeah but we should know better. Pubsub != realtime.


Does phoenix's / elixir channels compare with this well?


Our realtime features seem to overlap based on a quick look on their website, except we get distribution for free from the Erlang VM so you can deploy Phoenix on a cluster and you don't an intermediate redis instance or similar for pubsub/IPC across nodes.


whats with sudden needs for "flexible open-source realtime server with pubsub" referring to https://news.ycombinator.com/item?id=11071916 reply


Does this support Elasticsearch?


Not a direct answer to your question but you might also check out https://appbase.io/ which is Elasticsearch compatible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: