I worked for a while on a well-known product that used (and perhaps still uses) WebSockets for its core feature. I very much agree with the bulk of the arguments made in this blog post.
In particular, I found this:
- Our well-known cloud hosting provider's networks would occasionally (a few times a year) disconnect all long-lived TCP sockets in an availability zone in unison. That is, an incident that had no SLA promise would cause a large swath of our customers to reconnect all at once.
- On a smaller scale, but more frequently: office networks of large customers would do the same thing.
- Some customers had network equipment that capped the length of time of that a TCP connection could remain open, interfering with the preferred operation
- And of course, unless you do not want to upgrade your server software, you must at some point restart your servers (and again, your cloud hosting provider likely has no SLA on the uptime of an individual machine)
- As is pointed out in the article, a TCP connection can cease to transmit data even though it has not closed. So attention must be paid to this.
If you use WebSockets, you must make reconnects be completely free in the common case and you must employ people who are willing to become deeply knowledgeable in how TCP works.
WebSockets can be a tremendously powerful tool to help in making a great product, but in general they are almost always will add more complexity and toil with lower reliability.
I built several large enterprise products over WebSockets. I didn't find it that bad.
Office networks that either blocked or killed WebSockets were annoying. For some customers they were a non-starter in the early 2010s, but by 2016 or so this seemed to be resolved.
Avoiding thundering herd on reconnect is a very explored problem and wasn't too bad.
We would see mass TCP issues from time to time as well, but they were pretty much no-ops as they would just trigger a timeout and reconnect the next time the user performed an operation. We would send an ACK back instantly (prior to execution) for any client requested operation, so if we didn't see the ACK within a fairly tight window, the client could proactively reap the WebSocket and try again - customers didn't have to wait long to learn a connection was alive and unclosed.
> If you use WebSockets, you must make reconnects be completely free in the common case
I agree with this, or at least "close to completely free." But in a normal web application you also need to make latency and failed requests "close to completely free" as well or your application will also die along with the network. This is the point I make in my sibling comment - I think distributed state management is a hard problem, but WebSockets are just a layer on top of that, not a solution or cause of the problem.
> you must employ people who are willing to become deeply knowledgeable in how TCP works.
I think this is true insofar as you probably want a TCP expert somewhere in your organization to start with, but we never found this particularly complicated. Understanding that the connection isn't trustworthy (that is, when it says it's open, that doesn't mean it works) is the only important fundamental for most engineers to be able to work with WebSockets.
As rakoo said, exponential backoff mitigates the thundering herd. I was going to say add some jitter to the time before reconnecting, then I realized rakoo already said "after a random short time", which is exactly what jitter is.
(edited for coffee kicking in)
Congestion avoidance algorithms such as TCP Reno and TCP Vegas. basically code clients to back off if they detect a situation where they may be a member of a thundering herd.
No, we always ran on TLS. There were a few classes of these:
* Filtering MITM application firewall solutions which installed a new trusted root CA on employee machines and looked at the raw traffic. These would usually be configured to wholesale kill the connection when they saw an UPGRADE because the filtering solutions couldn't understand the traffic format and they were considered a security risk.
* Oldschool HTTP proxy based systems which would blow up when CONNECT was kept alive for very long.
* Firewalls which killed long-lived TCP connections just at the TCP level. The worst here were where there was a mismatch somewhere and we never got a FIN. But again, because we had a rapid expectation for an acknowledgement, we could detect and reap these pretty quickly.
We also tried running WebSockets on a different port for awhile, which was not a good idea as many organizations only allowed 443.
> But again, because we had a rapid expectation for an acknowledgement, we could detect and reap these pretty quickly.
I found the best way to handle this was with an application level heartbeat. That bypassed dealing with any weirdness of the client firewalls, TCP spoofing, etc.
Something like ping every 30 seconds and say goodbye to the socket if we don't receive 2 seems work reasonably well.
And it also prevent most idle killing base tcp disconnect from happening.
And even if some network is so dumb that decides to kill it under 30s, it is a non issue as that network won't be even usable in normal means. (How do you download any big file if it always disconnect instantly?)
> disconnect all long-lived TCP sockets in an availability zone in unison
I don't know what this means, but it sounds ridiculous. This would cause havoc with any sort of persistent tunnel or stateful connection, such as most database clients. Do you perhaps mean this just happens at ingress? That is much more believable and not as big of a deal.
> office networks of large customers would do the same thing.
Sounds like a personal problem. In all seriousness, your clients should handly any sort of network disconnect gracefully. It's foolish to assume TCP connections are durable, or to assume that you won't be hit by a thundering herd.
Maybe I'm old fashioned but TCP hasn't changed much over the years, none of these problems are novel to me, it's well-trodden ground and there are many simple techniques to building durable clients.
Also, all of the things you mention also affect plain old HTTP, especially HTTP2. There shouldn't be a significant difference in how you treat them, other than the fact you cannot assume they're all short lived connections.
Most applications written using HTTP, in my experience, do not have deep dependencies on the longevity of the HTTP2 connection. In my experience, TCP connections for HTTP2 are typically terminated at your load balancer or similar. So reconnections here happen completely unseen by either the client application in the field or the servers where the business logic is.
For us -- and I think this is common -- the persistent WebSocket connection allowed a set of assumptions around the shared state of the client and server that would have to be re-negotiated when reconnecting. The fact that this renegotiation was non-trivial was a major driver in selecting WebSockets in the first place. With HTTP, regardless of HTTP2 or QUIC, your application protocol very much is set up to re-negotiate things on a per-request basis. And so the issues I list don't tend to affect HTTP-based applications.
> the persistent WebSocket connection allowed a set of assumptions around the shared state of the client and server that would have to be re-negotiated when reconnecting. The fact that this renegotiation was non-trivial was a major driver in selecting WebSockets in the first place. With HTTP, regardless of HTTP2 or QUIC, your application protocol very much is set up to re-negotiate things on a per-request basis. And so the issues I list don't tend to affect HTTP-based applications.
I think this describes a poor choice in technology. There's no silver bullet here, and it sounds like you made a lot of questionable tradeoffs. Assuming that "session" state persists beyond the lifetime of either the client or the server is generally problematic. It's always easier for one party to be stateless, but you can become stateful for the duration of the transaction.
Shared state is best used as communications optimization, and maybe sometimes useful for security reasons.
> Assuming that "session" state persists beyond the lifetime of either the client or the server is generally problematic.
I don't think you're interpreting the problem right? The state is tied to the connection, not outliving client or server. But it outlives single requests, and would be uncomfortably expensive to re-establish per request.
I'm saying is that it's unrealistic to expect to hold a persistent TCP connection for an extended period of time across networking environments you do not control.
Making things not uncomfortably expensive is a good idea.
Relying on websockets to solve this for you is a mistake. It's convenient, but not robust. How would you solve it without websockets using traditional HTTP? The same solution should be used with websockets, but unlocks tremendous opportunities for optimization.
> How would you solve it without websockets using traditional HTTP?
You'd probably do the uncomfortably expensive setup, then give the client a token and store the settings in a database. And then do your best to cache it and have fast paths to reestablish from the cache on the same server or on different servers.
Not only could this add a lot of complication, now you've actually introduced the problem of state outliving your endpoints! You do unlock new ways to optimize, but you pay a high cost to get there. There's a very good chance this rearchitecture is a bad idea.
Sure. Look I'm not advocating for any particular solution here, just trying to point out the hopefully obvious fact that websockets are not a silver bullet. You've basically described why websockets unlock optimizations, which was my point.
Nothing in the GP's post is novel to websockets. Session based resource management is difficult, doubly so for long lived sessions. Relying on websockets to magically make that easy is foolish.
> Not only could this add a lot of complication, now you've actually introduced the problem of state outliving your endpoints!
I only want to point out that this is true with websockets as well, so I find this argument unconvincing. For websockets, what do you do when re-establishing a connection? You start anew or find the existing session. What if the client suddenly disappears without actively closing the connection? You have some sort of TTL before abandoning the session.
>Sounds like a personal problem. In all seriousness, your clients should handly any sort of network disconnect gracefully
That can be complex. Corporate MITM filtering boxes, "intrusion detection" appliances, firewalls, etc, can just decide to drop NAT entries, drop packets, break MTU path discovery, etc. Yes, there are things you can do. But then customers restart/reload when things don't happen instantly, etc. I don't know that there's a simple playbook.
None of this is particular to websockets, and in addition:
> you must employ people who are willing to become deeply knowledgeable in how TCP works
You already needed that for your HTTP based application; it's a fundamental of networked computing. Developers skipping out on mechanical sympathy are often duds, in my experience.
> employ people who are willing to become deeply knowledgeable in how TCP works
I used Microsoft's SignalR library. It knows TCP pretty well and handles most of the common pitfalls nearly automatically.
> customers to reconnect all at once.
That is definitely a problem. So we had to code it from the get go with the assumption that either the network will go down or the server will be bounced for an upgrade.
Actually most of the issues I encountered had to do with various iPad versions going to sleep and then handling WebSockets in different ways once it woke up.
> - Our well-known cloud hosting provider's networks would occasionally (a few times a year) disconnect all long-lived TCP sockets in an availability zone in unison. That is, an incident that had no SLA promise would cause a large swath of our customers to reconnect all at once.
I’m kind of surprised that it was that infrequent. I would expect software upgrades should cause long-lived sockets to reset…
> - Some customers had network equipment that capped the length of time of that a TCP connection could remain open, interfering with the preferred operation
I think this conflates a very specific paradigm for using WebSockets (state synchronization with a stateful "server") with WebSockets as a technology.
At the end of the day, WebSockets are just, well, sockets, with a goofy HTTP "upgrade" handshake and some framing on top of them. You could implement the exact same request/response model as in an HTTP based service over a WebSocket if you wanted to.
Stateful services are a tradeoff whether you use a WebSocket or not.
Reading through here, I think what you're trying to build is a synchronized stateful distributed system where state management becomes more transparent to the engineer, not only across backend services but also between the browser and the service side - this is well tread ground and a huge problem, but an interesting one to take on nonetheless. "WebSockets" are a red herring and just an implementation detail.
The article advocates that the client pulls data via polling (as I understand it), ignoring over a decade of precedence for using websockets for pushing data to a client or broadcasting it to many clients. It's up to the clients to consume that data and react to it if they even want to. There's nothing particularly bad about pub/sub there.
It also talks about a 'command pattern' at that point, which sounds like they're complaining about RPC really.
You're right. The key thing is the spectrum of freedom induced by the library/framework. For example, if use something like PHP then you live in a prison of the request lifecycle. Sometimes the prison is socially enforced by not having shared state between requests within a process.
There is nothing special about websockets, but they do confer a freedom and responsibility.
It isn't uncommon for people to get confused about layering. In fact, it is particularly hard when you have a lot of people who have used something for a long time without actually understanding the design of it.
I'm not a fan of how websockets are implemented in browsers. It's a bit too "magical" for my taste. But what people seem to be complaining about is a mismatch between how they think networking works and how it actually works.
This really isn't any different from the kinds of problems I'd teach beginners to get around 25 years ago when dealing with ordinary TCP sockets. What has changed is that programmers generally tend to know a lot less about the underlying technology these days (because there is so much extra complexity to worry about).
I've only professionally used WebSockets with Spring Boot and React and I must say: they're perfectly fine?
That is, if you use WS to simply asynchronously communicate events and do out-of-band message passing, they operate quickly, easily and efficiently. I wouldn't use them to send binary blobs back and forth or rely on them to keep a perfect state match, but for notifications and push events they're a delight to work with. Plus, their long-term connectivity gives them an edge above plain HTTP because you can actually store a little state in sockets rather than deriving state from session cookies and the like.
Yes, WebSockets don't fit well within the traditional "one request, one response, one operation" workflow that the web was built upon, but that model is arguably one of the worst problems you encounter when you use HTTP for web applications (not websites, though; for websites, HTTP works perfectly!). Most backend frameworks have layers upon layers of processing and securty mitigations exactly because HTTP has no inherent state
WebSockets aren't some magical protocol that will make all of your problems go away but if used efficiently, they can be a huge benefit to many web applications. I've never used (or even heard of) Adama, so I can totally believe that websockets are a terrible match for whatever use case this language has, but that doesn't mean they deserve to get such a bad rep. You just have to be aware of your limitations when you use them, the same way you need to be aware of the limitations of HTTP.
Absolutely right, until you have multiple instances of backends you need to deal with and synchronize, which is a pain in the behind. The OPs main issue seems to be with that problem, which is the tough component of using something stateful like websockets anyways. The impedance mismatch of ‘something happened in the database’ to ‘send event over websocket’ is painful in a multi backend instance environment.
Case in point: dossier locking in the product we both worked on. Hi Jeroen! Always nice to find an old colleague on here :)
You're totally right, of course; for shared states between different backend instances you'll need a different solution, like database locking or complicated inter-backend API calls, or even a separate (set of) microservice(s) to deal purely with websockets while other backend operate on the database. That way you can apply scaling without data consistency issues, if you want to go for a really (unnecessarily) complicated solution.
It all depends on your problem space. If you want to make a little icon go green on a forum because someone commented on your post, I think websockets are perfect, much better than the long-lived HTTP polls of yore.
I think the OP is using websockets to synchronize game state across different clients, which can be quite tricky even without having to deal with scaling or asynchronous connections. You can use websockets for that, but manual HTTP syncs/websocket reconnects after a period of radio silence would not go amiss. Hell, if it's real-time games this is about, you might even want a custom protocol on top of WebRTC to get complete control over data ad state with much better performance.
A stream isn't "one request, one response" but you can still write your websocket endpoints to be stateless. Once you realize that, its not so hard to work with.
a library for keeping a javascript variables synchronized between Node.js servers and clients.
Websockets work great for message passing but it struggles with data structures more complicated than what JSON can represent. Jsynchronous syncs any javascript object or array with arbitrarily deep nesting and full support for circular data structures.
If a computer goes to sleep, or disconnects, websocket connections (and their underlying TCP connections) get reset so you lose any data sent while a computer is unavailable. This is catastrophic for state-management if it's left unhandled. Jsynchronous will re-send any data clients are missing and reconstructs the shared state.
There's also a history mode that lets you rewind to past states.
Right now jsynchronous message passes using custom encoding in JSON which keeps simple changes as small as an HTTP header, but with byte level encoding I think this could be halved.
Some compression on top would probably do wonders for huge volumes of changes, though more browsers are supporting compression on the websocket level.
I'm not familiar with the name differential update. Instead of passing potentially large states back and forth, Jsynchronous numbers each change to your synchronized data and shares these changes with all connected clients. This is called Event Sourcing, and it enables jsynchronous to rewind to previous states by running the changes from start to any intermediate state.
WebSockets are great when used in addition to polling. This way, you can design a system that doesn't result in missed events. Example: have a /events?fromTS=123 endpoint.
At FastComments - we do both. We use WS, and then poll the event log when required (like on reconnect, etc).
Products that can get away with just polling should. In a lot of scenarios you can just offload a lot of the work to companies like OneSignal or UrbanAirship, too.
If you're going to use WS and host the server yourself, make sure you have plans for being able to shard or scale it horizontally to handle herds.
It was hard for us to not use websockets, since like 70% of our customers pick us for being a "live" solution for live events etc.
Depending on what you are trying to achieve, I also recommend SSE over websockets, especially if all you want is to signal clients when state changes on the server.
SSE is a simple protocol that you can easily implement yourself both in the server and the client if the client lacks support.
SSE also adds naturally to existing infrastructure of request-response if you only use SSE for notification and keep everything the same, i.e. use same endpoints as before for fetching new data on a SSE notification event, and thus can be turned off as easily if problematic, e.g. too high server load.
Yeah I agree. But even though SSE is super easy to grok and to implement (literally just standardized long polling), lots of existing infra builds on the assumptions that connections are short lived, so many of the WS issues apply to SSE as well.
IMHO, this unfortunate assumption is not really defensible in $current_year - especially from the multi billion dollar Cloud industry. I'd much more prefer first class support for long-lived connections on an infrastructure level, as opposed to a "proprietary database-level". I don't buy the argument that it's infeasible to solve the thundering herd issues.
I remember when I first heard about websockets, I was wondering what exactly it was useful for that SSE didn't already do. Almost all of the demos at the time were easier (IMO) to do with SSE. The two standards also both came from WHATWG at about the same time.
[edit]
I looked it up and SSE was a much earlier standard, but implementation of WS and SSE were relatively contemporary with the exception of Opera (had SSE in 2006) and IE (Never got SSE support).
I didn't know that SSE came first, thanks for adding that context.
It does feel like websockets tried to cram several novel features whereas SSE was simply giving proper clothing to the existing art of long polling.
In particular, WS is binary encoded, has support for multiplexing/message splitting, several optional http headers, which in hindsight appears to have simply complicated the spec at little-to-nil value.
SSE have a lot of restrictions that make them unattractive, like a global limit of 6 per browser session. This can cause confusing behavior for power users...
Hm yeah now that you mention it I recall that as well. Isn't that just an arbitrary crippling though? I can't imagine a good reason for why SSE would be hogging more resources than websockets.
Is it really worth the extra effort for WS over long polling at that point though? Especially if you're re-using the TCP connection it seems like the overhead would be minimal and the latency only slightly increased.
Sorry for the misunderstanding, but I don't mean WS over long polling. I mean WS in addition to polling, not long polling. Use websockets, but also expose an API to get the same events by specifying a timestamp. This way the websocket server implementation can be much simpler, and the client just has to call the API to "catch up" on missed events on reconnect.
You can also use this API for integrations, and your clients/consumers will thank you. For example, our third party integrations use the event log to sync back to their own data stores. They probably call this every hour, or once a day. You wouldn't want to use websockets with PHP apps like WordPress.
I can totally relate to that. When designing the RxDB GraphQL replication [1] protocol, it made things so much easier when the main data runs via normal request-response http. Only the long-polling is switched out for WebSockets so that the client can know when data on the server has changed. This makes it realy easy to implement the server side components when having a non-streaming database.
You're right which is where I ended the conversation.
Ideally, you should be able to poll because it is resilient. The challenge however is when you separate the initial poll/pull from the update stream because now you have to maintain two code paths. What I'm proposing is that the poll and update stream use the same data format using patching.
Why a separate poll instead of adding the initial offset to the websocket request url or handshake? Just to be compatibly with websocket hostile networks?
I’m confused. Just because you use request response doesn’t mean you don’t have state in your page. And sure, your connection can drop, but your requests can also fail. Protocol versioning is required, but that is true for any protocol ever, also request response. And what’s this about wanting to outsource all state and everything requires a database as soon as it has state? Which one does mspaint.exe use? And if using a load balancer or proxy somehow leaks through then something really messed up is going on. In pubsub, just like REST, one doesn’t care who sent the data. I’m confused.
Sorry about that. Yes, those are required but you have more freedom to go wrong in more severe ways.
Mspaint uses the file system at command of the user. Something that I didn't mention (which I will I'm a revision) I'd that the server is being run by another person with their own deployment schedule. So, we generally like to excise state ASAP to a durable medium since process state is volatile.
All of these issues could be solved well by a decent library. Or even just basic well thought out abstractions.
I've written a few before from "scratch" myself to decent success. I don't think there's anything inherent with sockets that make it impossible to use well. I mean, we've been using sockets since the beginning of the internet.
You can write the same article for literally any technology that's complicated to use without abstractions.
That's the core thing is the discover of well thought out abstractions for a specific domain. I glossed over a lot of the specific details, but I iterated the pitfalls that I have seen and how I'm thinking about the abstractions I'm using.
I am the author of Centrifugo server (https://github.com/centrifugal/centrifugo) - where the main protocol is WebSocket. Agree with many points in post – and if there is a chance to build sth without replacing stateless HTTP to persistent WebSocket (or EventSource, HTTP-streaming, raw TCP etc) – then definitely better to go without persistent connections.
But there are many tasks where WebSockets simply shine – by providing a better UX, providing a more interactive content, instant information/feedback. This is important to keep - even if underlying stack is complicated enough. Not every system need to scale to many machines (ex. multiplayer games with limited number of players), corporate apps not really struggle from massive reconnect scenarios (since number of concurrent users is pretty small), and so on. So WebSockets are definitely fine for certain scenarios IMO.
I described some problems with WebSockets Centrifugo solves in this blog post - https://centrifugal.dev/blog/2020/11/12/scaling-websocket. I don't want to say there are no problems, I want to say that WebSockets are fine in general and we can do some things to deal with things mentioned in the OP's post.
> Not every system need to scale to many machines (ex. multiplayer games with limited number of players)
the author writes a websocket board game server. Most, if not all, of these complaints read like the author isn't partitioning the connections by game.
The question is at what level do you partition the connection. I could setup a server and then vend an IP to the clients. The problem with that strategy is how do you do recovery? Particularly in an environment where you treat machines like cattle.
If you don't vend an IP, then you need to build a load balancer of sorts to sit between the client and the game instance server. Alas, how do you find that game instance server? If you have a direct mapping, sticky header, or consistent routing. As long as you care about that server's state, it is the same as vending an IP to the client except you can now absorb DOS attacks and offload a bit of compute (like auth) to the load balancer fleet.
The hard problem is how much do you care about that server's lifetime? Well, we shouldn't because individual servers are cattle, and we can solve some of the cattle problems by having a nice shutdown to migrate state and traffic away. This will help for operator induced events with kindness. Machine failures, kernel upgrades, and other such things that affect the host may have other opinions.
• Make push updates optional. If no push connection can be established, fall back to polling. You can start by implementing polling only and add push later.
• Use websockets only for server-to-client communication. Messages from the client are sent via regular HTTP requests.
• Keep no meaningful state on the server. That includes TCP connection state. You should be able to kill all your ec2 instances and re-spawn them without interrupting service.
• Use request/response for all logic on the server. All your code should be able to run in an AWS lambda.
• Use a channel/subscription paradigm so client can connect to streams they're interested in
• Instead of rolling your own websocket server, use a hosted service like pusher.com or ably.com. They do all the heavy lifting for you (like keeping thousands of TCP connections open) and provide a request/response style interface for your server to send messages to connected clients
We use websockets and solve a lot of the state management problem called out here by keeping very little state on the server itself. The primary thing on server is a monotonically increasing integer we use to stamp messages, this gives us total order broadcast which we then build upon: https://en.m.wikipedia.org/wiki/Atomic_broadcast
Here are some code pointers if you want to take a look:
The deltamanger in the container-loader package is where we manage the websocket. It also hits storage to give the rest of the system a continuous, ordered stream of events:
The main server logic is in the Alfred and Deli lambdas. Alfred sits on the socket and dumps message into Kafka. Deli sits on the Kafka queue, stamps messages, the puts them on another queue for Alfred’s to broadcast:
https://github.com/microsoft/FluidFramework/tree/main/server...
come to phoenix/elixir land. Channels are amazing and the new views system allows you to seamlessly sync state to a frontend using websockets with almost no javascript
I'm thinking about how I position Adama for both Jamstack and as a reactive data-store (which could feed phoenix/elixer land). I intend to change my marketing away from the "programming language" aspect and more towards "reactive data store".
My side-project relies heavily on WS, I'm currently using Node.js and it's alright but learning Elixir is my goal during these holidays. Any resource to share to get started? I don't know much about Elixir except it's perfect for such use-cases.
I was a nodejs programmer for almost 8 years before I hopped to elixir. A lot of it was motivated btw elixir's realtime system and concurrency.
One of the issues with nodejs is that you're stuck to one process which is single threaded. IE: you're stuck to one core. Yes there are systems like clustering which relies on a master process spawning slave processes and communicating over a bridge but in my experience, its pretty janky.
You're making a great move by trying out elixir. It solves a lot of the issues I ran into working with nodejs. Immutability is standard and if you compare two maps, it automatically does a deep evaluation of all the values in each tree.
The killer feature however is liveview. Its what meteor WISHES it could be. realtime server push of html to the dom and teh ability to have the frontend trigger events on the backend in a process thats isolated to that specific user. Its a game changer.
Anyways, if you're looking for resources to learn. pragprog has a bunch of great books on elixir. Thats how I got started.
I got pretty far in the beginning just by reading the docs on Phoenix channels [0]. I was learning Phoenix and ReactJS at the same time and got a pretty simple Redux thunk that interacted with Phoenix Channels in a few days. I'm not sure if that was the optimal way to do it, but it was really cool interacting with the application from IEx (elixir shell).
You might find an interesting (albeit more complex) entry point by interacting with Phoenix Channels from your Node.js app using the Channels Node.js client [1] and within the frontend itself.
I wrote Real-Time Phoenix and it goes into pretty much everything that I hit when shipping decently large real-time Channel application into production. Like, the basics of "I don't know what a Channel is" into the nuance of how it's a PITA to deploy WS-based applications due to load balancers and long-lived connections.
The new LiveView book is great if you're interested in going fully into Elixir (server/client basically). I use LiveView for my product and it's great.
Being already familiar with FP I was able to jump right into elixir after running through learn you some erlang. Erlang and elixir are very similar, and you get the bonus of learning a bit about OTP, BEAM, and the underlying philosophy. There's a ton of resources after that, and the docs are easy to use and understand.
its more than just webockets. Its a prototcol built on web-sockets that also include keepalive and long-polling fallback.
You could in theory replicate the protocol in another stack but its more than that. Elixir is uniquely suited to websocket applications. Since a websocket is persistent, you need a process on your end that handles that. Elixir is excellent at creating lightweight threads for managing these connections. Out of the box, you can easily support a few thousand connections on a single server.
I know because we did it at my startup. channels powers our entire realtime sync system and its yet to be a bottleneck. It more or less works out of the box without issue. Its almost boring level reliable.
Minor nitpick: Channels are transport-agnostic, you could always do them over longpoll, it's not hard to impl a raw tcp socket channel driver, and hell you could probably figure out how to do it in streaming http 1.1. I might to try to do channels over webrtc.
Elixir inherits those properties from the Beam VM it shares with Erlang - both languages are effectively great for this WS/Channels use case - but Channels seems unhelpful unless you're already using phoenix. I can't see that it can be used in say Erlang.
Channels are really just a protocol, but that protocol is implemented in Elixir (Phoenix) and so isn't available elsewhere.
I think that the important question is "why does this protocol exist?" Most likely, you'll end up solving similar problems as to why Channels exist in the first place. So from a protocol perspective, it's nice that some problems are solved for you.
May not fit your use-case, but you can create an umbrella project with both an Erlang app and an Elixir/Phoenix app, whereby the latter's capable of calling functions in the former.
We used WebSockets to build a web-based front end for Ardour, a native cross-platform DAW, and didn't encounter any of these issues. Part of that is because the protocol was already defined (Open Sound Control aka OSC) and used over non-web-sockets already. But as others have noted, most of the problems cited in TFA come from the design goals, not the use of websockets.
The answer is simple: Abstraction, or the lack thereof. Building real-time web applications are difficult. But the difficulty is accidental complexity that could be abstracted away. However, most of the tech stacks are not there yet - mainly because the request-response model is good enough for most of the sites, so the industry wouldn't push it forward wholeheartedly. Maybe the situation would start to change after WASM take off, or maybe not.
The best bet is to work on platforms that already have great WebSocket support. PHP might not be a great choice. Node.js is okayish but not great. Blazor sounds interesting but I'm not sure about the performance. Elixir/Phoenix is probably the best bet for now. It has nice, abstracted APIs, it's performant, and it can form clusters either by itself or with Redis.
It's very hard to scale WebSocket because of its stateful nature, so please look for platforms already solved this problem.
You can turn websockets into a flawless request/response with async/await included on like 20 lines of JavaScript. I do it all the time.
Generate an ID, make a request, store the promise resolve/reject in a map (js object). Your onmessage handler looks up the promise based on the ID and resolves the promise.
Add a few more tiny features like messaging and broadcasting streams and you've got both request/response and push messaging over a single websocket.
It's pretty neat in my opinion, saves having to mix HTTP and websockets for a lot of things.
It is neat. I've gone down this path[0]. But you find yourself essentially re-implementing HTTP, and losing things like backpressure. Sometimes it's what you need, but I try to be cautious before jumping to WS these days.
I tried to make something with websockets a few weeks ago. My backend was a simple Python3 web server and I just didn’t have the chops to make it work alongside the asyncio websockets stuff.
The alternate solution which worked exceptionally well was to make an XHR which the server would hold open and only respond to once an event occurred. The moment the XHR response arrives, another one is set up to wait for the next event.
I made a proof of concept that sent my keystrokes from the server to the browser. It was so snappy it was like typing locally.
I kludged together subscription support for our GraphQL stack, and it's super scary because the product people see it working and are like, "this is awesome!" but sometimes it doesn't totally work, and I don't think they realize how hard it would be to, "just make it work all the time".
I want a "real" non-kludge solution, but it's hard to convince someone they need to give you money to throw away something that "works", from their perspective.
A few years ago I was more inclined to use WebSockets. They're undeniably cool. But as implemented in browsers (thanks to the asynchronous nature of JavaScript) they offer no mechanism for backpressure, and it's pretty trivial to freeze both Chrome and Firefox sending in a loop if you have a fast upload connection.
I designed a small protocol[0] to solve this (and a few other handy features) which we use at work[1]. A more robust option to solve similar problems is RSocket[3].
More recently I've been working on a reverse proxy[2], and realized how much of a special case WebSockets is to implement. Maybe I'm just lazy and don't want to implement WS in boringproxy, but these days I advocate using plain HTTP whenever you can get away with it. Server Sent Events on HTTP/1.1 is hamstrung by the browser connection limit, but HTTP/2 solves this, and HTTP/3 solves HTTP/2's head of line blocking problems.
Also, as mentioned in the article, I try to prefer polling. This was discussed recently on HN[4].
I've never seen websockets accomplish anything that push with long polling didn't do more effective and efficiently. I think they are a technology that missed their time, the CPU / power savings are generally non-existent and the minimal bandwidth savings are frequently negated by the need to add in redundancy and checks.
If SSE supported binary data I might agree with you. I think WebSockets have their place, but are overused. I definitely agree long polling should be implemented first and then only add WS if you've measured that you need it. You're probably going to need to implement polling anyway for clients to catch up after disconnect.
I've been looking at it lately and the SignalR tech it uses. Very nice. My cursory Google research indicates a server using it can handle about 3000 users, which is not many, but for my purposes is fine. Blazor makes it completely optional, which I proved to myself by doing client side Blazor (dot net wasm) only and using an Apache server instead of IIS with ASP.NET.
We're migrating to server-side Blazor because of the good experience yet all of the things mentioned in this article are a concern for using this technology. A few things, especially around deployment and maintenance, are significantly less good when your clients are always connected. I'm currently trying to figure out mitigations.
I also implemented MQTT for industrial machines to publish data to a broker. It was trivial to create a web UI that subscribed to that broker via MQTT over Websockets. But I noticed that colleagues had this impression that MQTT cannot be used on the web, so they wanted to build a conversion from MQTT to SignalR.. a quick search would've cleared it up, but they were so sure for some reason. After I showed them the demo they just went with MQTT as well.
meh. we shipped an interactive ETL tool to a bunch of government customers and it relies heavily on the channel abstraction and pubsub provided by phoenix and elixir. maybe it's because message passing is the programming model all the way down the stack and all the way across the cluster, but long lived stateful connections are just not hard to deal with.
elixir has (plenty of) its own drawbacks, but wrangling a pile of topics over a websocket is not one of them. this can be a solved problem if you make some tradeoffs.
this system isn't huge, but has been running for years for plenty of customers irl.
Make sure you handle disconnects/reconnects, leverage query params to pass state (ideally limited), sync/scale via Redis pubsub, and assume that the websocket will fail so always have some sort of fallback/redundacy. Ideally think of websockets like sprinkles on ice cream.
I read up to 5-6 “problems” and still don’t understand the exact setting they are talking about. All websocket services I have used alleviated tcp issues by some sort of a heartbeat or planned close/reconnect (just like long-polling, which is basically the same thing from tcp perspective). State sync is not a problem if you don’t lose any state on server restarts, which you shouldn’t lose anyway. Moreover, you don’t have to have a conversation over a socket, request can be made over a regular xhr, only realtime events ({type:event, msg:{id, status}}) go back via socket, iff the client is interested in these. If a socket fails, they may reload the page and get this info via xhr again (/api/get-status?id=). The queue is usually natural, the messages are usually realtime.
the state on that machine must survive the failure modes of the proxy talking to it. That is, the state must be found. … However, this creates a debt such that a catastrophic socket loss creates a tremendous reconnect pressure.
It is terrifying indeed, but boils down to “don’t try to resend the past”, no FUD required. The state of a socket is the same as of an http request: they connect or auth with some id/jwt and then a socket can send events back again. If you can’t hold that reconnect+auth pressure, you likely can’t hold long-poll reconnect+auth pressure, and less likely but still probably you can’t serve clients who refresh their pages aggressively. Put an adaptive rate limiter before your routes already.
This is a talk with strange assumptions and without technical details, what is their point beyond selling some Adama? We lived up to the point when people are afraid of streaming in favor of repeated polling?
There is this particular app I have to use for work which relies heavily on web sockets. It works great so long as your latency is good. But as soon as your latency is over some threshold the sheer volume of websocket requests compounds and it becomes unusable. Normal sites just load slower, they don't fail entirely like this.
I guess it goes without saying but devs really need to test their apps under unfavorable conditions.
You don’t have to design a new protocol. There is an open protocol which works nice for this use case. It can handle dropping connections, has a nice routing mechanism, it scales pretty well, has different QoS levels, pub/sub, request/response. That protocol is... MQTT v5!
Personally I’m surprised it is not used more often by web devs.
MQTT v5 is fine enough until you have to care about reliability at scale. What I have seen happen is that it turns into a slightly better TCP where you have to build your own primitives.
For example, when you SUB, you get a SUBACK if enabled. However, there is nothing like a SUBFIN (subscribe over/finished/closed) to indicate that the subscription is over or failed. In many ways, pub/sub overcommits in many ways. Now, this all depends on what does SUBACK mean? is it durable? tied to a connection?
An interesting foil is something like rsocket. RSocket has a lot of nice things, but it requires an implementation to be damn near perfect. Paradoxically, rsocket lacks the notion of a SUBACK but it has something like a SUBFIN.
A key challenge with a stream is knowing if it is working. With request-response, you can always time out. With a stream, how does one tell the difference between we broken stream and an inactive stream?
We use this at my place of employment and it’s generally OK except for dealing with reconnects. Also pub/sub request/response was not nearly as smooth as we had hoped and eventually we reverted back to HTTP
We wanted a connection-oriented streaming protocol API for web browsers, like TCP, but we didn't want to just let web sites make random connections to TCP ports, so instead we took a connectionless protocol and bolted on a connection-oriented streaming protocol that can only use 2 port numbers.
The biggest challenge when migrating from HTTP to any kind of stream is the loss of callback on response, because there is no request/response. There is only push. That means your messaging and message handling becomes far more complex to compensate.
The article mentions that your socket could drop. This isn’t critical so long as you have HTTP as a redundancy.
For any kind of application making use of micro services I strongly recommend migrating to a socket first transmission scheme because it is an insane performance improvement. Also it means the added complexity to handle message management makes your application more durable in the face of any type of transmission instability. The durability means that your application is less error prone AND it is ready for migration to other protocols in the future with almost no refactoring.
A team at work made the mistake of using WebSockets for a new application instead of old fashioned long polling. That resulted in 3+ months of hell debugging a buggy OSS library and fixing incompatibility issues for different mobile devices. And it broke again when a new OS update was released for iPhone. I did warn them not to use fairly new tech when old battle proven tech would solve the problem equally well. But the shiny new tech was too tempting for them and they paid the price for it. Which yet again confirms my #1 rule of thumb: always use tech that is old and boring unless there is no old tech that can do the job. If you want excitement, perhaps instead do kick boxing. That worked for me.
I definitely think it is. However “old” and “boring” are not precise terms. Perhaps a more accurate way to define the rule would be: “Prefer the oldest, battle proven, known to the team tech that can do the job.” For example if long polling and WebSockets are both capable of doing what you need to get done then pick long polling. Because it is older, more used, proven, used by many web applications out there, compatible with everything on the planet, and known to the team.
having written a websocket game server that serves hundreds of thousands of simultaneous connections, I find the tone of this article really frustrating, because the author is peddling a lot of FUD about websockets while at the same time building a SaaS websocket server product. Websockets involve different challenges than traditional http servers, but I've always found them to be fun and stimulating.
If you’re developing a game you’re probably not dealing with enterprise networks. Enterprise networks have their own rules, only loosely related to the specs.
I wouldn't call it FUD, but lived experience manifesting as requirements and knowing what to avoid. I've been the guy that has to debug reliability issues in streaming services because people just wing it.
I agree that it is fun and stimulating, but you also have to pick your battles and know which problems are worth having.
I made a (bad) habit of taking all the HN threads about people being in love with Elixir, and being a Grinch ;)
Mostly it's stuff that we did wrong, did not understand, a poor choice of lib, and a strong preference for staticly typed langages with a visual debugger that just works.
Nothing that should prevent you from using or trying it - it's Christmas time, peace on flame wars.
Why use a websocket on CP systems where consistency is important? Request/Response seems better?
Even though they CAN be used for CP I always thought Web Sockets were targeted more for AP systems where Availability/Speed was what's more important and if the connection broke and state was lost it wasn't a big deal.
A websocket doesn’t have much import on CP vs AP. Request/Response is still done over unreliable connections that can’t guarantee exactly once request or response delivery without further application-level mechanisms (retries with idempotency keys, etc.). They’re also used mostly with web clients, and it’s pretty rare that we consider systems where web client partition is allowed to reduce availability for the system as a whole, even if it is CP.
Ah that makes sense. Maybe I need to read the article again but it was talking about lost state in the web client due to the web socket which means they're trying to treat the web client as a stateful partition?
So I guess my instinct was why even have state in the browser.
To reduce the problem of updating the software you can bring up a second sender and make all new connections point to that one, but don't close the old ones.
If your user sessions are less than the time between deployments, then you won't close any connections
When should you use Web Sockets? I've been interested in the tech but haven't found a use case that needs it. I can see a scenario of live, server pushed updates being a use case but am completely ignorant of the issues the article speaks to.
In particular, I found this:
- Our well-known cloud hosting provider's networks would occasionally (a few times a year) disconnect all long-lived TCP sockets in an availability zone in unison. That is, an incident that had no SLA promise would cause a large swath of our customers to reconnect all at once.
- On a smaller scale, but more frequently: office networks of large customers would do the same thing.
- Some customers had network equipment that capped the length of time of that a TCP connection could remain open, interfering with the preferred operation
- And of course, unless you do not want to upgrade your server software, you must at some point restart your servers (and again, your cloud hosting provider likely has no SLA on the uptime of an individual machine)
- As is pointed out in the article, a TCP connection can cease to transmit data even though it has not closed. So attention must be paid to this.
If you use WebSockets, you must make reconnects be completely free in the common case and you must employ people who are willing to become deeply knowledgeable in how TCP works.
WebSockets can be a tremendously powerful tool to help in making a great product, but in general they are almost always will add more complexity and toil with lower reliability.
(edited typos)