I'm really glad to see an article like this. I've worked in the space for a while (Fluid Framework) and there's a growing number of libraries addressing realtime collab. One of the key things that many folks miss is that building a collaborative app with real time coauthoring is tricky. Setting up a websocket and hoping for the best won't work.
The libraries are also not functionally equivalent. Some use OT, some use CRDTs, some persist state, some are basically websocket wrappers, fairly different perf guarantees in both memory & latency etc. The very different capabilities make it complicated to evaluate all the tools at once.
Obviously I'm partial the Fluid Framework, but not many realtime coauthoring libraries have made it as easy to get started as Replicache. Kudos to them!
A few solutions with notes...
- Fluid Framework - My old work... service announced at Microsoft Build '21 and will be available on Azure
- yJS - CRDTs. Great integration with many open source projects (no service)
- Automerge - CRDTs. Started by Martin Kleppman, used by many at Ink & Switch (no service)
- Replicache - Seen here, founder has done a great job with previous dev tools (service integration)
- Codox.io - Written by Chengzheng Sun, who is super impressive and wrote one of my fav CRDT/OT papers
- Chronofold - CRDTs. Oriented towards versioned text. I'm mostly unfamiliar
- Convergence.io - Looks good, but I haven't dug in
- Liveblocks.io - Seems to focus on live interactions without storing state
- derbyjs - Somewhat defunct. Cool, early effort.
- ShareJS/ShareDB - Somewhat defunct, but the code and thinking is very readable/understandable and there are good OSS integrations
- Firebase - Not the typical model people think of for RTC, but frequently used nonetheless
I should add... I talk to many folks in the space. People are very welcoming and excited to help each other. Really fun space right now.
Built a Google Docs like rich text collaborator for a client on Express/Psql and React. Worked like a charm. The hardest part was dealing with ports on AWS to be honest.
I'll have to look at some of these, I've reviewed some of these but not all. You are missing some I'm familiar with.
PouchDB+CouchDB work well out of the box with minimal fuss for open pieces you can just plug into this role. PouchDB handles the client's state persist and replication on the client, couchdb is the reliable cloud service you can replicate to.
Meteor, at least their pre-apollo stack had realtime collab type features with their mini-mongo client and oplog tailing.
Fluid Framework looks pretty cool! I somehow missed the Build announcement about this.
Maybe it's just me, but it has a SignalR + Orleans sort of vibe to it when I think about the types of problems it solves. I will definitely be digging into this a bit more.
We implemented all that manually, more or less in swift (and sqlite), then react+redux, and on the back end - postgres and python+flask. Works flawlessly so far. We do have the same setup more or less, with listeners triggering UI updates and push messages signalling the clients to fetch data from the server. Then, on the server, we have two dbs -> one where we store each update or create message, in a postgres-based queue, and another one, in a normalised format which we use for login (it's way faster than replaying all messages from the queue).
There are complexities when you move beyond one or two tables, though - like maintaining relations, ensuring things get done in the correct order, that they get merged (we merge all attributes of each item - e.g. one client can change color, and the if another changes the text content of the item these will get merged), etc.
We gave up on the websocket part and implemented basic polling, because they were not supported by App Engine at the time (things might have moved on since then, which is a couple of years ago). Yet, for a note/todo/habit tracking app, it simply doesn't need to be real-time from our experience.
Have a play at https://www.mindpad.io/app/. You can see how it works if you open up the web app in two incognito tabs, or on an iPhone and the web.
This stack reminds me of Meteor, which came out nearly a decade ago(!). https://meteor.com
It never really took off in the mainstream - I think because it was before many developers really trusted JS on the server, and a "full stack" framework is quite a big commitment for a team to shift to. Also most CRUD apps don't need real time collab.
I remember being amazed when changes were instantly propagated between my phone and laptop browsers with almost zero lag. This was the demo that sold it for me https://www.youtube.com/watch?v=MGbmW9bwJh4
I built my first big software project in Meteor! It was great, really a shame that it didn't take off. As you said I think it tries to do too much. Hell, at some point they even introduced their own package manager. It might be good for solo developers but as soon as you have a bit more bandwith I think you give up too much control.
Author here. Thanks for mentioning Meteor, which also impressed me a lot when it first came out. I think it didn't take off because it tries to do too much (frontend + backend + db). And one smart move by Replicache is that it tries to integrate nicely with the rest of your stack.
I haven't yet done this but based on some research it seems to me like the core of any collaborative app today (that wants to avoid Firebase and the other hosted platforms like Replicache seems to be) is easiest served by picking some CRDT library.
There are a couple of open-source CRDT libraries that provide both clients and servers (yjs [0] and automerge [1] are two big ones for JavaScript I'm aware of).
My basic assumption is that as long as you put all your relevant data into one of these data structures and have the CRDT library hook into a server for storing the data, you're basically done.
This may be a simplistic view of the problem though. For example I've heard people mention that CRDTs can be space inefficient so you may want/have to do periodic compaction.
Interesting! I know there was a large performance refactor that was merged in May [0]. This post you link was written in June of this year. Unclear if the performance fix is related to the reported issues and unsure if it still exists or not.
At the very least, the automerge maintainers seem to be very actively tackling performance problems.
Really interesting...you can build a similar (websocket/db backed) app with LiveView out of the box, no? Any idea how well that'd hold up against this solution?
The big difference is that with CRDTs you can make edits offline and they will get merged with other changes when you come back online. Websocket/db really only works when you always online.
That being said you can totally implement collab without CRDTs and if you don't particularly need offline it should be easier.
Replicache's creator Aaron has a pretty good Twitter thread explaining the difference among Replicache, WebSocket and (classic) CRDTs. I will summarize briefly here:
- WebSocket (and Phoenix Channel) is just a communication method. To maintain consistency and resolve conflict, you need something like Replicache.
- CRDTs are more suitable for p2p scenario while Replicache works better for client-server apps.
- Phoenix's Presence is built with CRDT but it's just a single feature, not a general CRDT toolkit.
I remember listening to an episode of the Exponent podcast, in which Ben Thompson said something like (paraphrasing from memory):
> People who love "native apps" can complain about Electron all they want—but there's simply no replacement for the real-time collaboration offered by web-based apps like Figma!
As someone who's not exactly thrilled with Electron and its memory usage—is there a reason the two go together? Is there a reason we can't build collaborative apps in Cocoa and GTK? I think these systems are awesome, I just think they'd be even better if they weren't also running full web browsers!
It could totally be done natively. The obstacle is how much of the stack you have to write and maintain. There are js libraries that do most of this heavy lifting for you, and CRDTs are pretty new to most devs.
It's just much much easier and cost effective to build a single code base and hit many many targets platforms with it.
Computing history has also shown that publishing efficient lean software doesn't help in the market. At least not over time to market, getting the key features right, and your ongoing costs.
This is reason enough. Already you now have to build the UI twice because there is no GUI framework that actually looks good on all OSs. You see this all the time where apps made on linux but technically work on macos just work terrible or look super ugly on macos.
You also have to remember windows, ios and android. When you build something targeting web browsers you only have to worry about screen sizes rather than OSs.
Figma’s performance is excellent due in large part to the fact they compile a lot of native code to Wasm. Electron or not it’s still fast.
To answer your question, collaborative apps ideally need to target the widest possible audience. Barring a massive budget, the best way to accomplish this is to also have a singular compile/build target. In most cases, that’s the web platform.
Figma's performance is impressive for an Electron app, but it does choke on very large files, which Sketch would have handled without a care. It's not great.
If Sketch had had Figma's collaboration features, we wouldn't have switched. But during the pandemic it was necessary.
That's what Replicache[0] solves, it provides for Causal+ Consistency across the entire system.
"This means that transactions are guaranteed to be applied atomically, in the same order, across all clients. Further, all clients will see an order of transactions that is compatible with causal history. Basically: all clients will end up seeing the same thing, and you're not going to have any weirdly reordered or dropped messages."
This sounds a lot like Operational Transform but without the transform part - it assumes that locally applied mutations can be undone and rebased without user interaction. But I feel like the Google Wave team would have a lot of objections to the idea that this can just be ignored. If your state is just a group of key value stores where last write wins and everyone can agree on who's last, that's fine, but text/token streams require a notion of transformation that I'm worried Replicache simply glosses over.
Indeed, there can never be one universal solution to this, because the problem is one of specification rather than (only) implementation.
For example, suppose we have an edit/delete conflict, where two clients concurrently interact with the same entity in your data model. In a simple case, we can decide to “resurrect” the affected entity and apply the edit, which is the option that never results in significant data loss and so might be a reasonable behaviour if no user interaction is involved.
Now, what if there were other consequences of deleting that entity? Maybe the client that deleted the entity then created a new entity that would violate some uniqueness constraint if both existed simultaneously. Or maybe it wasn’t the originally deleted entity that would violate that constraint, but some related one that was also deleted implicitly because of a cascade. How should we reconcile these changes, if simply allowing either one to take precedence means discarding data from the other?
At least if all clients are communicating in close to real time, it’s unlikely that any one of them will diverge far from the others before they get resynchronised, so the scope for awkward conflicts is limited. But in general, we might also need to support offline working for extended periods, when multiple clients might come back with longer sequences of potentially conflicting operations, and there’s no general way to resolve that without the intervention of users who can make intelligent decisions about intent, or at least a set of automated rules that makes sense in the context of that specific application. And in the latter case, we’d still probably want to prove that our chosen rules were internally consistent and covered all possible situations, which might not be easy.
> How should we reconcile these changes, if simply allowing either one to take precedence means discarding data from the other?
Exactly. This is why Replicache expresses change as high-level operations, like createPost or deletePerson that are application-defined.
Replicache doesn't try to automatically merge the effects of concurrent mutations, it just replays the mutations in the same order on each client. It's up to the implementation of the mutation to decide what the correct result is, and that answer can and often does change when the mutation is replayed on top of different states.
Because Replicache mutations are atomic, applications can also enforce invariants such as uniqueness or even more complex app-level invariants.
Imagine, for example, a calendaring application. An application built with Replicache can enforce the invariant that a room is only booked by one event in one time slice even under concurrent edits, just using normal programmatic validation. It's hard to do this kind of thing with CRDTs or other approaches to automatic merging because the data model knows nothing about the application's constraints.
It's a pretty simple-minded system, actually, but our experience is that it is a nice way to think about these problems and provides good results for many types of data, in particular structured data.
I’m not sure if you are understanding that when Replicache rebases operations locally it actually re-executes code which can have arbitrary effects. This design yields a lot of flexibility to preserve intent: the function can look at current state of world and decide to do something different.
Now, it is true that OT is considered the gold standard for certain kinds of collaborative editing, in particular unstructured text. But CRDTs are quickly catching up and I believe that any CRDT should by definition be implementable on top of Replicache.
Its also quite a lot easier to implement a Replicache backend than an ot backend.
I don’t know enough to comment on replicache, but you can also do OT on top of an operation based CRDT. For diamond types we’re making it support both - so if you want to, applications can do OT (which is simple, small, and fast) to talk to a server (or local proxy process), and then that process can do p2p server to server replication using CRDTs.
The result is we need way less complexity in the browser, or in applications. And still get all the advantages crdts bring - namely, no need for a central server acting as the source of truth.
I think for many customers the authoritative server is an advantage. It's useful in SaaS apps for the server to be able to override the clients, for all kinds of reasons -- antiabuse, authorization, extra validation rules, or just fixing bugs.
Yes, I completely agree. And I think we want both:
- A fast and well written CRDT that works in p2p networks should also work great for server-to-server replication in a data center (or across data centers).
- OT algorithms designed to work with centralized servers are simple, efficient, easy to code up and easy to work with. And they provide a really nice API for local applications to do IPC. CRDT libraries can expose OT endpoints just fine.
I'm still not 100% sure about what the best approach is in the P2P case. Embedding (/ linking) a CRDT library into every application would also work fine, but its complicated to get everything working across languages. And harder to update. The other option is running a single system / applicatoin wide CRDT-like service which manages credentials, that applications talk to like LSP / D-Bus. In that case, applications can just talk OT (which is much simpler).
How one wants to see them could depend; that's why I recommend using an RDBMS. One can "play back" transactions using different orders and filters. If teams get confused or accidentally "step on each others toes", then one may need to review different scenarios to see what was intended by two or more parties.
You could build this with couchdb multi master regional servers and pouchdb on the client and have full consistency with the replication both to clients and servers as well as conflict resolution (in case of collision) done for you.
This route seems like a lot of extra work for pretty similar functionality.
So far I've managed to keep the state in my side-project in sync with Websockets and Redux, Replicache sounds like the kind of solution I'd love to use, but boy the pricing makes it impossible to even consider.
I don't have any plans to use Replicache, but I went and looked at the pricing and I was kind of struck by your comment. Looking at it, it seems pretty fair to me? Especially under 10k MAC's. It seems like a flat rate / month is pretty nice too. Plus, it's free for all non-commercial use.
Am I wildly off base here? Is it just that middle tier jump to over 10k that is a no go?
Again, I don't have a horse in this race or even my own startup, just trying to understand if my own judgement is way off.
I would quickly be in the $500/mo tier and that would be a consequent cost to handle since I don't really make that kind of profit yet. But I have to agree anything beyond 10K is very reasonable given the features.
I just kind of wish they had an more affordable bracket between 500 and 10K but they probably have reasons not to.
Very nice writeup! However, the example did not fully work for me. I could perform CRUD on a single tab, but opening the list in multiple tabs did not replicate the list or actions. Seeing this in the console:
[Error] Could not connect to the server.
[Error] Fetch API cannot load https://damp-fire-554.fly.dev/replicache-pull?list_id=kx1I-gXPWwOxU9teRUJ_c due to access control checks.
[Error] Failed to load resource: Could not connect to the server. (replicache-pull, line 0)
It's a nice summary of how to use these technologies, but considering it states avoiding vendor lock-in is a goal, I was surprised to see it using fly.io and a managed cockroachDB.
It didn't actually use CockroachDB, they ended up using Postgres + Read Replicas.
I work on Fly.io, but there's very little vendor lock in here. We can't afford to lock people in, we're too small. We need to make their existing stuff work with zero friction.
Great article! Is there something similar to Replicache that is targeted towards simple multiplayer games? Im building a multiplayer version of a clicker game like Universal Paperclips[0] and dealing with similar the problems that Replicache tries to solve.
> Dealing with a global database brings in much complexity that is not essential to the subject matter of this article, which will wait for another piece.
Excellent write. It would be great to know why CockroachDB failed your needs.
Definitely interested in understanding end user benefit of the distributed database given one of purposes of library is to hide write latency and there needs to be coordination for every write.
A 225K gzipped .wasm file download for a client-side state management and persisistence layer is not great. It is competitive with some similar solutions, but still a lot for any web app's performance budget
Ah, when I brotli compress it locally, it's 188 (which is where I remembered 100 from) but I guess it uses different settings than the auto-brotli in Vercel.
web apps are sorely lacking a core storage technology
whoever gets their first may not make a lot of money but they'll be more influential than react (because the schema design will penetrate native dev as well)
The libraries are also not functionally equivalent. Some use OT, some use CRDTs, some persist state, some are basically websocket wrappers, fairly different perf guarantees in both memory & latency etc. The very different capabilities make it complicated to evaluate all the tools at once.
Obviously I'm partial the Fluid Framework, but not many realtime coauthoring libraries have made it as easy to get started as Replicache. Kudos to them!
A few solutions with notes...
I should add... I talk to many folks in the space. People are very welcoming and excited to help each other. Really fun space right now.