Back in 2009, we wrote our real time collaboration component using Operational Transformations (OT). But when CRDTs started to gain the spotlight, these are the reasons why it didn’t warrant a rewrite to our stack:
#1 Memory issues with tombstones. Marking as deletion has a cost of maintaining them throughout the session.
#2 CRDTs were perfect for plain strings and arbitrary key/value pairs. But when it comes to schematic JSON that has semantic value CRDT was an added overhead. For example, try making a collaborative HTML editor with support for table operations. It is very likely you will end up with a table structure that is invalid to render. In OT analysing, modifying or cancelling ops were much easier.
#3 Since the software is primarily a cloud document editor, a central server is necessary even otherwise. So why not use the server for efficient version management and operation sequencing as well? Why prefer CRDTs whose bulk of complexity comes from eliminating the need of a central system?
Such practical reasons have kept us from venturing into CRDTs. As of now, common editing platforms like Google Docs, Zoho Writer, CKeditor, ProseMirror, Quill, CodeMirror - all of these work with OTs instead of CRDT for collaborative editing.
Or at least that's the lesson I took from all the xi blogs I've read.
Patches are always a little bit garbage, and can cause extra problems with 3 way merges because of it.
At the same time, changes to the lang level would require changes at editor/VC level too.
As en example: a MIX type syntax can separate classes and class-methods into different files, and combine them back again. This way moving a class method around in a class won't be seen as removing a method and creating a new one: the file give the specific method an identity, and a method of differentiating between code movement and code alteration.
- Is there a way to materialize all of the past up to some point, throwing out history in the process, so as to reduce the overhead of CRDT?
- Some CRDT algorithms (really, representations of operations) are better at modeling a user's intent -- rather than "insert 'H' at 101, insert 'e' at 102, ..." and so on, it's "insert 'Hello' between ...". As I'm not familiar with OT, I wonder whether this begins to approach OT... In any case, some of the better CRDT algorithms for text editing tend to require more overhead, which is why the preceding thought.
- I'm much more interested in CRDTs for distributed relational databases, where it's much easier to reason about intent. "Relational" really means "has a well defined, enforced schema", while "text" means "anything goes" -- it's necessarily easier to reason about the former than the latter! Even so, a set of related updates to a relational database may be the complex result of a simpler abstraction -a simpler operation that the user intended- and eventual consistency semantics may ruin that result. One way to deal with that is to centralize certain kinds of operations, but those had better be less essential to business continuity.
- I'm with you on point #1. Obviously could be a source of good research, but also, it probably depends on policy. How many undos do you want to preserve? Even a one-headed VSCode instance only saves so many undos. You could probably hybrid CRDT with a consensus algorithm like raft to perform compactions.
And thanks for the link!
With respect to #3, one reason to prefer CRDTs over OT (even if a central server is required for other stuff) is that making the collaboration infra P2P can save a lot of resources if most of your collab is happening between a few individuals at a time. Not very necessary at a small scale, but if you have 10,000 small teams collab'ing, the cost savings you get from off-loading RTC to the end users can definitely add up.
The downside is the complexity. Now there are more things that can go wrong: the OT algorithm can fail (either due to bugs or because of corrupted data), and now also the distributed consensus algorithm can fail as well. When you have this level of complexity, going to CRDT is pretty appealing as an alternative.
That's not true, you don't need to keep them throughout the session. You can drop them as soon as you either synchronize with other clients or with the server, which can then synchronize with the clients, that's what conflict-free nature of CRDTs allows you to do. The only fundamental requirement is to keep track of changes when you are offline or out of sync until they are sent to others and even then you are supposed to merge those changes into fewer changes keeping disk and memory use down to a minimum.
> CRDTs were perfect for plain strings and arbitrary key/value pairs. But when it comes to schematic JSON that has semantic value CRDT was an added overhead.
Also not true. They were never perfect for plain strings, even incompatible semantically, but perfect for sets, counters, numbers, where basic commutative math is enough, a few other things and anything that combines those primitives into more complex ones, especially true for schematic structures, tables.
You should really try implementing CRDTs, especially OP-based, they change the way you think about keeping shared state in distributed systems.
> Since the software is primarily a cloud document editor, a central server is necessary even otherwise. So why not use the server for efficient version management and operation sequencing as well? Why chose CRDTs whose bulk of complexity comes from eliminating the need of a central system?
There is zero complexity in CRDTs that comes from eliminating the need of a central system, it's a completely different problem orthogonal to CRDTs. Central systems is where CRDTs are used the most actually.
Say 100 nodes edit a doc. To drop a tombstone, without a central server we need 99 acks. Even with a server, the server keeps an ever-growing list of ops unless it uses some garbage collection techniques. And as stated by OP those techniques have their own problems. Remember, OP relies on V8's hidden classes for deriving an optimal representation without eliminating tombstones. Servers though need not necessarily hold CRDTs in JS. It might be a different VM altogether.
> anything that combines those primitives into more complex ones, especially true for schematic structures, tables.
I'm not implying CRDTs are only for simple structures. Automerge has support for tables too. I'm talking about structures with semantic meaning. In database tables the ops are always inserting/delete columns/rows and updating cells. In HTML tables, it extends into cell-merging, cell-splitting, inner tables and all sorts of valid operations. Merging these operations without inspecting the op and writing a custom modifier - will certainly end up in an invalid table structure.
> Central systems is where CRDTs are used the most actually.
I'm assuming the central system referred here is the database resilience systems using CRDTs to merge. Database syncing without master/slave is exactly the kind of problem best solved with CRDTs. The very point is to eliminate the need for a central point of failure.
And another note, Yjs completely supports the OT rich-text type (i.e. delta format ). This is how the Quill editor was made collaborative.
So Yjs is at least as powerful as OT. If you want, you can still base your application around OT deltas, but Yjs is IMO much easier to use.
Sounds interesting. Will give Yjs a try!
Here's how it could be done (very roughly):
When a client makes a modification, it creates an op representing that modification and includes a list of server identifiers to send the op to. Then queues the op. Once op is sent and ACKed by at least one of the servers the op is dropped from the queue.
Servers receiving the op synchronize the op with each other, ACKing each other and removing ACKed servers from the list. They also synchronize with the clients, sending the op to each client and asking clients to ACK to all servers. Obviously there are multiple ways to do it and it can be almost as efficient as theoretically possibly, but of course you can't avoid some metadata for each client associated with a particular document even though it doesn't have to be kept in server memory, it could be stored on disk.
> I'm talking about structures with semantic meaning.
Semantic meaning generally doesn't map to any data structure directly, CRDTs are not an exception here either.
Unfortunately, it's a lot more complicated than this.
First, you can't wait until just one peer has ACKed -- you need to wait until all of them have. Any peer that hasn't ACKed could still send an op based on the one you want to delete, and then you wouldn't have enough information to merge the new op.
Second, you can't just wait until all peers have ACKed an op, you need to wait until no new ops depend on that op. An op could depend on an op even though it has been ACKed by everyone.
Third, a peer could go offline for a while, and then come back online. But if you are waiting for all peers to ACK an op, you might never get a situation where all peers are online at the same time.
Lately there has been other work pursuing those compressed representations: Chronofold, I think Martin Kleppmann's automerge has some memory-efficiency work, and there are others.
But at least in the context of xi, this is not where things went wrong. As I've written about (and the author was kind enough to link), it's because CRDT merges aren't a good fit for the problems a code editor is trying to solve, particularly when the "collaborators" are automated processes such as language servers.
In human collaborative editing, it's important to preserve text that's being entered, even in the face of conflict. But when a peer is an automated service, it's much better to drop the edit on the floor and recompute. I'm simplifying here, as it depends on the service - some are history-insensitive, some are sensitive to a small window of history (in the case of automatically inserting indentation, etc), and (speculative) other services may be sensitive to more history.
In addition, the CRDT constrains the data model considerably. In other words, it's unfortunately not a clean abstraction where you can easily add higher level layers on top of the CRDT, but you always have to design those with the CRDT in mind (ie, everything still has to be a monotonic semi-lattice).
So, as with everything else, it's a tradeoff, and it's a question of weighing the pros and cons. I'm glad to see work being done to improve CRDT, but even with a very efficient representation and solid algorithms, the problems with CRDT would be enough for me not to use them in a code editor.
your GitHub comment about "Why CRDT didn't work out as well for collaborative editing in xi-editor"  was sent to me several times as an argument not to use CRDTs. I agree with you that CRDTs might be too complex for the way the xi-editor used it (although I loved the idea, and appreciate that you tried it).
But the title of the Hackernews post tells a very different story. So many people misunderstood WHY CRDTs didn't work out well for the xi-editor.
At the very least, my article shows that CRDTs are well suited for shared editing. You brought up some valid points why CRDTs are not well suited for having different editor components communicate with each other (language server, indentation, syntax highlighting, ..).
Although, I still think there is a lot of merit in using CRDTs as a data model for a code editor (or any kind of editor). Not for editor components concurrently modifying the editor model, but just as an collaboration-aware model.
• Marijn considered a CRDT as a data model for CodeMirror 6 because positions in collaborative software can be better expressed . A position in the Yjs model is simply a reference to the unique id of the character.
• Even without collaboration, Yjs servers as a highly efficient selective Undo/Redo manager. Each item on the Undo/Redo stack just consumes a couple of bytes. Furthermore, most existing implementations don't support selective Undo/Redo. This is free when using Yjs as a data model.
• Some components can work in the background and annotate the CRDT model (not manipulate it). For example, a code analyzer that runs on a remote computer could annotate a function and notify the user about potential problems. The position of the annotation will still be valid if users modify the model concurrently in a distributed environment.
Regarding position and undo, these are problems that can also be solved just fine using OT techniques, and are (imho) simpler when there is a central authority that can order all revisions into a globally consistent sequence.
I think it's inevitable that people are going to misunderstand arguments, given the complexity of the underlying space and the paucity of good learning resources. So thanks for your writeup, it adds to the discussion.
After playing with a simple implementation of Martin Kleppmann’s newest automerge work, I was wrong to doubt CRDTs. I’m seeing about 6M ops / second in a prototype text engine in rust. I have much more to say on this - probably a blog post more to say. But I think CRDTs are the future for many workloads, and I have a strong sense of despair seeing how much time I’ve wasted investing in approaches which won’t feature strongly in the future.
Automerge is really good.
Your comment makes me sad on several levels..
Firstly, ShareJS and OT types was the first open framework to build collaborative applications on the web. Your work inspired me to work on shared editing - I just wanted it to work over WebRTC.
The idea of distributed applications on the web just popped up a couple of years ago. WebRTC didn't even exist when you started ShareJS. Even Websockets were still a bit experimental. OT was the right technology at the time.
You can compare Automerge's current performance branch with Yjs' current implementation in . To be fair, their implementation is not finished. If I understood correctly, they want to load the compressed format directly into memory. I played with this idea a few years ago and represented the Yjs model in an ArrayBuffer. This approach will improve load-time, but the performance will be intolerable in other aspects (e.g. when applying document updates, transforming the document to a String/JSON, or when computing diffs). The performance of Yjs is the result of more than five years research. I still have a couple of ideas to improve performance significantly. Although, my article clearly shows that it is definitely good enough now.
Then, working backward from language specific modes, I could imagine a way to generalize this to free form text, by negotiating the split points with the authority (or not at all, in the case of free text being edited by one client).
Though I guess for now just making these completely synchronous across the whole buffer, as you have, is fine in most cases, and already a significant improvement vs some editors (looking at you, beloved Emacs).
† i.e. for js, you could imagine a tree of statements/blocks containing expressions (maybe incl. operator precedence); for Clojure you could imagine a plain syntax tree, for JSON you could imagine it being the literal JSON data (plus whitespace metadata).
I think you are not talking about CRDTs here. CRDTs are a pretty good fit for distributed systems that share state, which is what collaborative editing is underneath. It's just that semantically completely automatic conflict resolution is incompatible with human collaborative editing, which is pretty obvious, but for some reason plenty of research goes into trying to do just that.
I find the complexity arguments against OTs to be far, far overblown and mostly academic, and that OT is, for me, much easier to understand.
The two big critiques against OT are transform explosion: you could have N^2 transforms. Except that not every operation pair interacts. I usually see just a couple of categories of operations: text, key/value, tree, and list, and they all only need transforms within a category.
The second critique is around sequencing in a distributed system, but I've also never seen a production system that doesn't have a central coordinating service. With a star topology OT sequencing because simple. You don't even need a lamport clock, you can get away with a simple counter. Buffering at the clients simplifies even more.
There's a great series of posts on collaborative editing by Marijn Haverbeke, the author of CodeMirror and ProseMirror, that are designed with OT in mind: https://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...
Real Differences between OT and CRDT under a General Transformation Framework for Consistency Maintenance in Co-Editors
Proceedings of the ACM on Human-Computer Interaction 2020
Chengzheng Sun, David Sun, Agustina Ng, Weiwei Cai, Bryden Cho
It’s an evolutionary series, here’s the rest I believe: https://arxiv.org/search/cs?query=Sun%2C+Chengzheng&searchty...
>Most CRDTs assign a unique ID to every character that was ever created in the document. In order to ensure that documents can always converge, the CRDT model preserves this metadata even when characters are deleted.
The entirety of the C code in my checkout of the Linux kernel is made out of
567464661 characters (counted with wc on all .c and .h files). Assuming a naive
algorithm that assigns a 128bit UUID to every single one of these characters,
you have a little over 8GB of RAM for the IDs. Of course you also have to
consider the deleted characters (and storing the characters themselves, duh).
That's assuming that you want to keep track of the entire source code forever
however, surely in practice you can be massively more efficient by "freezing"
the changes past a certain reasonable delay.
So yeah, that does seem pretty memory-intensive, but in the age of Electron-based code
editors and docker containers that doesn't seem all that absurd to me tbh. My
Signal desktop client currently uses up almost 300MB of RAM to display plain
text, I find that a lot less reasonable quite frankly.
If you would load the the Yjs document containing the editing history of the Linux kernel in-memory, you would use about 13.8 GB of memory. Of course, you wouldn't write the complete Linux kernel in a single file. As of 2011, the project consisted of ~37,000 files. If you represent each file as a separate Yjs document, you would use, on average, just a few kilobytes to load a single file.
The editing history of the Linux kernel is a very interesting benchmark resource. Maybe I will add it to crdt-benchmarks.
Linux is definitely not the largest git repo ever created . The big corporate monorepos are definitely larger; I know MS has moved Windows to git, and itself claims it to be the largest ever created (~300GB as of 2017, per ). Google and Facebook both eschew git, though.
Finding data on the largest open repos is more difficult. The largest classes of projects are those that develop in monorepos that implement critical operating system  functionality, browser engines, and compiler implementations. The shortlist I'd make comes out to these projects (in no particular order):
I haven't finished downloading all of these repos (my disk is begging me to stop right now), but it looks Linux is larger than gecko-dev by a very thin margin (so a putative gecko-dev that included comm-central with its CVS history as well would easily outstrip Linux), and Chromium seems to be an order of magnitude over both.
 To be clear here, I'm mostly thinking in terms of primarily textual repositories. Repositories with large binary assets are clearly not relevant for your means.
 https://devblogs.microsoft.com/bharry/the-largest-git-repo-o..., although https://news.ycombinator.com/item?id=14411724 claims that the 300 GB measures the size of the checked-out directory on disk, not the putative size of a full .git folder.
 I'm including both kernel roles as well as key userspace roles. Qt and Gnome would both be on my list of putative largest repos were they monorepos, but they appear to use many small repos instead.
Also, would establishing a hierarchy of OT groups (or some division of responsibility over the dataset) help with scaling?
It just seems like it's easier to make OT work in practice and that there might be practical techniques and compromises to overcome its limitations.
Being able to sync better through the day means reducing multiple checks on families, adds in new contacts that could be checked by others nearby. Stuff like that.
Thanks for sharing this app. I haven’t seen any data showing these user run apps are useful. Even if they got enough users to run them, the noise is so high to actually do anything with the results.
Although I’m optimistic that someone will figure it out and publish some papers on it.
For anybody wondering.