Hacker News new | past | comments | ask | show | jobs | submit login
Are CRDTs suitable for shared editing? (kevinjahns.de)
170 points by signa11 69 days ago | hide | past | favorite | 43 comments



I’m part of the team that makes Zoho Writer (Google Docs alternative) - https://writer.zoho.com

Back in 2009, we wrote our real time collaboration component using Operational Transformations (OT). But when CRDTs started to gain the spotlight, these are the reasons why it didn’t warrant a rewrite to our stack:

#1 Memory issues with tombstones. Marking as deletion has a cost of maintaining them throughout the session.

#2 CRDTs were perfect for plain strings and arbitrary key/value pairs. But when it comes to schematic JSON that has semantic value CRDT was an added overhead. For example, try making a collaborative HTML editor with support for table operations. It is very likely you will end up with a table structure that is invalid to render. In OT analysing, modifying or cancelling ops were much easier.

#3 Since the software is primarily a cloud document editor, a central server is necessary even otherwise. So why not use the server for efficient version management and operation sequencing as well? Why prefer CRDTs whose bulk of complexity comes from eliminating the need of a central system?

Such practical reasons have kept us from venturing into CRDTs. As of now, common editing platforms like Google Docs, Zoho Writer, CKeditor, ProseMirror, Quill, CodeMirror - all of these work with OTs instead of CRDT for collaborative editing.


Also, if the monoids used in a CRDT implementation do not align with the mental model users have for what they're doing (hint: they do not), then the result will be artifacts the users don't like. Single user text editing is amenable to using ropes and monoids internally because none of that leaks into the UI/UX, but multi-user text editing is not amenable to representation as monoidal interactions because in the multi-user case the monoidal interactions will become evident and confusing.

Or at least that's the lesson I took from all the xi blogs I've read.


This is a problem we need to solve with code editors and version control as well.

Patches are always a little bit garbage, and can cause extra problems with 3 way merges because of it.


In feel this is probably a language-level problem, and solving it only at editor/VC level will be complicated.

At the same time, changes to the lang level would require changes at editor/VC level too.

As en example: a MIX type syntax can separate classes and class-methods into different files, and combine them back again. This way moving a class method around in a class won't be seen as removing a method and creating a new one: the file give the specific method an identity, and a method of differentiating between code movement and code alteration.


I think it's important to note that "they do not yet". CRDTs are an active area of research, relatively young, and it is not unreasonable to expect that new datastructures which DO align to users' mental models will appear in the not-too-distant future.


CRDT is very well understood. The basics of the theory is utterly trivial. The only question is whether we can design new monoids that line up better with human understandings of their own intent -- of this I am skeptical because our mental models of what we're doing are not at all monoidal, though because CRDTs are so elegant and so simple, I would be ecstatic if indeed we find/design such monoids.


what do you think of Martin Klepmann's research, e.g: https://martin.kleppmann.com/2020/07/06/crdt-hard-parts-hydr...


I'll watch it and let you know. Sounds very interesting!


I'm part of the way through. Here are some thoughts.

- Is there a way to materialize all of the past up to some point, throwing out history in the process, so as to reduce the overhead of CRDT?

- Some CRDT algorithms (really, representations of operations) are better at modeling a user's intent -- rather than "insert 'H' at 101, insert 'e' at 102, ..." and so on, it's "insert 'Hello' between ...". As I'm not familiar with OT, I wonder whether this begins to approach OT... In any case, some of the better CRDT algorithms for text editing tend to require more overhead, which is why the preceding thought.

- I'm much more interested in CRDTs for distributed relational databases, where it's much easier to reason about intent. "Relational" really means "has a well defined, enforced schema", while "text" means "anything goes" -- it's necessarily easier to reason about the former than the latter! Even so, a set of related updates to a relational database may be the complex result of a simpler abstraction -a simpler operation that the user intended- and eventual consistency semantics may ruin that result. One way to deal with that is to centralize certain kinds of operations, but those had better be less essential to business continuity.


thanks for those thoughts!

- I'm with you on point #1. Obviously could be a source of good research, but also, it probably depends on policy. How many undos do you want to preserve? Even a one-headed VSCode instance only saves so many undos. You could probably hybrid CRDT with a consensus algorithm like raft to perform compactions.


Right, if it's about undo history, screw that, at some point you don't care anymore -- you might build a sort of VCS into the text editor so you can keep track of changes in a less granular way. If it's about knowing when you have all the history before some point in time, then that's doable.

And thanks for the link!


I think the main impetus for this post is to assert that #1 and #2 aren't much of a problem for Yjs, as they are for the arguably more well-known CRDT lib for JS, Automerge.

With respect to #3, one reason to prefer CRDTs over OT (even if a central server is required for other stuff) is that making the collaboration infra P2P can save a lot of resources if most of your collab is happening between a few individuals at a time. Not very necessary at a small scale, but if you have 10,000 small teams collab'ing, the cost savings you get from off-loading RTC to the end users can definitely add up.


OT doesn’t require a central server, just a tie breaker. You could just do leader election over webRTC using raft and offload OT to the client side as well and have that node push updates to the server for durable saving every x seconds and shift RTC off your infra.


Right, the idea that OT requires a "central server" is a bit of an oversimplification. It requires an authority that can serialize concurrent revisions into a globally consistent sequence. By far the easiest way to achieve that is an actual central server, but distributed algorithms have also been published for a long time, for example SOCT2 which uses a state vector.

The downside is the complexity. Now there are more things that can go wrong: the OT algorithm can fail (either due to bugs or because of corrupted data), and now also the distributed consensus algorithm can fail as well. When you have this level of complexity, going to CRDT is pretty appealing as an alternative.


> Memory issues with tombstones. Marking as deletion has a cost of maintaining them throughout the session.

That's not true, you don't need to keep them throughout the session. You can drop them as soon as you either synchronize with other clients or with the server, which can then synchronize with the clients, that's what conflict-free nature of CRDTs allows you to do. The only fundamental requirement is to keep track of changes when you are offline or out of sync until they are sent to others and even then you are supposed to merge those changes into fewer changes keeping disk and memory use down to a minimum.

> CRDTs were perfect for plain strings and arbitrary key/value pairs. But when it comes to schematic JSON that has semantic value CRDT was an added overhead.

Also not true. They were never perfect for plain strings, even incompatible semantically, but perfect for sets, counters, numbers, where basic commutative math is enough, a few other things and anything that combines those primitives into more complex ones, especially true for schematic structures, tables.

You should really try implementing CRDTs, especially OP-based, they change the way you think about keeping shared state in distributed systems.

> Since the software is primarily a cloud document editor, a central server is necessary even otherwise. So why not use the server for efficient version management and operation sequencing as well? Why chose CRDTs whose bulk of complexity comes from eliminating the need of a central system?

There is zero complexity in CRDTs that comes from eliminating the need of a central system, it's a completely different problem orthogonal to CRDTs. Central systems is where CRDTs are used the most actually.


> You can drop them as soon as you either synchronize with other clients or with the server

Say 100 nodes edit a doc. To drop a tombstone, without a central server we need 99 acks. Even with a server, the server keeps an ever-growing list of ops unless it uses some garbage collection techniques. And as stated by OP those techniques have their own problems. Remember, OP relies on V8's hidden classes for deriving an optimal representation without eliminating tombstones. Servers though need not necessarily hold CRDTs in JS. It might be a different VM altogether.

> anything that combines those primitives into more complex ones, especially true for schematic structures, tables.

I'm not implying CRDTs are only for simple structures. Automerge has support for tables too. I'm talking about structures with semantic meaning. In database tables the ops are always inserting/delete columns/rows and updating cells. In HTML tables, it extends into cell-merging, cell-splitting, inner tables and all sorts of valid operations. Merging these operations without inspecting the op and writing a custom modifier - will certainly end up in an invalid table structure.

> Central systems is where CRDTs are used the most actually.

I'm assuming the central system referred here is the database resilience systems using CRDTs to merge. Database syncing without master/slave is exactly the kind of problem best solved with CRDTs. The very point is to eliminate the need for a central point of failure.


OT comes with very different problems when it comes to representing tables. Structured content (tables, tables containing tables, tables containing quotes containing tables, ..) is much better represented using CRDTs.

And another note, Yjs completely supports the OT rich-text type (i.e. delta format [1]). This is how the Quill editor was made collaborative.

So Yjs is at least as powerful as OT. If you want, you can still base your application around OT deltas, but Yjs is IMO much easier to use.

[1]: https://quilljs.com/docs/delta/


> So Yjs is at least as powerful as OT.

Sounds interesting. Will give Yjs a try!


Great! :)


> Say 100 nodes edit a doc. To drop a tombstone, without a central server we need 99 acks. Even with a server, the server keeps an ever-growing list of ops unless it uses some garbage collection techniques. And as stated by OP those techniques have their own problems.

Here's how it could be done (very roughly):

When a client makes a modification, it creates an op representing that modification and includes a list of server identifiers to send the op to. Then queues the op. Once op is sent and ACKed by at least one of the servers the op is dropped from the queue.

Servers receiving the op synchronize the op with each other, ACKing each other and removing ACKed servers from the list. They also synchronize with the clients, sending the op to each client and asking clients to ACK to all servers. Obviously there are multiple ways to do it and it can be almost as efficient as theoretically possibly, but of course you can't avoid some metadata for each client associated with a particular document even though it doesn't have to be kept in server memory, it could be stored on disk.

> I'm talking about structures with semantic meaning.

Semantic meaning generally doesn't map to any data structure directly, CRDTs are not an exception here either.


> Then queues the op. Once op is sent and ACKed by at least one of the servers the op is dropped from the queue.

Unfortunately, it's a lot more complicated than this.

First, you can't wait until just one peer has ACKed -- you need to wait until all of them have. Any peer that hasn't ACKed could still send an op based on the one you want to delete, and then you wouldn't have enough information to merge the new op.

Second, you can't just wait until all peers have ACKed an op, you need to wait until no new ops depend on that op. An op could depend on an op even though it has been ACKed by everyone.

Third, a peer could go offline for a while, and then come back online. But if you are waiting for all peers to ACK an op, you might never get a situation where all peers are online at the same time.


What do you think of Chronofold [1]?

[1] https://arxiv.org/abs/2002.09511


In xi, there are two issues. One is the cost of the CRDT representation, the other is whether it accurately represents what you're trying to do. Classical implementations of CRDT use a heavyweight node per character in the text, each with its own unique and permanent id. In my research in xi, it became clear to me that more compressed representations are possible, and the xi CRDT is one such: it is an "op" object per edit, but the string itself is just a string.

Lately there has been other work pursuing those compressed representations: Chronofold, I think Martin Kleppmann's automerge has some memory-efficiency work, and there are others.

But at least in the context of xi, this is not where things went wrong. As I've written about (and the author was kind enough to link), it's because CRDT merges aren't a good fit for the problems a code editor is trying to solve, particularly when the "collaborators" are automated processes such as language servers.

In human collaborative editing, it's important to preserve text that's being entered, even in the face of conflict. But when a peer is an automated service, it's much better to drop the edit on the floor and recompute. I'm simplifying here, as it depends on the service - some are history-insensitive, some are sensitive to a small window of history (in the case of automatically inserting indentation, etc), and (speculative) other services may be sensitive to more history.

In addition, the CRDT constrains the data model considerably. In other words, it's unfortunately not a clean abstraction where you can easily add higher level layers on top of the CRDT, but you always have to design those with the CRDT in mind (ie, everything still has to be a monotonic semi-lattice).

So, as with everything else, it's a tradeoff, and it's a question of weighing the pros and cons. I'm glad to see work being done to improve CRDT, but even with a very efficient representation and solid algorithms, the problems with CRDT would be enough for me not to use them in a code editor.


Hi Raph,

your GitHub comment about "Why CRDT didn't work out as well for collaborative editing in xi-editor" [1] was sent to me several times as an argument not to use CRDTs. I agree with you that CRDTs might be too complex for the way the xi-editor used it (although I loved the idea, and appreciate that you tried it).

But the title of the Hackernews post tells a very different story. So many people misunderstood WHY CRDTs didn't work out well for the xi-editor.

At the very least, my article shows that CRDTs are well suited for shared editing. You brought up some valid points why CRDTs are not well suited for having different editor components communicate with each other (language server, indentation, syntax highlighting, ..).

Although, I still think there is a lot of merit in using CRDTs as a data model for a code editor (or any kind of editor). Not for editor components concurrently modifying the editor model, but just as an collaboration-aware model.

• Marijn considered a CRDT as a data model for CodeMirror 6 because positions in collaborative software can be better expressed [2]. A position in the Yjs model is simply a reference to the unique id of the character.

• Even without collaboration, Yjs servers as a highly efficient selective Undo/Redo manager. Each item on the Undo/Redo stack just consumes a couple of bytes. Furthermore, most existing implementations don't support selective Undo/Redo. This is free when using Yjs as a data model.

• Some components can work in the background and annotate the CRDT model (not manipulate it). For example, a code analyzer that runs on a remote computer could annotate a function and notify the user about potential problems. The position of the annotation will still be valid if users modify the model concurrently in a distributed environment.

[1]: https://news.ycombinator.com/item?id=19886883

[2]: https://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...


I'm not sure what to add. I definitely agree that CRDT is viable for collaborative editing, I'm just saying people need to be aware of the issues.

Regarding position and undo, these are problems that can also be solved just fine using OT techniques, and are (imho) simpler when there is a central authority that can order all revisions into a globally consistent sequence.

I think it's inevitable that people are going to misunderstand arguments, given the complexity of the underlying space and the paucity of good learning resources. So thanks for your writeup, it adds to the discussion.


Just to add my 2c, having written sharejs and sharedb and working on and off on OT systems for the past 10 years or so:

After playing with a simple implementation of Martin Kleppmann’s newest automerge work, I was wrong to doubt CRDTs. I’m seeing about 6M ops / second in a prototype text engine in rust. I have much more to say on this - probably a blog post more to say. But I think CRDTs are the future for many workloads, and I have a strong sense of despair seeing how much time I’ve wasted investing in approaches which won’t feature strongly in the future.

Automerge is really good.


> I have a strong sense of despair seeing how much time I’ve wasted investing in approaches which won’t feature strongly in the future.

Your comment makes me sad on several levels..

Firstly, ShareJS and OT types was the first open framework to build collaborative applications on the web. Your work inspired me to work on shared editing - I just wanted it to work over WebRTC.

Until 2015, JavaScript engines implemented different garbage collection approaches that wouldn't allow efficient CRDT implementations (as they need to handle millions of objects).

The idea of distributed applications on the web just popped up a couple of years ago. WebRTC didn't even exist when you started ShareJS. Even Websockets were still a bit experimental. OT was the right technology at the time.

Lastly, I'd love for you to try out Yjs. The current JavaScript implementation handles 2.5 million operations / second. A Rust/C implementation would surely handle more than that as it is not limited by automatic garbage collection.

You can compare Automerge's current performance branch with Yjs' current implementation in [1]. To be fair, their implementation is not finished. If I understood correctly, they want to load the compressed format directly into memory. I played with this idea a few years ago and represented the Yjs model in an ArrayBuffer. This approach will improve load-time, but the performance will be intolerable in other aspects (e.g. when applying document updates, transforming the document to a String/JSON, or when computing diffs). The performance of Yjs is the result of more than five years research. I still have a couple of ideas to improve performance significantly. Although, my article clearly shows that it is definitely good enough now.

[1]: https://github.com/dmonad/crdt-benchmarks/pull/4


I think we can agree on that. CRDT vs OT is a complex topic with lots of pros and cons for either side.


I think it would be interesting to approach this from the language mode side, insisting on structured buffers (syntax tree with rope leaves†). It is pretty rare that two clients (mode and user window, or two user windows) would lock a single large leaf, so you could just choose an authority/leader for each subtree, and have the leader explicitly and synchronously sequence the peon clients' edits.

Then, working backward from language specific modes, I could imagine a way to generalize this to free form text, by negotiating the split points with the authority (or not at all, in the case of free text being edited by one client).

Though I guess for now just making these completely synchronous across the whole buffer, as you have, is fine in most cases, and already a significant improvement vs some editors (looking at you, beloved Emacs).

† i.e. for js, you could imagine a tree of statements/blocks containing expressions (maybe incl. operator precedence); for Clojure you could imagine a plain syntax tree, for JSON you could imagine it being the literal JSON data (plus whitespace metadata).


> it's because CRDT merges aren't a good fit for the problems a code editor is trying to solve

I think you are not talking about CRDTs here. CRDTs are a pretty good fit for distributed systems that share state, which is what collaborative editing is underneath. It's just that semantically completely automatic conflict resolution is incompatible with human collaborative editing, which is pretty obvious, but for some reason plenty of research goes into trying to do just that.


I was trying to use a CRDT to share state between the GUI editor and the language server process. My attempt didn't work well, for reasons I've written about. I have no doubt it can be made to work, and the research towards that might be interesting.


I've implemented operational transform based systems several times, and every time I check out CRDTs I end up staying with OT.

I find the complexity arguments against OTs to be far, far overblown and mostly academic, and that OT is, for me, much easier to understand.

The two big critiques against OT are transform explosion: you could have N^2 transforms. Except that not every operation pair interacts. I usually see just a couple of categories of operations: text, key/value, tree, and list, and they all only need transforms within a category.

The second critique is around sequencing in a distributed system, but I've also never seen a production system that doesn't have a central coordinating service. With a star topology OT sequencing because simple. You don't even need a lamport clock, you can get away with a simple counter. Buffering at the clients simplifies even more.

There's a great series of posts on collaborative editing by Marijn Haverbeke, the author of CodeMirror and ProseMirror, that are designed with OT in mind: https://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...


Following the golden rule, I always post a link to a series of papers comparing the theoretical properties of CRDTs and OT – here's the latest one:

Real Differences between OT and CRDT under a General Transformation Framework for Consistency Maintenance in Co-Editors

Proceedings of the ACM on Human-Computer Interaction 2020

Chengzheng Sun, David Sun, Agustina Ng, Weiwei Cai, Bryden Cho

It’s an evolutionary series, here’s the rest I believe: https://arxiv.org/search/cs?query=Sun%2C+Chengzheng&searchty...


Here is the best article I read on CDRT. It is the thing that got it from theory to "ok I think I can impelment it if I ever need it and I believe it would be a solid solution": http://archagon.net/blog/2018/03/24/data-laced-with-history/


Nice overview of Yjs. I went down a bit of a CRDT rabbit hole a few months ago before ultimately deciding against using it in a project. Very interesting and fun stuff though, I keep a close eye on these projects. I was surprised not to see automerge mentioned but it does show up in the benchmarks. They're (automerge team) working on a major rewrite of some internal data structures that will eventually improve performance quite a lot, it also has a Rust and a Swift port.


I don't have an opinion on the core of the issue but this paragraph leaved me perplex:

>Most CRDTs assign a unique ID to every character that was ever created in the document. In order to ensure that documents can always converge, the CRDT model preserves this metadata even when characters are deleted.

The entirety of the C code in my checkout of the Linux kernel is made out of 567464661 characters (counted with wc on all .c and .h files). Assuming a naive algorithm that assigns a 128bit UUID to every single one of these characters, you have a little over 8GB of RAM for the IDs. Of course you also have to consider the deleted characters (and storing the characters themselves, duh).

That's assuming that you want to keep track of the entire source code forever however, surely in practice you can be massively more efficient by "freezing" the changes past a certain reasonable delay.

So yeah, that does seem pretty memory-intensive, but in the age of Electron-based code editors and docker containers that doesn't seem all that absurd to me tbh. My Signal desktop client currently uses up almost 300MB of RAM to display plain text, I find that a lot less reasonable quite frankly.


Based on the [B4] benchmark, we can predict the size Yjs document representing the complete editing history of the Linux kernel (probably the largest Git repo ever created): 864 MB. The size of the Git repository is currently 1.1 GB. So Yjs has better encoding.

If you would load the the Yjs document containing the editing history of the Linux kernel in-memory, you would use about 13.8 GB of memory. Of course, you wouldn't write the complete Linux kernel in a single file. As of 2011, the project consisted of ~37,000 files. If you represent each file as a separate Yjs document, you would use, on average, just a few kilobytes to load a single file.

The editing history of the Linux kernel is a very interesting benchmark resource. Maybe I will add it to crdt-benchmarks.


> the complete editing history of the Linux kernel (probably the largest Git repo ever created)

Linux is definitely not the largest git repo ever created [0]. The big corporate monorepos are definitely larger; I know MS has moved Windows to git, and itself claims it to be the largest ever created (~300GB as of 2017, per [1]). Google and Facebook both eschew git, though.

Finding data on the largest open repos is more difficult. The largest classes of projects are those that develop in monorepos that implement critical operating system [2] functionality, browser engines, and compiler implementations. The shortlist I'd make comes out to these projects (in no particular order):

* gcc

* LLVM

* Mozilla

* Chromium

* Linux

* OpenJDK

I haven't finished downloading all of these repos (my disk is begging me to stop right now), but it looks Linux is larger than gecko-dev by a very thin margin (so a putative gecko-dev that included comm-central with its CVS history as well would easily outstrip Linux), and Chromium seems to be an order of magnitude over both.

[0] To be clear here, I'm mostly thinking in terms of primarily textual repositories. Repositories with large binary assets are clearly not relevant for your means.

[1] https://devblogs.microsoft.com/bharry/the-largest-git-repo-o..., although https://news.ycombinator.com/item?id=14411724 claims that the 300 GB measures the size of the checked-out directory on disk, not the putative size of a full .git folder.

[2] I'm including both kernel roles as well as key userspace roles. Qt and Gnome would both be on my list of putative largest repos were they monorepos, but they appear to use many small repos instead.


Has anyone tried a pseudo-distributed OT with a floating server (maybe leader elections)?

Also, would establishing a hierarchy of OT groups (or some division of responsibility over the dataset) help with scaling?

It just seems like it's easier to make OT work in practice and that there might be practical techniques and compromises to overcome its limitations.


It seems like CRDTs would be useful for contact tracing in that distributed contact tracers collect data on cases and potential cases. They operate independently for hours through the day with occasional network access or at least end of day.


Since there is no concurrency in contact tracing (only you will manipulate your own data), a CRDT might be an unnecessary overhead. The German Corona-Warn-App is a decentralized approach for contact tracing. It's awesome, you should check it out: https://github.com/corona-warn-app


There is concurrency in contact tracing. In developing nations it’s like a map reduce problem. You send out the contact tracers in the morning, they gather results and sync up at the end of the day.

Being able to sync better through the day means reducing multiple checks on families, adds in new contacts that could be checked by others nearby. Stuff like that.

Thanks for sharing this app. I haven’t seen any data showing these user run apps are useful. Even if they got enough users to run them, the noise is so high to actually do anything with the results.

Although I’m optimistic that someone will figure it out and publish some papers on it.


>Conflict-free Replicated Data Type (CRDT)

For anybody wondering.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: