Hacker News new | past | comments | ask | show | jobs | submit login
Cola: A text CRDT for real-time collaborative editing (nomad.foo)
300 points by homarp on Sept 3, 2023 | hide | past | favorite | 42 comments



I would guess a G-tree is still a B-tree with additional parent pointers, the fact that it is stored in an array is a matter of representation but does not fundamentally change the structure. This still stores pointers, they are just in units of the node size instead of bytes and relative to the first array element instead of the beginning of the address space. A complete binary tree stored in a array without any explicit references, i.e. an implicit representation with the children of the node at index x stored at indices 2x + 1 and 2x + 2, is still referred to as a binary tree, just with an implicit representation.


It’s useful to talk about representation though. Especially in a language that nudges you away from absolute references towards relative, self managed ones.

A quite clever representation of a tree that I read about is to store nodes in DFS order in a flat array. Given that it’s read only and the readers want to traverse it depth first anyway, this can be quite efficient. S-expressions and HTML come to mind.


My point was not that it might not be important, just that you probably do not want to assign a new name to a data structure just because you use a different representation. If the implementation would have been in a different language, then adding parent pointers into the B-tree would have been less of a concern and the representation with an array would maybe never have come up. It would be weird to say that this CRDT is based on a B-Tree with parent pointers, unless you implement it in Rust, then it is based on a G-tree.


This doesn’t appear to support rich text formatting ranges like bold, italic, etc - unless I’m missing something in the API. AFAIK Peritext is still the state of the art in rich text CRDT algorithms https://www.inkandswitch.com/peritext/

I’d love to see this build the rich text stuff from the Peritext algorithm.


As a somewhat interested observer of the CRDT space, I always wondered if it would be possible to create a more general purpose "structured data" CRDT. I.e that the user define a model/schema to describe semantically valid states. If you just merge e.g JSON you might end up in a state that's syntactically valid, but doesn't make any sense.

Similar how the Pretext algorithm know that bold/italic/underlining operations are additive while highlighting colouring is not, I'm thinking the user could control the schema to declare that "state: notStarted" and "completionDate: 2023-09-04" is incompatible


You can get a surprisingly long way by just saying each field has atomic and independent changes. That's what Figma does. For structured data it seems to map well to what the user expects. If Bob drags something five pixels right and Jane drags it ten pixels then they find it reasonable if the state is five or ten; fifteen or 7.5 would be surprising.

You can also implement what you want at read time. "notStarted = !state.completionDate && !state.started".

Edit: Figma does this with their own server and client that intelligently handle invalid states like the parent pointers forming a loop. So it's not quite fair to say it's "just" that simple.


This is fine for everything except text


CRDT synthesis feels like a promising direction, e.g. see "Katara: Synthesizing CRDTs with Verified Lifting"/ https://arxiv.org/abs/2205.12425 / previous discussion https://news.ycombinator.com/item?id=32977887


> I always wondered if it would be possible to create a more general purpose "structured data" CRDT.

Technically yes; one approach would be to use a list CRDT containing representations of operations on the data structure. Each replica would play the operations forward and reject any that broke the schema to “project” the operations onto a local representation of the shared data structure.

So in your example, the operation to set state: notStarted would be rejected if a completion date were present.

Besides the fact that the computational cost of the data structure will only ever grow, another consideration is that even though you are locally only appending to the list of operations, operations from peers may arrive that are not at the end of the global order.

This approach would only guarantee that the data is consistent and meets the schema — it doesn’t guarantee that merges are semantically what a user would expect.


I've hoped for so long that I've sort of given up: inevitably each implementation requires very specific judgement calls about what edits mean and even after all the computer science, is missing fundamental things. (c.f. first footnote about deletes aren't un-doable). Every time I look into it I end up at "might as well write real syncing code, it'll be done by the time I have this compiling and a UI around it"


Question: the post says Cola only concerns itself with making conflict-free edits to unknown datatypes (just streams of characters & their locations).

Couldn’t formatting be represented in the text itself the way html does it? I honestly don’t know much about other rich text representations.

Edit: the Peritext doc linked above talks out exactly these unique RTF challenges. Very interesting read.


Any comparisons to Automerge or Y.js/Yrs regarding performance or features?


The “third part” of the post starts with “I’ve benchmarked cola against 3 other CRDTs implemented in Rust: diamond-types, automerge and yrs.” This cola library appears to perform favorably in operation speed.

I’d be curious to know about memory usage, too.


Seems like cola doesn't support pruning so I'd be worried about the performance when working with files with lots of edits.



Or Cache Oblivious Lookahead Arrays which considered to be an alternative to MergeTree (LSM), but not used anymore: https://github.com/giannitedesco/cola


also its written in rust, not to be confused with https://rust.facepunch.com/


I’m fighting the urge to expand on this joke but this isn’t reddit.


Or coda.io, a Notion clone style app.


Nice work! However, it doesn't seem like a fair benchmark. It doesn't calculate and store the ops, nor does it store the actual text. To build a text CRDT with delta updates support upon it, users must store OpID => Text in an extra structure, which isn't cheap.


> The solution is surprisingly simple: we can store every node in a dynamic array, (a Vec in Rust) and use indices into this vector to refer to other nodes. This solves the ownership problem because the vector now owns all the nodes. And by simply storing the index of the parent in every node we can navigate the tree in both directions without having to use any unsafe code.

> This all hinges on the assumption that once we’ve inserted a node into the vector its index never changes. Inserting new nodes doesn’t cause any issues: we just append them to the end. Removing a node however would be doubly bad: not only is it a linear operation (because we’re now using a vector instead of a tree), but it would also invalidate all the indices of the nodes that come after it.

If you use my crate slotmap (https://docs.rs/slotmap/latest/slotmap/) you can support deletions without worrying about indices shifting or 'indices' (slotmap calls them keys) ever pointing at different values, as the keys in slotmap also come with a version number.


Would it work in a CRDT context, where ordering is hard and so versions could only be local?


There are two kinds of versions here. The CRDT version and Slotmap's internal version. I am not proposing you merge the two concepts, rather that instead of a Vec and simple indexes you use a Slotmap and keys, which then does allow deletion at the data structure level (figuring out the deletion in a CRDT fashion is something else). The rest of the design would stay the same.

I'm not even saying that it's a particularly good fit here, especially if you wish to support undos. It's more of a future tip for when you want to use a similar pattern in a context where you really do want deletions.


I might give this a try for a project that Etherpad and Word could not handle.


Since it assumes all connected clients will receive all edits, why not just include a hash of the expected existing state of the text data before the offset and insertion/deletion/replacement instructions? That way, edits are more or less guaranteed to only be applied to appropriate states, and further changes can be accumulated in a dictionary with the hash key being the hash of the state of the data it’s expected to be applied to

Granted, that’s a lot of hash operations being redundantly computed from the same data, but it is trivial to understand and implement


Because asynchronous edits would never be applied with that system.

If I make local edits to a document and you make remote edits to a document, the hash for your "expected state" will never match my document's state in the future because I have already made local changes.

Yes, CRDTs require all clients receive all edits in order to guarantee convergence, but the updates not needing to be applied in a certain order (which is what you are proposing) is an important property for real-world distributed use-cases.


That’s only the case for CmRDTs, right? AFAIK CvRDTs (which send the whole state with every request) don’t impose any constraints on message order.


Ah, a bit of an "eats shoots and leaves" situations here.

I mean: Yes, CRDTs require all clients receive all edits in order to guarantee convergence.

They indeed do not need to be "in order", they merely need to receive all edits "to guarantee convergence". My insertion of "in order" was definitely confusing there.


Oops lol. I guess that applies to CvRDTs as well, since receiving a given state also means receiving all edits that caused it?


Does anyone know if there is an easy way to enable collaborative editing of a form in the browser? Say two people open the same page/form. Now they should be able to see which form field the other person is currently editing and for text form fields a text CRDT should be used. I tried to implement something like this with Yjs but unfortunately it was quite hard and didn't work well.


Why didn’t Yjs work? It seems like it would be perfect for that.


As someone new to it, I found it hard because the docs were incomplete. The "usual" use case of collaborative text editing with a pre-made text editor was well documented. For example here you can see a lot of "todo" in the docs:

https://docs.yjs.dev/tutorials/creating-a-custom-provider


Is this something I can use eg with tiptap/prosemirror or some other text editor to add CRDT-based collaborative editing?


I don’t think it’s made to support WASM. You’d also need a compatibility shim to get it to act like a Y Doc.


I believe prosemirror and tiptap (built on top of prosemirror) has builtin support for Y.js


looks very promising. can't wait to use it for glicol


Certificate seems broken?


written in Rust :)

https://github.com/nomad/cola - MIT licensed


Just a nitpick / editorial suggestion: The acronym "CRDT" is not extremely common, and never defined in the document. Such things trigger my obsessive personality streak hard and maybe others' too. Thankfully, Wikipedia gave me the answer, assuming this is about conflict-free replicated data types.


> and never defined in the document.

From the document's first part:

> A Conflict-free Replicated Data Type, or CRDT, is an umbrella term for any data structure that can be replicated and modified concurrently across multiple sites, and is guaranteed to converge without the need for a central authority to coordinate the changes.


Sorry about that. I saw the title, didn't know what CRDT meant, and clicked through to the document specifically to find out, not particularly wanting to read the article. I was then scan-reading and missed it. I must say in my defense, though, that the convention is to define an acronym when it's first used, excluding the title, and the definition is only at the fourth use excluding the title and first section heading.


> I must say in my defense, though, that the convention is to define an acronym when it's first used… the definition is only at the fourth use…

To be fair, the definition is not in the bulleted abstract, it is in the first use of the term in the first section, which someone not familiar with CRDT might first scan, helpfully headlined “Intro to CRDTs”:

# First part: Intro to text CRDTs

A Conflict-free Replicated Data Type, or CRDT, is an umbrella term for any data structure that can be replicated and modified concurrently across multiple sites, and is guaranteed to converge without the need for a central authority to coordinate the changes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: