For this kind of stuff (collaborative editing in general), it is probably possible to train a NN model in order to guess what is the best blend between the user conflicts. Something like that would likely work very well, reducing the importance of CRDTs by a lot in all the systems where conflicts are due to humans working at the same thing and where the outcome of the conflict, whatever it is, can't create any damage.
Huh - the idea of training a model for conflict resolution is really interesting. For code, for instance, given the massive dataset that Github has of how merge conflicts were resolved in real codebases... I could see the possibility of a "non-interactive-by-default" type of developer tooling where your feature branch is getting live-updated in the background from the develop/main branch as you code, and where large refactors are no more difficult than working in two distinct parts of the codebase. Definitely food for thought!
CRDTs give you a tool to resolve conflicts at the data layer, but not necessarily at the semantic layer.
In the example, we have a canvas. Turn up the latency. User A draws an outline of a heart on their canvas, and User B draws a smiley face on theirs.
After the "network" catches up, the CRDT does it's thing, and our states have been synchronised. But we're left with an overlapping heart/smiley combo that neither collaborator really intended.
A really smart system might say "user A started drawing their heart a moment before User B started drawing a smiley. Even though most of the drawing happened concurrently, we will give user A's drawing the priority" (perhaps also making a backup of B's work so that it isn't lost).
Resolving conflicts in this way requires an understanding of semantics and intent - that a NN could perhaps provide a heuristic for.
It's hard for me to ignore that generative AI can do images and text so the immediate usefulness of this resolving conflicting edits is going over my head.
Both users would have to share an intent that has more details that what a particular conflict and the current canvases have to offer. From there though, you could synthesize a compromise, like let's say choosing a color that isn't either user's first choice but is close enough to both, and also happens to be a good choice given the rest of the image. Or it decides that a compromise isn't a good idea and that choosing one of the user's edits is best.
If you didn't want to share that intent though, you are giving up your intent to the machine's intent (however it was trained), so you may as well generate two separate things and use some other tools to mix them.
What if user A and user B were drawing things that were supposed to coexist? Perhaps one is drawing the heart outline, the other is filling it in, etc.
If every potentially-conflicting action was followed up with a "which version do you want to keep?", it would get rather tedious.
And I don't imagine an "AI" would answer that question 100% accurately either, but it would make it a less frequent issue.
A text example: one user pluralizes the word "document", another user does the same (with some amount of latency so that they both feel the need to do it). A CRDT will happily converge on "documentss". This is of course not really what you want. An LLM on top of this would trivially correct to "documents".
This actually isn't a great example because documentss isn't a word, but think of a case with "desert" and "desserts" or something like that.
Point is, CRDTs guarantee convergence of the text, but they don't guarantee semantic intent is maintained.
I'm not sure you need an LLM to handle this example.
We have two other pieces of information available to add some heuristics with - position of the edit relative to start of the word, and word fit within a dictionary.
If both added an 's' to 'document', the position information and identical change value tell us they are the same update. No duplication of 's' necessary.
If instead one added an 's' while the other added 'a', the conflict could be resolved by choosing the one that fits closests to a dictionary word.
For many cherry picked cases, you can likely come up with some kind of solution that makes CRDTs less semantically-unaware. But with a good LLM, you can cover almost every situation. It is possible to experiment with this right now by giving ChatGPT the two edits and ask it to merge them.
IMHO the benefits of CRDT would still hold in this case, CRDTs just allow you to model the conflict and its resolution. But yes a self-driving app would probably be able to do a lot of cleanup easily -- in my experience conflicts are rare enough that automated merges can be simple, but in some domains of course that does not hold.
Preserving author intent is definitely one of the major outstanding problems in collaborative editing. NN's have certainly shown the most promise in performing human-like judgement calls, but this would be particularly thorny: you'd need to them to resolve toward exactly the same judgement calls on data that may differ for periods of time (in order to keep the data consistency that CRDT's provide).
That's an interesting thought — there are definitely a lot of corner cases where the peers technically converge, but the resulting state is obviously not what a human would want. But with the neural network approach, how would you ensure the peers even arrive at the same state?
What fun, jake! I am really enjoying your series here. When we built Pixelpusher (https://inkandswitch.com/pixelpusher) that was the beginning of a long and ongoing journey into the user experience implications of CRDTs.
For instance:
* how and when should we merge changes?
* what if we change our minds later?
* how do we know if our data is “in sync” with another user
* what does it mean to depend on another change?
And so on. Pixelpusher is awfully naive by the standards of today but I have very fond memories of working on it with Jeff Peterson, Jim Pick, and Orion Henry.