
Why CRDT didn't work out as well for collaborative editing xi-editor - UkiahSmith
https://github.com/xi-editor/xi-editor/issues/1187#issuecomment-491473599
======
tomsmeding
Relevant discussion from a couple of days ago:
[https://news.ycombinator.com/item?id=19845776](https://news.ycombinator.com/item?id=19845776)

------
chubot
I don't have much experience in this area, but I'd be interested in an
overview of how different pieces of sofware handle the concurrent /
multiplayer editing problem, like:

\- Etherpad

\- Google docs

\- Apache / Google Wave (open sourced:
[http://incubator.apache.org/projects/wave.html](http://incubator.apache.org/projects/wave.html))

\- repl.it [https://repl.it/site/blog/multi](https://repl.it/site/blog/multi)

\- figma [https://www.figma.com/blog/multiplayer-editing-in-
figma/](https://www.figma.com/blog/multiplayer-editing-in-figma/) (image
editing rather than text editing)

So is the gist of it that OT relies on central servers and they all use OT
rather than CRDT? That was not entirely clear to me.

Looking at the xi analysis, it sounds like "IME" is specific to a desktop
application using X (?), so it doesn't apply to any of this web software.

The rest of them seem like they do apply?

Is the problem that you have to pay a "CRDT tax" for every piece of state in
the application? I thought the same was true of OT. Don't you have to express
every piece of state within those constraints too?

repl.it doesn't seem to have a problem with syntax highlighting or brace
matching (try it, it's pretty slick). So is it just that they paid that tax
with a lot of code or is there something else fundamentally different about xi
vs. repl.it ? Or maybe xi is going for a lot lower latency than web-based
editors?

recent thread about OT vs. CRDT that might be interesting:
[https://news.ycombinator.com/item?id=18191867](https://news.ycombinator.com/item?id=18191867)

~~~
DannyBee
TL;DR CRDT is completely irrelevant to any of the highlighting/etc stuff

Most highlighters are lexers. Advanced highlighters/folders are parsers.

The lexing/parsing that is required for highlighting is easy to make
incremental for all sane programming languages.

for LL(star) grammars, adding incrementality is completely trivial (i sent
patches to ANTLR4 to do this)

for LR(k) grammars, it's more annoying but possible (tree-sitter does this)

For lexing, doing it optimally is annoying, doing it near optimally is very
easy.

optimal incremental lexing requires tracking, on a per token basis, how far
ahead in the character stream the recognizer looked (easy), and computing the
affected sets (annoying)

Most real programming languages have a lookahead of 1 or 2. Near-optimally
requires tracking only the max lookahead used, and assuming all tokens need
that max-lookahead. In a world where min token length is 1, that means you
only need to re-lex an additional (max lookahead) tokens before the changed
range. In a world where min token length is 0, it's all zero length tokens +
(max lookahead) tokens before the changed range. This does not require any
explicit per-token computation.

Again for basically all programming languages, this ends up re-lexing 1 or 2
more tokens total than strictly necessary.

Tree-sitter does context-aware on-demand lexing. i have patches on the way for
ANTLR to do the same.

The only thing CRDT helps with in this equation _at all_ is knowing what
changed and producing the sequence of tree-edits for the lexer/parser.

The lexer only cares about knowing what character ranges changed, which does
not require CRDT. The typical model for this kind of edit is what vscode's
document text changes provide (IE For a given text edit, old start , old end,
new start , new end)

The parser only cares about what token ranges changed, which does not require
CRDT.

~~~
josephg
Oh wow, cool!

I made a simple proof-of-concept realtime PEG parser a couple years ago, which
ingests text OT/CRDT operations ("insert at position X", etc) and invalidates
& recalculates the compiler output by invalidating all overlapping ranges and
recalculating from the root. My implementation is way slower than I expected
it to be - though I'm sure you could use a lot of tricks from well optimized
parsers to speed it up. I agree - CRDT / OT / whatever is sort of orthogonal,
although its really nice being able to feed the parser with the same operation
format, and have it update its output.

[https://home.seph.codes/public/miniohm/](https://home.seph.codes/public/miniohm/)
/ [https://github.com/josephg/miniohm](https://github.com/josephg/miniohm) if
you're curious.

I'd love to see this sort of thing applied at a larger scale in a compiler.
For example, I could imagine editing a C program with a compiler running live.
Instead of batch compiling artifacts to disk like its 1970, instead as I type
each character the compiler recompiles just the parts of my program that could
be affected by my change. The compiler passes code changes to the linker
(maybe with a function level granularity). And the linker could then (live)
manage writes to an executable, allocating new byte ranges for changed
functions and updating references on the fly.

Even on very large C++ project like Chrome, there's no reason why incremental
updates should take more than 1-2ms. If all you're doing is changing a single
function in the binary file, why do our linkers rewrite the whole thing?

~~~
wpietri
> Instead of batch compiling artifacts to disk like its 1970 [...]

I am delighted to see people discussing this. Batch orientation made sense in
a very resource-constrained era. But at this point we have more RAM and CPU
than we know what to do with. It seems so obvious to me that the correct
solution is to prioritize developer experience and keep everything hot.

My single biggest barrier to developing faster is the latency between making a
change and seeing what that does to the test results. I would like that to be
well under a second. Given the cost of programmer time, I could happily throw
money at RAM and CPU. But standard, batch-oriented tools won't take advantage
of it 99% of the time.

I think there's a revolution in developer experience out there, and I really
want to see it happen.

~~~
afiori
The perfect place to start would be support in LLVM backends, maybe they are
already working on this.

~~~
josephg
I don’t know that that’s true. I doubt you could do this sort of redesign
simply in an LLVM backend because a “hot” compiler would work differently at
so many levels of abstraction. I think an easier place to start would be a
self contained compiler with a little less scope than llvm. There we could
figure out the patterns, find a clean set of internal APIs and make some sweet
demos. And then with that approach the same task in LLVM.

------
lewisl9029
> Indeed, the literature of CRDT does specify a mathematically correct answer.
> But this does not always line up with what humans would find the most
> faithful rendering of intent.

This is a very salient point that anyone thinking of using CRDTs to "solve"
synchronization in an user-facing application needs to take into
consideration. Yes, CRDTs will guarantee that clients converge to an
identical, mathematically "consistent" state eventually, but there's no
guarantee whether or not that mathematically "consistent" state would make any
sense to the application business logic that needs to consume that state, or
to the human that needs to reason about the rendered result. That is a
completely different can of worms that we'll still have to tackle to build a
usable application.

Here's great example to illustrate this from Martin Kleppmann's talk on this
topic:
[https://youtu.be/yCcWpzY8dIA?t=2634](https://youtu.be/yCcWpzY8dIA?t=2634)

Rest of the talk is also highly recommended for anyone interested in an
approachable primer on CRDTs.

The trade-offs to CRDTs mentioned by the author in the context of text-editors
make sense, but I would be curious to hear from the Xray team on what their
current thinking on the topic is, given that they have collaborative editing
as an explicit core objective (which might shift the value prop in favor of
using CRDTs relatively speaking since in Xi it seems to be only an
aspirational goal), and that their approach to implementation was similar but
not quite identical to Xi's:

> Our use of a CRDT is similar to the Xi editor, but the approach we're
> exploring is somewhat different. Our current understanding is that in Xi,
> the buffer is stored in a rope data structure, then a secondary layer is
> used to incorporate edits. In Xray, the fundamental storage structure of all
> text is itself a CRDT. It's similar to Xi's rope in that it uses a copy-on-
> write B-tree to index all inserted fragments, but it does not require any
> secondary system for incorporating edits.

[https://github.com/atom/xray#text-is-stored-in-a-copy-on-
wri...](https://github.com/atom/xray#text-is-stored-in-a-copy-on-write-crdt)

~~~
the_duke
> but I would be curious to hear from the Xray team on what their current
> thinking on the topic is

Xray is dead:

[https://www.reddit.com/r/rust/comments/bdf3lx/we_need_to_sav...](https://www.reddit.com/r/rust/comments/bdf3lx/we_need_to_save_xray/)

------
laughinghan
@dang, can we edit the title? It's inaccurate. This post is about how the CRDT
didn't work for _asynchronous_ editing by automated tools like syntax
highlighting, automatic bracket balancing, etc. The post explicitly contrasts
those use cases with collaborative editing as a use case that the author _didn
't_ implement, but where they think "the CRDT is not unreasonable".

------
coldtea
Maybe first build a capable editor, with plugins, etc (xi-editor is not that
yet) and worry about "collaborative editing" later?

And even for that, I think simply "taking turns" (where users share an editor
session, can chat with each other, and can switch on sequentially who gets to
actively edit) is enough for 99% of cases, and is not more difficult than mere
single-person editing (since there are no conflicts).

~~~
marcosdumay
Start by redoing everything that the mature alternatives do is an advice for
creating neither successful not useful things.

By all means, focus on creating a kick-ass collaborative editor, and add just
the editing capabilities needed to make it good at collaborative editing.

~~~
coldtea
> _Start by redoing everything that the mature alternatives do is an advice
> for creating neither successful not useful things._

It's the best advice in order to see any update.

There are plenty of programs that do some unique things very well, but fail on
doing "everything that the mature alternatives do", so they fail to ever get
mainstream traction themselves.

People want a complete solution that ALSO does X unique thing, if they are to
drop their existing editors. Not something that they'll have to use alongside
them for that special case.

(Joel on Software has written some nice posts about this idea, and why
"minimal" competitors, who don't do "everything that the mature alternatives
do" frequently fail, though I can't find the link right now)

------
lucb1e
At the risk of asking a stupid question: is there a reason other than offline
support why we bother with conflict resolution algorithms?

Every time concurrent editors come up, one of the main points of discussion is
the pros and cons of different possible conflict resolution algorithms. People
seem to be spending a _lot_ of time on debating and implementing that. The way
I see it, whichever packet reaches the server first gets applied first. Send
something like "line 9 column 19: insert <Enter>", and when another client
whose cursor is on line 15 receives that, it moves the cursor down to line 16
and scrolls the view down one line. Because you can see each other's cursors
and selections, it shouldn't be hard to avoid typing in the same place. Unless
you have round trip times of multiple seconds (satellite uplinks maybe?), and
unless you edit continuously with more than, say, one people per ten
sentences, you should hardly ever need it, and if it happens, the person
editing will notice within two seconds and just wait a second for the other to
finish. It's not as if you can reliably apply edits anyway: as the article
already describes, changing a line from ABC to EFG concurrently with someone
modifying B to D, does not really have a good outcome. In a more realistic
example, it would be changing "its" to "it's" concurrently with changing the
word to "that". There is no good solution (the server wouldn't know which
person to ignore: the apostrophe inserter or the replacer), so someone will
have to resolve it manually anyway, so why bother with complex resolution
algorithms? Heck, I'd be fine if my editor would do exclusive locks for the
line I'm on before I can start typing.

For slow things like the customer report, internal documents, code, etc., I
use something like git. Collaborative editing is (to me) for realtime things
like jotting down notes about what I'm working on and looking at what others
are working on right now, where even a proper revision control system is too
cumbersome (git pull, vim notes.txt, small edit, :wq, git commit, git push,
repeat) because someone might be working on the same thing. In such a case,
where I need to work together on a file in real time, I'm not working offline,
so this conflict resolution is by definition unnecessary. Is that different
from the majority of people that use collaborative editing?

~~~
braythwayt
Well (strokes grey beard), before we talk about offline support, we should
consider that there are two kinds of "online editing."

The first kind of "online editing" is where you make a request to a server,
and nothing happens until the server acknowledges it and sends you an
approval. That's synchronous.

The second type of "online editing" is where you have an independent process
in your browser or client, and it communicates asynchronously with a server,
while simultaneously allowing you to edit a local model of the data.

In the first case, we really need an editor to send every keypress to the
server and wait for a response.

In the second case, we really are talking about the browser or local app or
whatever being offline, it just synchronizes very frequently with the server.

I think that the moment you want to support typing faster than the round trip
to the server, you are already talking about offline editing.

And in most cases, yes you do want to support typing faster than teh round
trip to the server, because what works in an urban area with high-speed
internet becomes comically unusable in low-bandwidth and low-latency
environments.

\---

So... I suggest that we almost always want to design for "offline editing,"
it's just that we don't always imagine someone writing an essay on a weekend
camping trip, then synchronizing when they get back to the office on Monday.

~~~
cryptonector
Mosh has this idea that you can keep typing and sending asynchronously even
though you need the ACKs to find out what really happened, then when you get
them you just redraw accordingly. Humans won't type too fast for too long, so
eventually there will be time to catch up and let the user see what actually
happened. The key is to distinguish client-side speculative outcomes from
actual outcomes. Imagine that the text you're typing is made reverse-video, or
a different color (subject to color blindness constraints) to indicate
speculative (as-yet-unacknowledged) text.

That is, I think humans can be part of the async system and understand that
what they're typing isn't committed yet.

Sometimes when I type I don't even look at the screen or the keyboard for a
bit -- entire sentences even. Less so with code, naturally. When I do this I
do have to eventually look at what I actually entered, because I might not
have noticed some typo, say. I just did that for this entire paragraph. I want
to believe that I'd handle speculation in the UI just fine.

~~~
braythwayt
You are describing classic branch-and-merge. “Here’s my branch that I created
offline.”

“Sorry, there are merge conflicts, please resolve them before resubmitting,”
or, “I resolved them without consulting you.”

~~~
cryptonector
Yes, but I'm saying that that can work very well for a visual, interactive
application.

------
nicodemus26
I think CRDTs would make much more sense in a projectional editor than a text
one. When the changes are mutations to the abstract syntax tree its more well
defined how a merge would end. Also, the merge results don't have the
opportunity to be invalid syntax.

~~~
msvan
It is really crazy that we go through these massive hoops to simulate what
would be trivial to do in an AST editor. I recently read through some of the
literature on projectional editors, and while they have historically had some
usability issus I really hope that this will change in the future.

------
catpolice
"For syntax highlighting, any form of OT or CRDT is overkill; the highlighting
is a stateless function of the document, so if there's a conflict, you can
just toss the highlighting state and start again."

I first became interested in CRDTs in a case where this wasn't really true. I
was writing an IDE for a custom in-house DSL - think of the application as a
special language for interacting with a gigantic and very strange database.
Basically, the problem was that the use case really stretched the bounds of
what is normally done with syntax highlighting. Some requirements:

\- It had syntax and semantic highlighting, where the visual feedback
associated with a term would depend on the results of queries to the remote
database

\- It had to be able to handle documents of several megabytes (and many
thousands of terms) fairly smoothly, with as little noticeable lag or flicker
as possible

\- It couldn't swamp the database with unnecessary requests

\- The document itself had implicit procedural state (e.g. if you wrote a
command that, if evaluated, would alter the state of a term on the database,
appearances of that term later in the document needed to be highlighted as if
those changes had already been applied)

So I definitely couldn't throw out metadata and start over with every change.
I ended up with a kind of algebraic editing model that allowed me to put
bounds on what needed to be updated with every edit and calculate a minimal
set of state changes to flow forward. It was extraordinarily complicated. I
never got around to learning enough about CRDTs to determine if they'd be
simpler than the solution I came up with, but they do seem to target some
similar issues.

~~~
raphlinus
Yeah definitely. I would consider this a form of semantic analysis, of the
kind provided by Language Server implementations rather than the kind of
syntax highlighting provided by syntect. Also note, the syntax highlighting
done in xi-editor is both incremental and async[11]. This actually worked out
well, and I would preserve it going forward. What I wrote above actually
overstates the case, I think the ideal solution is an extremely simple form of
OT so you can reuse as much as possible of the already-computed highlighting
state, but you certainly throw away the highlighting result for the region
being edited. Preserving "before" is trivial, and preserving "after" should be
a simple matter of fixing up line counts.

[11]: [https://xi-editor.io/docs/rope_science_11.html](https://xi-
editor.io/docs/rope_science_11.html)

------
hansjorg
I'm assuming CRDT refers to conflict free replicated data type:
[https://en.m.wikipedia.org/wiki/Conflict-
free_replicated_dat...](https://en.m.wikipedia.org/wiki/Conflict-
free_replicated_data_type)

OT, operational transformation:
[https://en.m.wikipedia.org/wiki/Operational_transformation](https://en.m.wikipedia.org/wiki/Operational_transformation)

~~~
spooneybarger
That's a correct assumption.

------
colemickens
Thank you for writing this up Raph. I've been following CRDT usage in Xray/Xi
and am curious to see where collaborative editing goes. I appreciate you
thinking about it upfront.

------
lacampbell
Having a bit of difficulty following this, so I'll break down my understanding
of CRDTs and see if someone can help me out.

A CRDT can be thought of as an algebraic structure, consisting of data type D,
and a join function. So for all a, b, c in D, it's:

Associative:

    
    
      join(a, join(b, c)) == join(join(a, b), c)
    

Commutative:

    
    
      join(a, b) == join(b, a)
    

Idempotent:

    
    
      join(a, a) == a
    

Partially ordered:

    
    
      if join(a, b) == b then a <= b
      a <= a == true
      if (a < b) and (b > a) then a == b
      if (a <= b) and (b <= c) then (a <= c)
    

So given all of that, I am not sure why the example in the article holds. I
assume it's a consequence of the partial ordering, but I don't know what the
partial ordering is. What's the join operation and what's the data type?

~~~
BoiledCabbage
I'm just learning this, but it seems to be that in some CRDTs "a" and "b"
represent two different edits to the common state made by two different users
and the join operations is how you combine those two edits into a single joint
edit that will produce the same results regardless of which order the edits
arrive at the "server".

In other CRDTs "a" and "b" each represent the new states after two users have
made different edits to the same common source, and the "join" function is how
you combine those independent edited docs/states into a single state again.

Ie if you think of fit branches, set of rules on edits to ensure that no
matter the order you merge branches back together conflicts can be
automatically resolved and will always reach the same end state with the
changes "appropriately" incorporated.

------
josephg
I replied with my thoughts to the github issue, but they might be of interest
to people reading along here too. I've got some experience on these systems
(wave, sharejs, sharedb, etc).

> As a side note, I've heard an interesting theory about why CRDT-type
> solutions are relatively popular in the cloud. To do OT well, you need to
> elect a centralized server, which is responsible for all edits to a
> document. I believe the word for this is "server affinity," and Google
> implements it very well. They need to, for Jupiter-style OT (Google Docs) to
> work.

You don't need to do this. (Although I'm not sure if we knew that on the wave
team). You can implement an OT system on top of any database that has a
transactional write model. The approach is to enter a retry loop where you
first try to apply the operation (but in a way that will reject the operation
if the expected version numbers don't match). If an error happens, fetch the
concurrent edits, transform and retry. Firepad implemented this retry loop
from the client, and it worked much better than I expected. Here is a POC of a
collaborative editor on top of statecraft -
[https://home.seph.codes/edit/test](https://home.seph.codes/edit/test) . The
only OT code on the server is this middleware function:

[https://github.com/josephg/statecraft/blob/b6a82f34268238c90...](https://github.com/josephg/statecraft/blob/b6a82f34268238c90a8b3e600ea39ad1558cd12b/core/lib/stores/ot.ts#L36-L110)
.

In my experience the reason why semi- or fully- centralized systems are
popular in products like google docs is that they're easier to implement.
Access control in a decentralized system like git is harder. Gossip networks
don't perform as well as straight offset-based event logs (kafka and friends).
And if you have a canonical incoming stream of edits, its easier to reason
about.

\---

> I have a stronger conclusion: any attempt to automate resolving simultaneous
> editing conflicts that, e.g., git merge could not resolve, will fail in a
> way that fatally confuses users.

I think you have to act with intent about what you want to happen when two
users edit the same text at the same time. There are basically 2 approaches:

1\. Resolve to some sort of best-effort outcome. (Eg "DE F G" or "E F GD")

2\. Generate an error of some sort (eg via conflict markers) and let the user
explicitly resolve the conflict

As much as it pains me to say, for code I think the most correct answer is to
use approach (1) when the code is being edited live and (2) when the code is
being edited offline / asyncronously. When we can see each other's changes in
realtime, humans handle this sort of thing pretty well. We'll back off if
someone is actively editing a sentence and we'll see them typing and let them
finish their thought. If anything goes wrong we'll just correct it (together)
before moving on. The problem happens when we're not online, and we edit the
same piece of code independently, "blind" as it were. And in those cases, I
think version control systems have the right approach - because the automated
merge is often wrong.

(More: [https://github.com/xi-editor/xi-
editor/issues/1187#issuecomm...](https://github.com/xi-editor/xi-
editor/issues/1187#issuecomment-491551004) )

------
EGreg
From these comnents it seems that OT requires a central server while CRDT can
have a far more flexible topology. Is this true? And don’t we have robust
implementations of CRDT for simple trees?

~~~
jahewson
No, OT can handle decentralization without issues. It’s just usually far more
desirable to centralize it.

------
microcolonel
I think it would be interesting to let the language mode control the rope, or
delegate subtrees of the rope to a mode. This way, you could represent things
like lexical scope in the tree of the rope, and a language-specific tokenizer
could further reduce the complexity of syntax formatting.

Emacs has the concept of "faces", and many Emacs major modes have proper
parsers, lexers, and even some static analyzers that they use to apply the
faces. If the rope resembled the AST, then many of the issues Raph talks about
could be greatly reduced by localizing edits to their area of influence. If
you edit inside a token, and somebody else deletes that whole token, then it
is pretty clear how to resolve that. You could conceive of natural language
modes which produce humanistic hierarchies, or modes with internal formats
other than text (which may have a cached text view on them) like spreadsheets
or debuggers.

------
marknadal
TLDR;

CRDTs cannot be "bolted ontop".

\----

I really don't like this answer, but it is sadly true - even as an expert in
the space (my database
[https://github.com/amark/gun](https://github.com/amark/gun) is one of the few
CRDT-based systems out there). And there is a simple reason for this:

Distributed systems are composable, they can be used to build higher-level
strongly consistent systems on top. (Note: Sacrificing AP along the way, but
then you can have a "tunable" system where each record you save you decide
what consistency requirement you need, fast or slow.)

However centralized systems are not composable, you can't go "down" the latter
of abstraction by adding more stuff.

~~~
raphlinus
What you say is certainly true, but also not a fair characterization of what I
wrote; we deliberately designed xi-editor from the early days to be consistent
with the CRDT model (see the "rope science" series for thinking from the very
early days).

But yes, if you have an existing editor or application and you try to just add
CRDT, there are a lot of things that can go wrong.

~~~
marknadal
That is what I thought, though when I read your sections titled "Actual
collaborative editing is a lot harder" and "CRDT is a tradeoff" it seemed to
suggest otherwise particularly with this comment:

"The flip side of the tradeoff is that you have to express your application
logic in CRDT-compatible form"

Previously I assumed this was speaking of xi, but it sounds like this was
meant generically (not specific to xi)?

I am curious now, it seems like the decision isn't a matter of CRDTs not
working out (technically), and more a matter of the amount of effort not being
worth it compared to other more synchronous approaches?

Absolutely the right call to make. Though, the last thing we want is people
giving up on CRDT research which is how the hackernews title reads "CRDTs not
working". So I was just trying to clarify things for future audience.

~~~
raphlinus
For sure. If you're doing collaborative editing, then CRDT is a reasonable
choice, but it comes with tradeoffs compared with other approaches such as OT,
as I hope I've explained. (The comparison with differential synchronization is
not as well understood as it should be - I suspect it works pretty well in
practice but doesn't lend itself to academic analysis.)

On the other hand, the idea of using CRDT for something _other_ than
collaborative editing (or certain types of databases where eventual
consistency is a reasonable consistency model) is almost certainly not worth
the complexity. That's what I wish I had known when I started.

~~~
marknadal
Ah, yes. Very well said!!

Hey, I'm in Bay Area also, we should meet up. I've seen you've commented here
on HN on Rik's makepad WebGL stuff and Joseph Gentle's OT (chatting with him
on Monday!), which are both people & projects I'm fascinated by too.

------
atheowaway4z
Parsing & syntax highlighting before CRDT might give better results

