
CRDTs: The Hard Parts [video] - benrbray
https://martin.kleppmann.com/2020/07/06/crdt-hard-parts-hydra.html
======
benrbray
See also automerge [1], discussed at the end. They are currently working on
performance improvements [2]. Quoting from the repo, "automerge is a library
of data structures for building collaborative applications in JavaScript:

* You can have a copy of the application state locally on several devices (which may belong to the same user, or to different users). Each user can independently update the application state on their local device, even while offline, and save the state to local disk.

* (Similar to git, which allows you to edit files and commit changes offline.)

* When a network connection is available, Automerge figures out which changes need to be synced from one device to another, and brings them into the same state. (Similar to git, which lets you push your own changes, and pull changes from other developers, when you are online.)

* If the state was changed concurrently on different devices, Automerge automatically merges the changes together cleanly, so that everybody ends up in the same state, and no changes are lost. (Different from git: no merge conflicts to resolve!)"

[1]
[https://github.com/automerge/automerge](https://github.com/automerge/automerge)
[2]
[https://github.com/automerge/automerge/pull/253](https://github.com/automerge/automerge/pull/253)

~~~
vlovich123
It’s hard enough getting your own code to run and work correctly. When would
this be a useful development paradigm?

~~~
topicseed
Realtime collaboration on documents (e.g. source code, rich text editing,
etc).

~~~
ickyforce
When I edit code it's broken most of the time. It wouldn't make sense to
collaboratively break it in various parts for other people...

~~~
dkersten
I have done remote pair programming many times though and in that case it’s
perfectly ok, because you’re communicating with the other person/people and
can let each other know if you will break something.

For that use case, I can see this being very useful.

------
gritzko
"Data laced with history" (2018) [1] is very relevant here. Interestingly,
that extensive post obsessively vivisects RON 1.0 (Replicated Object Notation
[2] as of 2017) which was based on columnar compression techniques Automerge
recently implemented (53:24 in the talk).

Columnar formats have their upsides and downsides, though.

[1]: [http://archagon.net/blog/2018/03/24/data-laced-with-
history/](http://archagon.net/blog/2018/03/24/data-laced-with-history/)

[2]: [http://replicated.cc](http://replicated.cc)

~~~
lann
It's worth noting that you are the author of RON :)

~~~
gritzko
... and working hard to release the new version. Hearing the fuzzer buzzing as
we speak...

~~~
archagon
Are there changes coming to RON that aren't currently mentioned on
[http://replicated.cc](http://replicated.cc)? Any clues as to what they might
be?

~~~
gritzko
I think, the new RON oplog is the biggest change.

------
mike_red5hift
Does anyone take issue with the fact that CRDTs seem to require keeping a
history of every change ever made to the document?

Seems like it could get unwieldy very fast. Especially, in the face of a bad
actor spamming a document with updates.

I've considered using CRDTs in a few projects now, but the requirement to keep
a running log of updates forever has ruled them out. I've ended up using other
less sound (more prone to failure), but more practical methods of doing sync.

Perhaps, I'm missing something. Wouldn't be the first time.

Are there alternatives without this requirement, or that would at least allow
a cap on the update log?

~~~
appwiz
> I've ended up using other less sound (more prone to failure), but more
> practical methods of doing sync.

Could you share the alternative methods that you’ve used?

~~~
mike_red5hift
Stuff like diffing multiple json documents and merging adds, updates and
deletes where conflicts did not exist and overwriting conflicting attributes
based on timestamps (latest wins). There's some reasonably good jsonpatch
libraries out there that will do the heavy lifting.

Plenty of room for problems to arise, but it did not require keeping a log of
updates. My use cases did not require real-time collaboration and the
structure of the document was known beforehand, though.

~~~
appwiz
Got it. I’ve typically seen Operational Transforms brought up as the
alternative to CRDT for real-time collaboration. But, if you don’t need to
solve for that use case and the format is JSON, it’s a different problem to
solve and you don’t need that journal of changes.

If you have code available in the public domain, I’d love to see it.

------
infogulch
There was a recent post of a series that dives into CRDTs [1] . Someone linked
to a paper, _Chronofold: a data structure for versioned text_ [2] dated this
April, that attempts to map CRDT semantics onto text editing. It's still on my
list.

[1]:
[https://news.ycombinator.com/item?id=23737639](https://news.ycombinator.com/item?id=23737639)

[2]:
[https://arxiv.org/pdf/2002.09511v4.pdf](https://arxiv.org/pdf/2002.09511v4.pdf)

------
alextheparrot
This spawned an idea, which I feel a need to post.

It seems the hard part about CRDT is choosing the correct commutative
function, as merging two operations in line with user intent is non-trivial.
Would it be possible to use a combination of superposition (Please correct me
if this word is wrong) and pruning to derive user intent?

The idea being that instead of combine being (A, A): A (A commutative
semigroup), couldn’t we represent the operation as (A, A): Set[A] and have a
way of showing the user set results in a way that their next operation shows
us the “correct” interpretation.

He’s doing this implicitly with the file tree example, wherein operations that
don’t create trees usually defy user expectations (Symlinks aside), so he
decides to prune those choices from the result Set[A] before introducing a
heuristic to further prune the set. There’s still an issue of users having
opposite intent, but at that point it just seems like we need to introduce
“Set pruning” or “Superposition collapse” operations as a user-level primitive
and then rely on recursion to resolve the possible nesting of these result
sets.

Does this riff with anyone / does anyone have further thoughts on this
formulation?

~~~
benrbray
This sounds similar to the concept of a "multi-value register" [1] I've seen
in a few places while reading about CRDTs the last couple days. Is that what
you're looking for? The idea is that each process maintains a vector clock
with the latest timestamp from all other processes, and we don't delete values
until one version can be verified to be "later" than all the others.

[1] Section 3.2ish of
[https://hal.inria.fr/inria-00555588/document](https://hal.inria.fr/inria-00555588/document)

~~~
alextheparrot
This was 100% the path I think I was traveling (Thank you for the link). I had
a hard time grocking the complete specification from the paper, but the README
of this repository that implements CRDTs in that frame was helpful [1].

The core part is separating the "merge" from the "resolve" state. Merging
state can be done in a variety of ways, so in the default formulation there
seems to be a focus on making the "merge" operation also "resolve" to only one
value, when really there could be multiple formally valid merges that the
client may desire (Which is part of the difficulty that the video notes as he
proposes a variety of possible file system mutations for a set of two
uncoordinated operations).

The cleanest clarification of my thought process is similar to the following:

Given the operation Merge(4, 2), I could propose that there are two valid ways
to perform this merge, addition and multiplication. This means the result
would be either 6 or 8. The act of returning a single value (6 or 8) changes
that proposal into a statement, though, which is the "resolve".

One subversion of this restriction is to return a set of all possible results
for the operations we think are valid, so {6, 8}. At this point, the user can
say (Either explicitly or implicitly) "I actually want 6" and we resolve it to
the single value of {6}. There are also special cases like Merge(2, 2), where
this whole situation is especially ergonomic because the merge operations are
equivalent.

There are problems, of course, with this approach.

One issue is the need to categorize all possible operations the user may think
are valid for Merge(4,2). If the user intended to do division, the result set
proposed above will not include the state that they would expect. Still, this
seems more general now, as we just need to gather the set of operations that
the user may think are valid instead of assuming which one is valid. There's
also a ranking problem that then exists at the UX level, as we need to find a
way to cleanly propose this set of alternatives.

Another issue is, of course, exists if both users propose conflicting
resolutions (Actor1 says "I want multiplication" and Actor2 says "I want
addition"). This is the issue with decoupling the "merge" and "resolve" steps,
as now we may cause a fork in the model which causes a fundamental divergence
of the collaborator's data.

[1] [https://github.com/rust-crdt/rust-crdt](https://github.com/rust-
crdt/rust-crdt)

------
sfvisser
From my (admittedly very limited) experience implementing CRDTs it became
clear that even technically correct is not good enough. Optimistic merge
strategies require a very clear understanding of user intent and expectation.

Properly consistent can still mean utterly confusing.

~~~
topicseed
Exactly that which is pushing many people away from CRDTs despite them being
mathematically proven true, and towards Operational Transformation.

~~~
heavenlyblue
OT is just a fancy name for a CRDT.

~~~
sagichmal
No, it isn't.

------
hencq
First off, this is an excellent talk. The presenter explains the different
topics in a way that even a layman like me can easily follow along. The
compression scheme he presents at the end seems very interesting as well.

I do wonder if in practice OT isn't a simpler solution for most applications.
He mentions the differences in the beginning of the presentation and the main
advantage of CRDTs is that they don't need a central server. It seems to me
that for e.g. a web app you have a central server anyway so all the extra
complexity of CRDTs isn't needed. I know almost nothing about this though, so
would love for someone more knowledgeable to explain why I might be wrong.

~~~
vivekseth
For single server/database web-apps, CRDTs might be useful because they allow
offline edits, and (to me at least) they are simple to understand and
implement. OT does allow offline edits too, but (I think) has poor performance
if there are many offline edits.

For multi server/database web-apps, CRDTs might be useful because they reduce
the centralization required for collaboration, and increase the fault
tolerance. In a load balanced web app, different clients could connect to
different servers/databases and stil achieve eventual consistency when those
back-end systems sync up. If any of those systems go down, in theory traffic
could be routed to other systems seamlessly.

------
migueloller
I recently shared a thread [1] on Twitter with CRDT resources I found useful
if you’re interested in that kind of thing.

[1]
[https://twitter.com/ollermi/status/1279067350269124609?s=21](https://twitter.com/ollermi/status/1279067350269124609?s=21)

------
sradman
CRDTs are great examples of what the NoSQL movement called Eventual
Consistency. I never understood why this movement assumed that abandoning
ACID-style consistency automatically gave you Eventual Consistency.

As a side question, have any new algorithms been developed over the last
decade that have significantly improved automatic source branch merging?

~~~
Taek
Eventual consistency and ACID are not at odds, you can have one or the other
or neither or both.

~~~
dboreham
Not true. Eventual consistency can not (provably) provide consensus.

~~~
IggleSniggle
sure it can! A particle can be _either_ this or that until someone looks at
it, and when they do, the act of looking at it causes the state to collapse
and propagate. Sure, a partitioned group may come to alternate realities
before they reunite, but then you just get to do the same thing again! Like
Conways Game of Life, except with application state.

(I am joking of course)

------
benwr
If anyone is interested, I've been trying to think about the problem of moving
ranges in a list-structured CRDT for a couple of weeks now for a side project,
and I've got a candidate that seems to satisfy the most obvious constraints.
I'd be really interested in any feedback / holes you can poke in my solution!

Rough notes are here:

[https://docs.google.com/document/d/1p1K3sxgKGYMEBH72r-lnP9Gn...](https://docs.google.com/document/d/1p1K3sxgKGYMEBH72r-lnP9GnBm5N15h77C81W15kPiE/edit?usp=sharing)

~~~
anchpop
This is interesting! I've been looking for a good sequence crdt to implement
in my Rust CRDT toolkit [0], which is still very much a work in progress but I
want to make it really useful. Do you know how this compares to yjs [1]?

[0]: [https://github.com/anchpop/crdts](https://github.com/anchpop/crdts) [1]:
[https://github.com/yjs/yjs](https://github.com/yjs/yjs)

~~~
benwr
I hadn't looked at yjs; I'll check it out! [edit: It looks to me like yjs is
much more flexible than my design here, but doesn't include an ability to move
ranges of lists to different locations]

Darn, and just after I'd implemented it myself in terrible beginner Rust! I
might get started reimplementing it using your tool :)

------
mdptt
What an excellent talk, many thanks.

One idea comes to my mind (a bit out of topic): as we can store the complete
editing history in a document (even including mouse movements) with a fairly
small overhead using the ideas of automerge, would this be useful for
distinguishing texts that are generated by machines and by humans? Or to
detect plagiarism?

~~~
williamstein
I wrote the realtime sync and edit history code in cocalc.com, which is used
by instructors to teach courses that use jupyter notebooks with realtime
collab. Instructors do regularly suspect and diagnose instances of cheating by
students due to us storing the complete edit history. We don't automate this -
it's more that they are grading, get suspicious, and use the edit history
manually to investigate further.

------
6510
I thought about this once and came to the conclusion that the problem doesn't
exist: Anger is the only thing you can accomplish by typing letters into the
sentence I'm writing. If the goal is not anger but to write text only one
person can write a sentence at a time.

The solution then becomes really simple: You insert your cursor where you want
to edit. You wait for the text and/or background of that paragraph to change
color. Light green would do nicely.

Other users will see that paragraph turn red.

The paragraph is now under your control. If someone else desires to edit a
sentence in YOUR paragraph that sentence changes color again and the side bar
displays a dialog allowing you (the owner) to 1) hand over the paragraph to
the new author, 2) hand over only that sentence or 3) ignore the request.

Dumb software wins!

~~~
sleepinseattle
That falls apart as soon as the author with the locked paragraph leaves the
document open while they’re doing something else, preventing others from
editing.

~~~
Ozzie_osman
Or if network connectivity isn't guaranteed.

~~~
6510
I really don't get it. You want to add thousands of lines of code and complex
data structures with tons of edge cases so that I can interrupt you while you
write??? It's just not desired functionality. Poor network connectivity is not
an excuse to break your workflow.

You simply do not assign 2 people to a task that can only be done by a single
person.

It is like a system where 2 people can poor coffee into the same cup at the
same time. We deal with the cup overflowing with some drainage (marvel at our
creation of course!) but then 12 people put sugar in the cup. Solution: we
overflow the cup further to dilute it! We solve the mobility issue by spilling
a bit more coffee out of the cup and whip the bottom with just the right type
of towel or napkin. The only problem that remains is both drinking from it at
the same time.

------
sillysaurusx
Hmm. Does anyone know of a transcript? I can’t watch right now, only read.

I wonder if there’s some free YouTube transcription service... even just
showing the captions would work.

Oh, hm. It’s not on YouTube anyway.

~~~
benrbray
Actually, the video on the page is a YouTube embed and the English captions
are pretty good (although understandably it struggles with CRDT jargon).

I found that the last 4-5 references listed in the link are pretty accessible,
and most of the diagrams from the talk are taken from one of the papers by
Kleppmann.

------
archagon
Great talk! I look forward to diving into "Interleaving anomalies in
collaborative text editors" to see how Kleppmann et al fixed the RGA
interleaving issue.

The section on moving items in a list makes my mind jump to Causal Trees
(≈RGA). The problem here is that a) we need the concept of a list bucket or
slot, and b) we also need to preserve the sequential integrity of runs of
items as in text editing. In CT/RGA, each letter/item is simultaneously a list
bucket and its contents. I wonder if this problem could be solved by adding a
"reanchor" op that moves the "contents" (and some/all children) of a letter op
to the "bucket" of another letter op?

Each reanchor op would have the job of "moving" a subtree of text to a new
parent bucket. (Under the hood, the subtree would obviously remain
topologically unchanged for convergence purposes, but the reanchor op would
inform how the tree is parsed and turned into a user-visible list or string.)
First, the reanchor op would need to reference the start and end letter/item
ops in a given subtree; and second, the reanchor op would need to reference
the letter/item op into whose bucket the subtree contents will be moved. If
the set of subtrees for a range of text/items contains multiple independent
subtrees, they would each need their own reanchor op. Concurrent moves of the
same subtree would be resolved as LWW, similar to the Kleppmann proposal.

Let's say you have a graph for string "ABCD" that looks roughly like this,
sans metadata:

    
    
            +---+
            | A |
            ++-++
             | |
          v--+ +--v
        +---+   +---+
        | B |   | C |
        +---+   +-+-+
                  |
                  v
                +-+-+
                | D |
                +---+
    

If you wanted to move "CD" after "A", you would add the following op:

    
    
            +---+
            | A +<---------+
            ++-++          |
             | |           |
          v--+ +--v        |
        +---+   +---+      |
        | B |   | C +<---+ |
        +---+   +-+-+    | |
                  |      | |
                  v      | |
                +-+-+   ++-++
                | D +<--+ m |
                +---+   +---+
    

"m" references "C" and "D" as the subtree range, and "A" as the target bucket.
The string would render as "ACDB".

We can get cycles with this scheme, but as far as strings or lists are
concerned, this doesn't actually matter, since the parent references aren't
reflected in the output (as they would be with an explicit tree data
structure). If, concurrently to this, another device wanted to move "A" after
"C", the merged graph would look like this:

    
    
        +---+   +---+
        | m +-->+ A +<---------+
        +-+-+   ++-++          |
          |      | |           |
          |   v--+ +--v        |
          | +---+   +---+      |
          | | B |   | C +<---+ |
          | +---+   +-+-+    | |
          +---------^ |      | |
                      v      | |
                    +-+-+   ++-++
                    | D +<--+ m |
                    +---+   +---+
    

How does this get resolved? Well, we can simply read this as "move the
contents of 'A' to the bucket for 'C', and move the contents of subtree 'C'
through 'D' to the bucket for 'A'." The string would consequently render as
"CDBA", as we would expect.

This is just a sketch, not sure if it would actually work in practice. Pain
points are a) moving multiple subtrees in a somewhat atomic way (when the
range to be moved covers more than a single run of items), b) sane results
when overlapping ranges are moved concurrently, c) moving items to a bucket
whose contents had already been moved before, and d) parsing performance,
especially given the fact that we'll now have ops (and their children) whose
contents might end up in arbitrary places in the output string or list. Might
just end up with a slow, confusing, and labyrinthine data structure.

There's also the question of intent: does the user want to move a range of
items to the slot currently occupied by a different item, or do they want to
move that range to the right of that item? Lists of notes will want the
former, while strings will usually want the latter. Perhaps the reanchor op
could explicitly differentiate between these two cases.

~~~
martinkl
Interesting idea, but the devil is in the details. Especially two concurrent
moves of partially overlapping ranges of characters is a tricky case to
handle, and it is not obvious to me how your scheme would deal with this.

Another fun case is two concurrent range moves in which the destination of the
first move falls within the second move's source range, and the destination of
the second move falls within the first move's source range. How do you handle
this?

I expect that any algorithm solving this problem will need a formal proof of
correctness, because it's very easy to miss edge cases when using informal
reasoning.

------
jwr
Incidentally, Martin's book ("Designing Data-Intensive Applications") is
excellent and highly recommended reading. If you find yourself saying things
like "this database is ACID compliant", "we have an SQL database with
transactions, so we're fine" or "let's just add replication to Postgres and
we'll be fine", you need to read this book.

~~~
swyx
> If you find yourself saying things like "this database is ACID compliant",
> "we have an SQL database with transactions, so we're fine" or "let's just
> add replication to Postgres and we'll be fine", you need to read this book.

can you elaborate why? are these sentences fundamentally wrong? they dont
appear so.

~~~
robto
The entire book is about when and how these statements turn out to not be
absolutely true. And it's not a short book. I don't have my copy in front of
me right now, so I won't get into specifics. But I consider it one of the most
important books I've read, if only for making me realize how difficult it is
to get distributed systems correct. Or, rather, learning that getting
distributed systems correct is impossible and what sort of tradeoffs you can
make in order to keep things mostly working.

And it turns out that most services that I find myself working on these days
are distributed systems, so having a healthy respect for all the ways things
can break is a useful place to be.

