
OT and CRDT trade-offs for Real-Time collaboration - andfrob
https://www.tiny.cloud/blog/real-time-collaboration-ot-vs-crdt/
======
migueloller
I’ve been doing a pretty deep dive on CRDTs and OTs to get the right user
experience for our collaborative website builder[1]. I’m still not done with
the implementation and subsequent testing (i.e., splitting text nodes, as
mentioned in the post). But the core algorithm behind Automerge[2], formalized
in the OpSets paper[3] is extremely promising. A good example of its power is
the ability to do an atomic tree “move”, whilst preserving strong eventual
consistency.

I'm so glad to see that the HN community cares about this problem space. Maybe
I should finally write that blog post on CRDTs and how to implement them
yourself following the OpSets spec.

[1] [https://www.makeswift.com](https://www.makeswift.com)

[2]
[https://www.github.com/automerge/automerge](https://www.github.com/automerge/automerge)

[3] [https://arxiv.org/abs/1805.04263](https://arxiv.org/abs/1805.04263)

~~~
sipjca
I think that blog post would be a great idea, would love to read it

~~~
migueloller
Really? I’ll get something out this week. My Twitter handle is in my profile.
I’ll probably tweet about it when I write it.

------
phsource
Wanderlog ([https://wanderlog.com](https://wanderlog.com)) is a Google Docs
for planning travel, and naturally, we had to figure this out early in the
process.

If you're on a Node.js/React stack, we highly recommend using the combination
of OT-JSON0 [1] and ShareDB [2], two excellent libraries. OT-JSON0 lets you
perform operational transforms on any JSON-serializable structure pretty
intuitively, and ShareDB handles synchronizing it between multiple clients and
resolving conflicts. Building your own is going to be a pain, so unless your
core business is a text editor like TinyMCE, it's probably best to stand on
the shoulders of giants.

Another approach that we didn't end up using is Figma's: especially if you're
working on anything other than text editing, ask if you actually want to use
operational transform. Something simpler might be good enough -- in Figma's
case, they use "last write wins" because they were able to break the document
into small enough, atomic elements. [3]

[1] [https://www.npmjs.com/package/ot-json0](https://www.npmjs.com/package/ot-
json0)

[2]
[https://www.npmjs.com/package/sharedb](https://www.npmjs.com/package/sharedb)

[3] [https://www.figma.com/blog/how-figmas-multiplayer-
technology...](https://www.figma.com/blog/how-figmas-multiplayer-technology-
works/stack)

~~~
migueloller
We’re actually using ShareDB at Makeswift right now and for us it has become a
horrible burden. There are very subtle bugs on their data persistence layer
with concurrency and the API is a bit of a mess to work with in my opinion.
I’m glad to see other people are having success with it, though.

~~~
spyder81
I've been given advice that ShareDB is "pretty antiquated and not well
supported". I'm told it is not difficult to rewrite in modern NodeJS,
replacing things like callbacks and event emitters with async/await produces a
much simpler and more reliable server.

------
raphlinus
There's one thing I'd add here, as advice to potential implementers of
collaborative editing algorithms. Test the system rigorously, autogenerating
concurrent workloads of all the operations (including undo), and also various
delays. If this had been done early on, all the broken algorithms (some
published with formal proofs of correctness!) would have been caught. A lot of
the broken algorithms work under light loads - TP2 requires at least three
concurrent edits, two inserts and one delete, to trigger.

~~~
spyder81
We're using property-based testing where we can, which I hope to cover in a
future post

------
gvkhna
At Mixcut.com (Building real time collaborative video editing) we use a
combination of OTs and a variant/simpler version of CRDTs that provide the
most coverage of the various features needed. The usage of low latency
websockets to shuttle OTs and building completely around OTs is key.

It’s working great for us in beta, we’ve recently shipped multiplayer to our
users and overall it’s very stable. With a lot more in store soon, please
check it out!

------
phedkvist
Thanks for linking to my blog post :)

"This is the exact "split node" scenario I described earlier; applying bold
near the start of a text node necessarily has to split the text node into
three parts (before, bold, and after)."

Not sure why you split it up into sections. In my CRDT implementation, I would
add meta-data to each character, with the boolean property which is either
bold or not. It's certainly cumbersome to keep the cursor at the right place
when inserts are being made, but it's doable.
[https://pierrehedkvist.com/posts/1-creating-a-
collaborative-...](https://pierrehedkvist.com/posts/1-creating-a-
collaborative-editor)

I personally never understood how OT actually works, clearly, Google Docs and
others find it useful. But to me, CRDT has more solid proof and reasoning
behind it, and it is easier to comprehend.
[https://medium.com/@pierrehedkvist/creating-a-
collaborative-...](https://medium.com/@pierrehedkvist/creating-a-
collaborative-app-using-crdts-5013349f4682)

~~~
spyder81
I definitely read a few of your posts during my research, thanks for your hard
work :)

The best CRDTs do treat each character separately (as I think I mentioned at
one point?). But not all of them - the video and my example is an example of
one that doesn't.

It's also very difficult to support arbitrary nested trees with a per-
character data type, which leads to the compromises I talked about.

------
strictfp
Is the article suggesting to apply a string CRDT to an entire JSON structure?
Are people doing that?

At the very least, I would expect different JSON data types to use different
CRDTs. For a collaborative model, I would expect the semantics of the data
model to be reflected in the choice of CRDTs for different substructures in
the JSON.

~~~
ergl
No, the state of the art CRDT solution for json is something like Automerge
([https://github.com/automerge/automerge](https://github.com/automerge/automerge)),
that treats the document like a collection/combination of CRDTs of different
types. (the author doesn't mention it explicitly, but it's linked at the end)

~~~
codebeaker
I have been using Automerge recently for a project, and I have found it to be
very, very user-friendly.

Our use-case is offline-editing of documents with an eventual sync-with-
yourself-online. It's _mostly_ there as a sync tool, not as a p2p colalb
editing. Unless you count yourself as a peer, I guess!

We store a document in local storage which is the result of
`Automerge.save(automergedoc) => serializable string`. That can be loaded with
`Automerge.load(s) => doc`.

"Changes" are a first-class concept in automerge (makes sense, I guess) so you
can do

    
    
       before = Automerge.from({..some doc..})
       after = Automerge.change(before, "optional commit msg", (doc) => doc.hello = "world);
    

With that done, you can get a diff, `Automerge.getChanges(before, after) =>
[changes]`.

Those `[changes]` are what we try and send to a server and keep them in a
`Set()` for each document (background sync, service worker, etc make a nice
experience.

This is all wrapped up in Redux, and a middleware that captures the changes,
and a reduxStore subscriber that puts the document back into local storage
after any changes.

It is for the application programmer virtually transparent, and so far we've
yet to find a fault with it.

I'm sure I'll come to regret making "changes" my principle data type, as the
server implementation just sees serialized blobs of data, and has no concept
of the structure of the document. That's solveable if I need it ever, because
there's a protcol compatible implementation in Rust, and I could always use
the JavaScript one on the server too, just haven't needed yet.

The serialization is also quite large, in the order of kilobytes for what are
actually quite small docs, but they do contain the original commit messages
too, and the Automerge team is planning to release an optimized serialization
format early this year, which they say will be especially useful for fields
with lots of changes, such as text fields, which currently pay a heavy toll in
terms of tracking `[changes]`. Fortunately that also doesn't affect me.

I can also +1 some of the advice to keep your whole document in a CRDT. We use
React+Redux, but since React Hooks made stateless functional components a
reality, we felt it was appropriate to use React exclusively for the UI state,
and track only document states in Redux reducers. It feels light-weight and
sustainable, although we're only about a month and 5k LOC into the project.
Time will tell I guess.

~~~
ergl
Yeah, personally the feature I'm looking forward to is materializing "changes"
into a document without losing editing history.

At some point your set of changes grows too large or you want to trim it to,
say, a few weeks or months. Afaik that isn't possible yet without losing the
ability to merging later on.

~~~
eudoxus
I'm farmilliar with Automerge, but not its specific features on state
compression. In general change set compression is possible with CRDTs but
there are some "gotchas" you need to look out for. Most notably, all peers
interested in a changeset, have to be completely in sync at the moment of
compression, otherwise, depending on which underlying CRDT type you use, there
can end up with duplicate entries in for example a Set type. Depending on the
use case, 100% peer sync is possible, in others its more challenging.

------
jiofih
So.. we’re left with no real explanation to why OT is better than CRDT other
than a 5s video showing a poorly managed text node update that moves the
cursor. I feel tricked.

~~~
slau
Really?

> In OT, every user action is broken down into one or more operations. These
> operations are transmitted between clients along with their baseline
> reference; if two users perform actions at the same time, incoming
> operations must be transformed to include the local operations that have
> happened since that baseline. They are then applied locally and form the new
> baseline.

> This constant transformation of operations turned out to have too many edge
> cases where clients were found to not produce the same baseline (the "wrong"
> papers above). When that happens, the clients will never converge on the
> same result and break the fundamental assumption of collaboration.

And that’s exactly right. It’s fairly easy to have an OT system that is very
vulnerable, because clients can cheaply generate change sets that are
extremely computationally expensive on the server side. I’ve seen a system
where a single mobile phone could, in a few seconds, lock up the
synchronisation server for days.

~~~
jiofih
But the article is arguing _in favor_ of OT, not CRDT. A couple examples of
those edge cases would go a long way, the one given is not very helpful to
make a point.

~~~
Vinnl
What's wrong with the one given? If my cursor moves because some other user
makes some other text bold, that doesn't sound like a desirable experience?

~~~
jiofih
Because it doesn’t highlight a particularly relevant deficiency of the CRDT
protocol. Preserving cursor position is not hard in that case. The post even
mentions stronger downsides of OT, the example does not support the
conclusion. Why is OT better?

~~~
Vinnl
Huh, the article gave me the impression that preserving the position was a
crucial problem there.

As for the last question, I think the later posts will get to that.

~~~
eudoxus
Preserving the position of the cursor is a crucial feature for collab editors.
However, he doesnt go into detail explaining why CRDTs make this feature
impossible. Sure the CRDT route splits the text, but I fail to see why that
makes cursor pos preservation impossible?

~~~
spyder81
It's not impossible, particularly in cases where the model is fairly flat.

In complex models it is symptomatic of a larger issue with complexity in
CRDTs; the post was already quite long and I didn't want to repeat any more of
the research I'd linked to.

~~~
jiofih
You mention that example as a turning point for choosing OT, confirming your
research. But to an outsider it is not particularly convincing. Especially
since you’d expect a CRDT implementation to use the rope/tombstone structure,
applying the transform to that sequence of characters and not actually have to
split a text node, which looks like a limitation of this particular
implementation, or tight coupling between the data model and the DOM.

The post is a good introduction to OT and CRDT, but is very light on actual
trade-offs and seems to end with a gut-feeling conclusion, when the title
promised “trade-offs between”.

~~~
spyder81
It's hard to offer more detail than that in a blog post. I'm not going to
repeat the research papers or write one of my own. The post is already so long
and complex, covering so many surface details, I couldn't reasonably push it
any further.

~~~
jiofih
I’d suggest skipping right to the meat of the subject (pros/cons in this
case), assuming some previous knowledge, if that’s what the article proposes
to share, and simply point readers to learning resources; it’s not possible to
start from scratch on every topic. The deeper the subject, the more glaring
this problem becomes.

------
lxpz
> the prospect of peer-to-peer editing with end-to-end encryption is an
> exciting one

Have you heard of CryptPad ? Not exactly peer-to-peer but has encryption so
that the server sees nothing.

~~~
spyder81
I haven't! This is what our server team is looking at as an option for
encrypting our RTC traffic.

[https://tweetnacl.js.org/](https://tweetnacl.js.org/)
[https://github.com/dchest/tweetnacl-
js/wiki/Examples](https://github.com/dchest/tweetnacl-js/wiki/Examples)

------
billconan
this is a wonderful writeup! thank you!

I'm researching the same thing.

Any tutorial on OT implementation? Especially, I need to handle structured
data, like json.

~~~
DennisP
This article popped up on HN recently and is a great intro to CRDT
implementation. It says OT is much more difficult.

[http://archagon.net/blog/2018/03/24/data-laced-with-
history/](http://archagon.net/blog/2018/03/24/data-laced-with-history/)

~~~
spyder81
That's included in my research links, yes :D

------
fyp
The only thing I remember from the CKEditor post from last year was that it
took "42 man-years" to implement:
[https://news.ycombinator.com/item?id=18220020](https://news.ycombinator.com/item?id=18220020)

I wonder if it's better understood now and can cost less R&D time.

~~~
spyder81
It is, as I'll cover in the second post ;)

------
marknadal
w00h00, this is my area of expertise.

After 8 years of working on this, I have changed my thoughts:

\- The correct algorithm is not always the correct user experience.

\- End-to-end encryption is too important to not have.

\- Offline support is great, but it behaving consistently is more important
than it behaving "intently".

\- Biggest pain points can most easily be solved at the editing layer, not
data layer.

As a result, OT gets thrown out the door immediately.

I built the most popular CRDT powered database on the market
([https://github.com/amark/gun](https://github.com/amark/gun), 11K+ stars,
11M+ monthly installs), but have some harsh words for CRDT obsessed people:

CRDTs are great, but if you create an infinitely complex one to handle "every
word editing operation" it'll actually result in an inferior user experience.

As such, for example, GUN supports lock-free concurrent operations on any
depth of data in a graph, but keeping text/strings behavior atomic is way more
important than having built-in automatic string merges.

Why? It is much simpler to build a correct text/string CRDT on top of a
predictable, stable, graph CRDT that is understandable.

Another example is, in our blog editing tools, despite having spent years
researching & implementing (+ others in our community) precise character-by-
character CRDT resolution schemes, I found I personally had a much better
offline & local first user experience with a predictable per-paragraph sync
scheme.

This is a stupid simple approach, but what it means is that I as a user, can
easily predict whats going to happen, so if I'm offline I know I should just
copy a new paragraph and fiddle with things there, cause it isn't gonna get
"auto delete regrammar merged".

Generally speaking, me and colleagues don't "fight" over the same paragraph,
we're usually concurrently writing different "sections" at the same time, it
is pretty rare to grammar fix their edit same paragraph as they're typing it -
that is also just kinda, rude looking.

Anyways, finally, is that the majority of text styling and DOM edge cases can
be handled by having a deterministic canonical DOM hierarchy that always gets
applied at the editing layer, BEFORE any CRDT operations even occur.

So for instance, re-arrange the DOM tree such that <i> is always inside <u>
inside <b>, or something like that.

We built this into a library called normalize, and it instantly eliminated so
much complexity, especially at the CRDT level. Ping me if you want a demo of
this library.

Finally, for everyone else, we also built an _INTERACTIVE CARTOON EXPLAINER_
of an extremely basic text CRDT to help others understand the more detailed
concepts:

[https://gun.eco/explainers/school/class.html](https://gun.eco/explainers/school/class.html)

~~~
throwaway739103
You are a charming and energized speaker and perhaps that is why Tim Draper
thinks you a good investment as with others he has made. Acronyms like 'PTSD'
and 'PARTY' in the project is likely good at attracting people who think they
sound fun and do not question many details.

It is difficult to get brief clear details about how gun.js works. There are
distracting or simple documents, discussions about high level ideas and
tangentially-related topics. In some cases 'worse is better' yes but it sets a
tone that the project is hiding internal problems of implementation or can not
easy explain its own design.

The transparency of gun.js enterprise organizations also raises questions. Are
people on your team related to you? Do you pay yourself and team and open
source contributors?

~~~
marknadal
Thank you!

They also like the math & science behind it, some investors do DD, some don't.

You're tone is pretty assuming, smug, and insulting.

It is running in production with millions of people, yes there are some
hiccups, but surviving & we're improving it to scale even more.

I rebuild GUN from scratch in 30min on stage, that is how little/simple GUN
is:

[https://gun.eco/docs/Porting-GUN](https://gun.eco/docs/Porting-GUN)

Do I pay myself? Lol. Yes. No, I do not pay OSS, but yes people contribute
regardless, strangers help me, my kids help me, investors help me, family &
friends help me.

