Automerge: A JSON-like data structure (a CRDT) that can be modified concurrently

samwillis · on Feb 21, 2022

If you are interested in CRDTs, take a look at Yjs[0] by Kevin Jahns too, it has bindings for the most popular rich text editors but is also brilliant for general data syncing too. It seems to to more performant than Automerge [1]. For an example of using it for shared state take a look at SyncedStore. [2][3].

I think Yjs probably has more potential than Automerge, the community around it is incredible. They have created an Open Collective to fund development of the connected projects[4].

0: https://github.com/yjs/yjs

1: https://github.com/dmonad/crdt-benchmarks

2: https://syncedstore.org/docs/

3: https://news.ycombinator.com/item?id=29483913

4: https://opencollective.com/y-collective

terhechte · on Feb 21, 2022

Kevin is also actively working on a Rust port of yjs [0] which is great for any native app that doesn't run on Javascript but would like to use CRDTs (or interop with other Javascript clients).

[0]: https://github.com/y-crdt/y-crdt

WhatIsDukkha · on Feb 21, 2022

There is also an active rust port of automerge fwiw -

https://github.com/automerge/automerge-rs

m00dy · on Feb 21, 2022

Indeed, yjs looks much more advanced.

sdesol · on Feb 21, 2022

Here are some development stats for yjs and automerge for the last 4 months. Ignoring bots, 15 people created pull requests for yjs. 7 people created pull requests for automerge.

https://oss.gitsense.com/insights/github?p=targets&q=authors...

What I found interesting is automerge has 3 long time contributors that are still actively contributing:

https://oss.gitsense.com/insights/github?q=authors%3Aept%2Cn...

yjs has 2

https://oss.gitsense.com/insights/github?q=authors%3AboschDe...

And something that I found very interesting is narnagon contributed to both yjs and automerge but hasn't contributed to yjs in the last 4 months.

https://oss.gitsense.com/insights/github?p=authors&q=authors...

Full disclosure: The above insights is from my tool

Edit: Just an FYI, the gift icon tells you when they first committed, which is how I know if are a long time contributor or not.

noduerme · on Feb 21, 2022

[flagged]

jeremyjh · on Feb 21, 2022

How is that relevant to a thread about CRDTs?

NavinF · on Feb 21, 2022

I see see that you’re testing how much you can derail a thread before getting banned. 6 comments in this thread, none of which were made after looking up what a CRDT is.

mabbo · on Feb 21, 2022

> The only case Automerge cannot handle automatically, because there is no well-defined resolution, is when users concurrently update the same property in the same object (or, similarly, the same index in the same list). In this case, Automerge arbitrarily picks one of the concurrently written values as the "winner"

I think using the term 'arbitrarily' makes this seem less useful than it is. When I initially read it, I scoffed at this library.

What's critical is that it consistently and deterministically picks the same winner, so all copies of the data structure, once the same updates are applied, have a consistent view. That's so important and the authors do a disservice by not being explicit about this here.

NuclearFishin · on Feb 22, 2022

I really like the idea of this library and I've bookmarked it for future use, but following on from your point, it does feel disingenuous to make this promise in the introduction:

> (Different from git: no merge conflicts to resolve!)

As you've pointed out, there _can_ be merge conflicts, they're just resolved arbitrarily. In theory, git could do this too but obviously nobody would use it then!

CRDTs themselves are inherently conflict-free, but if the problem you're solving is not, implementing via CRDTs is not the silver bullet you're looking for.

WolfOliver · on Feb 21, 2022

If I'm not mistaken, the solution is strong eventual consistent. While one change is picked randomly, it is the same on each node.

lewisjoe · on Feb 21, 2022

An interesting challenge in the CRDT era today, is that there isn't an ideal CRDT for rich text yet. I mean, not just simple boundary based formattings like bold/italics but also complex block based elements like tables and lists.

While recent CRDTs like peritext has done an incredible job of focussing on this specific problem of rich text, even that project hasn't extended its types to cover table operations and list operations. I've been thinking about this problem deeply and my intuition is that anything that resembles semantic trees are tricky to deal with using a generic CRDT.

For example a generic tree/json CRDT cannot be used for an AST tree - and it will most likely fail. I wrote about this a while back - https://news.ycombinator.com/item?id=29433896

If you have ideas on how to approach this problem, join us on this thread https://github.com/inkandswitch/peritext/issues/27#issuecomm...

samwillis · on Feb 21, 2022

I can't find a link right now but I believe Kevin is in the process of implementing "move" in Yjs, so that if you copy and paste from a section or split it, it is a move rather than a delete and insert.

Agreed on tables, they are so much more complex and may need further operation types to ensure no dropped edits.

Obviously with Peritext you are implementing a very rich text focused CRDT. The way I see it is there are general purpose and domain specific CRDTs, Yjs being more general purpose but with a concept of 'marks' on strings for formatting. Ultimately we need both and I like the concept of one that covers both bases so you can have a db document that contains rich text fields along with more traditional structured data.

Without exactly domain specific CRDTs for your application you will always have to do some level of schema correction after merging documents.

jitl · on Feb 21, 2022

Joe started an interesting discussion about block and table CRDTs on the Peritext github that you will find interesting: https://github.com/inkandswitch/peritext/issues/27

My perspective from working on a centralized collaborative editor (Notion) is that last-write-wins semantics in some places often fine; I wouldn’t stress out about column split/join for that reason. I proposed a CRDT for tables that handles the rest of the features reasonably well in the GitHub thread, we use a similar structure for Notion’s “simple tables” feature.

westurner · on Feb 21, 2022

Re: inlined tabular data in CRDT distributed systems for collaboration on documents that may be required to validate:

Atomicity: https://en.wikipedia.org/wiki/Atomicity_(database_systems)

> An atomic transaction is an indivisible and irreducible series of database operations such that either all occurs, or nothing occurs.[1] A guarantee of atomicity prevents updates to the database occurring only partially, which can cause greater problems than rejecting the whole series outright. As a consequence, the transaction cannot be observed to be in progress by another database client. At one moment in time, it has not yet happened, and at the next it has already occurred in whole (or nothing happened if the transaction was cancelled in progress).

> An example of an atomic transaction is a monetary transfer from bank account A to account B. It consists of two operations, withdrawing the money from account A and saving it to account B. Performing these operations in an atomic transaction ensures that the database remains in a consistent state, that is, money is neither lost nor created if either of those two operations fail. [2]

IIRC, Apache Wave (Google Wave (2009)) solves for tables but is not built atop a CRDT, like Docs and Sheets? https://en.wikipedia.org/wiki/Google_Wave

Jupyterlab/rtc - Jupyterlab, JupyterLite (WASM), - is built upon a CRDT for .ipynb JSON, at least. https://github.com/jupyterlab/rtc

(URI-) Named Graphs as JSON-LD would work there, too. https://json-ld.org/playground/

Does Dokieli have atomic table operations for ad-hoc inlined tables as RDF Linked Data? https://github.com/linkeddata/dokieli

samwillis · on Feb 21, 2022

I think there is some confusion over "tables", the GP I believe was referring to typographic "rich text" tables, like you have in a word document, rather than a "data table" in a DB.

The issue with "rich text" tables and CRDTs is that you effectively have two overlapping "blocks", columns and rows. Current CRDTs are good at managing a list of blocks, splitting them, reordering them. Most rich rich text tables are represented as a list of rows of cells.

If you have two clients, one adds a new row (represented as a new row in rows list), and the other adds a new column (represented as a new cell in each row in the rows list), and then merge, you have a conflict where the new row and column meet. There is a missing cell, and so you end up with a row that is one cell too short.

What is needed is a CRTD that has a concept of a table, or table like structure, so that when the documents are merged it knows to add that extra cell in the new row.

This has nothing to do with Atomicity in the traditional DB transaction sense.

Edit:

Just thinking about this a little more, if you model the table as a list of maps, with each column having a unique id it would overcome the misalignment issue, and it potentially helps with other table conflicts too. You would however have to store the order of the columns somewhere and deleting columns could conflict.

westurner · on Feb 21, 2022

> This has nothing to do with Atomicity in the traditional DB transaction sense.

Partial application of conflicting additive schema modifications is an atomicity issue as much as it is a merge issue. If the [HTML] doesn't validate before other nodes are expected to synchronize with it, that changeset shouldn't apply at all; atomicity.

It looks like Dokieli supports embedded tables.

With RDF (and triplestores, and property graphs), you can just add "rows" and "columns" (rdfs:Class instances with rdfs:Property instances) without modifying the schema or the tabular data. Online schema migration is dangerous with SQL, too, because the singular db user account for the app shouldn't have [destructive] ALTER TABLE privileges.

"CSV on the Web: A Primer" > "Validating CSVs" (CSVW Tabular Data) https://www.w3.org/TR/tabular-data-primer/#validating-csvs

gizzlon · on Feb 21, 2022

Kleppman is no idiot, and I learned a lot from his book. Highly recommended :)

This seems very usefull, but this part destroys some of the magic:

> The only case Automerge cannot handle automatically, because there is no well-defined resolution, is when users concurrently update the same property in the same object (or, similarly, the same index in the same list). In this case, Automerge arbitrarily picks one of the concurrently written values as the "winner"

I guess that's a pragmatic choice, but isn't avoiding this the whole point of conflict free types?

twic · on Feb 21, 2022

It goes on:

> Although only one of the concurrently written values shows up in the object, the other values are not lost. They are merely relegated to a conflicts object.

If you have a procedure for resolving conflicts on scalars (for example, a field is a maximum, and you can take the maximum of the conflicting values), you can detect the conflicts and resolve them manually.

On a brief inspection, it appears that the conflict objects hold the conflicting values, but not the previous value, so there's no way to do a three-way merge here. That makes it impossible to resolve conflicts on a field which is a counter, for example.

samwillis · on Feb 21, 2022

Although it's not implimented you could create a Counter CRDT using an array of numbers, appending each addition or negation, the 'value' being the sum of the array. This can be compressed by each client changing its most recent value if a sync has not been sent. There is probably also other optimisations you can do too, probably around keeping track of which client made which change and updating its value.

alpaca128 · on Feb 21, 2022

Well, two concurrent changes to the same property is basically a race condition and that can't be cleanly resolved in an automated way. You either pick a winner or you do manual merging like Git, but I guess in applications of such data structures you don't want to do anything manually.

samwillis · on Feb 21, 2022

Within a string, most CRDTs will keep both changes effectively concatenating them.

If you have something like a Int value and need to handle it more gracefully in your application than the arbitrary selection by the CRDT toolkit you can keep it within an array/list and then handle the merging yourself. So it starts off as [10], each user when changing it would delete the value and append a new one. That way if you have two users simultaneously change it, one to [15] and the other to [5]. The resulting synced value would be [15,5], you can then handle how to combine the values yourself (addition, average, etc) in a determinist way.

I haven't looked into it but some CRDT toolkits may store Ints by keeping track of changes to it, but the above allows you to do it yourself.

noduerme · on Feb 21, 2022

No, this is not the way. The proper way to handle it is that each client trying to update the property tries to get a write lock on it; that write lock has to wait for any other write lock to complete, and if it can't get a lock it needs to let the client know it failed to update. At a bare minimum this is what has to happen to maintain any kind of data integrity.

But by all means, if you think it's basically a race condition any time you have ten clients trying to update the same row in a database and you don't care which one wins, you should be able to write a very compelling and unusual poker website.

manigandham · on Feb 21, 2022

You've posted a lot of negative comments but seem to not understand or have any experience with CRDTs.

The whole point is that state modifications on the data structure by separate parties can converge to a final representation without any coordination. The math behind it is proven, however turning mathematical set-theory into a usable JSON interface is where the problem occurs.

A free-form JSON doc can't be completely supported but these kind of edge-cases around property updates are easily handled by using an array to hold modifications instead. CRDTs even have state-based and operation-based usage models to handle different scenarios, and then there's an entirely separate but parallel tech called operational transforms for other situations.

All of this is not only well-studied but widely used in many collaborative applications (eg: Google Docs).

mkl · on Feb 21, 2022

Nitpick: Google Docs uses the server-mediated Operational Transformation (OT), not peer-to-peer CRDTs.

samwillis · on Feb 21, 2022

The whole point of CRDTs is that they work offline with no central authority and therefore no locks.

noduerme · on Feb 21, 2022

Oh. This is just a local storage thing? Well what's the point of worrying about concurrency at all then?

Just FWIW; I'm hostile to this notion because it breaks decades of best practice about data integrity and claims to solve issues that aren't actually issues in real databases; plus the people boosting it seem to not understand anything about concurrency. If it's just a replacement for indexedDB or sqlite or something, well, who cares...

samwillis · on Feb 21, 2022

It's not a replacement for indexedDB or sqlite, and it's not about concurency. It's about merging conflicting changes.

This is a good article explaining the concept and implementation of a CRDT: https://www.inkandswitch.com/peritext/

neoberg · on Feb 21, 2022

Let's say you are developing a design application (think figma) which works in the browser. You don't want to lock the whole project just because someone needs to change a label somewhere. With CRDTs n people can open and edit the project at the same time and then push their changes and have them merged.

Very similar to git. When working with git, you don't lock the file or line you are working on.

aenario · on Feb 21, 2022

It's a very narrow use-case of conflict-free data type. Sure if you are willing to wait for however long to get a lock before you can edit something.

The main goal of CRDT is for long-split branch such as can happen in federated systems or with offline management. (think git)

noduerme · on Feb 21, 2022

Bad design. Would you ever put something into production that couldn't tell if two people were trying to write the same object at the same time and roll one back?

aenario · on Feb 21, 2022

You seem to only conceive of the web as "html+JS frontend communicating in real time to some server"

We have decades of distributed systems without a central server. Such as git, bit-torrent, Mastodon, Matrix, and the whole web3 mess. It's for these use-case that CRDT helps solve real problems.

lijogdfljk · on Feb 21, 2022

Why would you ever put a hyper focused distributed system in a centralized architecture?

You're choosing a sledge hammer for a screw.

This is a database that can work when you're offline. Your central server has nothing to do with this and is entirely incapable of even working.

dboreham · on Feb 21, 2022

A distributed lock. Amazing.

m00dy · on Feb 21, 2022

CRDT ensures a final state for all users.

manigandham · on Feb 21, 2022

CRDTs are mathematical data structures being applied here to create a usable JSON API, but they don't perfectly handle every possible mutation to a document.

The most common approach here is to not rewrite the same property but use an array with modifications continually appended. This has the benefit of also supporting undo/redo behavior, while the state can be compressed in snapshots for performance.

oever · on Feb 21, 2022

Randomly choosing a value means that references can be broken.

I'm guessing that automerge would not be able handle a JSON that needs to adhere to a JSON Schema. A system that only applies changes when the resulting JSON meets the Schema would a challenge to develop and very valuable.

If there is not data model that is being followed, the JSON structure must be assume to be just a bag of values.

noduerme · on Feb 21, 2022

>> Randomly choosing a value means that references can be broken

It also means no data is reliable; hence the entire thing is worse than useless.

carapace · on Feb 21, 2022

It's only stochastically useless. I'm sure it's very useful most of the time.

And keep in mind, if software didn't randomly break for mysterious reasons we might be out of a job, eh?

dgb23 · on Feb 21, 2022

I think CouchDB/PouchDB do this similarly, the winner is deterministic though?

samwillis · on Feb 21, 2022

CouchDB and PouchDB have no idea how to merge a conflict, if two copies are edited prior to syncing, one version it marked as the 'winner' and the other as a conflict version. As the developer you can then either chose to dispose of conflicts or have your own way of merging them.

This is actually a perfect use case for CRDTs such as Automerge and Yjs, I actually built a proof of concept combining Yjs with PouchDB to handle the conflicting edits:

https://discuss.yjs.dev/t/distributed-offline-editing-with-c...

ComodoHacker · on Feb 21, 2022

No, the point is avoiding conflicts where possible. Cases like "users concurrently update the same property in the same object" are rare in real life.

noduerme · on Feb 21, 2022

No, this is what row-level locks are for.

And concurrent updates happen all the time in real life. This is why every DB in (serious) production has transactions and rollbacks. It happens all the time. If you're not aware of concurrency issues on your DB or your mid-level backend code isn't equipped to handle them, you're writing bad code indeed.

ComodoHacker · on Feb 21, 2022

Please reread the "same property in the same object" part. You need row+column level locks for that.

In cases where users update the same row but different [independent] columns, you can avoid unnecessary conflicts. Either by redesigning relational model or by using CRDTs.

Tade0 · on Feb 21, 2022

I tried to use this library for a side project, but ultimately I found that I couldn't predict how it would merge two documents(at least without reading the essay).

The case where two actors add items to a list in parallel was especially troubling. I would expect that the result would be a two-element list, but that was not the case:

https://runkit.com/embed/2hv6rx14lp12

Code:

  const Automerge = require('automerge');

  let doc1 = Automerge.from({ cards: [] });
  let doc2 = Automerge.from({ cards: [] });

  doc1 = Automerge.change(doc1, doc => {
    doc.cards.push({ title: 'card1' });
  });

  doc2 = Automerge.change(doc2, doc => {
    doc.cards.push({ title: 'card2' });
  });

  doc2 = Automerge.merge(doc2, doc1);

  console.log(doc1);
  console.log(doc2);

xyzzy_plugh · on Feb 21, 2022

This is because you are updating properties, not adding items to a list in parallel. It's totally not obvious, but the lines

  let doc1 = Automerge.from({ cards: [] });
  let doc2 = Automerge.from({ cards: [] });

describe two different property updates, as I understand it. This is equivalent to

  let doc1 = Automerge.change(Automerge.init(), doc => {
    doc.cards = [];
  });
  doc1 = Automerge.change(doc1, doc => {
    doc.cards.push({ title: 'card1' });
  });

for each document. It's that initial assignment that is causing you trouble. If you start from a shared state, then adding items in parallel works exactly how you would expect:

  let doc1 = Automerge.from({ cards: [] })
  let doc2 = Automerge.init()
  // Merge doc1 into doc2
  doc2 = Automerge.merge(doc2, doc1)
  
  doc1 = Automerge.change(doc1, doc => {
      doc.cards.push({ title: 'card1' })
  });
  
  doc2 = Automerge.change(doc2, doc => {
      doc.cards.push({ title: 'card2' })
  })
  
  doc1 = Automerge.merge(doc1, doc2)
  doc2 = Automerge.merge(doc2, doc1)

Both docs are equal.

N.B. this is just my understanding of how this works based on reading the docs, I haven't actually done much depth of confirming beyond running the above code. In particular the warning about properties is extremely important:

> The only case Automerge cannot handle automatically, because there is no well-defined resolution, is when users concurrently update the same property in the same object (or, similarly, the same index in the same list). In this case, Automerge arbitrarily picks one of the concurrently written values as the "winner"

which is exactly what you are seeing here.

Tade0 · on Feb 21, 2022

Thanks for the writeup, makes sense now.

folkrav · on Feb 21, 2022

Unless I'm mistaken, according to their own TS types[0], `Automerge.merge` should return a `{ readonly [P in keyof T]: Freeze<T[P]> }`. Getting `undefined` here is probably a bug.

[0] https://github.com/automerge/automerge/blob/main/%40types/au...

Tade0 · on Feb 21, 2022

I think that undefined at the end is not related to my code - it's there before those console logs appear and they're appended to the start of this table.

In any case I get the same result locally: I suspected doc2.cards would have length: 2, but it's only 1.

I couldn't explain why so I rolled my own conflict-resolution scheme that fit my particular use case. It's primitive and doesn't actually give any guarantees, but gets the job done.

avel · on Feb 21, 2022

- "Downsides of Offline First" - https://news.ycombinator.com/item?id=28717848

- "CRDT Resources" - https://news.ycombinator.com/item?id=28998767

- "Show HN: SyncedStore CRDT" - https://news.ycombinator.com/item?id=29483913

simonw · on Feb 21, 2022

I'd love to see a library like this with compatible implementations across multiple languages. I'd particularly like one that lets a JavaScript frontend and a Python backend work together to manage a data structure.

The best trick I've seen for encouraging this kind of thing is to offer a language-agnostic test suite. I had a glance through some of the Automerge tests just now and, while they're written in JavaScript, it looks like it might be possible to extract most of them out into JSON or YAML files which could then be easily used to exercise multiple implementations.

mkl · on Feb 21, 2022

Here: https://github.com/automerge/automerge-rs

Yjs is also working on a Rust implementation: https://github.com/y-crdt/y-crdt

renke1 · on Feb 21, 2022

So I am planning to use CRDT sometime in the future.

Any thoughts on Automerge vs. yjs? – I am not doing a text editor. I just want to build a solid offline-first web application.

Also, is there any way to "squash" the history of changes? Let's say I have a central server through which all changes are synced (no peer-to-peer syncing). Does it make sense to force clients that haven't synced for a long time (let's say a weeks) to just discard their non-synced changed and use the "current" state as stored on the server?

Okay, one more question: Let's say I want to add an API to my server that uses the data that was synced to server (assuming the sync state of Automerge/yjs is stored somewhere). Would the server in this case just be another client that just get's the data from the synced state and stores in an appropriate store (say a SQL database, Elasticsearch, etc.)?

jitl · on Feb 21, 2022

Here’s what I know from going on a similar journey recently.

1. Choose Yjs for now.

2. Look at Yjs’s binary “update” format. That is what you should store in your database’s “blob” column. This also allows your backend to receive and transmit updates without hydrating the CRDT into JavaScript class instances. https://docs.yjs.dev/api/document-updates

3. Yjs has its own “gc” that discards deleted content. Without GC, deleted content remains in the CRDT but is hidden from the user’s perspective. You will need to hydrate the CRDT into memory for GC feature. I’m not sure how to run this GC, maybe it runs whenever you apply an update on a Y.Doc with doc.gc=true.

4. As long as GC is disabled, you can use “snapshots” to restore old versions of the doc. https://docs.yjs.dev/ecosystem/editor-bindings/prosemirror#v...

So, knowing the above, how to design a system like your question? I think you could go with a kind of hot/cold storage. Keep the “hot” version of your document in the “current” row of your Postgres table for a document. Send/receive updates to the hot row. Take snapshots on the server whenever you’d like to.

Then, the cold storage. Periodically, you want to GC the hot storage. Before you do that, apply it as an update to some cold storage, maybe a blob in S3 so you don’t permanently lose those deleted values, and your snapshots can work in perpetuity against the cold storage data. Then GC the hot storage.

I am more unsure about squashing. The naive way I implemented it is to just iterate copy all the data from OldHotDoc into a totally new independent NewHotDoc, and then archive/discard OldHotDoc. This will start a totally new history. What I’ve considered is that if any writes come from old clients before the squash, you can still apply the straggler writes to the old hot doc/old cold storage, and then manually diff the OldHotDoc before/after the change and then try to patch NewHotDoc the same way. Eventually you arrange for all clients to switch the the New doc history, and you can choose how long you’ll continue to try this janky patch strategy to accept straggler writes or just discard them.

I’m also not sure when you want to squash. I suggest fuzzing your system with the hot/cold storage part first to figure out what the rate of data growth of the “hot” storage is before you consider the squashing part.

spiffytech · on Feb 21, 2022

> Yjs has its own “gc” that discards deleted content. Without GC, deleted content remains in the CRDT but is hidden from the user’s perspective.

This alone makes Yjs the clear choice for me. If you're building an app where a user prepares a record and then shares it, senders assume recipients can't view the record's previous revisions from before it was shared (unless your app has an obvious 'history' feature). If a CRDT doesn't do garbage collection, recipients receive past revisions, and could extract those states from the CRDT if they wished.

Without GC, you have to address this by creating a new CRDT with no history each time the recipient list changes, and that breaks offline changes made against the old CRDT.

samwillis · on Feb 21, 2022

> Also, is there any way to "squash" the history of changes?

My understanding is that this is one of the areas that Yjs does a little better than Automerge, it has a heavily optimised binary representation that combines consecutive changes into a single action.

Most people (who have looked into it) probably associate Yjs with its editor bindings but it’s brilliant for any type of syncing. I used it for automatic conflict resolution for Pouch/CouchDB, works really well.

On your server question, you can go either way, load the Yjs document on the server to read it or store a json representation of the most resent state along side it. Personally I would go for the latter as it give you flexibility.

There are two implementations of Yjs, the JavaScript one and a newer Rust one which will have binding for other languages. Last I looked the Rust one was still a work in progress but that was a few months ago. It will provide great support for Yjs on the server side once it complete.

eatonphil · on Feb 21, 2022

Are there any good posts or videos or anything that motivates CRDT? As in you start with one user editing and add users and discover limitations and hack together support until you've ultimately got something like a CRDT?

All I know is that CRDTs can solve simultaneous editing. But you also don't need CRDTs to do this. There are other solutions.

I would like to read about where we came from and what the options and implications of various solutions are today.

dboreham · on Feb 21, 2022

It's a new name for "eventually consistent replication". The key factor is : no single point of reconciliation for conflicts/concurrency control. It's not really the application that's important, but rather the arrangement of the nodes running the application (peers, no central server).

lukeramsden · on Feb 21, 2022

I encourage anybody who enjoys Martin Kleppmann’s writing or work to check out his Patreon.

aaaaaaaaata · on Feb 21, 2022

Find a way to give money to Martin Kleppman if his work is interesting or valuable, reader!

avl999 · on Feb 21, 2022

I don't think I understand the usecases of CRDTs. Operational Transformation algorithms work well, are fast, well understood and are already deployed in real production systems like Google Docs so it's battle tested.

The OT problems that CRDTs try to solve seem a bit academic to me- you are almost always going to have a centralized server in your application (the edits need to be stored somewhere afterall outside of your clients browser's memory). Is avoiding the central server really worth all the additional complexity and the difficulty of being reason about them compared to relatively easier to think about OT?

I am sure I am missing something as people like Martin Kleppmann working on this are much smarter than me so I am sure this has practical advantages over OT but right now I can't see them.

samwillis · on Feb 21, 2022

I believe the consensus is that OTs are not that great if a client goes offline for too long, they work perfectly for situation where multiple people are concurrently editing, even with brief disconnections. But you can’t go offline and massively diverge then merge back. That’s what CRDTs do well.

A good example of successful CRDT usage is the Apple Notes app, you can heavily edit a note on two different devices (may be different users on a shared doc) with one disconnected from the internet and they will successfully merge the changes. It’s perfect for that sort of “asynchronous” collaboration.

dboreham · on Feb 21, 2022

OT depends on a central server to reconcile conflicts. The whole point of CRDT is to not have a central coordinator. Yes they're mostly academic in that the set of problems reasonably solved with eventual consistency is quite small, but not vanishingly small. Many "web scale" data stores rely on eventual consistency, for example.

otikik · on Feb 22, 2022

The next step in weapon escalation will be that adblockers will start blocking javascript requests and libraries by their shape instead of just by their domain (e.g. block all the libraries that come from apis.google.com or those who call an endpoint called /track on any domain)

manigandham · on Feb 21, 2022

One of the best intros to CRDTs and distributed editing by Jonathan Martin at NDC Oslo 2017: https://www.youtube.com/watch?v=pMMDVphop40

xmorse · on Feb 21, 2022

Are CRDTs used in multiplayer games? I see a lot of CRDT ideas on collaborative web development stuff but are they really necessary? I think most complex games are able to handle updates without them, am I right?

manigandham · on Feb 21, 2022

The core benefit of CRDTs is to avoid coordination between parties so they can independently mutate data structures and still converge on the final state.

Multiplayer games almost always have a central game server/service that has authority on state so it already provides the coordination. CRDTs are unnecessary and would be extremely slow and limiting.

jillesvangurp · on Feb 21, 2022

Similar problems but very different challenges and goals indeed. The main challenge with document editing is avoiding conflicts, correctness and avoiding corruption/data loss. The main challenges with shared game state (which could include very complicated things like modifiable geometry) is scaling it to massive amounts of users with minimal latency and minimizing undesirable/temporary inconsistencies due to people randomly disappearing, having networking issues, etc.

swagasaurus-rex · on Feb 21, 2022

Another consideration is UDP. Games on anywhere but browsers use UDP because it's fast – no round trip necessary, no acknowledge packets, no waiting on missing ranges to free up data from the buffer.

CRDTs need TCP/IP for the guarantee of getting all of the data without data loss. Only when a CRDT is fully up-to-date can you be sure that the resulting data structure is synchronized. This means UDP wouldn't be ideal for communicating CRDTs

brainbag · on Feb 21, 2022

CRDT is https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...