Hacker News new | past | comments | ask | show | jobs | submit login
Automerge: JSON-like data structure for building collaborative apps (github.com)
409 points by goblin89 on Feb 5, 2018 | hide | past | web | favorite | 98 comments

Wow! I really, really like this because the goal appears to be to expose CRDTs in a developer-friendly way. I truly believe that well-designed CRDTs are the future for a lot of distributed systems problems, but they've been a bit hard to get into for people who just want to solve problems and not read 10 papers first.

The only thing I miss here is a clear spec of the CRDT itself so that people could implement compatible versions in other languages.

Conflict-free Replicated Data Type[0], for anyone else who didn't know what CRDT means.

[0]: https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...

I've spent almost a decade working on CRDTs, largely from an industry perspective but also some academic side.

They're great, and not too hard to understand if you take time to think about them.

Here are some animated/interactive cartoon explainers we've made to help teach the concepts:

- How a CRDT version of Operational Transformation (Google Docs) works: http://gun.js.org/explainers/school/class.html

- Why CRDTs are better than centralized alternatives like PAXOS/RAFT: http://gun.js.org/distributed/matters.html

Martin Kleppmann's work (author of Automerge) is outstanding, especially check out: https://youtu.be/yCcWpzY8dIA?t=29m36s

I have a rather specific question, in case you know. What would be the best CRDT algorithm for editing a document consisting of (i) nested lists, (ii) strings, and (iii) some primitive values?

I'm looking at Kleppermann's JSON work. But it doesn't look like it handles collaborative editing of strings very well: they have to either be treated as an immutable value in a register, which doesn't allow for collaborative editing of the string, or treated as a list of characters, which would have to represent large edits (e.g. deleting a word) as a sequence of character edits, which sounds inefficient. Kleppermann's work also handles maps, which I don't need, which is fine.

For context, I plan to make a tree/structure editor, and am considering representing the document as a CRDT.

(not an expert)

Treating the long string like a list of substrings should work, I guess the optimum substring length could be determined experimentally based on real-world interaction patterns.

Thanks, I didn't and wondered if it had anything to do with Computer-supported Collaborative Work or Computer-supported Collaborative Learning (CSCW and CSCL, respectively[0][1]).

Not really, obviously, but given that one of the big sub-topics of CSCW and CSCL is about designing for remote collaborative work, CRDT looks like a technology that naturally fits in there.

[0] https://en.wikipedia.org/wiki/Computer-supported_collaborati...

[1] https://en.wikipedia.org/wiki/Computer-supported_cooperative...

Yes, thanks!

I agree that this effort is in the right place. Having the opportunity to change Realtime networks like Pusher, PubNub and Firebase would be great for apps.

I'm surprised those apps haven't made a bigger effort to implement this logic for production. It would certainly make them unique and hard to leave.

Is CRDT a good way to add sync for a classic invoice app (that need to track multi-tables changes? like invoice header, items, payments, shipments?)

I imagine it is based on the author's earlier paper? https://arxiv.org/abs/1608.03960

I haven't compared them though.

It’s not the same as the paper for performance reasons. This is noted in the github README. Martin said on Software Engineering Daily late last year that Automerge was three orders of magnitude faster than his published version of the JSON CRDT.

This is my second time highly recommending a Software Engineering Daily podcast with Martin Kleppman on it. He is really smart and able to break down really complex topics in a way that's easy to understand. If you've never heard of CRDTs before, have a listen - https://softwareengineeringdaily.com/2017/12/08/decentralize...

Funny thing is, we have had a easy to understand CRDT datastructure and ecosystem for years. The semantic web. If you ignore the closed world stuff and only use open world semantics RDF becomes a GSet representing a graph.

A pretty neat workable representation. Trivial to implement.

I don't see how. Care to elaborate what is conflict-free and replicated in the semantic web?

Nothing is, he's just trolling.

I'm not actually. Open world semantics means that you can never assume that a fact that hasn't been explicitly given doesn't happen to be on some server that is just temporarily unreachable. If you combine this with a monotonic query language and inference engine, which never retracts facts, you get a GSet with pretty powerfull operations. You can even arbitrarily cache facts without worry of conflicts.

CRDTs can provide an elegant solution to some problems but they are not powerful enough to be used as a general tool. If you change the title of your task card on two different devices, there is not much you can do to automatically merge those changes besides using some simplistic policy like last write wins which will not yield the desired result in general.

CRDTs require the operations on them to commute or equivalently that state changes are monotonic. If your application fits those constraints you can probably build a quite elegant solution but unfortunately that is the exception rather than the norm, most operations are not commutative.

EDIT: This was meant to be a reply to [1] instead of a top-level comment.

[1] https://news.ycombinator.com/item?id=16310656

In fact, one of the purposes of this project was to test exactly what it was like to use a Trello-like tool implemented this way.

What happens is that humans have intuitive coordination systems and tend to fix the same problems.

In other words, if the card title is "rename auotmerge" then both of us are likely to change it the same way, or at least either resolution is fine. In the event of a conflict we flag the multi-value in the UI so that a human user can decide if they care.

Where this approach shines is in domains where it helps avoid conflicts rather than resolving them. Naively adding replies to a comment thread might result in collisions or misordered comments across nodes of a distributed team. Automerge captures enough metadata from the operations created by users that changes "tend" to merge well.

I realize this is an argument not from first principles but from experience. We weren't able to find examples of other people building web applications with quite this approach (though we would love to hear from them if they are out there) and so the question was "will this feel alright?"

Apparently it does.

If you find this result provocative, I would encourage you to challenge it by attempting to build something with automerge and letting us know how it goes. I believe it will work well in applications where all user operations can always commute to produce a subjectively reasonable result. To make that concrete, a bank account is a poor domain to model with a tool like this, but a shared drawing is much better.

As you say, I am certainly working in the wrong domain for this and such an solution would never be acceptable. You certainly don't want edits to things like insurance claims, medical records, or financial transaction made in a way that edits somewhat randomly disappear and hope that people will notice and fix it.

This might work to some extend for a toy example like a simple task management application but I would bet even then users would quickly get annoyed about losing their edits. And you also can not really claim that your are not losing edits because you keep them in a list of conflicts because for all practical purposes they are gone unless you build a solution to explicitly deal with them which renders trying to use CRDTs mood.

I think unless your operations map cleanly to CRDTs and you get the expected and correct behavior all the time, trying to use CRDTs will just cause additional pains and not provide any benefits. Trying to force general data structures with general, non-commuting operations and CRDTs together just seems ill-fated to me.

Worth noting is the fact that this does not require a central transformation server unlike Google Docs' OT approach.

This means you can sync peer-to-peer without an Internet connection if say you have a Bluetooth connection between your phone and your laptop on an airplane.

Martin Kleppmann explains CRDT in detail in an episode of Software Engineering Daily.


When I learned one thing about collaborative apps, it's that there is no automerge that works for all use-cases.

Couchdb does it the right way, it simply keeps all versions and lets the application logic decide which is the "one true state".

Automerge is very similar.

If there are conflicts, it returns a _conflicts property on the object so you can retrieve the other conflicting alternatives so the application logic can write another change if it wants to use a different alternative than the one that was automatically chosen.

I'm confused. What's the point of `_conflicts` property if conflicts are automatically merged?

The linked project README, and even the CRDT Wikipedia page, seem to claim that conflicts are avoided, whereas, at least in my mind, these are more like strategies to automatically privilege one version in the event of a conflict.

From my reading of the docs, Automerge will arbitrarily pick a "winning" document in the case of a conflict, but guarantees that the same winner will be chosen if the merge were to occur, for instance, in a different order:

    // Pretend doc1 and doc2 are already Automerge objects
    let doc1 = { x: 1 }
    let doc2 = { x: 2 }

    // x will be either 1 or 2 (arbitrarily chosen), but
    // res1 will always == res2 regardless of choice
    let res1 = Automerge.merge(doc1, doc2)
    let res2 = Automerge.merge(doc2, doc1)
    res1 == res2 // true
However, there may be cases where you want to manually resolve conflicts, and if that is the case, the `_conflicts` property is there so that you can undo whatever merge occurred automatically and set the "winner" yourself.

The more interesting case is when you have the following:

doc1 = { 'a': {'b': {'c': ... }}}

doc2 = {}

So one has been deleted entirely and the other has modified some attribute deep down inside of it.

If you always have to check and resolve conflicts yourself, then it's not really useful.

If it's "arbitrarily chosen", but the result is always the same regardless of order, what is the algorithm that decides?

There's different ways you could do it, like calculating some hash of the value and picking the one with the greater value. If the hash function is good, then you get an "arbitrary" result that would still be consistent regardless of order.

Take a look at operational transforms as another great solution to this sort of problem. Many collaborative editors rely on them (Google docs, etc)


Keep in mind that OT relies on a centralized server where as CRDTs do not.

OT does not depend on a central server. The particular implementation in Google Docs does. Take a look at the GOTO algorithm and TP2 property.

Some users also "select all" + "paste"

Well, it depends on what you define as 'works'. As long as you have an accurate clock (which most devices have nowadays), it works many times to take the latest state. Combine that with merging on an attribute level and you will have something most humans will accept as 'works'.

If you want to build an editor you probably have to ask yourself what your 'attributes' are, but probably the individual letters.

Last year I built something fairly similar to automerge but with a focus on offline clients. I use it to sync my app's data from different clients to an WebDAV server. As some Clients are sometimes offline when changes occur the, resolving conflicts can occur easily but resolving them in an expected way isn't that hard if you do the time trick.

Several years ago I made jsonmerge, a Python library for merging standard JSON structures. The rules for resolving conflicts are encoded in a standard JSON schema document that is extended with some (non-standard) keywords.



This is how you're supposed to write documentation. Excellent descriptions, explanations and examples.

    To change the state, you can use the regular JavaScript array mutation
    methods such as push(). Internally, Automerge translates this mutable API
    call into an update of the immutable state object. Note that we must pass in
    doc1, and get back an updated object which we assign to the same variable
    doc1. The original document object is not modified.
Hmm. I like having the _option_ to use mutations to describe changes to the document, but in many cases I would actually prefer to use pure functions. Can I just return a new document with the necessary changes? I don't see any such examples in the docs here.

> If the state was concurrently changed on different devices, Automerge automatically merges the changes together cleanly, so that everybody ends up in the same state, and no changes are lost.

> Different from git: no merge conflicts to resolve!

This is impossible unless there are significant restrictions on what kind of operations are possible.

If I have a bag of 15 apples, and I take 10 of those apples at the same time as somebody else takes more than five apples then we have a merge conflict right there because we can't end up with a negative amount of apples in the bag.

It's right there in the README:

> The only case Automerge cannot handle automatically, because there is no well-defined resolution, is when users concurrently update the same property in the same object (or, similarly, the same index in the same list). In this case, Automerge arbitrarily picks one of the concurrently written values as the "winner":


This is very disheartening, because there are a few reasonable strategies:

1. CRDT like linear change rules.

2. Vector clocks.

3. Forking the entire tree and allowing merge queries, creating a DAG of the state that can be interactively and speculatively remerged.

Just picking one winner arbitrary and silently discarding all peers sounds like the kind of thing that gets labeled as a bug when we examine concurrent data stores.

Automerge uses vector clocks internally to track visibility and minimize conflict and retains all alternative values in the _conflicts list for that value.

Automerge must select one result to be the default consistently across all uncoordinated peers. If you don't do this, different nodes may see different documents during a conflict state, which is undesirable.

Okay, good answer. I can see why for ease of use you might not just demand that clients work with the raw vector clocks.

Well, so much for "no changes are lost", instead it should be "one of the change is lost, and you can't know which one until it's done ! Suprise !!!"

Conflicts are surprisingly rare, and when they are detected they show up in the _conflicts list for resolution per your needs. What automerge guarantees is that unless you care about the conflict you don't need to take any action.

There is a change history. No changes are lost.

Can clients quickly access the change history to figure out what happens? Can the relevant change history for an object or a proposed change be retrieved in linear time?

Merge conflicts are handled transparently and seemingly indeterminately? Yikes.

This isn't for a banking system, this is for building collaborative apps. If you press "A" and your friend press "B" into a Google Docs document at the same time, you also don't know if the document is going to read "AB" or "BA". In practice, this is not an issue.

Then there'd be a negative amount of apples. It's not impossible if you disregard context.

There is nothing new about doing things like this, OT and CRDT have existed for ages. Check out ShareDB (https://github.com/share/sharedb) or Webstrates (https://webstrates.net/) (based on ShareDB). In Webstrates, we don't have merge conflicts that need to be resolved and there are never any practical synchronization issues. The server orders the operation and if you try to delete something that's already been deleted, then we just ignore your operation.

Also quite courageous to say something is impossible when you have the code that does it right in front of you. ;-)

I think when we say "impossible" we mean "well sure any old thing is possible but there are hard limits on what's consistent and sane."

I will go read the docs in more detail but what you've just describes sounds pretty awful. Perhaps it's more just you being glib about non-cooperative actors in an environment that expects cooperatuon, but an "available balance" abstractions are a pretty reasonable thing to ask for, if only to encode natural numbers.

I'm not one to dictate how technologies like these are to be used, but all I can say is that in the almost two years I've been working on Webstrates – a collaborative system using a very similar technology – this has not been an issue.

But surely, if you allow malicious actors to modify your document, then that's your problem right there.

It's still impossible in the general case. You merely proved him right. Limiting yourself to OTs and CRDTs is a signficant restriction.

Why is that a restriction? You can make any changes to the JSON you wish, you just have to use their API.

ShareDB looks quite interesting. We wanted similar capabilities but without a central authority, thus automerge was born.

We've been wanting to try a CRDT-based implementation with Webstrates for some time, but haven't found a suitable implementation, so this looks very promising!

Any particular reason why you choose CRDT over OT?

I'm told OT gets very complicated, and I hoped a good CRDT would be general-purpose enough to cover a variety of applications and domains. Also, I like not needing a central server to coordinate action. Would love to compare notes -- feel free to reach out to me on twitter at the same nick.

Yet another problem is access-control: what if a user is allowed to access only part of the data-structure? And what if you'd want to encode access rules in the data-structure itself?

Another problem is this. Let's say there is a counter that is at value 100. I increment the counter. Simultaneously another user increments the counter, so the stable value should be 102. However, a operation-agnostic "merging" approach like described here can never catch that, and sets the final value to 101.

CRDTs generally benefit from "intent preservation" in their design. In automerge's case, this would mean that instead of storing a "set" value, you'd store either an "increment" value or a "set" value to support cases like the one you describe.

Automerge has fairly robust support for these kinds of use-cases around lists, which we use quite a lot, but we haven't actually needed them for numbers (though I expect we may want them eventually.)

My experience is with OT, so I can't speak for CRDT, but I can imagine it's similar.

With OT, you send a na[1] (number add) operation, so this would work fine. Indeed, if you treat the number as a string, then you have a problem.

[1]: https://github.com/ottypes/json0#summary-of-operations

This doesn't compose with all operations. One users adds and another user multiplies, for example. That is to say, not all operations are associative.

I bet you can run really far with this general idea. But there is no panacea.

You're absolutely right, but I'd also never suggest a collaborative system for anything critical. As I've said earlier in this thread: If you're in a Google Docs document with a buddy and you write "A" and he writes "B" at the same time, you also don't know whether that'll show up as "AB" or "BA". In practice, this isn't an issue.

To underscore our point, concatenation on strings is not associative, either.

To that end, the basics of math in associative, cumulative, distributive, etc., go together to create an algebra of what you can automate rather easily. I think most of us stopped thinking in terms of those laws years ago. To the point that it is probably odd to folks that stayed close to them.

> To underscore our point, concatenation on strings is not associative, either.

I think you mean commutative. It certainly is associative.

Ha! Yes. I meant to draw attention to all of the fundamental laws people are used to. Messed up in some edit.


As I understand it, CRDTs don’t work on atomic types you might use in normal code - they work on larger structures where the problem has been solved.

Wikipedia has some examples: https://en.m.wikipedia.org/wiki/Conflict-free_replicated_dat...

If you model all state changes as a serializable list of actions, like with Redux or Vuex, this is a non issue. You'd simply get the same two commands and your reducers would compute the same state for all clients, thanks to immutable data structures we don't need to mutate the state & do not have the issue you described.

The challenge is when there are multiple observers that have a different opinion on what order the commands happened in. Redux and Vuex don't have that problem, that's why it's a nonissue there.

Using lamport/logical timestamps you can ensure a distributed and totally consistent ordering of actions.

I wrote a middleware for Redux which propagates actions peer-to-peer in a consistent order using these timestamps and the scuttlebutt gossip protocol, you might find it interesting. https://github.com/grrowl/redux-scuttlebutt

That sounds like expected behavior. Both users are incrementing a value of 100, which would be 101 in both cases. It's not atomic.

If User A incremented 100, and saved it down; then User B loaded this saved state and incremented 101, it'd be 102.

But the point was: what if you wanted the behavior to be different than what you described. What if the intent was really "increment"?

For example, say you have a list AND a counter. Everytime you add an item to the list, you need to increment the counter, as an invariant of your system.

Of course, it's a contrived example, but you get the point: things can get more complicated and the system may break as a result, unless you're very careful.

We had this problem in our iOS app, solved it by using an array of ints wich are the increment-operations (well, we had uuids on them as well so we could determine what was new and do a proper merge).

But we merge by using differential synchronization (kinda) to resolve local changes and remote changes (to construct a "patch") by keeping an unchanged copy of the upstreams latest version (from the clients perspective).

Good point! I think you'd have to implement a lock of some sort, and check for it in application logic-- that way both can be updated simultaneously... not ideal!

The description contradicts itself:

> Different from git: no merge conflicts to resolve!

But later:

> The only case Automerge cannot handle automatically, because there is no well-defined resolution, is when users concurrently update the same property in the same object (or, similarly, the same index in the same list). In this case, Automerge arbitrarily picks one of the concurrently written values as the "winner":


> Although only one of the concurrently written values shows up in the object, the other values are not lost. They are merely relegated to a _conflicts object


>git merge --strategy-option=theirs

This is cool! I've been thinking about this a lot in a similar domain (JSON / YAML for configuration), and I've come to the conclusion that anything approaching manual merge is pretty much the worst thing ever.

The domain I'm specifically thinking of is managing routing rules, like nginx configs or control-plane configs for Envoy. It's really tempting to simplify it down to a config file, then you version-control it, then you get merge conflicts, and all of a sudden a bad merge makes 10% of your services unroutable. Which is hilariously bad.

Having an auto-merge for those sort of configs seems like the only reasonable way to do it. Escalating conflicts up, from easy merge, to trying an automatic strategy, to surfacing conflicts to the user in a domain-specific way, seems like the only way to do it!

Does anyone have any experience and can offer a suggestion on javascript libraries that will handle text editing (like google docs or etherpad) and which can be dropped into a site and which aren't locked into a particular service or backend?

I.e. basically something like this but for text?

[ShareDB](https://github.com/share/sharedb) is the most commonly used, we use it extensively - also has bindings to rich text or code editors. [Y-js](http://y-js.org/) is a more academic project, also based on CRDTs I think.

Prosemirror is another option that has the editor bits implemented.


I've been working on Observed-Remove Map and Set implementations that replicate the native Javascript API, as well as versions that use IPFS PubSub for synchronization.

* https://github.com/wehriam/observed-remove

* https://github.com/wehriam/ipfs-observed-remove

The inherent tradeoffs can be challenging to explain. For example in these implementations, there are caching constraints on certain delete operations.

Automerge looks fantastic, and I look forward to more progress in the space.

How does this compared to differential synchronization used by google drive? https://neil.fraser.name/writing/sync/

I love Differential Synchronization. Howevr, Google Drive doesn't use it.

More specifically, by Google Drive, I assume you mean Google Docs. And Google Docs doesn't use diffsync, it uses OT.

What's OT?

why doesn't google docs use DS? Who uses DS?

What's the best choice of making something similar to google docs' co-edit feature?

I found this page informative and interesting


it has a live visualization of the OT algorithm

The live visualization was a lot of fun to play around with.

WRT "What's OT?": https://en.wikipedia.org/wiki/Operational_transformation

The wiki article includes information that should answer your last question as well.

Do you happen to know why Google Docs uses OT instead of a CRDT-based system? This is something I've often wondered.

Might just be that Google docs came out before CRDTs were a thing - 2006 vs 2011.

So this is to paths (keys and indexes) in a JSON object what git is to lines in a text file ?

Could git be configured with a different diff algorithm to have this functionality, or is it too tied to lines ?

This would work great with Firebase too. I remember writing a collaborative document editor similar to Google docs and the "document" was basically JSON with nodes for headlines, paragraphs, etc that multiple users could edit in real time.

That time I just used to write the whole JSON to firebase (with flock like locking) and it was really bad but since it was just a weekend project I didn't care much at that time. Bet a library like this could have saved me a lot of headache.

This seems to be Martin Kleppman's work - worth noting that his book, "Designing data intensive applications" is also very good :)

This looks great, depending on the nature of the "collaborative app".

E.g. perhaps the most popular collaborative app is Google Docs. This would be a terrible choice for Google Docs, as it doesn't have specialized transforms for text editing/formatting (which turn out to be rather challenging).

My experience with collaborative sync is that it is really application specific. Most of the time, I used a dumb store (just storing all versions) and had the app decide of the logic to produce the current state.

But, this library looks nice, and I'm sure it can fit some use cases.

Very similar to Realm (also syncing CRDTs), though this seems more lightweight and works in client-side web apps. https://realm.io

Recently I created jkt library for my NodeJs project. https://github.com/slaveofcode/jkt

This is pretty interesting. I've seen similar problems at various places I've worked, and having a library that uses a CRDT to solve it is a winning proposition.

Thank you very much for the publication, may I ask a few questions:

* How to integrate this with a decent pagination library?

* How to integrate this with vue.js?

* How does this compare to http://y-js.org ?

* How does this compare to http://gun.js.org ?

Thanks for your attention!

I am planning to doing the same thing, but using OT.

My idea about conflicts is, since OT is a centralized protocol, the server can refuse some conflicts and let user to resolve it.

Another thing I don't like about CRDT is for text editing, it requires a uuid for each char you type. this is a huge waste of memory and disk imho. on the other hand OT requires no metadata

Can't wait to try this.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact