Hacker News new | past | comments | ask | show | jobs | submit login

> If the state was concurrently changed on different devices, Automerge automatically merges the changes together cleanly, so that everybody ends up in the same state, and no changes are lost.

> Different from git: no merge conflicts to resolve!

This is impossible unless there are significant restrictions on what kind of operations are possible.

If I have a bag of 15 apples, and I take 10 of those apples at the same time as somebody else takes more than five apples then we have a merge conflict right there because we can't end up with a negative amount of apples in the bag.




It's right there in the README:

> The only case Automerge cannot handle automatically, because there is no well-defined resolution, is when users concurrently update the same property in the same object (or, similarly, the same index in the same list). In this case, Automerge arbitrarily picks one of the concurrently written values as the "winner":

https://github.com/automerge/automerge#conflicting-changes


This is very disheartening, because there are a few reasonable strategies:

1. CRDT like linear change rules.

2. Vector clocks.

3. Forking the entire tree and allowing merge queries, creating a DAG of the state that can be interactively and speculatively remerged.

Just picking one winner arbitrary and silently discarding all peers sounds like the kind of thing that gets labeled as a bug when we examine concurrent data stores.


Automerge uses vector clocks internally to track visibility and minimize conflict and retains all alternative values in the _conflicts list for that value.

Automerge must select one result to be the default consistently across all uncoordinated peers. If you don't do this, different nodes may see different documents during a conflict state, which is undesirable.


Okay, good answer. I can see why for ease of use you might not just demand that clients work with the raw vector clocks.


Well, so much for "no changes are lost", instead it should be "one of the change is lost, and you can't know which one until it's done ! Suprise !!!"


Conflicts are surprisingly rare, and when they are detected they show up in the _conflicts list for resolution per your needs. What automerge guarantees is that unless you care about the conflict you don't need to take any action.


There is a change history. No changes are lost.


Can clients quickly access the change history to figure out what happens? Can the relevant change history for an object or a proposed change be retrieved in linear time?


Merge conflicts are handled transparently and seemingly indeterminately? Yikes.


This isn't for a banking system, this is for building collaborative apps. If you press "A" and your friend press "B" into a Google Docs document at the same time, you also don't know if the document is going to read "AB" or "BA". In practice, this is not an issue.


Then there'd be a negative amount of apples. It's not impossible if you disregard context.

There is nothing new about doing things like this, OT and CRDT have existed for ages. Check out ShareDB (https://github.com/share/sharedb) or Webstrates (https://webstrates.net/) (based on ShareDB). In Webstrates, we don't have merge conflicts that need to be resolved and there are never any practical synchronization issues. The server orders the operation and if you try to delete something that's already been deleted, then we just ignore your operation.

Also quite courageous to say something is impossible when you have the code that does it right in front of you. ;-)


I think when we say "impossible" we mean "well sure any old thing is possible but there are hard limits on what's consistent and sane."

I will go read the docs in more detail but what you've just describes sounds pretty awful. Perhaps it's more just you being glib about non-cooperative actors in an environment that expects cooperatuon, but an "available balance" abstractions are a pretty reasonable thing to ask for, if only to encode natural numbers.


I'm not one to dictate how technologies like these are to be used, but all I can say is that in the almost two years I've been working on Webstrates – a collaborative system using a very similar technology – this has not been an issue.

But surely, if you allow malicious actors to modify your document, then that's your problem right there.


It's still impossible in the general case. You merely proved him right. Limiting yourself to OTs and CRDTs is a signficant restriction.


Why is that a restriction? You can make any changes to the JSON you wish, you just have to use their API.


ShareDB looks quite interesting. We wanted similar capabilities but without a central authority, thus automerge was born.


We've been wanting to try a CRDT-based implementation with Webstrates for some time, but haven't found a suitable implementation, so this looks very promising!

Any particular reason why you choose CRDT over OT?


I'm told OT gets very complicated, and I hoped a good CRDT would be general-purpose enough to cover a variety of applications and domains. Also, I like not needing a central server to coordinate action. Would love to compare notes -- feel free to reach out to me on twitter at the same nick.


Yet another problem is access-control: what if a user is allowed to access only part of the data-structure? And what if you'd want to encode access rules in the data-structure itself?


Another problem is this. Let's say there is a counter that is at value 100. I increment the counter. Simultaneously another user increments the counter, so the stable value should be 102. However, a operation-agnostic "merging" approach like described here can never catch that, and sets the final value to 101.


CRDTs generally benefit from "intent preservation" in their design. In automerge's case, this would mean that instead of storing a "set" value, you'd store either an "increment" value or a "set" value to support cases like the one you describe.

Automerge has fairly robust support for these kinds of use-cases around lists, which we use quite a lot, but we haven't actually needed them for numbers (though I expect we may want them eventually.)


My experience is with OT, so I can't speak for CRDT, but I can imagine it's similar.

With OT, you send a na[1] (number add) operation, so this would work fine. Indeed, if you treat the number as a string, then you have a problem.

[1]: https://github.com/ottypes/json0#summary-of-operations


This doesn't compose with all operations. One users adds and another user multiplies, for example. That is to say, not all operations are associative.

I bet you can run really far with this general idea. But there is no panacea.


You're absolutely right, but I'd also never suggest a collaborative system for anything critical. As I've said earlier in this thread: If you're in a Google Docs document with a buddy and you write "A" and he writes "B" at the same time, you also don't know whether that'll show up as "AB" or "BA". In practice, this isn't an issue.


To underscore our point, concatenation on strings is not associative, either.

To that end, the basics of math in associative, cumulative, distributive, etc., go together to create an algebra of what you can automate rather easily. I think most of us stopped thinking in terms of those laws years ago. To the point that it is probably odd to folks that stayed close to them.


> To underscore our point, concatenation on strings is not associative, either.

I think you mean commutative. It certainly is associative.


Ha! Yes. I meant to draw attention to all of the fundamental laws people are used to. Messed up in some edit.

Thanks!


As I understand it, CRDTs don’t work on atomic types you might use in normal code - they work on larger structures where the problem has been solved.

Wikipedia has some examples: https://en.m.wikipedia.org/wiki/Conflict-free_replicated_dat...


If you model all state changes as a serializable list of actions, like with Redux or Vuex, this is a non issue. You'd simply get the same two commands and your reducers would compute the same state for all clients, thanks to immutable data structures we don't need to mutate the state & do not have the issue you described.


The challenge is when there are multiple observers that have a different opinion on what order the commands happened in. Redux and Vuex don't have that problem, that's why it's a nonissue there.


Using lamport/logical timestamps you can ensure a distributed and totally consistent ordering of actions.

I wrote a middleware for Redux which propagates actions peer-to-peer in a consistent order using these timestamps and the scuttlebutt gossip protocol, you might find it interesting. https://github.com/grrowl/redux-scuttlebutt


That sounds like expected behavior. Both users are incrementing a value of 100, which would be 101 in both cases. It's not atomic.

If User A incremented 100, and saved it down; then User B loaded this saved state and incremented 101, it'd be 102.


But the point was: what if you wanted the behavior to be different than what you described. What if the intent was really "increment"?

For example, say you have a list AND a counter. Everytime you add an item to the list, you need to increment the counter, as an invariant of your system.

Of course, it's a contrived example, but you get the point: things can get more complicated and the system may break as a result, unless you're very careful.


We had this problem in our iOS app, solved it by using an array of ints wich are the increment-operations (well, we had uuids on them as well so we could determine what was new and do a proper merge).

But we merge by using differential synchronization (kinda) to resolve local changes and remote changes (to construct a "patch") by keeping an unchanged copy of the upstreams latest version (from the clients perspective).


Good point! I think you'd have to implement a lock of some sort, and check for it in application logic-- that way both can be updated simultaneously... not ideal!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: