For an approach that doesn't ever force you to throw away or merge data, see the data structures in my OSCON talk. https://speakerdeck.com/jchris/couchbase-lite-oscon
I owe the world a write-up explaining these slides. Or come see me talk at StrangeLoop in St. Louis or Couchbase SF in September. http://www.couchbase.com/couchbase-sf-2013
For some end users it's already confusing to have different states in the same linear history on different devices because of upload times and devices being offline. I think giving most users multi-branch capabilities would overpower them and they might just ignore all branches but one, manually merging parts of older branches into the newer ones.
So I guess you assume usecases for application writers? I'm also not deep enough into this topic that I could imagine any usecases. Would you mind giving some examples?
For instance on a customer record you might have three sales folks all qualify a lead as a 4, one adds a fax number, one adds an email address, and one adds the same fax but a different email, and a note about the best time to call. If the database forced the application to merge on encountering each update, there'd never be a chance to set all three versions side by side and make an informed comparison. Instead it would be luck that A and B merge, and then the result merges with C. Or some other order.
Where it gets really useful is if you have different actors that have ownership of the merge on different fields. So for instance you could partially resolve the conflict by merging some fields automatically but present a human with the full set of options for a bigger octopus merge on, say, the notes field.
Let's say I have 10 devices d1,d2....d10 making updates to "a" on the server and went offline.
a==20 and last update was by d5 before everyone went offline.
When the devices come back up, the fate of "a" depends on the rulesets. Following are 3 possible high-level combinations.
i) All devices have "remote" rule.
On reconnection, everyone rollback "a" to 20. They are essentially back to the time before going offline. Even the device which did the last update(d5) before going offline is rolled back too, which seems bit odd. Still simple to reason with..
ii) All devices have "local" rule.
On reconnection, the last device to reconnect updates "a". It is then broadcasted to all other devices. Note that
it is not the last device to update "a". Rather it is the last to reconnect (Now, even if all of them reconnect at same time, depending on the queueing at server, the one at the tail wins). Not really simple..
iii) Mix of "remote" and "local"
Let's say d1 had "local" rule and all others had "remote". On reconnection, d1's "a" will be propagated to everyone. This is irrespective of the order of reconnection (I am assuming that between reconnections "a" is not modified). This is pretty simple and perfectly predictable. Now, if we have more than one "local", we start getting non-deterministic, and at the extreme move to case ii)
So when devices d1 through d10 make a (simultaneous, I assume) change, they all submit their change with the same parent revision. Assuming they were up-to-date before they submitted that change, exactly one of the device's changes will succeed (whichever reaches the server first). For example, if the previous revision was 100, they'll all submit changes with the parent revision 100, and the first one to reach the server will succeed, at which point the server revision will be increased to 101. When the other changes come in, they'll all fail because their parent revision doesn't match the server.
I have to slightly revise your scenario and say all these devices went offline and then made a local change. This may be what you meant, but I want to clarify that the changes were queued up locally but not yet sent to the server.
So in (i), where the conflict resolution strategy is "remote," what will happen is that one device's change will win (whichever reaches the server first), and all other devices will throw out their change in favor of the change that made it to the server. It's not the case that everybody's changes are rolled back.
In (ii), the first device to connect submits its change and is accepted by the server. Subsequent devices submit their change (with parent revision of 100), see that they're out of date (server revision is now 101 or higher), and resubmit their change with the new parent revision, effectively clobbering any changes that have been made on the server. So each device in turn clobbers the value on the server, and the ultimate value is whichever change was submitted last.
For (iii), really don't do that. Having different devices use different conflict resolution strategies is a bad idea, and I can't really think of a valid scenario fro that. Can you?
So to sum up, (i) would mean "first change to reach the server wins," and (ii) would mean "last change to reach the server wins."
Think we agree on case ii). On case iii) I still think that having 1 local and N-1 remote might be useful when we want to prioritize a particular writer over others. Borrowing the sales example from commenter jchrisa, consider a new user sales-head (local rule). He syncs the data uploaded by his sales folks (remote rule) and then goes offline. When he is done editing and comes online, he wants to make sure his delta takes precedence irrespective of any previous changes by sales folks during his being offline. Since he has local rule, his update will just win. Further, he does not want sales guys who were offline and come online after him to overwrite his last update immediately. I am assuming that the sales folks with "remote" rule will see the data with newer server version and accept it.
BTW, (for now) the Datastore API doesn't support sharing among multiple users.
Given the relationship the Datastore API has between the client and the server, it's also possible for a client to use a custom resolution rule (though this isn't yet exposed in any of our clients), which would let you do smarter things like look at the changes that were made and decide based on app-specific logic which one should win. I personally think that's a big advantage, but as I said, it's not one that's yet been fully realized.
The best engineers not only needs to write great code, they need to write the best documentations too :)
How would this algorithm handle changes to the same text file? Is the file a record, and each row considered a value in that record?
EDIT: Looks like this API is only for "structured data like contacts, to-do items, and game state."
(https://www.dropbox.com/developers/core and https://www.dropbox.com/developers/sync)
This is pretty much all that's possible to do generally for files, since their structure is opaque. One of the ideas behind the Datastore API is to support developers structuring their data in such a way that changes can be merged automatically.
I'm not logged in, and it doesn't log me in.