
Why Sync Is So Difficult - peter123
http://gigaom.com/2009/05/10/why-sync-is-so-difficult/
======
rantfoil
As far as I can tell, Dropbox has actually solved the desktop file-sync
problem. It's the first sync product of its kind that I've used that works
out-of-box.

Yes, sync is hard. Where I think things fail is mainly around user experience.
You just have to make smart decisions and let users recover if they see
something they don't expect. Dropbox does this quite well.

I learned this when I was at Microsoft on the ActiveSync team -- we actually
were trounced by RIM not because our sync was worse (it was probably superior)
but because we initially made the mistake of OVER-reporting status (even minor
conflicts that normally you would want to just ignore.)

Sync should be a silent, no/very little UI experience -- a utility that just
works in the background. Any attempt to make it more than that will cause the
product to fail miserably.

~~~
dryicerx
Regarding _Sync should be Silent_ , In a perfect world it would be so, but
this is far from it.

There is no way to automate this conflict resolution for binary files (if the
same file is modified in two places). You can simply pick the latest modified
version, but this is less than optimal (the other modifications will be lost
then). Even dropbox isn't totally silent, in a conflict such as this, their
server keeps revisions (so you can pick and choose the correct one, making it
not silent).

Silent Sync (in a automated no-user-interaction way) has , is, and always be a
dream (at least for the foreseeable future).

~~~
sachinag
But this is _exactly where_ Dropbox is genius. They just keep both files -
they don't scream at you that there's a conflict. In the case that you open up
a file and it's not quite right, you're going to look into it. You'll find the
alternate file and then copy/paste your changes in. Then you'll re-save and
_ta-da_ , you're OK. But there are going to be all sorts of situations where
you'll have minor conflicts, and the user will never, ever notice.

~~~
jgmorard
SugarSync also does just that: the file is duplicated and the user is left to
resolve the conflict at a later time. Actually it does that in a really clever
way so that if users keep editing over the already conflicting versions, it
doesn't create new conflict versions but rather treats the new files as
further versions of existing conflict duplicates...

------
coolestuk
Lotus Notes has been doing multi-master replication for over 20 years. The
very architecture is built on it. Replication can be push-pull, push or just
pull. And can be scheduled or user-initiated. It automatically merges changed
documents during replication. When the automatic conflict resolution fails,
then it saves both copies and requires human intervention. Damien Katz
(creator of couchdb) was a programmer for a core part of Notes until about 5
years ago. And Notes was the major influence on CouchDB.

There are a ton of other good features in Notes (and a few bad ones too).

------
dangrover
Sing it, sister.

I implemented my own 2-way syncing for ShoveBox and its new iPhone app
(wonderwarp.com/shovebox).

I was very thorough in the way I did it, but there are still small issues that
I wasn't able to resolve by release time.

I'm going to write a blog post on this soon.

------
lpgauth
Wouldn't building a sync on top of something like git simplify the whole
process?

~~~
defen
The problem is that not every change can be auto-merged, and a "normal" user
does not want to deal with resolving merge conflicts. Binary file formats
present another problem - imagine trying to merge two different changes to the
same image or Word document.

~~~
lisper
It's actually even worse than that. Even plain text files can cause problems.
You make a change to a file over here, then you delete the file over there.
What should be the result of the merge? There are an infinite number of such
screw cases because the "right answer" depends on the semantics of the data.
For example: you add a line to a file over here, then you add the same line to
the same file in the same location over there. What should be the result of
the merge? Should you end up with one extra line or two? What if one line has
an extra trailing space? What if it was an extra leading space and the file
contains Python code?

~~~
boryas
But don't these problems exist regardless of your approach? Git and it's way
of handling trees of old commits is a decent place to start. It clearly isn't
a final solution, but building on top of it seems like a worthwhile direction,
at least to me.

~~~
lisper
> But don't these problems exist regardless of your approach?

Yes, of course. That was exactly the point I was trying to make: it's a
fundamentally hard problem.

> Git and it's way of handling trees of old commits is a decent place to
> start. It clearly isn't a final solution, but building on top of it seems
> like a worthwhile direction, at least to me.

That depends on what problem you want to solve. If you want to solve the
_general_ data-storage-in-the-cloud problem, then Git is fundamentally flawed
because 1) one of the inescapable aspects of the problem is that the solution
depends on the semantics of the data and 2) Git _by design_ knows nothing
about the semantics of the data.

------
rarrrrrr
SugarSync IMHO takes a backwards approach to syncing. Their original model
didn't even include historical versions - wrong way syncs destroyed data! The
system works by observing and then replaying the events (file creation, moves,
etc.) observed on one device to others.

SpiderOak implements a different approach, having initially built a
comprehensive journaling backup. Sync happens as a result of logically
combining the journal entries from all available end points. There's no "event
replay." The final state for a folder is calculated based based on the user's
likely intent from the totality of all actions taken in each folder over time.
The set of actions to perform locally on any device is the diff between the
calculated end state and the local state.

There are still some good points here. The cross platfrom issues mentioned are
subtle. (and SugarSync doesn't even support Linux.) For instance, ":" is a
valid character in Mac/Linux filenames but not Windows. And the case
sensitivity/insensitivity can create conflicts where they wouldn't otherwise
exist.

For character encoding, Windows is actually the easiest with Unicode natively
stored. Mac and most Linux distros use UTF_8 but there's nothing stopping
users from dumping a bunch of filenames with arbitrary heterogeneous encodings
all in the same folder.

~~~
jgmorard
The single reason why SugarSync listens to filesystem events is to build the
journals and then merge them. I am not sure how that is different, and how
that wouldn't match the user's likely intent.

You're right about the fact that there wasn't historical versions. But that
was a year back. They're here now, have you not seen them?

Unicode encoding is very subtle. The difficulty is that there are several
different - but equivalent - unicode encodings for the same strings, and the
different filesystems use different normalizations to make sure that they can
compare their strings byte-by-byte.

------
terpua
We are working on a syncing product for companies and everything he wrote is
spot on.

I would add that file locking issues is also a huge problem even when it comes
to a _simple_ conflict resolution.

Throw in case sensitivity issues, among others and yeah, sync is difficult.

~~~
jgmorard
Aaaah yes locked files are fun too, I forgot to mention that in the article.

------
sachinag
Some day, we'll have IMAP for files and this will all go away.

------
known

       Why Sync Is So Difficult ?

2-phase commit uses all or nothing and asynchronous replication does not use
it.

