
Show HN: Domain-tailored CRDTs for collaboration without server involvement - archagon
http://archagon.net/blog/2018/03/24/data-laced-with-history/
======
archagon
Hi HN! I wanted to research more elegant ways to enable document sync and
collaboration in my apps sometime last year, and ended up discovering a new
class of data structure that made it possible to build collaboration into
documents right on the data level, completely separate from the network
architecture. (Maybe you're already familiar with CRDTs. Well, these aren't
just ordinary CRDTs, but almost _meta-CRDTs_ that can represent a variety of
convergent data types in a generic way. They also make it super easy to
implement things like past revision viewing and delta patches.) As a proof of
concept, I built a basic real-time collaborative text editor for iOS that
works equally well over regular iCloud sync, CloudKit Sharing, and offline, as
well as a simple mesh network simulator for macOS that has support for
collaborative text editing and simple Bézier editing.

The code is strictly educational for the time being, but I'm working on a
Swift library that can hopefully put this stuff to use in real apps. Hope to
start dog-fooding in release software by the end of the year!

~~~
urbit
Unfortunately I wouldn't expect a lot of substantive technical discussion for
a post like this on HN -- this post will expire before anyone has time to read
it properly and produce thoughtful responses.

(Which is broadly the problem with posting enormous applied CS-research
braindumps to HN. Don't know a better place to do it, though. And I see people
are upvoting you. Also, the absence of thoughtless responses is itself a great
response.)

The key point here, as far as I can tell, is really building a CRDT system as
a true platform layer, rather than as a one-off solution for a special-purpose
app. I think it's fairly clear that _generic_ sync is a pretty essential part
of a modern decentralized computing environment, and I don't think it's clear
at all what the best way to solve it is.

But... it seems to me that you've solved a large piece of the problem but not
the whole thing. Because the most important document sync and collaboration
platform is, of course, source code control. If you have a document sync and
collaboration model that doesn't at least generalize to classic revision
control, why not?

Now, there's a sensible reason to separate these problems -- CRDT and OT
solutions tend to specialize in the zero-maintenance case where a user-
resolved merge is just impossible. Whereas if it was possible to build a zero-
maintenance revision-control system, which automatically resolved all merges
and conflicts, someone would have done so already.

This certainly suggests that the two are different problems. But generalizing
across slight differences is what system software does. Maybe one is a special
case of the other?

A generalized layer for lightweight collaboration is pretty powerful. But it
certainly seems like the case that if you could generalize across lightweight
collaboration and heavyweight revision control, you would have something that
would be incredibly powerful.

Or is this too ambitious? I don't know so I'm asking you.

~~~
blake8086
I suspect you could build such a thing if you could more rigidly define edit
operations on source code files. The way we edit code now is roughly
"insert/edit/delete string <x> at position <y>", which doesn't really carry
enough context to auto-merge.

If edits carried all the semantic information of what they were doing to the
source: "rename symbol <x> to <y> everywhere" "perform step <x> after step
<y>", then we could probably build a zero-maintenance revision-control system.

------
saurik
The author is wondering why iCloud Sharing is used by so few developers; I
will point out / remind that, in addition to fundamentally being an API whose
usage implies "I will never be able to build an Android app which
interoperates with this data" (which is already pretty damning for most
developers as a door they don't want to permanently close), the first few
releases of iCloud were so bad that documents and even entire clients would
end up in permanently wedged states that developers could not fix for users
and were so bad that at WWDC one year I remember the Apple iCloud person on
stage during the "developer state of the union" talk apologizing for iCloud
being so broken and begging the audience for another chance.

~~~
archagon
While true, I don't think this explains the disparity between regular CloudKit
use and CloudKit Sharing use. For example, this snippet from Apple's CloudKit
paper[1] particularly struck me: "We identified the top apps using CloudKit,
based on their number of active users in the past month, and examined their
use of private and public databases. We found that 20% and 49% of the apps use
only the public or the private database, respectively, and 31% of apps use
both databases (20 apps use the shared database)."

So few apps from their sample set were using CloudKit Sharing that they didn't
even bother using a percentage! That's quite unusual for an Apple framework.

[1]:
[http://www.vldb.org/pvldb/vol11/p540-shraer.pdf](http://www.vldb.org/pvldb/vol11/p540-shraer.pdf)

------
anilgulecha
> I admit that this revelation made some wily political thoughts cross my
> mind. Could this be the chance to finally break free from the shackles of
> cloud computing? It always felt like such an affront that our data had to
> snake through a tangle of corporate servers in order to reach the devices
> right next to us. We used to happily share files across applications and
> even operating systems, and now everything was funneled through these
> monolithic black boxes. What happened? How did we let computing become so
> darn undemocratic? It had gotten so bad that we actually expected our
> content and workflows to regularly vanish as companies folded or got
> themselves acquired! Our digital assets—some our most valuable property—were
> entirely under management of outside, disinterested parties.

This was such an enjoyable read!

------
codetrotter
> Most obviously, users should be able to edit their documents immediately,
> without even touching the network [...]

> The user should never be faced with a “pick the correct revision” dialog
> box.

These two goals cannot be reconciled.

If we both synchronize a document with content and then we both go offline and
we make conflicting edits then someone will have to resolve the conflict.

~~~
repsilat
I think the author is ok with "one of the edits wins," so long as all users
agree on the one that will win (without communicating.)

I'm still reading TFA, but this is probably the biggest reason I won't use
this system as described. (I will copy any useful ideas, though, and keep it
in mind for future projects.)

For my current project, it just doesn't make sense to blindly merge documents.
The result will be broken, or worse -- valid but silently incorrect. Having
changes you made and saw applied silently vanish is not acceptable to me, and
I think having users resolve merge conflicts _might_ be. Not sure, I'm aiming
at non-technical users...

We wouldn't accept this for code, though. If that is a principled stance, and
not just a side-effect of traditional code syntax being brittle, we should
wonder whether our users deserve the same.

------
xenadu02
Nice overview of the state of the art on CDRTs. Like the op, I agreed that we
are more likely than not to see large clarity brought to the field thanks to
the latest compsci research. In the future we may wonder how we lived without
the ability to do distributed sync of data structures.

------
hesdeadjim
Wonderful article, appreciate the detail. I've had great success in the past
using event sourcing, so anything that is even remotely similar is always a
fun read. Thanks!

