
Pure Operation-Based Replicated Data Types - blopeur
https://arxiv.org/abs/1710.04469#
======
burntsushi
I've been following the literature on this topic for a while now, and I've
always wished for more examples of _actual systems_ built with these
techniques. Are there any? What kind of scales do they deal with? Are there
any write-ups on them? (I know of at least one good one[1].)

In particular, I've wondered how a simpler solution to the problems posed in
this paper stacks up. At what point does it fail? e.g., Consider:

    
    
        * An op based model.
        * A client-server architecture (not p2p).
        * Operations are assigned a total ordering by a central server.
        * Given any two dependent operations A and B, A is always
          transmitted before B.
    

This still gives you a lot of the stated advantages of an eventually
consistent system, where each client communicating with the server will
eventually converge even if they all temporarily diverge. The central server
and total ordering is a key ingredient, because it's what lets you guarantee
ordering between any two causal operations by having the server "choose" an
ordering.

I'm naturally interested in trade offs. For what use cases does the central
server model break down? Is it still useful for other things at lesser scales?

[1] - [https://medium.com/@raphlinus/towards-a-unified-theory-of-
op...](https://medium.com/@raphlinus/towards-a-unified-theory-of-operational-
transformation-and-crdt-70485876f72f)

~~~
ryuuseijin
What you describe looks a lot like a Google Wave like OT system. Wave-style OT
is eventually consistent, like CRDTs, but you need a central server to give
the event history a total order. This is necessary because Wave-style OT is a
1-1 model: clients are 1-1 connected with the server, but not with each other,
which would be n-n, which is what CRDTs can do.

The total order of the central server can make the system simpler and more
efficient, but by itself it doesn't solve the problem that Wave has, which is
allowing a client to edit his text/message without being interrupted by
network latency/interruptions -- imagine typing a letter and having to wait
for the server to acknowledge that keypress with >100ms latencies. To solve
this problem, you still need some form of xform/merge algorithm that OT and
CRDT systems provide.

EDIT: I assumed you were not familiar with OT systems since you didn't mention
it in your post, but now that I followed your link I can see that you are. In
that light, it seems your comment is more a question about what the tradeoffs
are between OT and CRDT systems rather than whether a central server can solve
all problems without xform/merge logic.

One tradeoff that comes to mind when thinking about OT and CRDT systems is in
the way operations track locations in the datastructure. In OT systems you
have offsets (small), in CRDTs you have uuids (large) or dynamically growing
identifiers (usually small but possibly large). This has implications for the
byte-size of operations or the in-memory datastructure.

Another is that CRDTs have a pruning problem. It has been some time since I
looked at CRDTs, but I remember that Wave-style OT didn't have the same
problem due to the central server. The pruning problem can cause a CRDT to
grow larger than it needs to by forcing it to keep more historic data around
just in case it gets an old operation it hasn't seen yet. The central server
solves this problem by guaranteeing that it will have sent you all old
operations before sending you a newer one. If you know all actors in a n-n
system you can also solve this issue, but in an unbounded n-n system I didn't
see any way this issue can be solved when I was researching it.

EDIT2: Just want to add that there are lots of other problems that are more
practical than theoretical. For example, authorization, authorative copy of
the data, REST API, things like that, but that would depend more on the exact
use case.

~~~
zzzcpan
With CRDTs log entries can be purged once they synchronize with everyone, no
need to keep them just in case. Although it's rather implementation specific.

~~~
ryuuseijin
Thanks, I did address that when I said you can solve it if you know all actors
in a n-n system, but I should have been clearer by pointing out the solution,
which is (as you already said) every known actor acknowledging it.

In an unbounded n-n system I still don't see a solution.

------
archagon
I'm not an expert or an academic, but I've been spending a lot of time with
CRDTs lately. It seems to me that there are two major issues with this
approach.

First, little is said about performance. As the paper explains, the meat of
each CRDT is pushed into the eval function, which to my understanding is
simply a function over the PO-Log. However, is it always possible to adapt a
convergent data structure in such a way that eval takes a reasonable amount of
time? I notice that sequences — perhaps the most important data type in CRDT-
land! — have not been implemented using this approach. If we assume that a
sequence can be retrieved from an insert/delete PO-Log by simply sorting it
and removing the deleted operations, does that mean that every eval is
O(Nlog(N)) at best? And if your solution is to cache the output sequence as an
array, a) how do you ensure a correct mapping between the PO-Log and cache on
every new operation, and b) what happens if you lose your data and need to
replay your PO-Log from scratch? Can the cache be reconstructed O(Nlog(N)) at
worst? A complete guess, but maybe having CRDT-specific bits in the
prepare/effect steps is what actually allows CRDTs to be performant in the
first place! PO-Log representation is alluringly flexible but seems to come
with some hefty tradeoffs.

Second, one of the cleanup steps relies on causal stability, i.e. knowing that
each client is ahead of a potentially concurrent op. This is a problem in
pure, decentralized P2P environments. First, depending on your network
architecture, it's not necessarily possible to identify each peer until they
actually start sending messages around. Maybe they got their hands on an early
revision of the data and have been chipping away for weeks before going
online. Second, nothing prevents a peer from connecting for a bit and then
leaving forever, thus ensuring that their last edits will never be causally
stable. This can be solved with some centralized logic, but then what's the
point of using a CRDT at all?

Finally, and less critically, the lossy cleanup steps make it impossible to
retrieve old revisions or identify the author of a particular change.

------
thdxr
This is the exact topic I've been trying to think on for the past month.
Excited to dig through it.

