
Roshi: A large-scale CRDT set implementation for timestamped events - striking
https://github.com/soundcloud/roshi
======
techie128
Twitter ran into the same problem. They solved it in a different way. They
merged approach #1 & #2 described in the article. For high fan out, you don't
write to the inbox of all their followers. This avoids high write
amplification when a popular celebrity tweets. Instead you send notifications
and have the followers pull tweets from popular celebrities while constructing
their timeline.

Having worked with Cassandra & Redis extensively, this might work better with
Cassandra. Cassandra already has sets, maps, list as datatypes and can be
leveraged to build a CRDT without having to write a new database. Oh, well I'm
guessing we're past that conversation? ;)

~~~
ryanpetrich
Cassandra’s sets, lists and maps are already CRDTs

------
marknadal
I've praised several other CRDT implementations here on HN, but this one seems
a bit odd - asking the author for more explanation behind their reasoning:

\- a time series CRDT is... just an append-only log, sure you can call it a
CRDT, but it doesn't really have any meaningful properties. So what's the
point of calling this a CRDT?

\- LWW Last Write Wins is also an odd choice, because the notion of "last"
doesn't exist in a distributed system - last according to who? When? What
about clock drift? If who, then do you rely upon an authority? Then you aren't
a CRDT.

Finally, regardless of CRDT stuff, what benefits does this give over just
using timestamps series into Redis directly?

I applaud and have called for more people to build databases and distributed
systems tools, so please keep it up. But I'm a tad worried this is just trying
to cash in on the emerging hype (finally!) around CRDTs. Could you explain
yourself more?

~~~
peterbourgon
Principal author here.

> a time series CRDT is... just an append-only log, sure you can call it a
> CRDT, but it doesn't really have any meaningful properties. So what's the
> point of calling this a CRDT?

I'm confused by this question. I don't think it's true that "a time series
CRDT is just an append-only log". In Roshi, each object is identified by a
key, and LWWSet semantics are provided via the timestamp. It's a CRDT because
writing the same key repeatedly with different timestamps results in a single
"winner" being kept in storage; if it were an append-only log, that wouldn't
be true. Right?

> LWW Last Write Wins is also an odd choice, because the notion of "last"
> doesn't exist in a distributed system - last according to who? When? What
> about clock drift? If who, then do you rely upon an authority? Then you
> aren't a CRDT.

I'm also confused by this question. LWWSet requires a concept of "last", it's
even right there in the name. And the documentation clearly indicates that
Roshi's "last" is clock time of the node that processes the request. As long
as you ensure that time is unique and monotonic (usually by including things
like node ID and a per-node monotonic counter) then it's absolutely sufficient
to act as the timestamp component of a LWWSet. If there is clock drift between
nodes, then a write W1 against some node may actually "beat" a later write W2
against another node with a slower clock, but in terms of correctness, that
doesn't actually matter: the write conflict still resolves deterministically.

> What benefits does this give over just using timestamps series into Redis
> directly?

I'm also confused by this question, and I'm not quite sure how to answer it.
The operation semantics, and the benefits they confer, are quite exhaustively
described in the README. You might also check the READMEs in the cluster and
farm subdirectories, which go into more detail. What specifically don't you
understand?

~~~
zzzcpan
> I don't think it's true that "a time series CRDT is just an append-only log"

You are correct. Append-only log as an implementation detail doesn't converge
and cannot be a CRDT. Although semantically logging can and should be a CRDT,
as it doesn't need to be append-only at all.

> As long as you ensure that time is unique and monotonic (usually by
> including things like node ID and a per-node monotonic counter)

As other comment noted, it's not enough. You need some causality here, at
least a lamport timestamp (which is a rule of thumb on timestamps anyway).
Because even with synced clocks some will always be faster and always win
writes.

~~~
peterbourgon
> even with synced clocks some will always be faster and always win writes.

That's true, but, in the use cases that Roshi targets, it's also not a
problem. Users will notice and re-issue writes.

