I hadn't seen RON It looks really interesting - I'll definitely take some time to play with it sometime soon. It's so great how much work is happening in this space right now.
> 1-3 is generally true for ORDTs (in archagon's terms).
That makes sense to me except for calculating diffs efficiently. Are you saying that can be done in time proportional to the _diff_? (as opposed to the number of changes?)
> P.S. I think, I will elaborate. So, imagine a real-time collaborative editor. Assuming every edit (1 char) is also a git-like commit, you have to mention its hash and its parent hashes. noms trims hashes to 20 bytes, that's reasonable, so we have at least 40 bytes of metadata, incompressible by definition. That means 1:40 data-to-metadata ratio or 1:200 if we count compression in. That is more or less the reason we did not use hashes in one our project some time ago.
> What is your perspective on that?
We don't actually transmit the hash of a chunk if we transmit the chunk itself since that's duplicate information. So the overhead would be half your calculation.
My perspective is that this frequently an entirely acceptable tradeoff. Your calculation is the absolute worst case, and there are many applications for which that worst case occurs infrequently. Today's collaborative text editors, for example, usually damp the liveness on purpose for ux reasons. It's cool to see someone type live, until you are that person, and realize that the other side is watching you misspell in realtime :).
The minimum size of an update in Noms is currently one "chunk". Chunks are set to average 4kb in the code currently, though that is tunable (the cost is that cost of access and write locally will go up).
If you wanted to reduce the size of updates in Noms further, you could do that transmitting patches to know chunks rather than new chunks.
It looks like the identifiers in RON are 16 bytes, so I think if you continue to push this direction in Noms you end up with wire formats that cost about the same. It just hasn't been something we've worked on yet.
In exchange for this tradeoff, what you get in Noms is:
- No need to ever accumulate a history of changes in order to know the current state. Each snapshot directly and efficiently represents the current state. It looks like in RON you can transmit a state, but it's not clear to me when nodes would choose to do that vs a patch.
- Local storage properties comparable to a traditional database. Store large amounts of data with efficient queries, sorted scans, mutations, etc.
I'm not sure how this all compares to RON, I'm looking forward to digging into it.
It is like the Lamport event model married to a LSMT database. And their children are CRDTs.
> That makes sense to me except for calculating diffs efficiently. Are you saying that can be done in time proportional to the _diff_? (as opposed to the number of changes?)
Slightly different abstractions here, but yes, in most cases a "diff" is a tail of an op log. On request, it gets compacted and sent. I guess, in noms you leverage immutable data structures for that.
Thanks, now I understand your trade-offs better.
What happens in the case where the op log represents changes that went back and forth a bunch between two states. Does that ever get compacted down locally?
The difference here is that Noms will never have to read or compact those intermediate states. If a character gets inserted and deleted a bunch of times, and then that set of changes get synced with a peer, the source peer diffs the beginning and end state and ignores the intermediate ones. The local work done will just be proportional to the actual change between the two states (in this case finding a diff of a single char). This work is always proportional to the change in the two states, not the amount of ops that transpired between them.
> I guess, in noms you leverage immutable data structures for that.
That, and content-addressing.