Thank you! To your point about industrial use, yes – this is interesting. For example, Actyx makes a software system for coordination within a factory floor, and Ditto performs sync between devices of cabin crew on an aircraft. These are nice examples of industrial local-first systems.
I agree that for most engineers, getting into formal proof is intimidatingly hard and perhaps not a good use of time, because lighter-weight specification languages and model-checking can help develop a good understanding of a system's behaviour with a much lower time investment.
But the intended audience for this post was not engineers working on production systems; it was intended for researchers studying distributed algorithms (e.g. consensus algorithms), and researchers who already know proof assistants and want to know how to use their skills to verify distributed algorithms. Note that this post appeared on a blog called "Machine Logic", not "Practical Distributed Systems Engineering"!
I have personally got a lot of value out of formal verification because the distributed algorithms I work on (CRDTs) are sometimes so subtle that without a proof of correctness, I simply don't believe that they are correct. I have several times developed algorithms that I believed to be correct, only later to find out that I was wrong. Here's a case study of one such example: https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-969.pdf
For people like myself, who are designing such algorithms, I believe that formal proof is worth the effort. But most people are not in this situation.
Hi, one of the authors here. You're right that in order to ensure that you won't receive timestamps lower than some threshold, you need to know all the nodes in the system, and you need to hear from all of them (if even just one node is unreachable, that will be enough to hold up the process). That's quite a big assumption to make, but unfortunately there doesn't really seem to be a good way around it. Lots of CRDT algorithms have this problem of requiring causal stability for garbage collection.
Using a consensus protocol is possible, and would have the advantage that it only requires communication with a quorum (typically a majority of nodes), rather than all nodes. However, it has the downside that now a node cannot generate timestamps independently any more — generating a timestamp would then also require a round trip to a quorum, making the algorithm a lot more expensive.
Martin here. I cited your paper in the related work section. It's a good start, but it does not cover everything that's required to achieve BFT — in particular, the issue that an operation may be valid in some contexts and invalid in others, and all correct nodes need to agree on whether an operation is valid or not.
If you have more details on the caveats you have discovered, it would be great if you could write them up so that others can learn from your experience!
I now had time to read more closely. Thank you for citing us!
In our paper and original approach/algo (iirc), CRDT updates are applied from the bottom of the dag to the heads (older to recent). This is also what you propose although we don't discuss how this helps wrt BFT. I think it is a very good point to highlight that hash-graph based CRDTs (or Merkle-CRDTs as well call them) can provide this property when processed "in order".
The caveat is that, in practice we found this quite impractical: when receiving an update with unknown descendants, you'd have to choose whether to hold on it to apply it at a later point when the missing sub-dag is received (which may never happen), or to drop it and rely on re-transmission at a later point. The first opens the door to abuse by a bad actor because you spend memory holding on to un-applied updates. The latter causes additional work as updates could pile up over an missing one, thus incurring re-transmission costs for the missing and all the later elements.
This also means that peers pubsub systems should probably decide whether to re-broadcast an update based on whether is valid or not per the CRDT operation it contains, which can have bad side-effects: a network missing an update due to a network issue may block the broadcasting of later any later updates even if they are correct, thus worsening the delivery for everyone and forcing the network to issue more re-transmissions.
And as a result, a peer should also decide whether to issue/broadcast updates based on whether the previous update has at least been received by a non Byzantine replica, as otherwise updates built on top will not be accepted by the network until re-transmission. That makes another potential bottleneck.
In our implementation, we found more practical to process updates immediately as they are received, and then process descendants until we reach the "known" part of the DAG. This means every update can be broadcast, gossiped around, and will be potentially applied without doing any waiting for parents, and occasional loss of an update does not block the processing of new updates before re-transmission. If an update has descendants that do not exist then it can be considered a DAG with a different "root" and does not have many consequences in terms of convergence . Note that here we are talking about DAGs with depth of 100k+ items, where sometimes there are 200 heads and processing every update may take a few seconds. We need to avoid blocking a replica as much as possible and get as much work done as possible asap.
I think some CRDTs can get away with this (in our case, CRDT-add-only-sets, using the hashes as IDs) and be byzantine-failure resistant (things would converge). In the paper you mention some examples where CRDT-rules can be abused more easily, so I'm guessing it is more difficult other CRDT types. Ensuring that IDs are unique is one of the main advantages of using hash trees.
In general, I think an attacker can usually find ways to screw up with a non-permissioned CRDT system without breaking convergence (i.e by submitting many valid updates). Your approach makes however a very good point wrt to misbehaving nodes which are not necessarily malicious.
Hello, Martin here :) I use the Paper app (https://wetransfer.com/paper) on an iPad, then transfer the images to a computer, do a little post-processing in Gimp and ImageMagick, and then drop the images into PowerPoint. You might be able to present directly from the Paper app, but it's more designed for drawing than for presenting.
Interesting idea, but the devil is in the details. Especially two concurrent moves of partially overlapping ranges of characters is a tricky case to handle, and it is not obvious to me how your scheme would deal with this.
Another fun case is two concurrent range moves in which the destination of the first move falls within the second move's source range, and the destination of the second move falls within the first move's source range. How do you handle this?
I expect that any algorithm solving this problem will need a formal proof of correctness, because it's very easy to miss edge cases when using informal reasoning.
Just for the record, I was already aware of GUN previously, and think it is a good project to include in the list. We had just forgotten about it when putting together this list of links. I guess I don't check HN all that often. ;)
That makes sense, thanks for adding that explanation. Just wanted to refute the statement that Mark didn't prompt you to add it, as we seem to be a bunch of HNers that are getting tired of the spam.