There may be systems where you can reliably detect if a client has a lock at all...

jamesblonde · on Feb 10, 2016

In this case, we transformed the problem from detecting if the client has the lock or not, to one of detecting if the client is dead or not - for which there are many more protocols (with different levels of accuracy). The point stands that auto-releasing distributed locks is not the only possible solution.

antirez · on Feb 10, 2016

It is impossible to write a reliable distributed failure detector in this use case, in a practical system model.

jamesblonde · on Feb 10, 2016

You mean in the asynchronous systems model, in the presence of failures, it is impossible to write a reliable failure detector: http://the-paper-trail.org/blog/a-brief-tour-of-flp-impossib... However, you can write a failure detector for a practical system with the help of omega (weakest failure detector): http://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p685-c...

In practice, failure detectors mean timeouts. What I am arguing is that you can have systems that adapt their timeouts for failure (such as our one), based on the current level of network congestion or load or whatever. The setting/changing of the timeout is averaged over a period of time, to ensure it is high enough to not give false positives. You could do the same approach at the per message level, but for all possible messages, it may be prohibitive. If you have a small number of messages, it may work. The problem with messages, compared to failure detection, however, is the following: how do you figure out you have had a false positive? For failure detection it's easy: the timeout expired, butre i'm still alive and my heartbeat arrives late, so we increase the timeout for that node. Vector clocks (fencing tokens) make it easier to reliably find out if messages arrive very late (identify false positives).