Hacker News new | past | comments | ask | show | jobs | submit login

There may be systems where you can reliably detect if a client has a lock at all, and if it's still alive or is failing. In most systems this is not possible, so your locks need to have an auto release feature. Distributed locks with auto release are like a huge needed evil: you avoid using it at all costs if you can, but there are problems when they are needed.



In this case, we transformed the problem from detecting if the client has the lock or not, to one of detecting if the client is dead or not - for which there are many more protocols (with different levels of accuracy). The point stands that auto-releasing distributed locks is not the only possible solution.


It is impossible to write a reliable distributed failure detector in this use case, in a practical system model.


You mean in the asynchronous systems model, in the presence of failures, it is impossible to write a reliable failure detector: http://the-paper-trail.org/blog/a-brief-tour-of-flp-impossib... However, you can write a failure detector for a practical system with the help of omega (weakest failure detector): http://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p685-c...

In practice, failure detectors mean timeouts. What I am arguing is that you can have systems that adapt their timeouts for failure (such as our one), based on the current level of network congestion or load or whatever. The setting/changing of the timeout is averaged over a period of time, to ensure it is high enough to not give false positives. You could do the same approach at the per message level, but for all possible messages, it may be prohibitive. If you have a small number of messages, it may work. The problem with messages, compared to failure detection, however, is the following: how do you figure out you have had a false positive? For failure detection it's easy: the timeout expired, butre i'm still alive and my heartbeat arrives late, so we increase the timeout for that node. Vector clocks (fencing tokens) make it easier to reliably find out if messages arrive very late (identify false positives).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: