
Is Redlock Safe? Reply to Redlock Analysis - antirez
http://antirez.com/news/101
======
carllerche
This response is incorrect at a very fundamental level.

First, Antirez claims that requiring the "materialization" store to be able to
tie break with a monotonically increasing token requires linearization? This
is completely false. The monotonically increasing token allows for eventual
consistency. That's the entire point of it. It's monotonically increasing.

For anyone who claims that once you have coordination (locking) in your system
you already lost is completely ignoring the research coming out of BOOM
(Berkeley Orders of Magnitude). You can design your system to push the
coordination out to the edge and way from your "choke points" and use these
monotonically increasing tokens to keep your bottlenecks coordination free.

Secondly, Antirez's argument that you can use a compare & swap in a
transactional storage layer is also wrong. This is not possible to write
safely.

I'm not even going to touch his argument that using system clocks in a
distributed locking algorithm is safe...

~~~
dvirsky
> I'm not even going to touch his argument that using system clocks in a
> distributed locking algorithm is safe...

Please do

~~~
Guvante
Clock drift is a real thing in distributed systems.

If I have a 5 second lock that expires at 12:01:00 and my clock is off by 1
second I could write at 12:01:01 potentially after someone else has a lock, or
worse, after someone else has written.

~~~
dvirsky
but the expiration time is relative to the redis server's time, and the time
measurement on the client side is done relative to the client's time. Other
clients don't care about your clock and you don't care about redis' clock
AFAIK. From TFM:

> The client computes how much time elapsed in order to acquire the lock, by
> subtracting from the current time the timestamp obtained in step 1. If and
> only if the client was able to acquire the lock in the majority of the
> instances (at least 3), and the total time elapsed to acquire the lock is
> less than lock validity time, the lock is considered to be acquired.

~~~
dsp1234
What is the effect on the client of a VM live migration "pause"?

or daylight savings change?

or ntp updates which change the time?

Since the system clock can change relative to itself at any time, what effect
does that have on the algorithm?

~~~
xorcist
Distributed time is tricky, but it depends on what the intended use is. Under
normal operation it will only move forward, for example.

> VM live migration "pause"?

System clock will take a larger step than usual. It won't go backwards.

> daylight savings change?

None. System clock is UTC for a reason.

> ntp updates which change the time?

NTP only drifts the clock (under normal operation).

~~~
dsp1234
Not that I think it effects your point, but just off the top of my head:

 _System clock will take a larger step than usual. It won 't go backwards._

Assumes that checkpoint and restore to a previous time won't happen.

 _NTP only drifts the clock (under normal operation)._

Unless it's a system like this. Note that this was pulled at about 1pm EST.

    
    
      Waiting for clock tick...
      ...got clock tick
      Time read from Hardware Clock: 2016/02/09 22:36:50
      Hw clock time : 2016/02/09 22:36:50 = 1455057410 seconds since 1969
      Tue 09 Feb 2016 02:36:50 PM PST  -1.038948 seconds
    

_Under normal operation it will only move forward_

'Under normal operation' is not really a high bar for distributed systems.
After all, the network is reliable 'under normal operation' too.

~~~
Dylan16807
Use CLOCK_MONOTONIC. Or you could make your client abort when the clock goes
backwards.

If someone forks a VM from a running snapshot, you're screwed whether they
mess with the clock or not.

------
tptacek
Whoah, whoah, whoah, no:

 _1\. Most of the times when you need a distributed lock system that can
guarantee mutual exclusivity, when this property is violated you already lost.
Distributed locks are very useful exactly when we have no other control in the
shared resource._

In context, Sanfilippo seems to be arguing that providing monotonic logical
timestamps or sequence numbers is futile, since a system that relies on locks
that break is going to fail one way or another.

That is probably true, but wildly misses the point: when a sound system that
relies on distributed locks fails, _it doesn 't silently corrupt data_. But
that's exactly what happens in the "unfenced" lock-breaking scenario Kleppmann
describes.

There are worse things than crashing, is the point.

 _slightly later_

The argument about storage servers taking fencing tokens being "linearizable"
also seems flimsy: aren't they "linearizable" (in whatever sense Sanfillippo
means) in the case Kleppmann because _because the lock service arranges for
them to be_? That's functionality Kleppmann says is missing from Redlock.

~~~
antirez
Maybe you missed the point ;-) I said this and later showed how the Redlock
unique token per lock can be used to guarantee the same. So it's equivalent,
but yet _very non credible model_ in production. When you have a way to avoid
a mess when a race condition happens, it's better to avoid a distributed lock
at all, to start with.

~~~
tptacek
Can you further explain your "check" and "set" storage example? I feel like
you might be begging the question. "check" and "set" seem like they describe
_another lock_. How does that lock time out?

~~~
antirez
Whatever you can mount with a random token is always simpler than checking if
Token_ID_A < Token_ID_B which needs a linearizable store. But basically this
is how it works:

You want to modify locked resource X. You set X.currlock = token. Then you
read, do whatever you want, and when you write, you "write-if-currlock ==
token". If another client did X.currlock = somethingelse, the transaction
fails.

Edit: note that I noticed some confusion about this. Any other client is free
to replace X.currlock even if set to a past value.

~~~
kapitalx
The problem with this solution is that if A writes Token_A to resource, and
then A dies, then the resource can never be written to again. How do you avoid
not needing to time out the 'curr_lock'

~~~
SomeCallMeTim
No, when you acquire a lock, you are allowed to write to currlock immediately.
It's AFTER you've done whatever processing you need to do that you do a
compare-and-set transactional write that verifies no one else has written to
currlock between your acquisition and the completion of your task.

------
eliwjones
A lot of the back-and-forth here reminds me of:

"Note that, with careful optimization, only 14 gajillion messages are
necessary. This is still too many messages; however, if the system sends fewer
than 14 gajillion messages, it will be vulnerable to accusations that it only
handles reasonable failure cases, and not the demented ones that previous
researchers spitefully introduced in earlier papers in a desperate attempt to
distinguish themselves from even more prior (yet similarly demented) work."

from James Mickens' "The Saddest Moment" \-
[https://www.usenix.org/system/files/login-
logout_1305_micken...](https://www.usenix.org/system/files/login-
logout_1305_mickens.pdf)

~~~
ixtli
This is such an elegant way to phrase my general sense of frustration I felt
reading certain exchanges here. Thanks =)

------
justinsb
As with all distributed systems algorithms, we should assume Redlock is unsafe
until proven otherwise. A proof looks like
[http://ramcloud.stanford.edu/raft.pdf](http://ramcloud.stanford.edu/raft.pdf)
(Raft), [http://research.microsoft.com/en-
us/um/people/lamport/pubs/p...](http://research.microsoft.com/en-
us/um/people/lamport/pubs/paxos-simple.pdf) (Paxos) or
[http://www.tcs.hut.fi/Studies/T-79.5001/reports/2012-deSouza...](http://www.tcs.hut.fi/Studies/T-79.5001/reports/2012-deSouzaMedeiros.pdf)
(Zookeeper).

Personally I'd prefer stronger proofs than even those, but I think those
papers get us to the point where implementation issues are as important as
theoretical issues.

In my personal opinion, Redlock is not at the point where we even need to
consider the implementation.

~~~
eternalban
A good starting point. But those proofs are notoriously based on fairy tales
such as reliable networking, Fail/Stop, and other white lies.

The fact that a protocol (e.g. Raft) has been proven correct sets the floor
but it is certainly not sufficient to assume correctness of an
_implementation_.

~~~
sseveran
It should always be a prerequisite to have a proof before one begins an
implementation. If one cannot construct such a proof then they have no
business trying to design new distributed algorithms. Fortunately there are a
few algorithms for proofs of replicated state machines available as previously
noted.

Proofs will typically show the limitations of an algorithm, for instance the
original paxos paper notes that the contained algorithm does not provide
liveliness.

A system based on an algorithm with a proof has some chance of being correct,
while a system based on an algorithm with no proof while technically it may
have a non-zero probability of being correct that probability approaches zero.

Having worked on such systems and having talked to some people who have worked
on debugging issues with paxos implementations at extreme scale while it is
virtually impossible to to ensure that the system will be able to make forward
progress it is possible to ensure that the state of the state machine remains
correct.

When I look at an implementation of such an algorithm to evaluate it I pay
very careful attention to the test suites. The more bizarre the tests they
contain the more likely the implementation is to handle very weird network
edge cases.

------
jedberg
If you need a global lock in your distributed system, then you're already in
trouble. In _most_ cases, the best solution is to rewrite your software so
that you don't need the lock. I don't want to say all, because I'm sure there
are use cases where it would be impossible to do so. However, for my own
research, I'm having trouble finding examples where a global lock is
absolutely necessary. If anyone has any examples, please send them to me.

For example, a lot of people say, "updating a bank balance". But even that can
easily be eventually consistent. You send increments and decrements and settle
up at the end. Yes, your balance might go negative, but that's what ATM limits
are for -- the most you could go negative is that limit. Most other systems
can be written the same way. And for non-numeric values, there are always
vector clocks or timestamps or quorum to determine which update came next, or
you could send a client a list of all recent updates and let the client "do
the right thing".

~~~
eva1984
Is leader election somehow a usage of a global lock? I do agree it should be
avoided actively though.

~~~
jodah
It's related.

Leader election generally[1] requires consensus among distributed processes,
and global locks generally[1] require consensus as well. The benefit of this
is that both problems can be solved on top of a common consensus
implementation which is what [http://atomix.io](http://atomix.io) does.

1: I say _generally_ because you can do fancy things with fancy clocks to
avoid running operations through a quorum under certain circumstances, but
these carry caveats that preclude them from being reliable enough to use in
many use cases.

------
theptip
Another potential issue with antirez's argument:

""" If you read the Redlock specification, that I hadn't touched for months,
you can see the steps to acquire the lock are:

1\. Get the current time.

2\. … All the steps needed to acquire the lock …

3\. Get the current time, again.

4\. Check if we are already out of time, or if we acquired the lock fast
enough.

5\. Do some work with your lock.

...

The above also addresses “process pauses” concern number 3. Pauses during the
process of acquiring the lock don’t have effects on the algorithm's
correctness. """

What if the process GCs for 2 minutes between 4 and 5? Then the lock times
out, but the process still does its work with the lock, even though another
process may think it has the lock as well.

This is a specific example of the criticism that you cannot rely on clock
checks for correctness that has been made elsewhere.

~~~
antirez
This is clearly addressed in the blog post. A GC pauae after step 3 is
equivalent to any other pause after the lock is acquired. For example it's not
different than a lock server replying OK to a lock request but the client only
reads it after two minutes.

~~~
cyphar
Yes, but you can't predict how long it would take so you can't be sure the
lock you acquired you still hold while operating on a shared resource.

------
bsaul
I recently had to help some developer that was stuck trying to have mongodb
lock a resource and started using redis+redshift to prevent multiple webserver
from updating a document in parallel (mongodb basic locking features weren't
enough in his case).

As a RDBMS aficionado, that whole stack made me puke, BUT, to all the really
smart people here having a PHD in distributed systems, please : try to help
antirez build something that works. Many developers are trying to reach the
shore of consistent systems after having been lost in the ocean of "NoSQL is
great for everything".

Note : and i'm not saying redis is to blame here. It's obviously a great
software. It's mostly the users that are to blame in this case.

~~~
tlrobinson
Redlock, not Redshift, I assume?

~~~
bsaul
Yeap, typo

------
jamesblonde
I'll take one point here. Antirez's point that clients _have_ to timeout on
locks, otherwise the resource will be deadlocked is not correct. We
implemented an alternative model based on distributed shared memory. Clients
take a lock on the root of a subtree (of potentially millions of inodes in our
version of HDFS) and persist the lock to shared memory and look for existing
locks on the subtree in shared memory. If the client dies while processing the
subtree, its lock is still persisted in shared memory alongs with the
clientID. If another client tries to take the lock and it notices the client
holding the lock has died, it takes the lock. We used heartbeats to shared
memroy to implement the failure detector for clients (based again on shared
memory
[http://www.jimdowling.info/sites/default/files/leader_electi...](http://www.jimdowling.info/sites/default/files/leader_election_using_newsql_db.pdf)
). Importantly, timeouts for heartbeats expand when the system is congested.

~~~
antirez
There may be systems where you can reliably detect if a client has a lock at
all, and if it's still alive or is failing. In most systems this is not
possible, so your locks need to have an auto release feature. Distributed
locks with auto release are like a huge needed evil: you avoid using it at all
costs if you can, but there are problems when they are needed.

~~~
jamesblonde
In this case, we transformed the problem from detecting if the client has the
lock or not, to one of detecting if the client is dead or not - for which
there are many more protocols (with different levels of accuracy). The point
stands that auto-releasing distributed locks is not the only possible
solution.

~~~
antirez
It is impossible to write a reliable distributed failure detector in this use
case, in a practical system model.

~~~
jamesblonde
You mean in the asynchronous systems model, in the presence of failures, it is
impossible to write a reliable failure detector: [http://the-paper-
trail.org/blog/a-brief-tour-of-flp-impossib...](http://the-paper-
trail.org/blog/a-brief-tour-of-flp-impossibility/) However, you can write a
failure detector for a practical system with the help of omega (weakest
failure detector):
[http://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p685-c...](http://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p685-chandra.pdf)

In practice, failure detectors mean timeouts. What I am arguing is that you
can have systems that adapt their timeouts for failure (such as our one),
based on the current level of network congestion or load or whatever. The
setting/changing of the timeout is averaged over a period of time, to ensure
it is high enough to not give false positives. You could do the same approach
at the per message level, but for all possible messages, it may be
prohibitive. If you have a small number of messages, it may work. The problem
with messages, compared to failure detection, however, is the following: how
do you figure out you have had a false positive? For failure detection it's
easy: the timeout expired, butre i'm still alive and my heartbeat arrives
late, so we increase the timeout for that node. Vector clocks (fencing tokens)
make it easier to reliably find out if messages arrive very late (identify
false positives).

------
andy_ppp
I feel a bit like how a juror must feel when two expert witnesses are making
opposing arguments...

~~~
davidw
No kidding. Unfortunately, it seems people sometimes get kind of snarky with
these discussions, which I don't like. It starts to remind me of the Comic
Book Guy.

I would encourage everyone to be civil and patient, and if you have the time,
expand on what you're writing so that those of us who are not experts can
learn something.

------
alblue
Short answer: no.

Longer answer: provided that things never go wrong it works. But the
assumption that things never go wrong is false. In particular the whole basis
of the original point is that there is no true universal time, and this post
(essentially) says "provided that there is universal time then I can't see
what you are complaining about"

Swing and a miss.

------
pinkunicorn
I'm actually exploring the usage of Redlock in one my projects and the first
question I've is - How can you count on a program to end in a given time? That
feels similar to Halting problem. The only option I've is to use an external
program to kill the program that's using the lock after the lock's expiration
period.

~~~
antirez
Note: you have the same issue with all the other dist locks with auto-release.
If you have a mission critical use case you could:

1\. Try real time kernel extensions.

2\. Try a time bomb or watchdog process as you suggested.

3\. Put monitoring systems to avoid load to go over a certain level.

4\. Tune the Linux scheduler params.

5\. Use TTLs that are comparatively very large compared to the work you need
to do with the lock hold.

------
codys
I'm not distributed systems expert, but I don't see any attempt at responding
to:

> 1\. Distributed locks with an auto-release feature (the mutually exclusive
> lock property is only valid for a fixed amount of time after the lock is
> acquired) require a way to avoid issues when clients use a lock after the
> expire time, violating the mutual exclusion while accessing a shared
> resource. Martin says that Redlock does not have such a mechanism.

The section "Distributed locks, auto release, and tokens" talks about why
Martin's solution isn't right (I'm not well-read enough to tell if that
argument is correct), but never actually argues that Redlock has a mechanism
to avoid violation of mutual exclusion due to timeouts.

Perhaps I'm missing some background here: does such a mechanism exist?

~~~
squeaky-clean
(tl;dr, I don't think he does)

I'm also not a distributed systems expert, but this is how I interpreted the
arguments. AntiRez is responding to the scenario where a client acquires a
lock, hangs, a new client acquires a lock, and the old client comes back and
tries to write something based off of a now stale lock. As demonstrated in
this image [0]

So the random token ensures that even though many clients may "think" they
have a lock, only up to a single client can ever actually have one. The write
in the diagram which corrupts the database shouldn't happen, because you'll
need to check that your lock token matches the service. [1] (excuse the
terrible MSPaint modifications).

Which covers the common cases of lock timeout. But the clients aren't the only
part of the system that can violate the mutex. Martin seems to have already
thought of this rebuttal:

    
    
        You cannot fix this problem by inserting a check on the lock expiry just before writing back to storage. Remember that GC can pause a running thread at any point, including the point that is maximally inconvenient for you (between the last check and the write operation).
        [ ... ]
        If you still don’t believe me about process pauses, then consider instead that the file-writing request may get delayed in the network before reaching the storage service. Packet networks such as Ethernet and IP may delay packets arbitrarily
    

And I don't believe AntiRez answers this (or I'm not understanding when they
do :P ). Even if a client makes a write-request at a perfectly valid point in
time, when they own the lock, how do I know that by the time the write-request
reaches storage, that the lock is still valid? The difference boils down to
"redlock guarantees requests are sent during a valid lock", but not "redlock
guarantees requests transact during a valid lock"

Martin's fix [2] is similar to the token method suggested, except: A) the
write-if-token-matches logic is handled directly by the storage layer. B) It
uses an auto-incrementing token.

He spends a few paragraphs making the point that redlock cannot generate good
fencing tokens, because it has no good auto-increment consensus. But I don't
see how a random token is any worse than an incrementing one if you're just
going to straight up reject requests where the token doesn't match. Antirez
addresses this, without addressing A.

I would also really like someone to correct me if I'm missing something. The
Redlock implementations are very bare-bones with tests or examples of how to
actually use them. The Ruby library has a more "advanced" example, which says
this at the point where they try writing to the file system:

    
    
                    # Note: we assume we can do it in 500 milliseconds. If this
                    # assumption is not correct, the program output will not be
                    # correct.
    

Which doesn't give me good faith in this...

[0] [http://martin.kleppmann.com/2016/02/unsafe-
lock.png](http://martin.kleppmann.com/2016/02/unsafe-lock.png)

[1] [http://i.imgur.com/8rezAaX.png](http://i.imgur.com/8rezAaX.png)

[2] [http://martin.kleppmann.com/2016/02/fencing-
tokens.png](http://martin.kleppmann.com/2016/02/fencing-tokens.png)

[3] [https://github.com/antirez/redlock-
rb/blob/master/example2.r...](https://github.com/antirez/redlock-
rb/blob/master/example2.rb)

~~~
zzzcpan
> But I don't see how a random token is any worse than an incrementing one if
> you're just going to straight up reject requests where the token doesn't
> match.

No, random tokens are not worse per se, they just have no place in the
algorithm Martin described. It's a completely different system. In that system
incremental tokens presume a mechanism that gives you a way to have some order
of events. And one can rely on that order to decide which requests with which
tokens to reject.

------
fweespeech
> All the time Martin says that “the system clock jumps” I assume that we
> covered this by not poking with the system time in a way that is a problem
> for the algorithm, or for the sake of simplicity by using the monotonic time
> API. So:

> About claim 1: This is not an issue, we assumed that we can count time
> approximately at the same speed, unless there is any actual argument against
> it.

[https://aphyr.com/posts/299-the-trouble-with-
timestamps](https://aphyr.com/posts/299-the-trouble-with-timestamps)

> Timestamps, as implemented in Riak, Cassandra, et al, are fundamentally
> unsafe ordering constructs. In order to guarantee consistency you, the user,
> must ensure locally monotonic and, to some extent, globally monotonic
> clocks. This is a hard problem, and NTP does not solve it for you. When wall
> clocks are not properly coupled to the operations in the system, causal
> constraints can be violated. To ensure safety properties hold all the time,
> rather than probabilistically, you need logical clocks.

> A somewhat less safe but reasonable option is to use NTP rigorously on your
> machines, and sync it to TAI or GPS instead of POSIX time or UTC. Make sure
> you measure your clock skew: everyone I know who says they run NTP has
> discovered, at one point or another, that a node was way out of sync. If you
> want rough correspondence to POSIX time, you can still ensure monotonicity
> by running your own NTP pool and slurring leap seconds over longer time
> frames.

Clock skew is a very real problem and virtually impossible to avoid 100% of
the time. Particularly when you have things like leap seconds where the
fundamental concept of how many seconds in a day get changed.

[https://en.wikipedia.org/wiki/Leap_second](https://en.wikipedia.org/wiki/Leap_second)

 __System clocks are fundamentally unsafe if you just use NTP. __That isn 't
to say you can never do it. Some use cases it is "safe enough" [e.g.
Analytics, Metrics] where the occasional fuck up isn't substantial but Antirez
isn't making that argument with Redlock as far as I can tell.

> “Okay, so maybe you think that a clock jump is unrealistic, because you’re
> very confident in having correctly configured NTP to only ever slew the
> clock.” (Yep we agree here ;-) he continues and says…)

That still isn't reliably safe because there is no guarantee all nodes will
perform the slew identically.

------
SEJeff
This entire debate reminds me quite strongly of Peter Bailis's talk at
MesosCon last year on designing distributed systems to avoid coordination
where possible. It is a really excellent talk for anyone into this sort of
thing:

[https://www.youtube.com/watch?v=EYJnWttrC9k](https://www.youtube.com/watch?v=EYJnWttrC9k)

[https://mesoscon2015.sched.org/event/3Aot/keynote-silence-
is...](https://mesoscon2015.sched.org/event/3Aot/keynote-silence-is-golden-
coordination-avoiding-systems-design-peter-bailis)

------
jodah
The article calls on someone(?) to Jepsen test Redlock, which is a good idea,
but I think that the author(s) of such systems should release their own Jepsen
test suite along with the software they're purporting to be safe (as others
have done [1][2][3]). I would suspect that operations on Redlock, when
subjected to nemesis that partition, kill and skew-clocks on various nodes,
would not be found linearizable.

In general, I don't understand why one would build a system that attempts to
approximate consensus without just using one of the proven consensus
algorithms. Redlock is not the only one here, there are other systems that do
this as well.

[1] [https://github.com/atomix/atomix-
jepsen](https://github.com/atomix/atomix-jepsen)

[2] [https://www.datastax.com/dev/blog/testing-apache-
cassandra-w...](https://www.datastax.com/dev/blog/testing-apache-cassandra-
with-jepsen)

[3] [https://foundationdb.com/blog/call-me-maybe-foundationdb-
vs-...](https://foundationdb.com/blog/call-me-maybe-foundationdb-vs-jepsen)

~~~
takeda
The problem with Jepsen is that it can prove that systems are unsafe, but it
can't be used to prove that systems are safe.

For example Consul did their own Jepsen testing[1] in which they passed them,
but Kyle pointed out [2] that they passed them only because they changed
timeouts from 1 s to 300 ms effectively making the race condition window
smaller.

[1]
[https://www.consul.io/docs/internals/jepsen.html](https://www.consul.io/docs/internals/jepsen.html)
[2] [https://aphyr.com/posts/316-call-me-maybe-etcd-and-
consul](https://aphyr.com/posts/316-call-me-maybe-etcd-and-consul)

~~~
aidenn0
It's typicaly not tractable to prove that a system is safe. For example,
RethinkDB used the raft consensus algorithm, which on its own is proven
correct, but due to a bug could in some cases break linearization. Jepsen was
able to uncover this.

The best method we currently have for demonstrating that large systems are
safe is to try really hard to prove that they are unsafe and fail.

------
skybrian
For problem 1, you'd use a distributed lock to update more than one resource.
A single resource could very well implement its own lock in which case you
don't need a distributed lock. But once you have more than one resource to
update, you do.

A lock should result in linearized access (all resources updated while holding
the lock should be updated in the same order based on the time when the lease
was granted). A counter is enough to put the updates in order. Giving leases a
unique id doesn't do that - the network could reorder writes seen by two
different resources. Fencing ensures that if the network does reorder writes,
one of them fails.

The lock is acting sort of like a clock that ticks whenever it hands out a
lease. The resources effectively update their local clock whenever they see an
update with a new token from the central clock, and use this local clock to
detect stale writes.

Note that fencing doesn't guarantee that writes don't happen late - all
updates could be delayed by 10 seconds and it wouldn't catch it. But it isn't
feasible to prevent this on systems with bad clocks connected by a flaky
network.

------
cyphar
I don't understand many of the leaps you make in your argument. From what I
can see, you're defending that Redlock is "good enough" in cases where you
don't care about your computers hating you. This is not a good environment for
distributed systems when you need strong correctness properties.

Reading through the original description of Redlock[1] and your "rebuttal" is
interesting. The algorithm described in the spec isn't safe if a GC pause or
process pause happens between <checking how long it took to acquire a
majority> and <doing work and releasing the lock>. You don't even touch this
argument (pretending to not understand which step is the problem and defending
that everything before the gap I just described is safe).

You claim that fenced tokens require linearizablity. I don't know if this is
true, but even if it is all you've proven is that the solution offered isn't
good enough. The fact that something that tries to fix your system doesn't fix
it properly doesn't mean that your system isn't broken.

------
eternalban
Have you read this Salvatore? [http://research.microsoft.com/en-
us/um/people/lamport/tla/fo...](http://research.microsoft.com/en-
us/um/people/lamport/tla/formal-methods-amazon.pdf)

> ... The good thing is that distributed systems are, unlike other fields of
> programming, pretty mathematically exact, or they are not, so a given set of
> properties can be guaranteed by an algorithm or the algorithm may fail to
> guarantee them under certain assumptions. In this analysis I’ll analyze ..

Saw no mathematics in your article.

~~~
antirez
Unfortunately to test Redlock with a formal method is pretty useless. The
algorithm is so trivial, for the part you can mathematically model it, that is
a wasted effort to model it: even with a proof of the lock acquisition process
it all boils down to "is the system model credible"? And that's what we are
debating. Note that similarly Martin don't even try to address the problem of
_how_ the lock is acquired for the same reason I guess. Also note that if you
try to model the problem adding unbound processes pauses, and put as a
requirement mutual exclusivity with auto-release, I don't think there is a way
the model will ever be validated, _whatever_ system you model implementing
this.

~~~
agentultra
I'm still relatively new to formal specification but it seems that some form
of weak/strong fairness would suffice for the correctness of your invariant
and then it's a matter of testing your liveness properties while holding the
invariant with a model checker.

TLA+ seems pretty good at this. Why would this be a waste of effort?

------
abritishguy
I don't see a distributed lock implementation being both 100% correct without
any support of the resources guarded by the lock and actually feasible.

For data which absolutely must never violate the lock then the underlying data
storage has to be involved. The vast majority of times where a distributed
lock is used is where the underlying storage doesn't provide any support and
the distributed lock is good enough, failures are exceptional and not
disastrous.

If you want perfect correctness then you can just not have auto-release but
practically this will cause more problems than it solves.

------
reza_n
I understand how a fencing token can prevent out of order writes when 2
clients get the same lock. But what happens when those writes happen to arrive
in order and you are doing a value modification? Don't you still need to rely
on some kind of value versioning or optimistic locking? Wouldn't this make the
use of a distributed lock unnecessary?

~~~
Terr_
> But what happens when those writes happen to arrive in order and you are
> doing a value modification?

I believe the "first" write fails, because the token being passed in is no
longer "the lastest", which indicates their lock was already released or
expired.

------
newman314
Would something like Google TrueTime help in this scenario. My takeaway being
accurate with time...

------
agentultra
Neither the specification nor the analysis contain a formal, verifiable
specification. How can one test an invariant and properties and make real
conclusions about the safety/liveness of Redlock without it?

Is it not useful to do so?

------
cbsmith
I don't understand how the migration to a monotomic clock source is treated as
a trivial move. In practice, using a monotomic clocks source makes
coordinating lease time records correctly even more complicated.

------
cbsmith
I don't understand how the migration to a monotomic clock source is treated as
a trivial move. In practice, using a monotomic clocks source makes
coordinating lease time records correctly.

------
lukasm
What does "linearizable" means in this context?

~~~
querulous
there's a formal definition but informally it means every participant agrees
on an order of events in a system. if a system is linearizable you can safely
compare-and-set to mutate state without introducing conflicts

------
melted
The first step to getting out of the hole is to stop digging. The author of
this article clearly hasn't learned this lesson yet.

------
onetwotree
There's no salty like antirez salty. I have mad respect for the guy as a
programmer, but he needs to take things a bit less personally IMO.

~~~
antirez
Sorry I did not meant to be rude.

~~~
hyperpape
Fwiw, I didn't perceive your piece to be rude. I'm not sure who's right, but
it seems like everyone is being civil.

