
Everything you never wanted to know about file locking - duck
http://apenwarr.ca/log/?m=201012#13
======
tedunangst
Every BSD system I'm aware of has something called lockf, which may or may not
be layered on top of flock (or the other way around).

Note that all of these locking schemes fall down hard on NFS, depending on
client and server, regardless of what any man pages say about support.
Homework question: How does a stateless protocol like NFS remember when one
client locks a file and a different client tries locking it?

The failure mode on NFS can vary from "lock always succeeds, regardless of
actual status" to "lock always fails, hanging forever", or my favorite, "locks
work, unless your process crashes, and then you can never lock that file again
until you reboot the NFS server".

~~~
DanielRibeiro
I had discovered this the hard way. Replaced with Apache Zookeeper, which now
is much more stable than at the time. It is meant to be a port of Google's
distributed lock server Chubby.

------
tptacek
Worth reading all the way to the bottom; the Python stuff made me giggle.

------
glhaynes
Good article. I don't quite understand this part, though:

 _Still not convinced_ [that you shouldn't use mandatory locking] _? Man, you
really must like punishment. Look, imagine someone is holding a mandatory lock
on a file, so you try to read() from it and get blocked. Then he releases his
lock, and your read() finishes, but some other guy reacquires the lock. You
fiddle with your block, modify it, and try to write() it back, but you get
held up for a bit, because the guy holding the lock isn't done yet. He does
his own write() to that section of the file, and releases his lock, so your
write() promptly resumes and overwrites what he just did._

If you're going to read some data and then potentially write modified data
back over what you just read, shouldn't the first step before you even read it
be to get a lock on the file or that range? Or, if your calculation might take
a long time, at least get a lock right before you write it back, verifying
first that you're writing over data that hasn't changed while you were off
calculating?

~~~
stingraycharles
That was my first intuition too. It sounds like a classic race condition, in
which case you would simply acquire the highest level of locking (exclusive)
before reading the data, to make sure all your accesses to the shared data are
synchronized.

But I'm pretty sure I'm misunderstanding what the author was trying to say.

~~~
IgorPartola
He is talking about mandatory locks, where if I lock the file and do things
properly, you cannot do operations on it, even if you disregard the locks. In
other words, it's a "convenience": I can structure my program in such a way as
to "guarantee" that I will have the exclusive lock on the file and even your
"buggy" code won't be able to overwrite it despite the fact that your code
does not acquire any locks. Seems like the perfect solution, until the madness
described in the OP ensues.

Also, mandatory locks are used in Windows. Ever try to delete that virus.exe
while it's still running? Yeah.

~~~
glhaynes
I think I understand that, but it's still not clear to me how the _type_ of
the lock is the cause of the problem: the problem as far as I can see it is
that the hypothetical programmer isn't acquiring a lock (regardless of type)
before starting their operation and then holding it until completion.
Mandatory locking certainly has its issues/annoyances, but in terms of the
situation I quoted, if used properly [1], it works fine and there is no risk
of unexpected results due to races.

[1] acquire lock for maximum level of operations you might need and hold it
all the way through to completion, _or_ to avoid holding a lock on the object
for too long, get and release multiple locks verifying as needed that the data
hasn't changed each time

~~~
tedunangst
Mandatory locks lead people to believe that they are safe. Advisory locks come
with no implicit promises, so people know that they only protect against well
behaved apps (generally other instances of the same program) and adjust
program behavior accordingly.

------
js2
This excellent writeup inspired me to publish some old dotlocking code I had
lying about - <https://github.com/jaysoffian/dotlock>

------
kelnos
So it turns out only one sentence near the end really matters:

"I guess lockfiles are the answer after all."

Yup. Don't use the locking APIs. Just use lockfiles be done with it.

I'd actually go a step further and suggest only using lock _directories_.
Using lockfiles assumes O_CREAT|O_EXCL works properly everywhere (which is
probably a safe assumption, but...). mkdir() will return EEXIST if you fail to
acquire the 'lock'.

------
yread
Cool writeup!

 _This is apparently because some versions of Windows don't understand shared
locks_

Which versions? It seems even the Windows 95 had FILE_SHARE_READ and
FILE_SHARE_WRITE which are effectively shared locks (it just didn't have the
FILE_SHARE_DELETE)

~~~
TimJYoung
The author is referring to the LockFile API call, which only supports
exclusive byte-range locks, vs. the LockFileEx API call, which does support
shared byte-range locks, but is only available on the Windows NT variants (NT
4 and above), not the Windows 9x variants (95, 98, and ME).

------
known
[http://www.beej.us/guide/bgipc/output/html/singlepage/bgipc....](http://www.beej.us/guide/bgipc/output/html/singlepage/bgipc.html#flocking)

