
How to change symlinks atomically (2005) - ffggvv
http://blog.moertel.com/posts/2005-08-22-how-to-change-symlinks-atomically.html
======
Animats
Ah, yes, atomic file system operations in UNIX-like systems.

May not be available for your operating system variant. May not be available
for your file system. Not applicable to some remote file systems. May not
function properly in some virtual machines. Consult your storage area network
vendor for additional information.

~~~
wfunction
Yeah, it's another one of the many things that the Windows kernel gets right
but *nix programmers never care to give it credit for. `FILE_LINK_INFORMATION`
in Windows has a `ReplaceIfExists` flag just for this purpose.

~~~
pif
It's funny how you name Windows kernel in a comment about symbolic links,
considering that it _misses_ symbolic links.

~~~
ordinary
[https://en.wikipedia.org/wiki/NTFS_symbolic_link](https://en.wikipedia.org/wiki/NTFS_symbolic_link)

~~~
geocar
These things are called symbolic links, but are not what UNIX calls a symbolic
link. The name is unfortunate and perhaps cannot be helped.

~~~
icebraining
What's the difference? Because MS claims it has created them "to function just
like UNIX links", and I'm not seeing how they're wrong.

~~~
geocar
On UNIX, anyone can create a symbolic link, and a symbolic link can contain
arbitrary text, while Windows requires admin privileges and that the target be
a valid file path.

The new WSL _does_ have symbolic links, but these are only available to Linux
programs.

~~~
icebraining
The admin privileges is just the default security policy, you can configure it
to allow regular users.

~~~
geocar
That is really bad advice.

An application should not change the default security policy.

~~~
icebraining
I'm not saying it should. I'm saying it's not a property of the symlink
feature. Some Linux distro could implement the same policy, but symlinks would
still be symlinks.

~~~
lisivka
No, it cannot be disabled via a policy in Linux, because it is regular FS
operation, not a filter.

~~~
icebraining
Sure it could, SELinux policies can operate on regular FS operations just
fine.

------
josefbacik
This is just an accident of how some file systems are implemented and isn't
actually garunteed. If you did this on xfs you could still end up with the a 0
length symlink if you crashed at just the right time.

~~~
jjnoakes
xfs doesn't guarantee atomic renames in the same directory?

I thought that requirement was from POSIX.

Does xfs not conform to POSIX?

~~~
josefbacik
The rename is atomic, the data being in the file is the problem, there is no
garuntee unless you fsync.

~~~
trav4225
Sorry, would you mind clarifying? "The data being in the file"?

The way I understand the article's proposal is this:

1\. Create new symlink pointing to desired file (assumed to already exist in a
stable state).

2\. Move new symlink over old symlink.

~~~
asdfaoeu
Symlinks are just special files with a the contents being the link contents.
His argument is that it is not atomic when considering a server crash. But I
don't think that matters anyway.

~~~
geocar
Symlinks are not files, they do not have inodes and you cannot open them to
fsync their contents.

They exist as _directory entries_ with a small in-directory content only, thus
syncing the directory they exist in is sufficient to persist them.

~~~
icebraining
Symlinks do have inodes, and their content (the link information) will be
stored in that inode structure, if it fits, not in the directory.

See e.g.
[https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Symb...](https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Symbolic_Links)

~~~
geocar
Sorry, you're right, however this is an implementation detail, and nothing to
do with POSIX which did not define any use for the `d_ino` field for a
symbolic link[1]. I think this is made unnecessarily difficult by Linux/ext4
calling something an "inode" that is not what UNIX traditionally called an
inode (or what POSIX calls a "file serial number").

I think it is more useful to think of the symbolic link as "inside" the
directory it is in, because this tells the programmer what to do (and what to
sync: the directory).

[1]:
[http://pubs.opengroup.org/onlinepubs/009695399/functions/rea...](http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir_r.html)

~~~
yuubi
An inode, traditionally, is the per-file data structure on disk (or in memory)
that contains the user/group IDs, file type (plain/directory/device/etc; not
jpeg/png/executable/etc), permissions, file size, and the location of the
content (either a list of block numbers, or the address of a block containing
the list if it wouldn't all fit into the fixed-size inode), among other
things. These structures were identified by a number. This is an
implementation detail not specified by POSIX.

A directory, historically (on systems with 14- or 30-character file name
limits like we had in the 1990s, at least 4.2 or 4.3 BSD-derived), consisted
of a file containing a list of (16-bit inode number, file name) structs, thus
making a nice round 16 or 32 bytes per directory entry, evenly dividing a
512-byte block. On some older systems you could see this structure by opening
and reading a directory with open and read (I think every system everywhere
prohibited writes to a directory because modifying a directory would allow you
to access files in unreadable directories, uncleanly delete any file given its
inode number, etc). The only special thing about a directory on disk was that
its inode said it was a dir, so the filesystem code would trust it as a source
of inode numbers and otherwise use it as a dir.

Some of the 14-byte-name systems supported symlinks. As there was no space in
the directory entry for anything other than an inode number, the link contents
would have to be accessed through an inode. (I think I've heard of systems
that could store very small file contents like typical symlink target names in
the inode instead of allocating a data block, but can't name one).

The number you see in ls -i, which is sometimes called just the "inode" (and
apparently also called the "file serial number"), is the number in the
directory entry that's used to find the inode. I guess someone somewhere had
or intended to have a system that could store symlinks in the directory
instead of consuming an inode.

------
userbinator
As much as the Win32 API is criticised for many of its functions containing
often-unused parameters, symlink() having a flag parameter to control whether
to overwrite the original link's contents if it exists would've made this much
easier.

~~~
adrianratnapala
You would need the extra parameter + the guarantee that the symlink syscall
did the overwrite atomically. And I don't think proliferating such guarantees
througought the API is going to end well.

In Unix we have this convention that `rename()` is our atomicity swiss-army
knife. I guess it makes it easier for implementers to only have to make their
strong guarantees for one system call (+ a few related friends like
`renameat`).

Now you could argue that `rename()` is a bit too under-powered for this job,
and maybe we want transactions or something. But NTFS tried that too and
deprecated them.

What I would like to see in Unix is better support for anonymous files and
directories. You could use this with things like `renameat()` so that the
temporary never touches the filesystem and is automatically cleaned up if the
rename fails.

~~~
JoshTriplett
> What I would like to see in Unix is better support for anonymous files and
> directories. You could use this with things like `renameat()` so that the
> temporary never touches the filesystem and is automatically cleaned up if
> the rename fails.

Exactly. This approach makes the various calls orthogonal. Various syscalls
create anonymous reference-counted inodes on a filesystem (which disappear
when all references to them do), and syscalls like linkat or renameat or
installs such an inode (referenced by file descriptor) into the filesystem
with a name.

O_TMPFILE allows creating an anonymous file inode. Ideally, a syscall would
exist to do the same for an anonymous symlink inode.

~~~
cyphar
On GNU/Linux there's also the (badly designed) memfd_create() syscall which
allows you to create an anonymous inode even if / is entirely read-only and
you don't have mounting permission. While you could do it with a user+mount
namespace and sending the fd over a UNIX socket, sometimes making a syscall is
a better idea. :P

~~~
geocar
memfd_create() does not create an inode. It only creates a file descriptor.

With a file descriptor you can create multiple mappings to the same physical
address -- something otherwise impossible on UNIX or Linux (although mach
allows you to vm_remap[1] which is often sufficient).

[1]:
[http://web.mit.edu/darwin/src/modules/xnu/osfmk/man/vm_remap...](http://web.mit.edu/darwin/src/modules/xnu/osfmk/man/vm_remap.html)

~~~
JoshTriplett
> memfd_create() does not create an inode. It only creates a file descriptor.

It creates both; memfd_create (and timerfd_create, signalfd, eventfd,
userfaultfd, epoll, and others) use a kernel subsystem called "anon_inode" to
create an in-memory inode to back the file descriptor.

~~~
JoshTriplett
(Minor correction: memfd_create creates an in-memory inode without using
anon_inode, while the various other syscalls use anon_inode.)

------
ashitlerferad
Some related articles:

[https://yakking.branchable.com/posts/moving-
files-1-copying/](https://yakking.branchable.com/posts/moving-
files-1-copying/) [https://yakking.branchable.com/posts/moving-
files-2-sparsene...](https://yakking.branchable.com/posts/moving-
files-2-sparseness/) [https://yakking.branchable.com/posts/moving-
files-3-faster/](https://yakking.branchable.com/posts/moving-files-3-faster/)

------
johnwheeler
As a Python/web developer whose never written a line of production C/C++ code
in his 16 year career, it's always humbling to stumble into these threads on
HN.

~~~
Rapzid
I highly, highly recommend getting in the habit of consulting OS syscall
documentation for stuff. The behaviour of the syscall determines most of the
behaviour for this stuff in every language(few exceptions like stdlib
buffering, etc). The linux syscall docs are pretty easy to consume even if you
don't write C. "The Linux Program Interface" is an awesome book based on the
man pages(by the main man page maintainer). Not sure what exists for Windows
in the same vein book wise, but technet or something surely has what you NEED.

~~~
johnwheeler
Thank you. I will check it out. I've been looking into the `select` call a
little because I'm working with websockets, but I'm not sure if that's related
to syscall. It's amazing what you can get away with _not_ knowing with
computers nowadays. It doesn't necessarily mean you should strive for
ignorance, of course.

