
PATH_MAX Is Tricky - eklitzke
https://eklitzke.org/path-max-is-tricky
======
problems
Windows has a similarly disastrous situation where most tools and APIs follow
MAX_PATH - which is defined to be 260 chars. But that doesn't affect the
actual filesystem or syscall interface, just common APIs and tools. This makes
it impossible to delete files from windows explorer for example.

If you want to fix this you basically have to bypass it by using "\\\?\" on
the front of the full path. The situation gets messy when you're trying to
write an installer with node packages especially.

[https://msdn.microsoft.com/en-
us/library/windows/desktop/aa3...](https://msdn.microsoft.com/en-
us/library/windows/desktop/aa365247\(v=vs.85\).aspx#maxpath)

~~~
dom0
That's an issue that lies in the unfortunate intersection of 16/32 bit
Windows, Windows NT and MS libc (MSVCRT), which support some combination of
two and a half system designs.

~~~
frik
Microsoft could have fixed it with Win 64bit (Win64).

They had done such a step before with the switch from Win16 to Win32 and the
help of Win32s. With Win32 they cleaned up the old API, fixed things yet kept
it source compatible when possible. Microsoft could have fixed so many things
with Win64 starting with Windows 2003 64-bit. But no, Microsoft invested
little on native Windows API between 2002 and 2012 - Longhorn (later Vista)
and dotNet were the latest hype.

~~~
mikequinlan
Microsoft has partially fixed this in Windows 10 but it is not simple or easy.
[https://blogs.msdn.microsoft.com/jeremykuhne/2016/07/30/net-...](https://blogs.msdn.microsoft.com/jeremykuhne/2016/07/30/net-4-6-2-and-
long-paths-on-windows-10/)

------
derefr
> The problem is that you can’t meaningfully define a constant like this in a
> header file. The maximum path size is actually to be something like a
> filesystem limitation, or at the very least a kernel parameter.

AFAIK, "paths" aren't a thing filesystems think about. As far as a filesystem
driver is concerned, paths are either—for open/read/write/etc.—plain inodes (a
uint64), or—for directory-manipulation calls—an inode plus a ptrdiff_t to
index into the dirent list†. The only things that care about NAME_MAX are
lookup(2) [get inode given {inode, dirent}], and link(2) [put inode in {inode,
dirent}].

So it's really only the kernel, through its syscall interface, that cares
about paths—and so PATH_MAX is just a representation of the maximum size of a
path the kernel is willing to accept in those syscalls. As if they each had a
statically-allocated path[PATH_MAX] buffer your path got copied into.

† Writing a FUSE filesystem is a great way to learn about what the kernel
thinks a filesystem is. It's very different from the userland perspective. For
example, from a filesystem driver's perspective, "file descriptors" don't
exist! Instead, read(2) and write(2) calls are stateless, each call getting
passed a kernel-side io-handle struct that must get re-evaluated for matching
permissions on each IO operation. (You can do _some_ up-front evaluation
during the open(2) call, but, given that a file's permissions might _change_
while you have a descriptor open to it, there's not much point.)

~~~
jstimpfle
> For example, from a filesystem driver's perspective, "file descriptors"
> don't exist! Instead, read(2) and write(2) calls are stateless

That seems hard to believe. It would inefficient to look for a file for each
e.g. write(). I guess a filesystem defines its own implementation-defined
open-file handle and the kernel translates that into an (integer) FD.

> each call getting passed a kernel-side io-handle struct that must get re-
> evaluated for matching permissions on each IO operation. (You can do some
> up-front evaluation during the open(2) call, but, given that a file's
> permissions might change while you have a descriptor open to it, there's not
> much point.)

That's not how it works. File modes are checked when you open the file. Once
you have an open file (a file handle), the file modes no longer matter.

File modes are more a "PATH" thing (there aren't per-filesystem variations and
I assume permission checking is not done in file system code), although the
file system must allocate the space to save the file mode bits and implement
the VFS API.

~~~
derefr
> there aren't per-filesystem variations and I assume permission checking is
> not done in file system code

There are! Keep in mind that "filesystems" includes things like NFS and SMB.
The local kernel has no ultimate authority over security policy of a remotely-
mounted filesystem, right? In open(2), the kernel has to ask the filesystem
what's what, because the filesystem might know something the kernel doesn't.

Now, the interpretation of stat(2) values (UID/GID, ACLs, etc.) are up to the
VFS—but the filesystem is the one that created that stat struct when the
kernel called its stat(2) impl, and it expects to get that stat-struct passed
back with no changes to open(2), and then make the decision for itself what
those stat-struct members _mean_.

Which is to say, it's perfectly possible to write a filesystem that says you
have 0000 permissions on a file, but which still lets you open(2), read(2),
write(2), readdir(2), etc. that file! It's up to the filesystem to _enforce_
file permissions or ACLs, and it can do that however it wishes; stat(2) is
just an _indication_ , in a common VFS language, of the policy the filesystem
is (probably) going to enforce. It's not a baton passed to the kernel to do
the enforcement for it. Linux has no equivalent to NT's kernel-object ACLs.

> File modes are more a "PATH" thing (there aren't per-filesystem variations
> and I assume permission checking is not done in file system code), although
> the file system must allocate the space to save the file mode bits and
> implement the VFS API.

Ah, sorry, I didn't mean file permissions; was a typo. I mean things like, the
process on the other end of a pipe closing its write end, will cause your
read(2) call to that pipe's FD to fail, because the _IO permissions_ (not file
permissions) on the FD have changed between the two successive read(2) calls.

When your open(2) impl gets called, you receive a stat(2) [that you previously
created yourself when the kernel called your stat(2) implementation], and a
set of open(2) flags, compare the two, and decide whether to grant each
requested permission from open(2) given the stat-struct. Essentially, the
open(2) impl is a pure mask-function on the requested flags, to determine what
permissions actually end up put into the descriptor. (Conveniently, the kernel
then returns a permissions error if it doesn't get returned the perms it asks
for. But it could always end up with _more_ perms than it asked for!)

Then, later, the kernel can modify that set of IO permissions without telling
you, and your next read(2) or write(2) might get called with different IO
permissions.

Another interesting fact: in the VFS struct file_operations (where you put
your pointers to your filesystem's implementations of file operations), there
is no member representing close(2). No FS-driver-level function gets called by
the kernel in response to close(2)! Instead:

• There is a flush(2) that gets passed a file struct, to indicate to the FS
that a given file's handle has been closed—but this is only there so that, if
the file is part of a filesystem with synchronous-commit (e.g. NFS in sync
mode), closing the file will trigger a flush of the entire _device_. This is
stateless and idempotent; a given file-struct might get flush(2)ed any number
of times. It's there to ask the the file's extents' backing store to
checkpoint itself, not to do anything with the file itself.

• There is a release(2) that gets called when _all_ handles to a file have
been closed—i.e. when the kernel's "open(2) refcount" on the file-struct drops
to zero. If the filesystem, say, caches some things about the file when you
open(2), you can release that cache-entry on release(2).

Notice that neither of these operations has semantics that would let you clean
up local state allocated in a table keyed off anything passed to open(2),
because there's no call that happens 1:1 with open(2) calls. Thus, you really
_can 't_ key local state to a struct-file in a way where you can later look it
up again. And there's nowhere _inside_ a struct-file to stash a key for your
local state, either. So, like I said, read(2) and write(2) are "stateless."

~~~
jstimpfle
> There are! Keep in mind that "filesystems" includes things like NFS and
> SMB...

Of course - I would call that "augmentations". But still each has to implement
the dreaded POSIX modes to be compliant.

Thanks for the thing about there not being any state for open files in the
filesystems themselves. I browsed a bit around the LXR and couldn't find any.
That's insightful! (and I think it's a sensible design choice)

------
Animats
All the UNIX/Linux/POSIX functions which take output "char *" params without a
length should have been moved to deprecated header files a long time ago. Like
1990 or so. It's not too late.

~~~
Tempest1981
Sometimes I dream of a world where C had a built-in string type. Imagine how
much time could have been saved, and how many crashes prevented.

------
loeg
The GNU Hurd approach to PATH_MAX is to set it to something ridiculous like
SIZE_MAX, something that cannot possible be allocated, to illustrate to
programmers that it is a fiction.

I don't think that's necessarily the best approach, but it matches reality
more closely than typical Linux/BSD values (1024 or 4096).

~~~
the_mitsuhiko
That seems like a bad idea because people will then use inconsistent max path
lengths.

~~~
loeg
The idea is you need to allocate and resize larger if your initial buffer was
too small. It doesn't matter what inconsistent length you start with as long
as you scale it up as needed.

~~~
ygra
I actually like that some Windows APIs you pass a buffer to will tell you if
the buffer was too small _and_ the necessary size to accommodate the result,
requiring a bit less guessing.

~~~
dom0
It's still a race though - you _have_ to loop (possibly indefinitely).

~~~
ygra
Ok, didn't think of another thread changing the directory. Still, there's an
upper bound around 32 kilo code units, so not indefinitely. Unless UNC paths
are not bounded, don't know about that right now.

~~~
dom0
True, thought the wrong way about it.

------
TheAceOfHearts
This post got me curious, so I did a quick search on macOS 10.12. I found the
values defined in
"/System/Library/Frameworks/Kernel.framework/Headers/sys/syslimits.h".
PATH_MAX is 1024, and NAME_MAX is 255.

There's also an amusing todo question that looks like it might've been there
for at least close to 20 years now:

    
    
        #define	OPEN_MAX		10240	/* max open files per process - todo, make a config option? */

------
jstimpfle
The essence of the article: PATH_MAX applies to the syscall interface. It's
not related to file systems. Paths aren't a file system thing, but simply a
convenient means of addressing files. Basically they are URLs for local
resources.

And that totally makes sense once you understand that files are basically
"objects" (in the OO sense) identified by inodes instead of memory addresses.
A file system implements the graph of these objects (linked by special file
objects called _directories_ ). The fact that one can cross file system
boundaries using file paths also indicates that file paths are none of a file
system's business.

~~~
masklinn
> It's not related to file systems. Paths aren't a file system thing, but
> simply a convenient means of addressing files.

> The fact that one can cross file system boundaries using file paths also
> indicates that file paths are none of a file system's business.

The filesystem knows about file names, stores them, and puts limits on them
(often 255 code units though some are lower — FAT16's 8.3, HFS's 31 — and some
are higher — Reiser4's 3976 bytes).

A file path is nothing but a concatenation of a bunch of file names and
separators ergo file paths are, in fact, an FS's business.

And while that's mostly fallen out of style there are still length-limited-
path filesystems: ISO-9660 and UDF for instance.

~~~
jstimpfle
> A file path is nothing but a concatenation of a bunch of file names and
> separators ergo file paths are, in fact, an FS's business.

This is a non sequitur.

------
teddyh
> _This constant_ [PATH_MAX] _is defined by POSIX_ …

Well, it’s _allowed_ by POSIX. A POSIX compatible system doesn’t _have_ to
define it if it has no such inherent restriction on path lengths. Indeed, the
GNU Hurd does not have such a restriction, and consequently does not define
it. This leads to many porting adventures for those trying to compile a
program on GNU Hurd, believing their source code to be correct for any POSIX-
compliant system.

------
recentdarkness
And to add to the confusion unix domain sockets have a maximal length of
something between 92 a 108. That's an implementation detail of platform it's
running on. This in particular has been biting me already.

------
leni536
I played around with glibc's getcwd() some time ago. With strace one can
easily see how getcwd() works. If the current path is larger than PATH_MAX
then the getcwd syscall fails. Then as I recall glibc uses '..'s recursively
so it never has to call a syscall with a long relative path.

If there is a non-user-readable directory in the path then the fallback method
fails but the getcwd syscall works if the path is short enough.

Bash also "cheats" by caching the working directory and updating it on 'cd'
commands.

------
manwe150
If I understand the conclusion of the article right, it's that we _should_
actually just use PATH_MAX? In particular, he points to the glibc
implementation of realpath as being very correct. But it (like the man page
description of it says), appears to prefer to use the hard-coded value of
PATH_MAX, unless that value is unavailable and it is forced to query for the
kernel _PC_PATH_MAX value instead.

That's not what I would have expected. Did I miss something obvious?

------
josteink
That's tricky indeed. And doing things properly seems quite involved.

For now I'll keep my limits.h. At least until I get a real-world bug-reports
telling me this is causing real-world issues :)

