
Non-Posix File Systems - nsm
https://weinholt.se/articles/non-posix-filesystems/
======
Animats
I've wanted somewhat different file system semantics, but close to the POSIX
model:

\- Unit files. The unit of consistency is the entire file. Files are opened
for writing, written, and closed. Then then become openable by other programs.
Overwriting a file replaces the file as a unit; other readers see the old file
until the writer closes and the reader reopens. This is the default and the
case for most files. On a crash, the file reverts to the previous good file,
if any. Can be memory-mapped as read only. File replacement can be done now
through non-portable renaming gyrations. It should just work.

\- Log files. Append-only mode, enforced. Can't seek back and overwrite. On a
crash, the file recovers to some recent good write, with a correct end of file
position. It does not tail off into junk.

\- Temp files. Not persistent over restarts. Not backed up. Can be memory-
mapped as read/write. On a crash, disappears.

\- Managed files. These are for databases. Async I/O available. May have
additional locking functions. Separate completion returns for "accepted data"
(caller can reuse write buffer) and "committed data" (safely stored), so the
database knows when the data has been safely stored. Can be memory-mapped as
read/write. Intended for use by programs which are very aware of the file
system semantics. On a crash, "committed data" should be intact but data not
yet committed may be lost.

~~~
wahern
> Overwriting a file replaces the file as a unit; other readers see the old
> file until the writer closes and the reader reopens.

This is the semantic rename provides, and the strength of that semantic is a
common complaint from many in the anti-POSIX crowd. I suspect you knew that
already, but the number of syscalls would be similar in either case except
that the latter file would need to be visible before the rename. Linux does
provide O_TMPFILE + AT_SYMLINK_FOLLOW which provides the nearly ideal
behavior. (Nearly because it still requires /proc/self/fd access, AFAIU.)

> Log files. Append-only mode, enforced

Both chflags (BSD) and chattr (Linux) provide append-only modes attached to
the file/inode (instead of the file descriptor or open file table
entry)[1].[2] Neither command nor their options are defined by POSIX, but
_adding_ more requirements to POSIX filesystem conformance goes against the
grain of prevailing sentiments.

> Temp files. Not persistent over restarts. Not backed up. Can be memory-
> mapped as read/write. On a crash, disappears.

Again, the semantics of unlinking deliberately provide this, though it does
make it invisible in the normal namespace. But there's a tension between
leveraging the namespace for implicit semantics (Plan 9) vs adding a multitude
of options and modes attached to each particular file (Windows).

It's also not uncommon for temporary filesystems like /tmp to be reformatted
on boot if the backing store is persistent. That's one reason to use multiple
filesystems for a Unix install.

[1] Note that O_APPEND mode is attached to the open file table entry, not the
file descriptor, so the mode is inherited across dup and interprocess
descriptor passing. Interestingly, on Linux /proc/self/fd (also /dev/fd, which
is a symlink) has the semantics of open, not dup, so file table entry state
like O_APPEND is lost; whereas on BSD /dev/fd has the semantics of dup so
O_APPEND is preserved/enforced.

[2] Solaris lacks chattr and chflags, but chmod supports an append-only
option.

~~~
skissane
> Both chflags (BSD) and chattr (Linux) provide append-only modes attached to
> the file/inode (instead of the file descriptor or open file table entry)

On Linux, by default only the superuser can set the append-only flag. That
severely limits its usefulness for many applications

> adding more requirements to POSIX filesystem conformance goes against the
> grain of prevailing sentiments

POSIX is of declining relevance. What really matters nowadays is can you get
Linux and *BSD (including macOS) to agree on implementing something new. Get
one to add it, and convince the others to copy it. If you can do that, then
getting it added to the POSIX standard is likely to be easy.

Vendors hardly care about POSIX certification anymore. The latest version of
POSIX/SUS, UNIX V7, currently has zero certified implementations. (Oracle
Solaris did achieve V7 certification, but has since lost it; I don't know
exactly what happened, but I suspect Oracle refused to pay the certification
renewal fees.)

~~~
wahern
> On Linux, by default only the superuser can set the append-only flag. That
> severely limits its usefulness for many applications

Ah, good point.

> Get one to add it, and convince the others to copy it. If you can do that,
> then getting it added to the POSIX standard is likely to be easy.

In actuality it usually works in reverse: Red Hat (now IBM), which has the
most dominate presence on the committee, convinces POSIX to add or modify
something, and then everybody else adopts it. Examples: asprintf, fmemopen,
stpcpy, vdprintf. I can't really think of an example that worked the other way
around, though there may be some in the next specification.

The fact that nobody is certified to V7 is beside the point. Nobody is
certified to the latest HTML5 standard, either, AFAIK. The point is to provide
a shared target. Few people want to copy Linux or glibc because the semantics
are invariably underspecified and in part accidental, and people would prefer
to avoid those aspects even if they have no other choice but to nominally
adopt the interface. Standardization provides a chance to clarify behavior and
to fix the bounds of what a portable program can expect long term. If I have
to support epoll + kqueue in perpetuity (and I assume I do), I'll structure my
program differently.[1] However, if I want to use pselect (or the forthcoming
ppoll), I'll target the POSIX-defined semantics, provide a best effort wrapper
(or none at all), and tell users on slower evolving platforms to complain to
their vendor.

The Linux kernel strives for strong ABI backward compatibility, but it hardly
has a perfect track record in that regard (e.g. sysctl(2)), and the future is
even more unclear given the various directions it's being pulled. Even POSIX
can change course, but it does so more methodically than with a Github code
search. And while Linux doesn't make promises regarding POSIX compliance, it's
less likely to break POSIX semantics than its own semantics, ceteris paribus.
Whether that's because kernel developers value POSIX compliance, or merely
because POSIX-vetted semantics are narrower and less accidental is irrelevant.

[1] Also, I'll take the bet that epoll is never standardized, precisely
because of its accidental and sometimes very undesirable semantics as compared
to kqueue. At best we'll get a greatest common denominator interface that can
be built around both--or possibly taking into consideration Solaris ports.

~~~
skissane
> In actuality it usually works in reverse: Red Hat (now IBM), which has the
> most dominate presence on the committee, convinces POSIX to add or modify
> something, and then everybody else adopts it. Examples: asprintf, fmemopen,
> stpcpy, vdprintf.

I'm not convinced your timeframe is right. stpcpy was added to FreeBSD in
2001, didn't officially become part of POSIX until 2008. Red Hat didn't invent
stpcpy either; it was invented on the Amiga in the 1980s, and the GNU project
adopted it from there and then Linux acquired it from the GNU project. I think
if we studied the history of the other functions you mention, we'd also find
that their formal addition to POSIX wasn't the primary cause of their spread,
just a formal recognition of an existing _de facto_ reality.

~~~
wahern
My Ubuntu LTS man page says:

> This function was added to POSIX.1-2008. Before that, it was not part of the
> C or POSIX.1 standards, nor customary on UNIX systems. It first appeared at
> least as early as 1986, in the Lattice C AmigaDOS compiler, then in the GNU
> fileutils and GNU textutils in 1989, and in the GNU C library by 1992. It is
> also present on the BSDs.

(See also [https://man7.org/linux/man-
pages/man3/stpcpy.3.html](https://man7.org/linux/man-
pages/man3/stpcpy.3.html))

OpenBSD didn't adopt stpcpy until 5.1, circa 2012
([https://man.openbsd.org/stpcpy](https://man.openbsd.org/stpcpy)) and NetBSD
until 6.0, also circa 2012
([https://man.netbsd.org/stpcpy.3](https://man.netbsd.org/stpcpy.3)). In that
time frame there was a flurry of activity in both projects regarding POSIX
compliance.

My experience with the others was similar, though not uniformly. Support on
OpenBSD, NetBSD, macOS, and AIX typically post dated their addition to POSIX.
Whether they would have been added independent of POSIX I can't say, but there
are plenty of GNU extensions that remain unsupported, and some, like
strerror_r (specifically the return type and behavior on error), that likely
will never be.

While not specifically on point, I think it's noteworthy that the proposal to
add stpcpy to C2X was made by Martin Sebor, a Red Hat employee:
[http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2352.htm](http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n2352.htm) Red Hat doesn't share the seemingly
pervasive cynicism regarding standardization. Precisely what motivates that,
I'm hesitant to speculate. Like most things I'm sure there are mixed motives.

~~~
skissane
macOS's man page says "The stpcpy() function first appeared in FreeBSD 4.4". I
believe FreeBSD 4.4 was released in September 2001. So that's approximately 7
years before it was added to POSIX. Given that timeframe, it seems unlikely
that POSIX has much to do with its presence in FreeBSD. A much more likely
explanation is to ease porting of GNU projects (and software developed on
Linux) to FreeBSD.

From checking opensource.apple.com, I conclude that OS X added it in 10.3,
released in 2003, so that's around 5 years before POSIX 2008. Again, given the
timeframe, it seems hard to argue that POSIX triggered Apple's action here.

Given Linux, FreeBSD and Darwin all already supported it, I wonder to what
extent NetBSD/OpenBSD's decision was motivated by formal POSIX conformance
versus improving compatibility with Linux/FreeBSD/Apple. It is hard to say,
but I think the later was likely just as important as the former, and given
the importance of the later, they might still have added it even if it had
never formally made it into POSIX.

~~~
wahern
> Given Linux, FreeBSD and Darwin all already supported it, I wonder to what
> extent NetBSD/OpenBSD's decision was motivated by formal POSIX conformance
> versus improving compatibility with Linux/FreeBSD/Apple. It is hard to say,
> but I think the later was likely just as important as the former, and given
> the importance of the later, they might still have added it even if it had
> never formally made it into POSIX.

One of the benefits of standardization is, "here is a definitive list of
things that need to be added". With rare exception (e.g. async I/O), there's
no hand-wringing regarding whether a POSIX interface should be added. Search
for the keyword "POSIX" among the OpenBSD 5.1 (circa 2012) changes at
[https://www.openbsd.org/plus51.html](https://www.openbsd.org/plus51.html) Now
search for Linux. For whatever reason stpcpy isn't listed (a tweak is listed
in plus52.html), but you'll see where the emphasis lay.

Linux POSIX conformance is incredibly good, particularly at the libc level
notwithstanding lack of formal certification[1], so invariably it's tough to
say, absent direct confirmation, what was on people's minds. But at least on
the OpenBSD mailing-lists, more often than not the explicit reason is POSIX
compliance. And I get the same impression from reading NetBSD changelogs.
FreeBSD is unique because they've been far more proactive with not only adding
POSIX interfaces, but also adding their own extensions. And historically they
were a bigger player and in many ways the heir to the "BSD" mantle; certainly
more so than NetBSD. But pre-existing support doesn't negate my point (see
plus51.html for evidence concerning the immediate motive behind adding
stpcpy), nor does the fact that an interface supported by multiple vendors is
more likely to be adopted by POSIX. I could likewise dredge up some SysV and
other extensions that weren't supported in Linux or glibc until they were
adopted by POSIX.

Which isn't to say platforms don't adopt Linux interfaces (whether or not they
originated with Linux) with an eye toward Linux compatibility; of course they
do, and do so increasingly. For example, I know that OpenBSD refactored their
getpeerid interface as a wrapper around newly added getsockopt+SO_PEERCRED
support, which I've always assumed was a nod to Linux compatibility, or at
least an admission of the obscurity of getpeereid. OpenBSD also adopted
MSG_NOSIGNAL (POSIX, SysV'ish, Linux) instead of the SO_NOSIGPIPE that already
existed on FreeBSD, macOS, and NetBSD (likewise for F_SETNOSIGPIPE and
O_NOSIGPIPE supported by macOS and NetBSD), which goes to show that OpenBSD
doesn't merely add whatever interface might marginally aid portability.[2]
MSG_NOSIGNAL was a no-brainer as compared to the alternatives, and that would
have been true regardless of whether Linux supported MSG_NOSIGNAL. If you want
to add an interface, the default answer is "yes" if it's POSIX; if it's not
POSIX the default answer is "tell me more". POSIX is self-justifying. That
difference in friction might seem de minimis, but ask anyone who has
maintained or tried to contribute to a large open source project. The ability
to appeal to a standard model which already has buy-in by the project is a
major convenience in terms of everybody being on the same page.[3] In that way
it can drive behavior unintentionally, which isn't an accident as it pertains
to the the salience of a formal standard. Of course, Linux is a de facto
target for most projects, and one of far more concern than nominal POSIX
conformance, but that doesn't diminish the value of POSIX targeting, and IME
neither has it diminished the interest in POSIX conformance among the people
for whom it actually matters--those interested in portability, which tend to
be disjointed from the set of people for whom POSIX is a dirty word.

[1] IME it's better than macOS even though macOS is UNIX03 certified. glibc
and musl take POSIX very seriously, both the letter and the spirit. Developers
from both projects actively file tickets on
[https://austingroupbugs.net](https://austingroupbugs.net) to remediate errors
and omissions in the text, and problems with actual semantics, explicit and
accidental. Various BSD developers are also active there, but I get the sense
of a pecking order, if only because of Red Hat's large presence (literally and
figuratively).

[2] Contrast that with kqueue. FreeBSD, macOS, NetBSD, and OpenBSD kernels
have diverged significantly since kqueue was adopted. You can't copy+paste
kqueue-related kernel code across them. But they still look to each other for
prior art when it comes to filling gaps or extending the behavior of kqueue,
and uniformity is given significant weight. (Especially across *BSD. macOS
extensions don't always make the most sense, like with macOS's poorly
considered EV_OOBAND. See [https://sandstorm.io/news/2015-04-08-osx-security-
bug](https://sandstorm.io/news/2015-04-08-osx-security-bug)).

[3] "Running a successful open source project is just Good Will Hunting in
reverse, where you start out as a respected genius and end up being a janitor
who gets into fights." [https://diff.substack.com/p/working-in-public-and-the-
econom...](https://diff.substack.com/p/working-in-public-and-the-economics)
Fights over whether or not to support POSIX are relatively rare. The ones that
happen are infamous because they're the exception.

------
skissane
Article describes Multics’ ability to have a file's directory entry on disk
but its contents on tape, so trying to access its contents will cause it to be
retrieved from tape, and then asks "Is something like this offered on any
POSIX-compatible file system?"

What the article is describing is basically just HSM (Hierarchical Storage
Management), which is a commercially available technology – e.g. Sun/Oracle
SAM-QFS on Solaris, IBM Tivoli Storage Manager on AIX, DFSMShsm on z/OS.

Windows NTFS also supports HSM, although the core NTFS itself only provides
features necessary to implement HSM (such as FILE_ATTRIBUTE_OFFLINE and
reparse points), and you need an add-on to Windows to actually use those
features to produce a full HSM solution. (Actually Windows itself used to
include such a solution, Remote Storage Service, but Microsoft removed it in
Windows Server 2008 onwards; but the underlying functionality is still there
in NTFS, and available for third party HSM implementations to exploit.)

~~~
a1369209993
You could also implement this pretty easily in FUSE: if the dirent is present
on the underlying (ext or whatever) filesystem, just forward the operations,
otherwise leave the syscall blocked and hunt down the relevant backup. I don't
know that anyone's actually written that, though.

~~~
acdha
Nothing about this is “easily” once you work through the edge cases for
performance and reliability. The only places which need HSM have enough data
volume and range of applications to stress any simple solution (e.g. with the
approach you outlined: what happens when someone runs find on that volume?).
One of the more interesting challenges is how to deal with not having quite
enough fast storage to batch the slow storage. Your system can appear to work
well with one test workload and then fail miserably when two people start
running different tasks at the same time.

After a couple decades of this, I generally think this class of software is a
mistake. Any time you misrepresent one class of storage as another it
inevitably leads to very complex software which is still pretty fragile and
confuses its users on a regular basis, and the cost savings never deliver to
the hoped-for degree.

~~~
a1369209993
Well obviously the performance is going to be terrible, but it's probably
better than the zero performance you get if you block everything while waiting
for the system to fully restore from backup.

> what happens when someone runs find on that volume?

It stalls until all the directories are restored? And hopefully pushes those
directories to the front of the to-be-restored-from-backup queue, but even
without that it's still better than not being able to run _any_ operations on
that volume.

~~~
acdha
It can be worse than zero: your tape drive get hit with lots of small file
requests, running much slower than it would be to stream a restore of a large
batch containing all of the files you need, and causing increased failure
rates on the hardware and media because tape drives are designed to stream,
not seek. I’ve had to explain this to multiple HSM admin teams who were trying
to save a few bucks on staging HDD capacity and surprised to see it taking
over a month to restore a terabyte of data (not joking - and that was with
multiple drives!) and hardware failing at like 5x the manufacturer’s
estimates.

What you’re trying to do is akin to saying you can write an interface layer to
make a railroad look like Uber: at some point the fundamental differences
between the architectures are too much to paper over. The situation has
improved now that the major operating systems have offline file support so you
can make it more obvious that some files are not instantly available but you
still need all of your client software to handle that gracefully.

------
sillysaurusx
Surprised no mention of plan9's FS. It always seemed like the core innovation
in plan9.

Basically, everything is a file. But to a ridiculous extent, far beyond what
you'd normally think of as files. It's been years since I looked into it
though, so maybe I'm misremembering.

Designing it that way has lots of advantages. For example, you can connect
computers together via networks using the equivalent of `cat`. (And yes, we
have netcat, but it's not quite the same thing as having the abstraction built
into the OS.)

~~~
ridiculous_fish
How did Plan 9 allow its APIs to evolve? For example, reading /dev/mouse
returns records that are exactly 49 bytes, in text format [1]; how did they
add new fields without breaking every app?

1: [http://man.cat-v.org/plan_9/3/mouse](http://man.cat-v.org/plan_9/3/mouse)

~~~
eadmund
As a sibling notes, they could have just added a new file while supporting the
old format, but they had some leeway in the format itself: the buttons value
only uses three bits.

Had they been a little more clever and used a delimited format like
S-expressions, they could have simply specified that each mouse event would
generate one S-expression, and clients could have read one expression at a
time.

------
_delirium
This is incidentally why the Common Lisp file I/O functions are a bit more
complicated than you might expect. You can ignore most of it if you assume
something vaguely POSIX-like, but if you want to be portable to e.g. both Unix
and VMS (as was once desirable), there are functions like _make-pathname_ that
build a representation of a file location from up to 6 parameters (host,
device, directory, name, type, and version). The 3 examples in the docs are
interesting:
[http://clhs.lisp.se/Body/f_mk_pn.htm](http://clhs.lisp.se/Body/f_mk_pn.htm)

~~~
wglb
But do they go as far as the features noted in the Hydra file system?

~~~
gumby
No it was a tagged architecture but not a capability system. In fact
everything ran in a single address space.

------
ChuckMcM
This is a good read. I believe that Steve Bourne (of Bourne Shell fame)
implemented the file system in UNIX at Bell Labs but I may be mis-remembering
that. He was also a big Multics fan.

What is unsaid in this article is that nearly all file systems that are in
widespread use today, started when "disk space" was a constrained resource. It
is a very reasonable thing to ask, "now that stable storage space is much more
plentiful, how might we design systems that are better than the current ones?"

The ability to scavenge blocks to re-create state is a good example of that.

One of the cool things about WAFL (the Write Anywhere File Layout) system that
NetApp used (uses?) was that it's very design made snapshots 'trivial' since
every write to disk was to un-allocated blocks. What that meant in practice
was that the file system on disk was _always_ sane. This was what let you pull
the power from the box at any time, and assuming its non-volatile RAM was
still available, it could always recover. Something that you could do with
Intel Cross Point memory and a bunch of disks.

Microsoft research built a number of interesting file systems, some more
successful than others, which incorporated many of the ideas from Multics and
other OSes.

All in all, its a fun place to experiment.

~~~
diegocg
WAFL trick is COW, which had been popularised massively by ZFS

~~~
ChuckMcM
As you have observed they have an identical design philosophy.

There is a not so funny story there where a person interned at NetApp and then
went to work at Sun and re-implemented the patented technology as ZFS.

------
notacoward
I was expecting another article about the current crop of _less than_ POSIX
filesystems, but I was pleasantly surprised to find it was about older systems
with features _absent_ in POSIX. Very interesting stuff. Kudos. I wonder if I
can find some information about MTS's filesystem, which also had some neat
features especially with respect to access control (PKEYs). Might be a
worthwhile addition.

~~~
formerly_proven
The whole point of the Unix FS was that the developers felt the mainframe
approach (~"files are actually sorta like databases") as to complex and
heavyweight.

~~~
notacoward
What the OP should teach us is that there wasn't any one mainframe approach.
There were many approaches, each involving many components. The process of
standardizing what we now know as POSIX involved pruning a lot of unnecessary
pieces, but it also inevitably involved leaving out some features that might
actually have been useful. Some of them ( _ahem_ ACLs) even had to be added
back in later versions of POSIX. Just as we should never stop looking for new
ideas that can make our lives as programmers and users easier, we should also
never forget old ideas whose time might well have come back around. It has
happened too many times for the possibility to be ignored except by fools.

~~~
bregma
Engineering has always been a tug-of-war between parsimony and completeness.
The pendulum is constantly swinging.

------
macintux
I was fortunate enough to be working for Basho and attended the RICON where
mrb gave his distributed systems archaeology talk. It really highlighted for
me how so many important ideas are captured in historical whitepapers and
often forgotten today.

[https://speakerdeck.com/mrb/distributed-systems-
archaeology](https://speakerdeck.com/mrb/distributed-systems-archaeology)

Regrettably many RICON talks were lost when Basho went out of business, but
someone rescued this one. Thanks Alex Ott!

[https://youtu.be/om_mAaM5sL8](https://youtu.be/om_mAaM5sL8)

------
LeoPanthera
Acorn's "ADFS", as used in RISC OS, uses "." as the directory separator. Fully
qualified paths look like this:

fs::drive_id.$.directory.directory.filename

Where "$" means "root directory". (On network filesystems, you can also use
"&" which means "home directory".)

The top level identifier is the filesystem type, usually "adfs", which is a
slightly unusual way of doing it.

Bringing in files from other systems, which invariably have filename
extensions, involves converting the . to a /, so you end up with filenames
like "readme/txt". ADFS stores the file type not as a filename extension, but
as a three-character hex ID in the filesystem instead. (Text, for example, is
FFF.)

~~~
fanf2
On the BBC Micro-era pre-hierarchial DFS, there was a single level of single
character directories, so files were called things like “d.file”. $ was the
default directory rather than the root.

The Acorn/Norcroft ARM C compiler used c and h directories which effectively
swapped the file extension around, so sources were named like c.main, and
#include would try swapping name and extension for compatibility. It was about
as Unix-flavoured as was possible on the Archimedes...

------
gumby
Having used both Multics and the Alto, I can add a few points:

Multics as implemented used a slightly different syntax, such as > for path
separation.

Segments were not file descriptors but were intended to be what might today be
called blocks. The original idea was that the entire memory would essentially
be a single address space, with segments being blocks of memory that might be
in core, on disk, or on tape (called very slow storage if I remember correctly
but it’s been decades). Security was at the segment level, so you could for
example have (in posix parlance) an suid shared library that an ordinary
program could, with appropriate authentication and permissions, call into in a
controlled manner. Multics’ permission structure was more fine grained than
the binary of suid. You can see how the backup system kind of falls out of
this automatically. I trust some multician will step up and correct any
memory-corruption-based thinkos in the above.

Of course reality didn’t quite match the dream and Multics was cancelled
before some of that research could be completed. But if you look at the x86
segment registers they could implement something like that. I think this was
also in the intel iapx-430 but fortunately even the name of that ancient dead
processor is hazy in my mind.

As for the alto filesystem and its descendants: the labels don’t have to be
next to the blocks they describe. After all Unix filesystems have multiple
copies of their basic breadcrumbs at least. There’s certainly plenty of room
for these in modern storage systems; every modern drive, whether spinning or
ssd, does a small amount of this in block remapping and wear leveling.

------
tyingq
Was surprised not to see the VAX/VMS filesystem on the list. It had both
stream and record based files, versioning, and a fairly rich ACL setup.

~~~
protomyth
It also allowed for amazing clustering.

I would guess there are quite a few filesystems lost in history. My personal
favorite will be the Newton's Soups. A modern version with replication would
be amazing.

------
guerby
FUSE allows you to experiment new (or old) features easily nowadays, too bad
it isn't mentionned in the article:

[https://en.wikipedia.org/wiki/Filesystem_in_Userspace](https://en.wikipedia.org/wiki/Filesystem_in_Userspace)

I use borg mount to access my backups for example.

~~~
bityard
FUSE effectively provides a POSIX-like interface to arbitrary code, whereas
the author is lamenting that these features weren't built into Unix/POSIX-like
systems to make them widely available in the first place.

The "problem" with modern POSIXish systems is that the definition of what is
"POSIX" seems to be set in stone. All the various Unix-likes offer POSIX
compatibility (or aim toward it) while new features that extend capability end
up being implemented in completely different ways across different systems.

So for example while Linux has inotify (arguably an implementation of the
filesystem traps that Multics had), SGI had FAM, FreeBSD has kqueue, MacOS has
FSEvents, and they're all incompatible with each other... a nightmare for
developers of portable applications.

~~~
anthk
>a nightmare for developers of portable applications.

man 2 fstat.

~~~
comex
fstat is not a mechanism for being notified when a file changes.

~~~
anthk
I've just shown you another example on incompatibility.

------
peter_d_sherman
>"The very first hierarchical file system was developed for Multics. It is
described in the paper A General-Purpose File System For Secondary Storage
(1965) by Robert C. Daley and Peter G. Neumann. There are several things that
I find astounding about this paper:

There were apparently no hierarchical file systems before Multics. The
references do no cite any previous work on this and I haven’t found any."

(PDS: As an amateur computer historian, I'd be interested in this myself, if
there existed any hierarchical file system before Multics, or if one was
conceived in any academic paper before the one cited...)

[...]

>"Directory entries that point to secondary storage. _This is a game changer
for file system management._ More on this below."

(PDS: Which, to this day, is one of the great abstractions, one of the _great
ideas_ , in computing!)

------
qubex
For anybody who has the desire and opportunity to interact with something
currently in production and truly alien-feeling, I warmly suggest IBM’s AS/400
(now iSeries) OS/400’s filesystem. Library versus folders, logical versus
physical files, integrated DB2 database is one of the many, many head-
scratchers you’ll encounter until you suddenly “_get it_”.

------
heisenbit
Portability was important and so more complex capabilities were not
sufficiently used to survive and spread. I wonder how the similar struggle
ends for advanced cloud features.

------
ruslan
And this guy have not heard about IBM OS/400 and VMS yet :)

------
garmaine
Strange that BeFS/BeOS is not on this list.

~~~
nayuki
A good read: [https://arstechnica.com/information-
technology/2018/07/the-b...](https://arstechnica.com/information-
technology/2018/07/the-beos-filesystem/)

------
ulzeraj
Anyone remembers Novel NSS? Even when it got ported to Linux as part of Open
Enterprise Server it lacked a lot of POSIX features. The ACLs were the
weirdest part since they had to be configured in eDirectory’s LDAP console.

------
t0astbread
What I really want that isn't there in POSIX is not directly a file system
feature but something more general: I really want some sort of "middleware-
system" to intercept all sorts of events (like file system access, binary
execution, network or device access etc.). There should be multiple
intercepting programs that handle one or more events each and can decide to
block or pass the event to the next interceptor. They could also log or modify
parameters of the event (like redirect a file read or wrap a binary that's
about to be executed in another program like script or torsocks).

You could even use this system to implement some Unix features as
interceptors: Shebangs and even file system permissions could be handled this
way. You could also implement containers with this or provide some kind of
"switchboard" UI akin to uMatrix for letting the user decide on permissions.

~~~
lxpz
Have a look at the Genode operating system (genode.org), I think you'll like
it!

------
nanomonkey
No mention of the immutable content addressable hash file system presented in
the Artifacts System[]. One of the benefits of a system like this is that you
can have multiple versions of a library coexisting without conflict with your
requirements system. Each program can reference a different hash, duplicates
are easily made unnecessary because they resolve to the same hash, etc.

[][https://apps.dtic.mil/dtic/tr/fulltext/u2/a276589.pdf](https://apps.dtic.mil/dtic/tr/fulltext/u2/a276589.pdf)

~~~
narwally
Sounds like having your libraries stored in git repos and then having your
programs point to the commit they want to use. Something like this would solve
headaches that there are innumerable tools for. A much more elegant solution
for the problems things like pyenv or nix are attempting to solve.

~~~
nanomonkey
Yes, exactly, except for local first, and for all your files in your
filesystem (documents, libraries, program source code, etc.), and
collaborative.

------
mjevans
It would be interesting if a "file type" existed to provide these more complex
operations, and if those could also be bound to a daemon running with provided
parameters.

This would allow for a filesystem name linked to a database API / any other
daemon or program.

I also love the TRAP (related to above) and both directory operations. Quotas
can sort of fulfill the space limits thing (but I've never bothered using such
a system and haven't been on a system that did). Append Only directories are a
far harder beast to slay, but would very nicely fulfill many queue submission
systems.

------
vermilingua
_> Why can your browser run sudo? ...Suppose only your SSH client had the
operation required to use your SSH keys._

Perhaps I’m missing something, but why could a malicious application (say,
Chrome in the context of a browser running sudo) not also include the
capability to access the file system with impunity?

It seems that Hydra simply turns the technical problem of managing filesystem
protections into a political one, of auditing the applications we use; which
may have been practical in the 80’s, but is now demonstrably a failed method
of protecting user data.

------
phendrenad2
Interesting POSIX history. It used to come down to Windows/Mac and POSIX, but
for the past 20 or so years we've been seeing new filesystem styles emerge,
which blend cloud and on-device storage. Navigating the iOS file save menu is
a good example. 3rd party apps like Google Drive and DropBox hook into the OS
and show up in the list. I think this is probably the way things are headed,
and POSIX and the "Windows/DOS/Classic Mac" style will both fade away.

~~~
boogies
Menus ≠ filesystems.

POSIX will last forever because it’s simple, standard, solid, portable and
fairly flexible (see FUSE). DOS will last forever because inertia is the only
reason Windows still exists, and MS knows this and prioritizes backwards
compatibility, eg. it’s still basically impossible to name a file CON on
Windows 10 because of backwards compatibility.

~~~
kccqzy
That's a shortsighted and very programmer-centric view of what a filesystem
is.

Increasingly we are seeing a bifurcation between the kind of filesystems
developers use for programming (POSIX style) and the kind of "filesystems"
being exposed to end users. I put the word in quotes because they are not
thought of as filesystems by its traditional definition. But it's a reality
that many users now think of these siloed-by-app systems as the real
filesystem.

The iOS menu mentioned by GP is a great example because that's what's user
facing, not the developer-facing one like /private/var/mobile/Containers/blah-
blah-blah. Today even many developers won't have to deal with POSIX-style
filesystems that often, since they generally write code to store structured
data in a database (sqlite on mobile, more sophisticated ones on the server
side) while large blobs are stored in services like S3, making the idea of a
POSIX filesystem quaint.

My own prediction is that POSIX filesystems in another decade or so will be
like assembly language today: it's still there, still being taught and
learned, but users and a majority of developers won't need to know about it.

~~~
_jal
Casual users' confusion about the presentation of technical details is a
different conversation. This one is about actual filesystems.

It may well be that end users are abstracted away from their filesystems, but
you seem to assume both that 'The Cloud' is the natural end-state of systems
evolution and that everyone comes along for the ride.

I don't think either of those are true.

------
deepstack
_> When you use a file system through a library instead of going through the
operating system there are some extra possibilities. You are no longer
required to obey the host operating system’s semantics for filenames. You get
to decide if you use / or \ to separate directory components (or something
else altogether)_

This is really about time in 2020 that someone (Apple, MS, Linux, FreeBSD,
Plan 9 I'm looking at you) implemented this!

------
mnem
Fascinating to see other possibilities that were out there. Were the
capabilities in Hydra the inspiration for what Fuchsia OS has, or are they a
long standing concept?

~~~
kentonv
Hydra and Fuchsia are both operating systems that use Capability-based
security. Hydra would have been one of the very earliest ones, while Fuchsia
is a very new one. There have been many others in between, like KeyKOS or
EROS, as well as use of capabilities in "normal" operating systems, like
FreeBSD's Capsicum, or arguably even just passing file descriptors over Unix
sockets with SCM_RIGHTS on any POSIX system. There have also been capability-
based programming languages like E and Pony, and capability-based network
protocols like CapTP or (my own) Cap'n Proto. So it's unlikely that the
designers of Fuchsia directly based it on Hydra, but there was probably
indirect inspiration, yes.

~~~
gumby
Capability systems have been an area of research since the 60s

------
venamresm__
This is a great article. For anyone wondering about the current data storage
stack on Unix-like systems today, I wrote a simple article that goes over it
[1].

[1] [https://venam.nixers.net/blog/unix/2017/11/05/unix-
filesyste...](https://venam.nixers.net/blog/unix/2017/11/05/unix-
filesystem.html)

------
GuB-42
Maybe another filesystem worth mentioning would be WinFS.

It should have been one of the major features of Windows Longhorn, which later
became Vista.

Simply put, it was a relational database, with features you typically find in
RDBMS like Postgres or Microsoft's own SQL Server, which took advantage of
some of the work done on WinFS.

------
amelius
All this innovation and meanwhile, when I pull out a USB drive that was not
properly unmounted, I risk corrupting it to the point it becomes unusable.

------
beervirus
How did this get to be the top link on HN in under 20 minutes with 0 comments?

~~~
solarkraft
Comments are not a ranking factor. Per the FAQ:

> The basic algorithm divides points by a power of the time since a story was
> submitted.

... So what matters is that this story was upvoted _quickly_.

~~~
saagarjha
Comments are actually usually a downranking factor, to try to prevent flamewar
posts from staying on the front page.

