
Filesystem devs should aim to make “badly written” app code “just work” (2009) - pcr910303
https://lwn.net/Articles/326505/
======
Animats
I'd argue that UNIX-type file systems should offer several types of files:

* Unit files. When you create a file and write it, it's not visible for other opens until you close it. If you open a file with O_CREAT|O_WRONLY|O_TRUNC, you create a new file, which replaces the old one on close. In the event of a program or system crash, or exiting via "abort" without closing first, the old file remains. So there's always one completely written file. Creating a unit file is an atomic operation. Most files are unit files. (This isn't original with me; it comes from a distributed UNIX variant developed at UCLA in the 1980s.)

Replacing an existing file currently requires elaborate renaming gyrations
which vary from OS to OS and file system to file system. At least for Linux,
this should Just Work in the normal case.

* Log files. If you open a file with O_APPEND, you can only write at the end. The file system should guarantee that, even after a crash, you get the file as written out to some previous write. If you call "fsync", the recovery guarantee should include everything written up to the "fsync" point. No seeking backwards and overwriting on a log file.

* Temporary files. You can do all the file operations, and the file disappears on a reboot.

* Managed files. These are for databases and such. They have some extra API functions, for async reads and writes and commits. Async I/O should have two callbacks - "the data has been taken and you can now reuse the buffer", and "this write is definitely committed and will survive a system crash" That's what databases really need, and try to fake with "fsync". Only a few programs will use this, but those are important programs.

~~~
skissane
> * Temporary files. You can do all the file operations, and the file
> disappears on a reboot.

It would be nice to be able to have a process tree own a temporary file, such
that when the last process in the tree exits (not necessarily the process
which created the file), the file is automatically deleted, rather than having
to wait for the next reboot.

~~~
TeMPOraL
I would prefer it to not be the default; it's sometimes useful to keep
temporary files in the event of a process crash, especially on a server that
would then immediately restart said process. I had such a case very recently,
and was thankful for the existing behavior of tmp files.

~~~
skissane
Rather than delete the file straight away, put it in a “trash can” or “recycle
bin”. A background process deletes files from “trash can” at a later date. It
could normally give them a grace period (e.g. 7 days, configurable) but the
files could be deleted early if storage space is running low. The same feature
could be used to provide undelete for non-temporary files too.

~~~
JetSpiegel
mktemp is enough for this.

------
zer00eyz
"Anybody who wants more complex and subtle filesystem interfaces is just
crazy. Not only will they never get used, they'll definitely not be stable."

I think there is a more universal truism here - that "complex and subtle" are
sources of pain, problems and headaches.

I want to write "cool" and "magical" code as much as the next person, but
that's the stuff that I look at later WTF because I am no longer in the same
state of mind. Clear, simple, straight forward, plain as day are better than
anything else.

And if you do have to do something "magical" something "odd" or hard to
understand for the love of god please leave notes explaining what, why and how
you did what you did. And if you are replacing a "clean" and "readable"
version leave it there commented out. Sure it is "in the repo" but almost no
one ever looks and you are just making it harder for me to figure out the
original intent of whatever was there.

~~~
Gene_Parmesan
The more time I spend as a dev, the more I realize that writing clear, simple,
straight forward code is actually the greater challenge. When I started
progressing beyond learning the basics, the sort of projects I was building
were quite simple, so writing the simple code for them came to feel boring. I
would read complex codebases and see all the fascinating tricks they employed
and wished I was writing code like that. It felt like, "Those are the smart
programmers. I should emulate them."

As the complexity of the projects I work on has increased (especially in the
time since I became a part of a professional dev team), I've come to realize
that most of that tricky, complicated-looking code I had read years earlier
was actually kind of the easy way out. When you are just trying to get stuff
done, the complexity of your code mirrors the complexity of the project.
There's no effort put in (and often no time to do so) to create a smoother
interface to shelter the code from the complexity of the task at hand.

It's much more interesting to me now to be faced with a complex task and to
figure out how to make the code simple and clear. Elegant, well-designed
libraries/APIs belie the challenge in writing them. The code looks so simple
that it feels immediately obvious as you're reading it. I've come to realize
that reading a code base that seems dead simple -- so simple anyone could
instantly understand it -- is actually often inversely related to the
difficulty in creating it.

Yes, necessary complexity does exist, and sometimes there's no way around
doing something fairly nasty in your code. But as an ideal to strive for, I
find 'simple' to be fascinating.

~~~
sojmq
Problem is when you have a shitty language like Python that does almost no
optimisations so you're forced to write "clever" code if you want it to run
reasonably fast.

~~~
TeMPOraL
If you _must_ use Python, then pawn off the important work to C. Half of the
reason this language exists for is easy FFI, and that's all the popular
libraries get enough performance to be usable for non-toy applications.

------
RcouF1uZ4gsC
> The undeniable FACT that people don't tend to check errors from close()
> should, for example, mean that delayed allocation must still track disk full
> conditions, for example. If your filesystem returns ENOSPC at close() rather
> than at write(), you just lost error coverage for disk full cases from 90%
> of all apps. It's that simple.

Most programmers abstraction of a computer system is synchronous and
consistent. If you make the simple synchronous cases do the wrong thing, like
write returning success on disk full when a write did not happen, you are
going to break people’s code.

This also explains why when you when you choose a database, a relational DB
with strong consistency and linearizability should be your default. Going for
eventually consistent in the database layer and expecting the application
logic to deal with it will lead to grief in many, many cases.

~~~
TazeTSchnitzel
This could be summarised as: don't assume the responsibility to do something
correctly if you don't have to, and whoever has to do it otherwise is more
competent than you.

------
bvinc
I think he's got a point.

As a developer I find myself in a different scenario. I'm usually trying to
find out what exactly the 100% guaranteed way to do something is. Instead, I
find incomplete documentation and different people with different opinions on
what the guarantees are, and most people writing bad code that they assume
will usually work.

Just modifying a file in an atomic way requires a complicated dance of
multiple files and multiple syncs and a rarely tested cleanup routine the next
time the file is opened. No one does this.

I don't know what the solution is.

~~~
jiggawatts
I do, but it's not a popular opinion.

POSIX, and by extension, the classic 1960s-1980s era UNIX way of doing things
just needs die a long overdue death.

This stuff was designed at a time when every CPU instruction mattered,
everything was optimised to death for frugality, and commands were abbreviated
from "copy" to "cp" because ermahgerd two bytes is a huge saving! That
mentality got us Y2K. This is an era where latencies were not the bottleneck,
CPU cycles and memory bytes were.

A lot of stuff in filesystems is just plain stupid. For example, why do
applications install their files. one. at. a. time? Like... what the fuck? How
does it make any sense for an application to be _partially_ installed? Who
actually codes their application with 500 modules and dynamic libraries to be
able to handle the scenario where one of them is inaccessible due to an ACL or
a mismatched version because of an overwrite by something or someone else?
NOBODY, that's who. Meanwhile, I can make a cup of tea while Adobe Lightroom
launches on an SSD drive because it is 99% OS API overhead and 1% usermode
action.

This is why Docker is popular. Not because Docker is good, but because OS APIs
are retarded.

Every application install should be a union fs. This union fs should be
entirely user-mode, so that if an application has 10,000 files, it doesn't
take 10,000 round-trips to the OS kernel with the Intel mitigations, context
switches, and cache flushes that all brings with it.

Copying a file shouldn't require a user-mode buffer to feed the data through,
forcing it to come down WAN links just to go back up the same WAN link again
on the way out.

Overwriting a file shouldn't require more than a single API call, because it's
nearly 2020, and we should have long since realised that kernel transitions
are _expensive_ , so we should optimise to minimise the number of round-trips.
open(), write(), write(), write(), flush(), sync(), close(), poke(), prod(),
jesusfuckingchrist(). Just take a buffer bigger than 4KB, or better yet,
standardise an API to take a stream from user mode.

Just take a buffer and a filename, and atomically replace. Done. Bang. No lost
data, not torn writes, just DO IT. How hard can this be? Is it impossible to
do this? Are we forever stuck with POSIX, which was created in 1988, before
most of its modern users were born?

~~~
blt
Your comment reminds me of the situation of game engines and 3D graphics APIs
a few years ago before Vulkan and Metal were released. Too high-level for
developers who want control (and understand how the hardware actually works),
but too low-level for developers who want to minimize complexity.

Now, Vulkan and Metal offer the detailed control for library developers, and
everyone else uses some higher-level wrapper.

Does it make sense to split the file system in a similar way? I guess the main
challenge is avoiding too many competing wrappers.

~~~
TazeTSchnitzel
The problem with Vulkan is even graphics driver developers struggle to use it
correctly.

------
pdonis
The thread title omits a crucial word: "Filesystem" (as in "filesystem
people", not just "people"). The point he is making is that filesystems are
supposed to be utterly reliable; applications should not _have_ to take
extreme precautions to avoid having the filesystem lose their data. And the
fact that practically nobody actually takes any precautions, let alone extreme
ones, is strong evidence that programmers do in fact expect filesystems to be
that reliable.

~~~
blazespin
“File system people should make badly written application code just work.”

~~~
sn41
That makes much more sense. And it turns out to be the opposite of what I
thought when I read the title. I had initially thought - just write bad code
whose only merit is that it works.

------
iradik
Kyle wants to add a new API called barrier() which will improve consistency
and remove the need for fsync.

Linus makes the point here is that the file system API is already complicated
to the point that few use or implement it correctly. Further complicating the
API will likely create more problems than it fixes.

------
slimsag
Suggestion: I think the title should be:

> Linus: Theory and practice sometimes clash. And when that happens, theory
> loses. Every single time.

His point seems to be more about accepting reality and allowing the practice
(which in this case is theoretically "badly written" code) to "just work".

As the title stands now:

> Linus: People should aim to make "badly written" code "just work"

One might incorrectly assume Linus is suggesting that defensive programming
should be practiced heavily -- but that does not appear to be what he is
saying here.

~~~
sophiebits
Or perhaps “Filesystems should aim…”. Which is what he actually seemed to
mean.

~~~
slimsag
Yeah, that seems a bit more precise!

------
1gor
APIs you provide to consumers should aim to make their badly written code just
work. That's what Linus said.

~~~
blazespin
Yeah. So many api writers aim to force clients do all the heavy lifting. The
whole point of a good api is that it reduces heavy lifting. Anyone can write
pass through apis that don’t do anything.

~~~
dpark
> _The whole point of a good api is that it reduces heavy lifting._

Isn’t that the opposite of what Torvalds is saying? He seems to be arguing for
simplicity. APIs that do a bunch of magic for you are the opposite of simple
and tend to be mountains of subtle bugs and unexpected behavior.

~~~
Const-me
> APIs that do a bunch of magic for you are the opposite of simple

You're mixing simplicity of API with simplicity of implementation. More often
than not, you can only have one but not both.

Modern Linux or Windows do huge amount of magic when you call kernel API like
open (POSIX) / CreateFile (Windows), yet the API is simple and easy.

You can expose all implementation details, your code will be simple, but hard
to build upon. Speaking about data storage, once upon a time I programmed
Nintendo consoles, their file system API probably was very simple for Nintendo
to implement, but using it wasn't fun: SDK documentation specified delays,
specified how to deal with corrupt flash memory, etc.

You can do the other way, you'll have to do lot of work handling all the edge
cases, your code will be very complex, but this way you might make a system
that's actually useful. About data storage, SQL servers have tons of internal
complexity, even sqlight does, but API, the SQL, is high level and easy to use
even by non-programmers.

------
kstenerud
To put this more generally: a properly designed API makes is easy and natural
to do things right, and difficult (but not impossible) to do things wrong.

------
bsder
I wonder if ZFS had these issues in the same timeframe?

These strike me more as "Linux doesn't believe in _actual_ testing" rather
than inherent bugs because its a filesystem.

------
capdeck
Is there a point though, when badly written code becomes so hard to maintain
and improve, that people would just avoid doing that altogether?

The reason "badly written" works for Open Source is that if code is useful -
there will be someone in the future who will refactor it. In proprietary
setting that only happens when the fate of the company itself (or a large
chunk of the business) is at stake. Otherwise stagnation is king.

~~~
diminoten
It's easy to pretend that "badly written" code is intentional or desirable,
because then when we write it, we can excuse ourselves by saying, "BUT IT
WORKS!"

That's why this idea is popular, and your comment is controversial. You're
taking away a crutch that _many_ of our peers cling desperately to as a way to
justify their shortcuts and poor decision making.

------
ianai
Slightly OT: is there any new emerging tech for Linux file systems that aren’t
ext4/xfs/zfs? I was surprised the other day by those being the options for /
on centos.

~~~
Lammy
Btrfs was included as experimental in CentOS 6 and 7, but removed for CentOS
8.

------
purplezooey
Missing the W. Richard Stevens books right about now...

------
NathanOsullivan
(2009)

------
newnewpdro
I feel like Linus is basically just arguing in favor of the principle of least
surprise.

There are definitely some serious pitfalls when it comes to unix file io, just
look at for how long we lived with postgres and its broken assumptions
surrounding fsync behavior on linux.

~~~
rodgerd
> how long we lived with postgres and its broken assumptions surrounding fsync
> behavior on linux.

You mean its assumption that the API wasn't lying about the integrity of the
data it claimed to be writing? Very broken indeed.

------
ymerej
I think the message is that we should make the basic stuff "just work" before
we start adding the complexity of more bells and whistles.

~~~
perl4ever
More features don't _have_ to be more complex or "subtle" (I wish).

------
l1g2d5
At the end of the day, the client/users/etc just want it to work. They don't
care about the quality of the code.

------
justinator
Told you: Perl is built by the gods.

~~~
peterwwillis
Perl has subtle complexity just like everything else. But it also has taint
mode, which depending on the way you squint is either a great example of
"security that just works", or a great example of "you must use X method to
get good security".

~~~
mapgrep
I vote for the second option :-) IMO taint mode is a great example of what
Linus is talking about. Too subtle.

I cut my teeth on Perl and actually believed the books when they said you
should always use taint mode when touching data that came e.g. over the
network. I wrote all my web code under -T (-wT actually but you get the idea).

Then one day I went to drop in some full text search via a then popular
library (Plucene). And what do you know, Plucene would not work under taint
mode, because it was not developed under taint mode. The maintainer would not
accept my simple patch essentially because he did not understand that you
could not untaint without a regex somewhere (i.e. he did not know how taint
mode worked at a basic level). So I maintained a patched version of that lib
privately. Only to later hit the same issue with another popular library.

So I had to stop using taint mode. If it’s just an option — even one
aggressively marketed in O’Reilly books back when people actually mostly read
O’Reilly books to learn various systems — it’s not going to win much adoption.

------
mehrdadn
Quite a bizarre read considering Linux's decision to e.g. overcommit memory
makes 100% _correctly_ written code _break_ nondeterministically...

~~~
perl4ever
It seems to me that a basic requirement for not being an idiot is that you
recognize hard problems as being hard, and I see a lot of supposedly smart
people declaring hard problems are easy because they're just ignoring
tradeoffs or aspects of an approach that undermine it.

I'd think "everybody" knows, certainly I would expect Linus to know, about
Postel's law and the subtle ways it ends up causing problems. Whenever you
make things easy or difficult, you shape the evolution of how people do those
things. There's no simple universal answer to "do we make things easy or
difficult". Or "do we blame the user or the toolmaker?"

I don't really understand the psychology of going around arguing one side of
an insoluble problem, observing that others are totally convinced of the other
side, and occasionally flipping sides, but never acknowledging the meta-
problem of integrating both or deciding when to apply each.

~~~
dooglius
What do you think the "insoluble" problem is here?

~~~
perl4ever
Deciding whether and how to influence the way people use a tool or product.

Do you say "you're holding it wrong", or do you adjust to what people seem to
be like?

I mean, it's insoluble if treated as a single binary decision and not
contextual.

