
Files Are Hard (2015) - signa11
http://danluu.com/file-consistency/
======
JohnStrangeII
I strongly recommend using Sqlite for your own document format. Sqlite is one
of the most well-tested pieces of software on earth and ACID-compliant. You
can even make it safer than the default if you don't need maximum performance
only need it for storing documents. It is very crash and corruption proof,
especially with the full sync option and if you use it from one thread only.

~~~
cjfd
It has been a while ago so maybe things are better now but I have seen sqlite
being a disaster when the times comes to upgrade the schema. Not all of the
common operations like deleting/renaming a column are supported. Since it has
been a while I don't remember exactly which ones. Then someone writes an sql
script to provide this functionality. Then it turns out that a freshly created
database using the latest schema is subtly different from one that was created
from an older schema and then updated. Things like a column that has a default
but only in one of these two cases. Lots of fun, but not really.

~~~
blattimwind
For most uses of SQLite it's acceptable (or even advantageous) to copy the
entire database for doing an upgrade, instead of the more RDBMS-y way of using
DDL.

~~~
Hackbraten
Which brings you the additional challenge of how to overwrite the old database
file with the new one atomically in the face of crashes.

~~~
tylerhou
[http://man7.org/linux/man-
pages/man2/rename.2.html](http://man7.org/linux/man-pages/man2/rename.2.html)

> If newpath already exists, it will be atomically replaced, so that there is
> no point at which another process attempting to access newpath will find it
> missing.

~~~
Hackbraten
Did the original article not specifically make a point about Linux’s `rename`
to be atomic only in the happy case and not when a crash happens during the
rename?

~~~
jng
Rename is the most atomic you can get on Unix. Original article talks about
partial file updates. New SQLITE file + rename should be as failproof as
possible. And I’m any case nothing is 100% safe, so redundancy is a hard
requirement for real safety.

~~~
Hackbraten
I was referring to the last section of the article, the part where it says
rename isn’t atomic on crashes. Why not migrate traditionally via DDL in a
transaction?

~~~
mehrdadn
It's not clear to me if they're correct about that. The way it's written, it
seems to be saying " _my read_ of the POSIX standard suggests rename may not
be atomic on crashes", rather than "there are POSIX implementations in common
use that have been observed to have non-atomic renames on crashes".

------
squarefoot
I have no idea on how many mails the author receives, but anyway I would
suggest to take a look at Claws Mail. [https://www.claws-
mail.org/](https://www.claws-mail.org/)

I have all my mail since forever online, that is, tens of thousands of posts
including attachments in multiple inboxes belonging to multiple accounts:
everything since my Internet day one (20+ years), the earlier messages
imported from Eudora before moving to Linux. It's all in my $home, and
searches are damn fast: fractions of seconds for header searches which are
indexed, tens of seconds to minutes for searches into the body. As far as I
recall, I never lost a single file.

    
    
      ~$ du -chs Mail
      3.7G Mail
      3.7G total
    

Might post some spam from the late 90s if anyone is interested:)

~~~
Diederich
> Might post some spam from the late 90s if anyone is interested:)

Would love to see a few (:

I've had dana@realms.org since 1995 but sadly I haven't kept the stuff older
than 15 years.

~~~
squarefoot
There were very few of them at that time. About once a month I received a
newsletter from Programmer's Paradise which I probably signed for somewhere,
then some "adults only" links (message was empty) from a members.xoom.com
address, then the usual "enlarge it" spam. Here's an example (addresses
probably fake or long dead, but redacted anyway):

    
    
      --------------------------------
    
      From: XXXX@123india.com
      To: "XXXX@yesitsmail.net"<>
      Subject: YeS It Works!.....Gotta Have  It          15643
      Date: Wed, 18 Nov 1998 02:53:45 -0500
      X-Mailer: QUALCOMM Windows Eudora Pro Version 4.1
    

JULY SALE

Try This POTENT Pheromone Formula That Helps Men and Women To Attract Members
of The Opposite Sex Click here to learn more:

Ever wonder why some people are always surrounded by members of the opposite
sex?

Now YOU Can................... _Attract Members of The Opposite Sex Instantly_
Enhance Your Present Relationship * Meet New People Easily _Give yourself that
additional edge_ Have people drawn to you, they won't even know why

Click Here For Information Read What Major News Organizations Are Saying About
Pheromones!

To be removed from our mailing list, Click Here

    
    
      ----------------------------
    

That spam contained also some quoted text coming from a yahoogroups list I was
part of with a bunch of old friends, so either some of them or the server were
compromised and spitting out our addresses.

Spamming was occasional at that time, and I probably deleted most of them, but
in a few years it became unsustainable and I installed IpCop to identify junk
and redirect it to its own folder.

One more example:

    
    
      --------------------------
    
      From: "InvestTXXXX@uasc.com.kw" <InvestXXXX@uasc.com.kw>
      To: "XXXX@prodigy.com" <XXXX@prodigy.com>
      Reply-To: <XXXXXGreatPicksDaily@trk.kht.ru>
      Subject: Fwd: Investor's Alert
      Date: Sun, 9 Jun 2002 21:07:41 -0400
    

Immediate Release

Cal-Bay (Stock Symbol: CBYI) Watch for analyst "Strong Buy Recommendations"
and several advisory newsletters picking CBYI. CBYI has filed to be traded on
the OTCBB, share prices historically INCREASE when companies get listed on
this larger trading exhange. CBYI is trading around $.30¢ and should skyrocket
to $2.66 - $3.25 a share in the near future. Put CBYI on your watch list,
acquire a postion TODAY.

REASONS TO INVEST IN CBYI • A profitable company, NO DEBT and is on track to
beat ALL earnings estimates with increased revenue of 50% annually! • One of
the FASTEST growing distributors in environmental & safety equipment
instruments. • Excellent management team, several EXCLUSIVE contracts.
IMPRESSIVE client list including the U.S. Air Force, Anheuser-Busch, Chevron
Refining and Mitsubishi Heavy Industries, GE-Energy & Environmental Research.

RAPIDLY GROWING INDUSTRY Industry revenues exceed $900 million, estimates
indicate that there could be as much as $25 billion from "smell technology" by
the end of 2003.

ALL removes HONERED. Please allow 7 days to be removed and send ALL address
to: XXXXXAgain@btamail.net.cn

Certain statements contained in this news release may be forward-looking
statements within the meaning of The Private Securities Litigation Reform Act
of 1995. These statements may be identified by such terms as "expect",
"believe", "may", "will", and "intend" or similar terms. We are NOT a
registered investment advisor or a broker dealer. This is NOT an offer to buy
or sell securities. No recommendation that the securities of the companies
profiled should be purchased, sold or held by individuals or entities that
learn of the profiled companies. We were paid $27,000 in cash by a third party
to publish this report. Investing in companies profiled is high-risk and use
of this information is for reading purposes only. If anyone decides to act as
an investor, then it will be that investor's sole risk. Investors are advised
NOT to invest without the proper advisement from an attorney or a registered
financial broker. Do not rely solely on the information presented, do
additional independent research to form your own opinion and decision
regarding investing in the profiled companies. Be advised that the purchase of
such high-risk securities may result in the loss of your entire investment.
The owners of this publication may already own free trading shares in CBYI and
may immediately sell all or a portion of these shares into the open market at
or about the time this report is published. Factual statements are made as of
the date stated and are subject to change without notice. Not intended for
recipients or residents of CA,CO,CT,DE,ID,
IL,IA,LA,MO,NV,NC,OK,OH,PA,RI,TN,VA,WA,WV,WI. Void where prohibited. Copyright
c 2001 ________*

    
    
      --------------------
    

edit: formatting to make headers readable.

------
throwGuardian
What kind of email volume are you dealing with. Middle to higher level execs
routinely receive 500+ legitimate emails, and an equal amount of spam
everyday, and they use Outlook without issues

~~~
vxNsr
I do desktop support for a medium size office, we use O365, outlook sucks at
managing emails and I regularly find that people who moved emails to a folder
can't find it months or years later.

~~~
AA-BA-94-2A-56
When I worked in marketing, I had Outlook search flat out refuse to find
emails in my Inbox, through both header searches and body searches.

Totally undeterministic behaviour, because it would turn around and work on
other searches.

------
loopz
The sane old-school way to store mail is using directories and files. Mozilla
Thunderbird does this and I've never had a corruption issue. If need be, you
can index or open individual files in text editor, as they're all plain text.
Would hope for revitalization of the application, but works for my personal
use cases.

Yeah, files, are hard.

~~~
Avamander
I wish there'd be a native e-mail client that utilized a RDBMS properly, I
want a really fast full-text search, I want fast filters. The current ones
have all been very unsatisfactory, mostly because the file-based approach
doesn't allow them to. It really isn't sane in my opinion.

~~~
bronson
Have you tried notmuch? [https://notmuchmail.org/](https://notmuchmail.org/)

It uses a Xapian index, not an RDBMS, but it's impressive. RDBMSes aren't
known for producing good natural language search results.

~~~
Avamander
I have not, I am tempted however, I really dislike the power Google has over
me.

------
wheybags
I'd be interested to see a current version of ZOL (zfs on Linux) in that
table. I suspect it would hold up quite well.

------
jolmg
Why?

    
    
      creat(/dir/log);
      write(/dir/log, “2,3,foo”, 7);
      pwrite(/dir/orig, “bar”, 3, 2);
      unlink(/dir/log);
    

and not?

    
    
      creat(/dir/new);
      write(/dir/new, “foo”, 3);
      rename(/dir/new, /dir/orig);

~~~
timcederman
Check the update at the bottom:

> Update: many people have read this post and suggested that, in the first
> file example, you should use the much simpler protocol of copying the file
> to modified to a temp file, modifying the temp file, and then renaming the
> temp file to overwrite the original file. In fact, that's probably the most
> common comment I've gotten on this post. If you think this solves the
> problem, I'm going to ask you to pause for five seconds and consider the
> problems this might have.

> The main problems this has are:

> rename isn't atomic on crash. POSIX says that rename is atomic, but this
> only applies to normal operation, not to crashes. > even if the techinque
> worked, the performance is very poor > how do you handle hardlinks? >
> metadata can be lost; this can sometimes be preserved, under some
> filesystems, with ioctls, but now you have filesystem specific code just for
> the non-crash case etc.

> The fact that so many people thought that this was a simple solution to the
> problem demonstrates that this problem is one that people are prone to
> underestimating, even they're explicitly warned that people tend to
> underestimate this problem!

------
m45t3r
> With that tool, they find that most filesystems drop a lot of error codes:

I recently switched my work notebook to XFS, so looking at this table makes me
kinda happy, even if I had bad experiences with XFS in the past (a forced
turnoff thanks to a crash in X11 was sufficient to put my system in a
unbootable state, trying to recover it with xfs_repair broke it completely;
also I suffered with 0-size files randomly appearing after forced reboots
quite constantly).

I know for a long time that XFS is probably one of the most well written
filesystems for Linux, even if the user case seems to be more focused in
servers with uninterrupted power supply than desktops and notebooks.

~~~
temac
Well, I don't have very good experiences of XFS in the presence of system
crashes; you are not alone to get irreparable FS. And even without crashes,
the tooling is not very good.

And the thing is that _even_ with uninterruptible UPS, you better handle
crashes more gracefully than what XFS is apparently able to do, because e.g.
kernel panics can occasionally happen.

So on my side, I'm moving from XFS to ext4. I'm not even sure this will be
better, I'll see...

------
StillBored
Too much of the storage industry has been consumed by the performance at any
cost metric. Even when that means making engineering decisions that put data
at risk.

Back when the original POSIX specs were being worked on, a common assumption
was that anyone serious about their data ran all disks/raid
controllers/filesystems/etc in writethrough (or equivalent) mode. Combined
with what the spec doesn't guarantee leaves us in a world where its pretty
much impossible to really make any data retention guarantees. A large part of
this is that write() style api's should never have been allowed to be anything
but synchronous with the data being commited to non-volatile storage. Thats
because if you move the error handling to when fsync() or close() (or pick
your place) if one of the writes fails its impossible to report accurately to
the application what failed sufficiently to know how to recover it. This goes
beyond just filesystems these days, there is a major raid chipset vendor that
is/was shipped by a couple tier 1 vendors which only ran in write-through/FUA
mode if it wasn't provided a battery, the performance was so abysmally bad
that nearly everyone ended up buying the battery to enable write back. The
problem with this controller in write back mode is that it didn't honor any
kind of fencing or FUA when in writeback mode, instead depending on a
intermittent timer flush based system to force the data to disk. If there was
a disk failure on write, it was impossible to know what had actually been
flushed and depending on firmware it would go from either reporting the error
to silently dropping the data. Neither helps the end application that might
have quit, and had definitely passed its close/rename/flush/etc sequences by
the time the error occurred.

Bottom line, there is an API mismatch from top to bottom of the storage stack.
Starting with the simple idea that if a filesystem operation doesn't provide
an async completion notification it should be forced to be consistent at
completion. Anything else _WILL_ create the opportunity for data loss over
even simple power loss, much less the more complex cases of delayed writes in
raid controllers/async replication/etc.

Put another way, like this article points out and others linked here
reference, there are a ton of "bugs" in most OS's storage stack. Enough that
they generally behave badly in the face of actual failures.

------
zzzcpan
Related to the post, some disk and filesystem reliability research in
chronological order, including things from the author.

Disks:

An Analysis of Latent Sector Errors in Disk Drives (2007)
[https://research.cs.wisc.edu/adsl/Publications/latent-
sigmet...](https://research.cs.wisc.edu/adsl/Publications/latent-
sigmetrics07.ps)

Failure Trends in a Large Disk Drive Population (2007)
[https://static.googleusercontent.com/media/research.google.c...](https://static.googleusercontent.com/media/research.google.com/en//archive/disk_failures.pdf)

Cycles, Cells and Platters: An Empirical Analysis of Hardware Failures on a
Million Consumer PCs (2012) [https://www.microsoft.com/en-us/research/wp-
content/uploads/...](https://www.microsoft.com/en-us/research/wp-
content/uploads/2016/02/eurosys84-nightingale.pdf)

Flash Reliability in Production: The Expected and the Unexpected (2016)
[https://www.usenix.org/system/files/conference/fast16/fast16...](https://www.usenix.org/system/files/conference/fast16/fast16-papers-
schroeder.pdf)

Filesystems:

IRON File Systems (how filesystems behave on disk errors) (2005)
[https://research.cs.wisc.edu/wind/Publications/iron-
sosp05.p...](https://research.cs.wisc.edu/wind/Publications/iron-sosp05.pdf)

EIO: Error Handling is Occasionally Correct (2008)
[https://www.usenix.org/legacy/event/fast08/tech/full_papers/...](https://www.usenix.org/legacy/event/fast08/tech/full_papers/gunawi/gunawi.pdf)

SQCK: A Declarative File System Checker (2008)
[https://www.usenix.org/legacy/events/osdi08/tech/full_papers...](https://www.usenix.org/legacy/events/osdi08/tech/full_papers/gunawi/gunawi.pdf)

All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-
Consistent Applications (2014)
[https://www.usenix.org/system/files/conference/osdi14/osdi14...](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-
pillai.pdf)

Filesystem error handling (2017) [https://danluu.com/filesystem-
errors/](https://danluu.com/filesystem-errors/)

Files are fraught with peril (2019) [https://danluu.com/deconstruct-
files/](https://danluu.com/deconstruct-files/)

And a DRAM reliability paper, for a more complete picture:

DRAM Errors in the Wild: A Large-Scale Field Study (2009)
[https://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf](https://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf)

