
Atomic Commit in SQLite (2007) - jeffreyrogers
https://sqlite.org/atomiccommit.html
======
jeffreyrogers
I was interested in this topic because people were telling me that SQLite (and
other databases) have more sophisticated methods of preventing corruption to
database files than just calling fsync (or the platform equivalent) on the DB
and hoping that the OS and disk controllers behave appropriately. What this
document shows is that while SQLite has a somewhat sophisticated mechanism for
allowing rollback of transactions in the case that they are not completed, it
does not have a better means of ensuring data actually gets to disk than
relying on fsync.

In fact, this document explicitly calls out that the DB can be corrupted if
the OS and disks do not implement flushing properly. Further, when I was
looking at how SQLite does testing I noticed that there are no power failure
or actual disk crash tests (these are obviously hard to test in an automated
way). Instead SQLite simulates faults by using an OS abstraction layer that
injects fake disk faults, reorders writes, etc. This means that the actual
behavior of SQLite (and probably other DBs) under actual power failure
situations is probably worse than commonly acknowledged.

Edit: I should add that this issue is obviously an OS/filesystem/hardware bug,
and it's not really a criticism of SQLite to say it can't address it.

~~~
arielweisberg
Testing via software fault injection is quite reasonable if you introduce all
the necessary cases. What SQLite tests for is more than most databases do!

There aren't really build any special mechanisms besides fsync/msync if the
database is built on top of a filesystem. You have to tell the filesystem to
make it's metadata consistent as well.

Writing to a raw disk you might do something different. I don't have much
experience with that as there isn't a huge amount to be gained. It's still
going to look and work fairly similarly as you are writing your data to the
disk and then emitting a write barrier to make all prior writes durable.

If a disk lies about what data is durable there is no way to work around it
from inside the database. If you can't durably store data you can't
acknowledge to clients the data is durably stored. You can YOLO and build
structures that probably won't get corrupted in practice, but that's no way to
live life.

~~~
jeffreyrogers
I agree! SQLite is very high quality and goes to greater lengths than any
other software I'm aware of to ensure correctness and reliablity... but
several people told me that SQLite has ways of improving on fsync and I was
interested to find that isn't the case.

Edit: And I don't think testing actual power failures is realistic even for
something like SQLite. Maybe if you are using SQLite and have a critically
important application you'd do something like that, but it's too labor
intensive in almost all cases.

~~~
jasonwatkinspdx
> several people told me that SQLite has ways of improving on fsync and I was
> interested to find that isn't the case.

The way you're talking about this is simplistic. To have a reliable datastore
you need to know not just fsync exists, but a lot of specific details in
exactly how you use it, and what you can and cannot count on. There's a lot
for you to learn here if you care to look a bit deeper than "oh, just uses
fsync, boring".

~~~
jeffreyrogers
> There's a lot for you to learn here if you care to look a bit deeper than
> "oh, just uses fsync, boring".

That wasn't my conclusion at all. That's why I took the time to read the docs
and the source code. Because I was interested in how SQLite provided the
guarantees that it does and what the limitations of those guarantees are.

