
How to Corrupt a SQLite Database File - pcr910303
https://www.sqlite.org/howtocorrupt.html
======
cpuguy83
In another life (2007) I worked for a company that does digital signage. We
had about 8000 systems deployed across the US. The software on them relied
heavily on SQLite. At some point we started seeing random corrupted SQLite
databases.

Trying to figure out what was causing this we realized our pc manufacturer was
using the same sku but hardware inside could vary... and in particular the
hard drive. Every system with a corrupt DB had a particular brand hard drive
(WD), the ones that didn't fail was another (Seagate).

We never really spent much more time on this other than to lock down the
hardware config and replace all the hard drives in the failed systems with the
"good" brand.

Super interesting to see this. I'm assuming it was: Disk drives that do not
honor sync requests

We were of course using consumer grade equipment, because the company was
pretty cheap, but also we had so many to deploy. I wonder how much this cost
the company to fully resolve. Between systems not playing, having to send
technicians, shipping systems overnight, time troubleshooting, etc.

~~~
pilsetnieks
With how widespread both SQLite and WD are wouldn't half the world have apps
crashing left and right, if just the hard drive brand was the cause of the
problem?

~~~
wolf550e
The case of the disks not honoring fsync is famous. I learned about it from
Brad Fitzpatrick, author of LiveJournal (later on the Go team at Google)
writing about it in 2005:
[https://brad.livejournal.com/2116715.html](https://brad.livejournal.com/2116715.html)

~~~
da_chicken
One of the first optimizations at basically every level is to not honor fsync
requests. Network file systems lie. Operating systems lie about local volumes.
RAID controllers lie. On-disk controllers lie. That's because these lies are
caches, and cache buffers are one of the easiest ways to boost performance.

------
bob1029
I have found that corruption of SQLite files in practice is a very difficult
thing to achieve. On Windows, the worst I've ever seen is a wall of "Database
File is Locked" which was resolved with a rewrite of the DB access logic.
After much experimentation, I have found that the happy path for SQLite is as
follows:

1\. Use a SQLite distribution that is compiled or otherwise configured for
serialized threading mode (this is the default in most cases).

2\. Use a single connection from a single process to access the SQLite
database file. This connection can be safely shared between multiple threads
with optional locking as required (due to the guarantees offered by #1 above).

3\. Transactions are handled with locking in the application code (i.e. on the
SQLiteConnection instance), not in the database. This is predominantly
required for inserts with PK generation, but also applicable in cases where
you don't want 2 different threads to interleave updates during a certain
multi-step operation (e.g. debiting one account and crediting another).

So far, we have found this to be the most performant and simple way to get
business data in and out of a SQLite database file. The "one connection per
operation" approach is NOT ideal for SQLite because there is an actual
database file that needs a handle created/closed each time. One connection per
makes sense when you are using a hosted database designed for this access
model.

~~~
blattimwind
In WAL mode you can write from multiple threads without any problem, but you
need to keep serializability in mind. Specifically, no concurrent write
transactions are allowed. In other words, you can't have a transaction start
at X, have another transaction start at X+1, issue DML and commit, and expect
your original transaction from X being able to commit - it will never be able
to.

Some SQLite wrappers (notably, Python's) mess with the start and rollbacks of
transactions and make this appear buggy.

~~~
oefrha
> Some SQLite wrappers (notably, Python's) mess with the start and rollbacks
> of transactions and make this appear buggy.

@coleifer (author of peewee) has a pretty informative blog post exploring — in
part — transactional semantics, which might be of value to people who want to
know more about this.

[https://charlesleifer.com/blog/going-fast-with-sqlite-and-
py...](https://charlesleifer.com/blog/going-fast-with-sqlite-and-python/)

~~~
mox1
+1 for @coleifer, peewee is an awesome ORM, he knows his DB stuff very well.

------
adrianmonk
> _Systems that run automatic backups in the background might try to make a
> backup copy of an SQLite database file while it is in the middle of a
> transaction. The backup copy then might contain some old and some new
> content, and thus be corrupt._

I am surprised by this! Is it really not entirely safe to take a backup while
some software is running, has a SQLite database open, and might be writing to
it?

That limitation is fine for a big database that has someone knowledgeable
administering it who will take care to follow any required backup procedures.
But SQLite is often included in software that nontechnical end users will use
without even knowing that there's a SQLite file. For example, Firefox and
Google Chrome use it.

This seems to mean that if a regular end user is browsing the web while
backups run on their computer, it's possible they could be unlucky and get a
bad backup that would corrupt the database if they had to restore it.

Am I interpreting that right? If so, it suggests that many/most software
projects that include SQLite are doing it in a slightly unsafe way.

(It seems possible that the _-journal and_ -wal files could protect against
that, but this SQLite documentation only mentions that they protect you when
transactions fail, which is a different case.)

~~~
GlitchMr
Yes. However, keep in mind it's not just SQLite, but pretty much every program
that has any kind of state.

For instance, a program may save its state into a file every time it changes
by truncating the file and then writing new contents. If an user were to copy
the file after truncation, but before write, the user would end up with an
useless backup.

That said, this is rather unlikely to happen in practice. Programs usually do
stuff (including saving files) when they are interacted with, so a backup
copied with File Explorer has a very high probability of working, even when
the application is running.

Programs can deal with the issue by performing atomic full writes by
performing renames, but most programs don't bother to. SQLite cannot do that,
because it cannot afford doing full writes as databases tend to store a lot of
data.

~~~
adrianmonk
Writing a copy and renaming works, but is it the only way, or is it just an
approach that a lot of software uses because it's simple and it works? In
other words, are there ways that might be more work to implement but also more
efficient that would be suitable for a database?

I agree the risk is pretty small in practice. Though copying with File
Explorer isn't the use case I had in mind. It was automatic scheduled backups.
Both Windows and macOS make it pretty easy to set these up with an external
hard drive. And those could start running while you're using the computer
and/or you could start using the computer while backups are running.

~~~
lolc
An append-only format should work fine with most backup software that copies
files. A vacuum phase would still have to do an atomic rename.

------
arminiusreturns
I wonder if dqlite
([https://github.com/canonical/dqlite](https://github.com/canonical/dqlite))
or rqlite
([https://github.com/rqlite/rqlite](https://github.com/rqlite/rqlite)) are
enough to help prevent this.

~~~
samatman
No, but they should be able to _recover_ from it, which is almost as good.

------
justinator
I must be getting old - an article with such a title would used to also be
appended with, "For Fun, and Profit!"

~~~
anon73044
I would have expected that if this were posted on milw0rm or a personal blog,
but not on the actual SQLite info page.

~~~
justinator
I honestly feel it's more a sea change type of thing,

[https://mitpress.mit.edu/books/fun-and-
profit](https://mitpress.mit.edu/books/fun-and-profit)

------
tracker1
Using a mounted volume in docker for windows/mac desktop and inside container
to edit the same database.

Don't do this btw, the FS sync in docker for windows/mac is really slow and
prone to causing corruption and errors if you do.

------
logicallee
This section was particularly interesting (I add my own thoughts below it):

>Unfortunately, most consumer-grade mass storage devices lie about syncing.
Disk drives will report that content is safely on persistent media as soon as
it reaches the track buffer and before actually being written to oxide. This
makes the disk drives seem to operate faster (which is vitally important to
the manufacturer so that they can show good benchmark numbers in trade
magazines). And in fairness, the lie normally causes no harm, as long as there
is no power loss or hard reset prior to the track buffer actually being
written to oxide. But if a power loss or hard reset does occur, and if that
results in content that was written after a sync reaching oxide while content
written before the sync is still in a track buffer, then database corruption
can occur.

>USB flash memory sticks seem to be especially pernicious liars regarding sync
requests. One can easily see this by committing a large transaction to an
SQLite database on a USB memory stick. The COMMIT command will return
relatively quickly, indicating that the memory stick has told the operating
system and the operating system has told SQLite that all content is safely in
persistent storage, and yet the LED on the end of the memory stick will
continue flashing for several more seconds. Pulling out the memory stick while
the LED is still flashing will frequently result in database corruption.

\-----

If you were a human rather than sqlite you would stop believing that a sync
has occurred immediately, trust it a little bit after a few seconds, and if
it's been a minute since you yourself last touched the file, you would think
"surely it has been written by now".

So it seems by tracking wall time, the library could do better than have to
rely on the drive, and could be more on guard against lying drives.

~~~
magila
The bit about disk drive lying about syncing writes is a persistent myth.
There have probably been some HDDs at some point which had notable bugs in
their flush command, but in general drives do actually write data to disk
before reporting complete status for a flush.

I spent several years working for a major HDD manufacturer and their firmware
QA process included a test rig which would repeatedly pull power to the drive
while sending write and flush commands. The drive would then be powered up and
the data on the drive examined to verify that all data which should have been
flushed actually made it to disk.

I think this myth persists mostly because it can be surprising difficult to
coax an OS into sending a flush command for much the same reason disk drives
get accused of lying about it: performance.

These days the most reliably solution is to reconfigure the drive with write
caching disabled. This way the drive will ensure every write hits the disk
before reporting completion to the host. Write caching remains enabled by
default on SATA drives mostly for historical reasons. Back before command
queuing was commonly supported write caching gave a large speedup. These days
everything supports NCQ so disabling write caching only has a small
performance impact.

~~~
xenadu02
This is absolutely not a myth. There have been drives that behaved this way,
including from "name brand" manufacturers. Maybe those were firmware bugs,
maybe they stopped doing it, but it is not something made up.

Of course there have also been controller bugs, file system bugs, and so on
over the years. A decade ago the behavior of most file systems was quite poor
in the face of underlying hardware failures but a lot of work has gone into
improving the situation.

SSDs are slowing improving too. Most of them have claimed to go into read-only
mode when enough blocks wear out but in reality most of them would crash, fail
to respond, or just lose data. It turns out manufacturers tend to be a bit
loose in their claims until independent third-parties put their claims to the
test. More recent models seem to be a bit better behaved in failure states.

------
andreareina
Is open() on something other than /dev/stderr returning 2 a thing? Or is there
something screwy that was causing it to get associated with the open database
file?

~~~
emilfihlman
The kernel gives you the lowest numbered fd available.

If you close stderr, 2 becomes available and can be handed out.

Also it's not exactly guaranteed that 0, 1 and 2 are the fd numbers for stdin,
out and err on program start, but they are hard coded and standard.

E: It's implementation defined, ie not guaranteed. Your platform may make
guarantees about it, but it is not an actual standard.

~~~
blattimwind
> Also it's not exactly guaranteed that 0, 1 and 2 are the fd numbers for
> stdin, out and err on program start

Yes, it is.

[https://pubs.opengroup.org/onlinepubs/9699919799/functions/s...](https://pubs.opengroup.org/onlinepubs/9699919799/functions/stdin.html)

~~~
emilfihlman
You must differentiate between a guarantee by an actual standard and platform
specific documentation.

~~~
makomk
That is a guarantee by an actual standard. More specifically, it's one of the
things that POSIX.1-2017 guarantees but the ISO C standard does not. In
practical terms this means that all Unix-like systems use the traditional fd
numbers for stdin, stdout and stderr but it's possible that non-Unix systems
might not.

~~~
caf
(Standard C doesn't have file descriptors at all, it has FILE * file handles).

------
drderidder
It doesn't get more than a passing mention, but section 2.1 is worth noting.
Don't try to use sqlite files on an NFS mounted drive.

------
jhatemyjob
> an SQLite

Does this settle it, then? This implies it's pronounced ess-queue-lite. I
personally prefer the sequel-lite pronunciation, but if the developers use the
former, then I guess I'll do that.

~~~
tingletech
I thought it was ess-que-ell-ite

yea, seems so [https://english.stackexchange.com/questions/431329/what-
is-t...](https://english.stackexchange.com/questions/431329/what-is-the-
correct-pronunciation-of-sqlite)

------
Const-me
> Backup or restore while a transaction is active

I wonder does that apply to Windows?

Well-designed Windows backup software uses shadow copy NTFS feature. Shadow
copy creates a snapshot of the complete volume when backup starts. All writes
happening after that won’t be included in the shadow copy, so the backup
software should never read some old and some new content.

Does it mean when using VSS on Windows, SQLite databases will be backed up
just fine? Or am I missing something?

~~~
DenisM
You're probably right. Taking a VSS snapshot is identical to pulling the power
plug and imaging the drive. If SQLite hand handle the latter it should be able
to handle the former.

Another thing that should work well is an AWS EC2 EBS snapshot as Amazon
promises a consistent point in time copy.

------
umvi
Somewhat unrelated, I feel like the stars aligned to prevent my company from
using SQLite.

We use a lot of CSV files in our embedded devices. I personally hate CSV and
advocated for SQLite. However, our CSVs are 1 writer/many readers, and the
only way to do that in SQLite is to change the mode to WAL. However,
apparently WAL doesn't work on JFFS2 filesystems :(...

I'm going to give it another shot on our next-gen product which uses ubifs

------
kbumsik
Can they be applied to other DBMSs? I mean, I don't think other DBMSs treat
database files too differently from SQLite, don't they?

~~~
pjc50
Most DBMS don't let you at the files directly - you speak to the DB over a
socket. Because SQLite is in-process, bugs in the process that access random
file descriptors can write over the SQLite files.

(This is more ammunition for the idea that the real software isolation
boundary on a desktop computer should not be "user", but "software author"!)

------
wolco
The most common by far.

I/O error while obtaining a lock leads to corruption. Probably speaks to my
drive/os choices.

------
abinaya_rl
Thanks for the post, We have been using SQLite with Django for the past one
year and I'm so happy that it exists. I even store it in the git repo and and
it makes backups so easy.

------
silvestrov
> Continuing to use a file descriptor after it has been closed

It would be nice if file descriptors on 64-bit platforms used tagging so the
full value could never be reused.

E.g. lower 32 bits is the normal fd and the 32-bit is a serial number (or
random value). When using the fd the kernel can use the lower 32-bit as index
into the table and check the upper 32 bits are correct.

~~~
fanf2
Sadly that isn't allowed by POSIX

[https://pubs.opengroup.org/onlinepubs/9699919799.2018edition...](https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/functions/V2_chap02.html#tag_15_14)

> _2.14. File Descriptor Allocation_

> _All functions that open one or more file descriptors shall, unless
> specified otherwise, atomically allocate the lowest numbered available file
> descriptor_

~~~
codetrotter
Would badness ensue if we allowed ourselves to define all previously used fds
as “unavailable”, and thus conforming to the rules as stated when we increment
our serial number before handing out the next fd? If yes, what sorts of
badness and why?

~~~
samatman
It strikes me that you'd use up STDIN, STDOUT, and STDERR pretty fast that
way.

One could make an unprincipled exception for them, of course. And SQLITE would
still need the special-casing to prevent using low-numbered fds. But this
would prevent other problems and is worth considering.

~~~
mras0
Didn't see your reply before commenting, but it gave me a thought: We really
only need a "local" solution. If there were a way of requesting fd's that
weren't required to be the lowest available ones (perhaps even 64-bit ones
with a process unique uppper part), that'd probably be good enough here. Of
course we're still sharing address space with buggy code, so maybe it's not
worth it.

~~~
codetrotter
I’ll reply here since then both you and parent can see it and comment further.

In the case that you linked to in the sibling comment that you wrote in reply
to me, the dup2 syscall is used.

So the question then that I have is, in the case of

    
    
      foo 2&>1
    

Does bash then:

a) First close fd 2, and then call dup(1), or does it

b) Call dup2(1, 2), or

c) Do something else entirely, or something more complex or convoluted?

In case the answer is b) then it seems to me, though I may be overlooking
something, that we could allow calls like b) and inside of the dup2 syscall
keep the id of the fd that we are using as newfd, since it is unambiguously
requested that this is intended, while at the same never reusing fd id in the
dup syscall.

In this comment I am referring to the dup and dup2 syscalls as defined in the
Linux man pages:

dup() uses the lowest-numbered unused descriptor for the new descriptor.

dup2() makes newfd be the copy of oldfd, closing newfd first if necessary, but
note the following: [...]

[https://linux.die.net/man/2/dup2](https://linux.die.net/man/2/dup2)

With the modification that “lowest-numbered unused descriptor” would be unique
as previously stated in the comment that the two of you are replying to.

More generally, the question is, does any wide-spread piece of code manually
close an fd and then call the dup syscall, and expect the fd id to be exactly
the same as the one they themselves closed, instead of using dup2?

And furthermore, likewise, does any widespread piece of code use the dup2
syscall to specify the newfd where actually it should not have cared about the
id of the newfd and should have used dup instead and let the system decide?

~~~
mras0
I don't think it matters what bash actually does, the issue (as I see it) is
that when a program calls write(N, ...) it sometimes _wants_ to refer to the
"stale" file descriptor N. Consider this contrived example:

    
    
        write(STDERR_FILENO,"test",4); // OK, write to normal stderr
        close(STDERR_FILENO);
        open(....); // Open some log file
        // later
        write(STDERR_FILENO,"x",1); // OK, write to log file
    

Even though fd 2 refers to different files I think the above is required to
work for POSIX compliance.

~~~
codetrotter
That case is covered too as long as the program makes use of the dup2 syscall
to make the switch instead of close followed by open. But it is probable that
some pieces of code that are currently in widespread use do it by close and
expecting a call to open to give them fd id for stderr like you say indeed I
guess.

So the conclusion is that the answer to my original question is probably like
you say, we can’t just do that.

But it would be interesting to find out how many wide-spread pieces of open
source code do it that way, and for all code that does one could submit
patches to change them to using the explicit dup2.

------
endorphone
Related question - does SQLite checksum blocks (pages) of data? Of course the
filesystem usually does, and the physical device itself does, but in the
scenario where the OS somehow delivered corrupted data would SQLite at a
minimum detect it?

~~~
Liquid_Fire
> Of course the filesystem usually does

Does it really? Certainly a few filesystems like ZFS have features like that,
but AFAIK none of the "mainstream" ones do (the things actually running on
billions of devices, like ext4, NTFS, etc)

------
laurent123456
2009 (at least) + previous discussions:
[https://hn.algolia.com/?q=How+to+Corrupt+an+SQLite+Database+...](https://hn.algolia.com/?q=How+to+Corrupt+an+SQLite+Database+File)

------
tgsovlerkhgsel
> Backup or restore while a transaction is active

Is there any information whether taking a consistent snapshot (that contains
both the DB file and journals) is safe?

~~~
egdod
Seems like that would have the same problems if you do it while transactions
are ongoing.

~~~
dunham
I believe they are subtly different.

If you copy while transactions are ongoing, you're reading different parts of
a file at different points in time. For example, the beginning of the sqlite3
has already been copied, but by the time you get to the log file, blocks have
been committed back into the beginning of the sqlite file.

If you take a consistent snapshot of the filesystem first, I would think that
would be the same as a crash in the middle of a transaction. Things should be
in a consistent state that the recovery code at startup can handle.

------
poooood
Had a team mate use “dos2unix” on everything in a directory and that defo
corrupted the database.

------
Lex-2008
Very interesting read! Although some points might sound too strange to include
in this list (like "memory corruption" \- in case of faulty memory, everything
is in danger, not only database) - some worth keeping in mind (multiple
hardlinks to the file, moving a single DB file around without accompanying
*-wal file, etc)

~~~
matsemann
> strange to include in this list (like "memory corruption" \- in case of
> faulty memory, everything is in danger, not only database)

But theoretically one could build something that handles up to X random bit
flips, so it's maybe worth mentioning that sqlite doesn't handle this (even
though probably no one else does as well)

~~~
Franciscouzo
Could you? You would also have to take into account that the instruction that
make the program could also be randomly flipped

~~~
blattimwind
In most cases the size of the code is much smaller than the data handled, so
for random errors (perfectly working DRAM has a bit error rate somewhere in
the general vicinity of 10^-14/bit*hr) the distribution of errors is in
accordance to that proportion.

------
dicytea
Any good introduction to sqlite3 for a Python user?

~~~
kccqzy
Read the sqlite SQL documentation. Then read the built-in Python library
documentation. Or read the SQLAlchemy documentation, for a different way to
interact with it. Build something.

[https://docs.python.org/3/library/sqlite3.html](https://docs.python.org/3/library/sqlite3.html)

[https://docs.sqlalchemy.org/en/13/core/tutorial.html](https://docs.sqlalchemy.org/en/13/core/tutorial.html)

[https://www.sqlite.org/lang.html](https://www.sqlite.org/lang.html)

