
How to Corrupt an SQLite Database File - naftaliharris
http://www.sqlite.org/howtocorrupt.html
======
maxdeliso
The amount of engineering rigor that has gone into sqlite over the years
boggles the mind. It's a project that has been held up as a sterling example
of successful software development many times, rightfully so. I especially
like this part of their license, from the top of sqlite3.h

 __" The author disclaims copyright to this source code. In place of __a legal
notice, here is a blessing: ____May you do good and not evil. __May you find
forgiveness for yourself and forgive others. __May you share freely, never
taking more than you give. "

~~~
ricardobeat
Unfortunately the FSF holds the view that a clause like that makes software
"non-free". See [http://www.gnu.org/licenses/license-
list.en.html#JSON](http://www.gnu.org/licenses/license-list.en.html#JSON)

~~~
Stratoscope
It's a blessing, not an obligation or a demand or a condition of use.

Unlike the JSON license clause "The Software shall be used for Good, not
Evil", the SQLite blessing doesn't put any restriction on your "freedom to run
the program, for any purpose (freedom 0)" or on any of the other freedoms
listed by the FSF.

It's no different from putting this in a library:

# This work is dedicated to the public domain.

# I hope you enjoy using it!

No one could argue that if you didn't _enjoy_ using the library (maybe the API
is a confusing mess) you would somehow be in violation of the license.

p.s. I upvoted your comment because I think you raised an interesting point!

~~~
derleth
> It's a blessing, not an obligation or a demand or a condition of use.

That's how we interpret it. The problem is that we don't know what the rights-
holder has in mind, and what that entity's willing to sue over.

Lawyers can be comically risk-averse. Comical, that is, until you see the
kinds of things people actually sue over.

~~~
computer
See this 3 minute presentation fragment about the "Do not use JSLint for Evil"
license clause:
[https://www.youtube.com/watch?v=-hCimLnIsDA](https://www.youtube.com/watch?v=-hCimLnIsDA)

~~~
aw3c2
It's so idiotic. Who of those who do evil _in my view_ , do agree that what
they are doing is evil?

------
CraigJPerry
I know SQLite "does testing right" and they rightly get credit for this.

They also "do documentation right" and this is another accessible, clear
example of that.

It's also all public domain. I've used their documentation as an example in
the past when driving through documentation improvements and i'll no doubt
point to them again!

~~~
mattmanser
Am I missing something about this "do documentation right" as while I've got
sqlite working a few different times, generally speaking it wasn't their docs
but SO that actually enabled me to do so. As far as I remember it is
definitely anti-noob. The very first page in the documentation repeatedly uses
KiB instead of Kb for example. The first example is in TCL. The 2nd example is
in C.

Not Java, not Python, not Ruby. TCL & C.

Documentation done right for a small percentage of programmers elbow deep in
neckbeards maybe. Alienating and weird for the rest of them.

Still a fantastic product ofc!

~~~
eropple
This is a pretty weak argument. Using "KiB" instead of "KB" is _precise_ , not
"anti-noob". And I don't think that there's a serious reason for a C library
to provide a non-C tutorial on the off chance that you don't know C and have
to bind it to a language without libraries--if you're using almost anything
modern, you already have a SQLite library and shouldn't need to do the binding
yourself.

Even assuming the lack of a library, however, I still find the "but it's all
in C!" argument unpersuasive. The idea that C is some deep "neckbeard" stuff
is just silly. Part of understanding programming is understanding how your
language of choice interacts with the _lingua franca_ that is C, even if you
don't know C yourself (which you really should, even if you're working in Ruby
or Python or Java on a regular basis). So learn what you need to learn before
using SQLite if you have to write your own bindings. That is not a big deal.
If it "alienates" you, learn more. It's all out there for you.

------
seiji
That's why the D in ACID confused me for a long time. There's no way to
guarantee durability. They say "durable" but they mean "durable, kinda, we
think, we hope, if nothing is lying to us, if your storage array doesn't lose
power without a battery to flush the cache, if your system doesn't power down
while your drives are lying about their committed write status, or if your
write succeeded then an elf ran away with your platters."

~~~
asperous
Sqlite guarentees Durability by waiting and checking the disk to make sure it
is stored correctly before reporting a success. This default behavior is
rather slow and can be turned off.

This is in stark contrast to something like MongoDB, which barely parses the
request before reporting a success, and doesn't make any guarantees of when or
even if it will ever save the data (though it usually does).

~~~
gizmo686
What happens if the disk or kernel lie. At both levels, I can see a potential
performance enhancement by buffering changing while transparently acting as if
they have actually been committed. My guess would be that hard-drives have
explicit instruction(s) than ask for what actually happened; but I have never
worked with this type of thing.

~~~
taspeotis
> What happens if the disk or kernel lie

My understanding is that the kernel (for Windows and Linux values of "kernel")
will never lie to you. They will accurately report what the storage drivers
told it, and the lying occurs at that level.

Storage drivers seem to be a big source of corruption for MSSQL [1]:

> The most common cause of database corruption (more than 95% of all
> corruption cases) that we in PSS encounter turn out to be caused by a
> platform issue, which is a layer below the SQL Server. The most common
> individual cause is a 3rd party driver or firmware bug.

And as well

> My guess would be that hard-drives have explicit instruction(s)

They do, but the problem is that the storage drivers lie about what actually
happened. E.g. Basically they implement write caching for the "flush the write
cache" instruction.

[1]
[http://blogs.msdn.com/b/suhde/archive/2009/04/08/introductio...](http://blogs.msdn.com/b/suhde/archive/2009/04/08/introduction-
to-database-corruption-in-sql-server.aspx)

~~~
Sami_Lehtinen
It's same story with PostgreSQL aka Postgres. "Almost all corruption issues
are related to hardware problems, and most of the rest are due to documented-
unsafe configurations like disabling fsync."

But of course doing that is up to the user. Using write-back caching with long
sync interval and disabling fsync. It's really nice, gives better than SSD
performance with regular HDD. Until you shut it down uncleanly, then you're
screwed. But of course any sane person would use this only for temporary or
other really non-important data which can be regenerated or lost without
problems in such situation.

I'm using such configuration with ERP,BI/ETL (Extract, Transform, Load) tasks.
When I start the task, I anyway drop and recreate any tables required for the
task. SO I don't really mind if data gets corrupted. That's just life. Doing
safe commits would make task very slow.

Only good question is how to balance smartly, in application data / caching,
database engine caching and file system caching. In cases where database runs
on same server as the processing application.

~~~
Sanddancer
I'd say that this is a good task for tablespaces and intelligent disk
partitioning. Set up a partition for the data you really don't care about with
all the speedup options, and create a tablespace that points to it.

~~~
lobsterdiner
Also keep this in mind: [http://thebuild.com/blog/2013/03/10/you-cannot-
recover-from-...](http://thebuild.com/blog/2013/03/10/you-cannot-recover-from-
the-loss-of-a-tablespace/)

------
j_s
If you do happen to corrupt any SQLite database file, it can usually be
repaired... unless it is on an Android phone and the app tries to re-open it
before you can make a copy:

Android automatically deletes corrupt SQLite files!

[http://stackoverflow.com/questions/7764943](http://stackoverflow.com/questions/7764943)

~~~
ghaff
I managed to corrupt the SQLite database that Lightroom uses last December.
Around that time my old Windows system was having some hardware issues (random
crashes) which led me to upgrade to a Mac Mini. However, after the upgrade,
the problems started reoccurring and eventually Lightroom wouldn't read in the
database at all and wouldn't repair it. I suspect that something bad happened
during one of the system's random overhear or whatever shutdowns.

As it turned out, I was able to fix the problem by dumping the database and
creating a new one. Oddly enough, Adobe's repair function apparently didn't
try this fairly straightforward (once you knew to do it) repair process.

------
joe_the_user
Great stuff and all but can I ask why SQL folks make it hard for an
application to get simple _read_ access to a SQLite database? Especially, I
have an application which wants to use the Firefox places-bookmarks-etc db
file. I can copy the file to a different directory and then run a query on it
but can't open it in place for reading, for _just_ reading. WTF is the problem
with that? Is reading a file a way to corrupt it?

~~~
simscitizen
Use sqlite3_open_v2 with the SQLITE_OPEN_READONLY flag. It's been there for
several releases now.

------
tomcam
This also serves as a clear illustration of SQLite internals and DBMS
internals in general without thick clots of rubbish jargon. Nice. If you're
interested in database implementation this neatly covers a multitude of
topics.

------
eksith
I can tell you one of the easiest ways to corrupt an SQLite db file is to not
let a library specifically designed for file operations, handle file
operations. And of course, not using provided libraries for db access (E.G.
PDO if you're using PHP).

I ran a forum on SQLite a while back, and things worked extremely well as long
as I didn't pretend I knew better than the library.

There's no good excuse to not use PDO and there hasn't been one for years now.
Before that, I've implemented a write queue, which seems redundant in
retrospect, in case the file lock issue came about, but it never did even
though I did hit the write queue a few times.

As for crashes, periodic snapshots of the db _and journals_ (that's very
important) is usually the best way to avoid recovery issues.

------
zackmorris
I stumbled onto this article a couple of days ago when I was trying to find a
library to integrate sqlite with a cloud service like dropbox. Does anyone
know of one that has:

* Read

* Write

* Callback (to sort out paradoxes when two or more devices update the store independently offline and need to merge)

It's possible to do this with CoreData, but poorly, with a high burden on the
developer to learn the entirety of Apple's APIs and no way to alert the user
as to what it is doing under the hood, which causes the app to hang for
minutes or even forever until the managed object context says it's ready.

~~~
j_s
This is one of the main reasons CouchBase was created. Its community has
fragmented as the company tries to figure out how to make money, but it would
be worth your time to check it out. Here the Xamarin folks explain their new
CouchBase Lite wrapper: [http://blog.xamarin.com/simplifying-persistence-with-
a-docum...](http://blog.xamarin.com/simplifying-persistence-with-a-document-
database/)

~~~
zackmorris
This is just exactly what I was looking for, thank you! I especially like that
it's document oriented and that replications/merges are handled in the
simplest way possible, by presenting the most recent revision ID and giving
the application the option to inspect previous revision IDs and use custom
logic or ask the user which revision to use.

I think where they perhaps missed the boat is that nearly every app that needs
to synchronize across devices needs this, but I'm having trouble finding a
simple SAAS plan. When I was younger I was interested in hosting my own
database but now I'm just not. I want to pay a few bucks and have someone else
do it.

So on that note a possible startup idea is to host couchbase and charge for
it. I think the core of the problem is that app sales are one-time, so a
subscription model may not be appropriate. But the bandwidth will typically be
so small that it won't matter. So that puts the total value per user maybe in
the 25 cent range. How many million new users per year would it take to gross
a million dollars.. I can see their dilemma. But, I think there is something
to this. Whoever pulls it off could be the next dropbox but for databases. If
this all clicks for someone, look me up!

------
bch
Tcl (the test harness) at work.

------
Daniel_Newby
The message I got was that POSIX advisory locking needs to be ripped out.

------
Glyptodon
On seeing this first thought was "oh, another Ess Kyoo El person" because
reading a "An See Kwul" instead of "A See Kwul" just grated. Pardon my attempt
at phonetic spelling.

~~~
mikeash
The official pronunciation of "SQLite" is "ess kyoo el ite". That's straight
from the author.

