
Ask HN: Is sqlite good for long-term persistence? - avh02
Hi All,<p>so basically I&#x27;ve started to put all important stuff in (private&#x2F;personal) git repos, examples are my quote selections and another is visa&#x2F;passport&#x2F;travel information that I intend to start to keep a track of and backfilling - the question is would something like sqlite be suitable for long-term storage for this kind of data? i.e: if i dig up this file in 30 years, will I be able to read it? (like i would a text file)<p>The problem is that 1) text files don&#x27;t give the kind of organisational power i&#x27;m looking for and 2) if i ever want to build something on top of these files, having them in a format like sqlite should make for easy implementation.<p>Any other long-term, open options? it doesn&#x27;t really make for easy google-ing
======
RcouF1uZ4gsC
The answer is yes, you will be able to read it. According to
[https://twitter.com/copiousfreetime/status/67583454330408140...](https://twitter.com/copiousfreetime/status/675834543304081409?lang=en)

SQLite will be supported for the life of the Airbus A350 frame. Airframe have
lifetimes measured in multiple decades, so having it work in 30 years is a
good bet.

------
dguo
I was fortunate enough to attend a talk by Dr. Hipp (the creator of SQLite),
and it was very clear that he is dedicated to supporting SQLite for the long
term and keeping it backwards compatible.

From [https://sqlite.org/lts.html](https://sqlite.org/lts.html):

"In addition to "supporting" SQLite through the year 2050, the developers also
promise to keep the SQLite C-language API and on-disk format fully backwards
compatible. This means that application written to use SQLite today should be
able to link against and use future versions of SQLite released decades in the
future."

~~~
avh02
everything so far was definitely convincing, but this is the last nail in the
coffin, thank you!

edit: i was already convinced by all the responses, i'm now peacefully
convinced

------
brennen
I think SQLite stands a better chance of still being readable in 30 years than
many possible storage mechanisms, but I'd hedge my bets if I were you: Export
the SQL routinely, like pryelluw suggests, and make regular backups.

Other than that, it's hard to beat simple text formats for sheer longevity and
future-proofness. You might well consider printouts of some core stuff if your
goal is for the info to survive multiple decades. Whatever you do, document
your own process while you're at it, so future you can figure out how to
reassemble things if a software dependency shifts out from under you.

~~~
jschwartzi
If you have a lot of data that you're printing, choose a font that's easy to
OCR, or consider using a 2D or 1D-stacked barcode symbology like QR Code or
PDF417. PDF417, in particular, can be encoded with up to 50% redundancy and
uses an open standard, plus its stacked nature makes it easy to scan with 1D
and 2D scanners. It's likely to remain supported in a variety of devices for
the foreseeable future.

[https://en.wikipedia.org/wiki/PDF417](https://en.wikipedia.org/wiki/PDF417)

Information density is much higher with barcode than it is with simple text
printed on paper, although you'll need a lot of time and some pretty sharp
eyes to decode them without electronics.

------
jdormit
Your real problem isn't the data format, it's the host machine - you need to
be sure that the computer where your files live is still around/working in 30
years. Either make frequent backups to multiple machines, or choose a cloud
provider that will be around for a while, e.g. AWS or similar (and even then,
still do frequent backups to other machines).

The data format you choose should depend on the relationships present in the
data. If there are relational connections between data points (e.g. data1 HAS
A data2, data3 HAS MANY data4s), then SQLite is a fine choice. If all you want
to do is keep the data around, there is nothing wrong with text files -
perhaps the directory structure could represent the organizational structure
you require.

~~~
avh02
oh yeah of course, it was a very theoretical "if i dig up this file in 30
years" \- for the most part i access and update these files on the regular and
have multiple backups. - but not much will beat stability of a format when
things do go very wrong.

I can't find it now, but I remember reading about an organisation trying to
archive work that had been on floppies and even when the data was read, the
file formats were obsolete. so I wanted to draw the question on that kind of
parallel.

as for the relationship format - definitely with visas/passports, a passport
will have several visas on it, and will have taken journeys on it, etc. -
again, i haven't done this _yet_ but it's been on my mind for a good year now
and am just trying to take down my barriers to doing it.

~~~
jdormit
Ah understood. In that case, I'll echo the comments below - I would choose
SQLite to preserve data relationships with automated dumps to plain text
(maybe JSON? Something parseable, anyway), since I can't imagine utf8-encoded
text going obselete to the point where no programs exist that can read in the
foreseeable future.

------
detaro
Sqlite should be around for a while, and since the source is around it should
be possible to compile it in the future. I don't know how stable the file
formats are, or if you'd have to do regular migrations.

Text-based formats in my opinion still are a better bet for really long-term
storage, since I think they are easier to recover from partially-broken
states. (E.g. if you give me an sqlite file with a few bytes broken, I don't
know how how to recover that. In a text-based format I can fix it by hand if
necessary, or modify my tools do deal with it). More complex formats can be
derived from them and used as caches: e.g. for a large text-file archive, you
might want to have a full-text index in some database format for faster
querying. As long as that index is fully generated from the base dataset, its
stability is not as important.

~~~
dividuum
> I don't know how stable the file formats are, or if you'd have to do regular
> migrations.

Reading this made me wonder why there isn't a binary file format that has a
mandatory header with a plain text description about itself in it. I can't
imagine the SQLITE format being too complicated, so it would probably fit into
8k maximum. Basically a way to make it easier to create a parser, if the
original description or source code is lost. Given that the SQLITE format is
everywhere it's probably not too relevant in that case, but I still wonder if
there's any format that does something like this.

~~~
throwawayish
The format itself is not complex, but recovering the transaction _is_ (very
much). So better make sure to leave it clean.

------
justinclift
Not sure about a 30 year timeframe. 10 years seems like it wouldn't be any
kind of problem (personal opinion), but 30 years is an awful long time.

That being said, the SQLite development Community are very very big on
backwards compatibility. eg:

[https://youtu.be/ZvmMzI0X7fE?t=19m56s](https://youtu.be/ZvmMzI0X7fE?t=19m56s)

Hmmmm, there's a better talk by D. Richard Hipp around, but I'm not finding it
atm. In the better talk he explains they (SQLite) are planning to be around
for the 30 year timeframe you mention. Something to do with an airplane
manufacturer looking for long term guarantees, but I don't remember the
details.

~~~
__derek__
> Hmmmm, there's a better talk by D. Richard Hipp around, but I'm not finding
> it atm. In the better talk he explains they (SQLite) are planning to be
> around for the 30 year timeframe you mention. Something to do with an
> airplane manufacturer looking for long term guarantees, but I don't remember
> the details.

Here's the video[1] linked to from a tweet[2] linked to above.[3]

[1]:
[https://www.youtube.com/watch?v=Jib2AmRb_rk&feature=youtu.be...](https://www.youtube.com/watch?v=Jib2AmRb_rk&feature=youtu.be&t=3134)

[2]:
[https://twitter.com/copiousfreetime/status/67583454330408140...](https://twitter.com/copiousfreetime/status/675834543304081409)

[3]:
[https://news.ycombinator.com/item?id=13569301](https://news.ycombinator.com/item?id=13569301)

~~~
justinclift
Excellent, thanks. Was hoping someone would dig that up. :)

------
pmontra
You'll be OK. The first answer now is about the link between SQLite and the
Airbus A350 but even if that wouldn't be the case... Make regular dumps in
plain SQL [0] and the data will survive changes in the architecture of your
host system and even to the end of SQLite. In the worst case you'll have to
write a script to upload them to any db we'll be using by then. Or the db will
look at the file and figure what to do with it :-)

[0] Command .dump a
[https://www.sqlite.org/cli.html](https://www.sqlite.org/cli.html)

------
CJefferson
Take a virtual machine image which can read your data, and store that. If you
have everything you need, down to the kernel, you'll know you can still access
the data.

Experience of emulators tells us in 30 years someone will definitely have a
good quality 32-bit intel emulator, capable of running linux. Such things
already exist, even in Javascript, and I'm sure someone (or many someones)
will want a good quality emulator, for running old games if nothing else!

------
pryelluw
Why not keep an sqlite db and the text files? Keep a back up of both and have
a script dump the db data to text every n days. Then tar/zip it.

------
haldean
C will definitely be around in 30 years, and the source of SQLite is a single
C file; if you want to be really careful, just store both the C source and
your data?

------
ind33d
> Is sqlite good for long-term persistence?

More than any data format in existence.

SQLite is used in every single mobile phone on the planet.

For those of you who are not familiar with D. Richard Hipp's work, SQLite is
the best tested and supported library in the history of software. Look at the
test suite and be amazed.

~~~
avh02
the lts link posted by dguo summarizes their testing methodology, definitely
impressive.

When i originally asked this question i knew sqlite was everywhere (used to do
android dev) and didn't expect it to (edit: completely) disappear even if it
lost favour, but had no idea it was THAT ubiquitous - I'm completely convinced
it's here for a few decades at least. I'm happy with that.

------
rpcope1
If I had to bet on anything being still here and supported in 30 years, it
would definitely be sqlite3 (but maybe not git), given how much it's used in
literally everything, and how much the sqlite team has given consideration
around support and compatibility.

~~~
avh02
thankfully i use git on a daily basis so would be immediately aware of it's
obsolescence, sqlite i can't say the same for (yet), so would probably be
blind to it slowly losing favour (should that happen).

tl;dr: git is not the risk for me here.

------
StavrosK
I can't imagine that you won't be able to run some sort of old linux emulator
with sqlite in 30 years. A simpler format is better, but SQLite is good too.

Even better if you keep SQL dumps with your backups every few months. Those
are going to be readable much more easily.

------
Klathmon
If you are worried about it, just make a "backup" system which will dump your
sqlite DB to CSV or "text" or even raw SQL commands every so often.

That way you get to have all the convenience of any tools you might want right
now, and have a very easily "parsable" copy that you should be able to write
something to put it in something in the future.

------
bjt
Personally, I wouldn't trust today's sqlite files to be readable in 30 years.

+1 to text-based formats. For long text I'd choose markdown. For anything with
a more nested structure, yaml. Yaml can do references between objects as well.

I'm not sure what you mean by "the kind of organisational power I'm looking
for".

~~~
avh02
Wasn't aware yaml could do this, but in terms of organisational power,
basically, my prime example which i've only mentioned at a high level so far
would be to record all my travel information, because every time i have to
fill out a visa form that asks, "which countries have you visited in the last
10 years" i usually end up going through passports. But this is the start for
these things, I will later want to make indexes for photo/video archives, etc.

but back to the example at hand:

if i wanna know which year i went to country x, boom! query

if i want to know where i went between 2005-2008, boom! query.

if i want to know when i used a certain visa (e.g: multi-entry visas) boom!
query.

this gets difficult with flat files, not saying it's impossible with something
like denormalised CSV, but i'd have to write my own scripts (or string
together a frustrating sequence of bash commands) for these things.

------
B1FF_PSUVM
> text files don't give the kind of organisational power i'm looking for

Somewhere, somewhen, someone put together something (a bunch of shell
scripts?) that gave you some fraction of a database's power on plain text
files.

(Somewhat fuzzy, I know ... wetware not competitive these days.)

~~~
avh02
partially responded to this on bjt's comment.

my original premise for sqlite was that it's in-place on the file (no daemon,
etc), until this Ask thread I never knew sqlite had such longevity planned.

edit: I don't doubt it either, I could probably gun-to-my-head do this with
csv's and bash kung-fu, but i'd rather not because it's a barrier to usage.

~~~
B1FF_PSUVM
This may have been what may memory was dredging up - "fsdb, a flat-text
database for shell scripting" :
[http://www.isi.edu/~johnh/SOFTWARE/FSDB/Fsdb-2.23_README.htm...](http://www.isi.edu/~johnh/SOFTWARE/FSDB/Fsdb-2.23_README.html)

------
asn0
You can use whatever format you want if you also archive (1) the software
needed to read whatever formats you use and (2) a computer that can run that
software. A well-built computer should easily survive 30 years of non-use if
stored properly.

~~~
avh02
This is a bit extreme for my use case, but I do agree that while possible,
it's also risky - what happens if your basement with the computer floods or
similar - where with a data-only strategy you can proliferate it pretty
easily.

------
DougN7
Sqlite database files can (rarely) become corrupt, which could cause you to
lose the whole file. I'd lean towards text and individual files myself.

------
z3t4
Most likely cause of data loss is probably accidentally deletion, so keep
backups and make sure they work.

------
user5994461
Don't use sqlite for that. Or any other application specific format that only
one application can read.

There is no application that has a 30 years lifespan. Expect to already have
massive issues in 10 years. If you depend on a single application, you're
screwed.

You want a readable long term format, you use a good old txt/csv/xml. Whatever
fancy computer may come up in 2050, it's guaranteed they will ship with
applications to read that.

If needs be, gzip it to save space.

~~~
SQLite
I intend to support SQLite for 33 more years. I take that commitment seriously
and plan accordingly.

But even if my plans do not work out, the SQLite file format is fully
documented and relatively simple. See
[https://www.sqlite.org/fileformat2.html](https://www.sqlite.org/fileformat2.html)
for details. Writing a script to convert an SQLite database into a series of
CSV files shouldn't take a competent hacker more than a few of hours and a
novice more than a few days.

Reasonable estimates are that there are more copies of SQLite in use today
than any library other than zLib. I dare say that SQLite will live as long or
longer than gzip, and should both disappear, I suspect you will have a much
easier time recovering content from an SQLite database than you would from a
gzip file.

~~~
avh02
Oh wow, hello! * blush *

Serious (if redundant) question though: on year 34, is backward compatibility
out the window?

(that said, 33 years is probably enough time to make the standard fairly
permanent anyway, so even if the developers decide to break it, there will be
enough residue that I don't believe the then-old standard could disappear)

