Hacker News new | comments | show | ask | jobs | submit login
Ask HN: Is sqlite good for long-term persistence?
62 points by avh02 109 days ago | hide | past | web | 46 comments | favorite
Hi All,

so basically I've started to put all important stuff in (private/personal) git repos, examples are my quote selections and another is visa/passport/travel information that I intend to start to keep a track of and backfilling - the question is would something like sqlite be suitable for long-term storage for this kind of data? i.e: if i dig up this file in 30 years, will I be able to read it? (like i would a text file)

The problem is that 1) text files don't give the kind of organisational power i'm looking for and 2) if i ever want to build something on top of these files, having them in a format like sqlite should make for easy implementation.

Any other long-term, open options? it doesn't really make for easy google-ing




The answer is yes, you will be able to read it. According to https://twitter.com/copiousfreetime/status/67583454330408140...

SQLite will be supported for the life of the Airbus A350 frame. Airframe have lifetimes measured in multiple decades, so having it work in 30 years is a good bet.


I was fortunate enough to attend a talk by Dr. Hipp (the creator of SQLite), and it was very clear that he is dedicated to supporting SQLite for the long term and keeping it backwards compatible.

From https://sqlite.org/lts.html:

"In addition to "supporting" SQLite through the year 2050, the developers also promise to keep the SQLite C-language API and on-disk format fully backwards compatible. This means that application written to use SQLite today should be able to link against and use future versions of SQLite released decades in the future."


everything so far was definitely convincing, but this is the last nail in the coffin, thank you!

edit: i was already convinced by all the responses, i'm now peacefully convinced


I think SQLite stands a better chance of still being readable in 30 years than many possible storage mechanisms, but I'd hedge my bets if I were you: Export the SQL routinely, like pryelluw suggests, and make regular backups.

Other than that, it's hard to beat simple text formats for sheer longevity and future-proofness. You might well consider printouts of some core stuff if your goal is for the info to survive multiple decades. Whatever you do, document your own process while you're at it, so future you can figure out how to reassemble things if a software dependency shifts out from under you.


If you have a lot of data that you're printing, choose a font that's easy to OCR, or consider using a 2D or 1D-stacked barcode symbology like QR Code or PDF417. PDF417, in particular, can be encoded with up to 50% redundancy and uses an open standard, plus its stacked nature makes it easy to scan with 1D and 2D scanners. It's likely to remain supported in a variety of devices for the foreseeable future.

https://en.wikipedia.org/wiki/PDF417

Information density is much higher with barcode than it is with simple text printed on paper, although you'll need a lot of time and some pretty sharp eyes to decode them without electronics.


Never thought about self-documenting the process - that's a good insight.

regular backups, like you and pryelluw have mentioned is i guess the ultimate solution on this - probably enforce this with a pre-commit hook so that it's always up-to-date, easy to get lazy.


Save it as a spreadsheet. Microsoft will keep publishing new versions of Excel every 2 years for the next 30 years, but you are guaranteed they will never change the format or add a single new feature over that period!


Your real problem isn't the data format, it's the host machine - you need to be sure that the computer where your files live is still around/working in 30 years. Either make frequent backups to multiple machines, or choose a cloud provider that will be around for a while, e.g. AWS or similar (and even then, still do frequent backups to other machines).

The data format you choose should depend on the relationships present in the data. If there are relational connections between data points (e.g. data1 HAS A data2, data3 HAS MANY data4s), then SQLite is a fine choice. If all you want to do is keep the data around, there is nothing wrong with text files - perhaps the directory structure could represent the organizational structure you require.


oh yeah of course, it was a very theoretical "if i dig up this file in 30 years" - for the most part i access and update these files on the regular and have multiple backups. - but not much will beat stability of a format when things do go very wrong.

I can't find it now, but I remember reading about an organisation trying to archive work that had been on floppies and even when the data was read, the file formats were obsolete. so I wanted to draw the question on that kind of parallel.

as for the relationship format - definitely with visas/passports, a passport will have several visas on it, and will have taken journeys on it, etc. - again, i haven't done this _yet_ but it's been on my mind for a good year now and am just trying to take down my barriers to doing it.


Ah understood. In that case, I'll echo the comments below - I would choose SQLite to preserve data relationships with automated dumps to plain text (maybe JSON? Something parseable, anyway), since I can't imagine utf8-encoded text going obselete to the point where no programs exist that can read in the foreseeable future.


Sqlite should be around for a while, and since the source is around it should be possible to compile it in the future. I don't know how stable the file formats are, or if you'd have to do regular migrations.

Text-based formats in my opinion still are a better bet for really long-term storage, since I think they are easier to recover from partially-broken states. (E.g. if you give me an sqlite file with a few bytes broken, I don't know how how to recover that. In a text-based format I can fix it by hand if necessary, or modify my tools do deal with it). More complex formats can be derived from them and used as caches: e.g. for a large text-file archive, you might want to have a full-text index in some database format for faster querying. As long as that index is fully generated from the base dataset, its stability is not as important.


> I don't know how stable the file formats are, or if you'd have to do regular migrations.

Reading this made me wonder why there isn't a binary file format that has a mandatory header with a plain text description about itself in it. I can't imagine the SQLITE format being too complicated, so it would probably fit into 8k maximum. Basically a way to make it easier to create a parser, if the original description or source code is lost. Given that the SQLITE format is everywhere it's probably not too relevant in that case, but I still wonder if there's any format that does something like this.


The format itself is not complex, but recovering the transaction is (very much). So better make sure to leave it clean.


corruption is unlikely here (famous last words) and if it happens, it should be generally recoverable (data copied over several places and versioned so that accidental corruption can be rolled back)


Not sure about a 30 year timeframe. 10 years seems like it wouldn't be any kind of problem (personal opinion), but 30 years is an awful long time.

That being said, the SQLite development Community are very very big on backwards compatibility. eg:

https://youtu.be/ZvmMzI0X7fE?t=19m56s

Hmmmm, there's a better talk by D. Richard Hipp around, but I'm not finding it atm. In the better talk he explains they (SQLite) are planning to be around for the 30 year timeframe you mention. Something to do with an airplane manufacturer looking for long term guarantees, but I don't remember the details.


> Hmmmm, there's a better talk by D. Richard Hipp around, but I'm not finding it atm. In the better talk he explains they (SQLite) are planning to be around for the 30 year timeframe you mention. Something to do with an airplane manufacturer looking for long term guarantees, but I don't remember the details.

Here's the video[1] linked to from a tweet[2] linked to above.[3]

[1]: https://www.youtube.com/watch?v=Jib2AmRb_rk&feature=youtu.be...

[2]: https://twitter.com/copiousfreetime/status/67583454330408140...

[3]: https://news.ycombinator.com/item?id=13569301


Excellent, thanks. Was hoping someone would dig that up. :)


sounds like solid backing. awesome.


You'll be OK. The first answer now is about the link between SQLite and the Airbus A350 but even if that wouldn't be the case... Make regular dumps in plain SQL [0] and the data will survive changes in the architecture of your host system and even to the end of SQLite. In the worst case you'll have to write a script to upload them to any db we'll be using by then. Or the db will look at the file and figure what to do with it :-)

[0] Command .dump a https://www.sqlite.org/cli.html


Take a virtual machine image which can read your data, and store that. If you have everything you need, down to the kernel, you'll know you can still access the data.

Experience of emulators tells us in 30 years someone will definitely have a good quality 32-bit intel emulator, capable of running linux. Such things already exist, even in Javascript, and I'm sure someone (or many someones) will want a good quality emulator, for running old games if nothing else!


Why not keep an sqlite db and the text files? Keep a back up of both and have a script dump the db data to text every n days. Then tar/zip it.


C will definitely be around in 30 years, and the source of SQLite is a single C file; if you want to be really careful, just store both the C source and your data?


> Is sqlite good for long-term persistence?

More than any data format in existence.

SQLite is used in every single mobile phone on the planet.

For those of you who are not familiar with D. Richard Hipp's work, SQLite is the best tested and supported library in the history of software. Look at the test suite and be amazed.


the lts link posted by dguo summarizes their testing methodology, definitely impressive.

When i originally asked this question i knew sqlite was everywhere (used to do android dev) and didn't expect it to (edit: completely) disappear even if it lost favour, but had no idea it was THAT ubiquitous - I'm completely convinced it's here for a few decades at least. I'm happy with that.


If I had to bet on anything being still here and supported in 30 years, it would definitely be sqlite3 (but maybe not git), given how much it's used in literally everything, and how much the sqlite team has given consideration around support and compatibility.


thankfully i use git on a daily basis so would be immediately aware of it's obsolescence, sqlite i can't say the same for (yet), so would probably be blind to it slowly losing favour (should that happen).

tl;dr: git is not the risk for me here.


I can't imagine that you won't be able to run some sort of old linux emulator with sqlite in 30 years. A simpler format is better, but SQLite is good too.

Even better if you keep SQL dumps with your backups every few months. Those are going to be readable much more easily.


If you are worried about it, just make a "backup" system which will dump your sqlite DB to CSV or "text" or even raw SQL commands every so often.

That way you get to have all the convenience of any tools you might want right now, and have a very easily "parsable" copy that you should be able to write something to put it in something in the future.


Personally, I wouldn't trust today's sqlite files to be readable in 30 years.

+1 to text-based formats. For long text I'd choose markdown. For anything with a more nested structure, yaml. Yaml can do references between objects as well.

I'm not sure what you mean by "the kind of organisational power I'm looking for".


Wasn't aware yaml could do this, but in terms of organisational power, basically, my prime example which i've only mentioned at a high level so far would be to record all my travel information, because every time i have to fill out a visa form that asks, "which countries have you visited in the last 10 years" i usually end up going through passports. But this is the start for these things, I will later want to make indexes for photo/video archives, etc.

but back to the example at hand:

if i wanna know which year i went to country x, boom! query

if i want to know where i went between 2005-2008, boom! query.

if i want to know when i used a certain visa (e.g: multi-entry visas) boom! query.

this gets difficult with flat files, not saying it's impossible with something like denormalised CSV, but i'd have to write my own scripts (or string together a frustrating sequence of bash commands) for these things.


> text files don't give the kind of organisational power i'm looking for

Somewhere, somewhen, someone put together something (a bunch of shell scripts?) that gave you some fraction of a database's power on plain text files.

(Somewhat fuzzy, I know ... wetware not competitive these days.)


partially responded to this on bjt's comment.

my original premise for sqlite was that it's in-place on the file (no daemon, etc), until this Ask thread I never knew sqlite had such longevity planned.

edit: I don't doubt it either, I could probably gun-to-my-head do this with csv's and bash kung-fu, but i'd rather not because it's a barrier to usage.


This may have been what may memory was dredging up - "fsdb, a flat-text database for shell scripting" : http://www.isi.edu/~johnh/SOFTWARE/FSDB/Fsdb-2.23_README.htm...


You can use whatever format you want if you also archive (1) the software needed to read whatever formats you use and (2) a computer that can run that software. A well-built computer should easily survive 30 years of non-use if stored properly.


This is a bit extreme for my use case, but I do agree that while possible, it's also risky - what happens if your basement with the computer floods or similar - where with a data-only strategy you can proliferate it pretty easily.


Sqlite database files can (rarely) become corrupt, which could cause you to lose the whole file. I'd lean towards text and individual files myself.


Most likely cause of data loss is probably accidentally deletion, so keep backups and make sure they work.


Don't use sqlite for that. Or any other application specific format that only one application can read.

There is no application that has a 30 years lifespan. Expect to already have massive issues in 10 years. If you depend on a single application, you're screwed.

You want a readable long term format, you use a good old txt/csv/xml. Whatever fancy computer may come up in 2050, it's guaranteed they will ship with applications to read that.

If needs be, gzip it to save space.


In other instances this would be my recommendation, but OP needs to somehow capture relationships between data-items in ways that referential integrity will be maintained: this is a non-trivial challenge, because it requires application logic to enforce that referential integrity, and therefore requires an application, not just a format alone.

If the requirement were merely to store some flat or hierarchical data, any run-of-the-mill file-based filesystem would fit the bill, and folder structure alone could be used to express relationships, but without simplifying the data model, this does not seem to be sufficient.

SQLite manages to provide this capability and logic, and its installed base is very large, and thus lessens the likelihood that it or compatible systems disappear outright. I believe for your usecase, leaning on SQLite is preferable to coming up with your custom format to express data relationships, and your custom code to enforce those relationships. But consider the points raised elsewhere: plain text remains partially recoverable if damaged, while binary files often exhibit more brittleness.


> There is no application that has a 30 years lifespan.

There are plenty of application that have had much longer lifespans.

Usually mainframe rather than microcomputer apps so far, but that's because 30 years ago microcomputers were fairly young.

(Anyway, SQLite format is a library-specific format of a library used in hundreds [thousands?] of applications, not an application-specific format.)


I intend to support SQLite for 33 more years. I take that commitment seriously and plan accordingly.

But even if my plans do not work out, the SQLite file format is fully documented and relatively simple. See https://www.sqlite.org/fileformat2.html for details. Writing a script to convert an SQLite database into a series of CSV files shouldn't take a competent hacker more than a few of hours and a novice more than a few days.

Reasonable estimates are that there are more copies of SQLite in use today than any library other than zLib. I dare say that SQLite will live as long or longer than gzip, and should both disappear, I suspect you will have a much easier time recovering content from an SQLite database than you would from a gzip file.


Oh wow, hello! * blush *

Serious (if redundant) question though: on year 34, is backward compatibility out the window?

(that said, 33 years is probably enough time to make the standard fairly permanent anyway, so even if the developers decide to break it, there will be enough residue that I don't believe the then-old standard could disappear)


Tsss. Amateur mistake.

It doesn't matter how long your brand will live, the problem is how long the version used to make the backup will live.

Let's make a backup with SQLite(Lib) 2017. Somewhen in 2030, we'll try to recover that backup.

First, we realize that SQLite[Lib] 2030 can't read it. Because there were subtle changes on the format and the applications/librairies binaries for whatever reason and it cannot read that file anymore (even though everyone guaranteed that it would be readable for the next 33 years).

So we look for a version of SQLite 2017 (or as close as we can get). Maybe we find one, maybe we don't. Let's say we find one... we try it and it doesn't work on Linux Redbian 2030.

So we need to find an old OS from 2017. Let's assume we find one... but it doesn't recognize the current Motherboard and it cannot run on Intel 96 bits processors.

So we gotta find a VM or an old computer in a garage...

And we'll stop there because it's endless.


> There is no application that has a 30 years lifespan.

You literally made me "chuckle out loud".


you can serialize the content of a sqlite db as a SQL file. ie. plain text. therefore, best of both worlds. text. plus you get rich structure and a query language, transactions...


This should be better advertised.

I've known about SQLite for fifteen or so years, and I hadn't read about that.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: