
SQLite as an Application File Format (2014) - kjeetgill
https://www.sqlite.org/appfileformat.html
======
anonymouzz
I wish more people would publish data as SQLite databases (if the size
permits, of course; usually it does). It's so much more reliable than CSVs,
which have at least a few dimensions of significant differences
(quoted/unquoted, comma vs semicolon vs tab vs space, headers/no headers,
comments). Not to mention that initial data exploration can be done right in
an SQLite explorer/browser tool.

~~~
_wmd
As much as I love SQLite, and while it is open source, it is a single
implementation that AFAIK has no published open specification. The only way to
read an SQLite file is using SQLite, and in that respect, it is for many users
just as closed as wrapping something in a word document.

CSV isn't perfect, but it provides a ton of flexibility, for example, CSVs can
be streamed or support parallel segmented download across a network with
useful work possible during the transfer. The format is so simple that it can
approach almost free to parse (see e.g. my own
[https://github.com/dw/csvmonkey](https://github.com/dw/csvmonkey) ).

CSV is also distinguished in that regular home users with spreadsheet programs
can usually do most things a developer can do with the same file. For me user
empowerment trumps all other goals in software, including warts. Things like
JSON, XML or SQLite definitely don't fit in that category, although I guess
SQLite is at least better due to the wide availability of decent GUIs for it.

Finally as a data transfer format, SQLite has the potential to be massively
inefficient. Done incorrectly it can ship useless indexes that can inflate
size >100%, and even in the absence of those, depending on how amenable the
data is to being stored in a btree and the access patterns used to insert it,
can leave tons of wasted space inside the file, or AFAIK even chunks of
previously deleted data.

~~~
darkpuma
> _" As much as I love SQLite, and while it is open source, it is a single
> implementation that AFAIK has no published open specification. The only way
> to read an SQLite file is using SQLite, and in that respect, it is for many
> users just as closed as wrapping something in a word document."_

That's an extreme position to take, particularly since the SQLite code is
public domain. Furthermore it's one of the formats recommended by the Library
of Congress for archival/data preservation:

[https://www.loc.gov/preservation/resources/rfs/data.html](https://www.loc.gov/preservation/resources/rfs/data.html)

[https://www.sqlite.org/locrsf.html](https://www.sqlite.org/locrsf.html)

~~~
_wmd
> The only way to read an SQLite file is using SQLite

This part unfortunately isn't a position, it's absolute. It's hard to imagine
a situation where as a developer we would not have access to a C runtime or
for any reason whatsoever would not be able to use SQLite, but the hard
dependency on its code is real, and represents a real hazard in the wrong
environment. A super easy example would be parsing data on say, a tiny
microcontroller on an IOT device. This can start to hurt quickly:

> Compiling with GCC and -Os results in a binary that is slightly less than
> 500KB in size

Open formats at least give you the option of implementing whatever minimal
hack is necessary to finish your job without say, introducing some
intermediary to do an upfront conversion, and at least for this reason SQLite
cannot really be considered a perfectly universal format

~~~
darkpuma
By 'position' I was _very obviously_ referring to

> _" as closed as wrapping something in a word document."_

Sure CVS makes it trivial to waste your time reinventing the wheel, making
your own parser. The situations were there are technical limitations that
prevent the use of sqlite are becoming vanishingly rare. (Not to mention the
resources necessary to use sqlite is unrelated to how many implementations
there are or whether it's 'open' or 'closed'.)

~~~
Skunkleton
1) For practical purposes the vast majority of platforms that are
inappropriate for SQLite are not appropriate for CSV.

2) SQLite has no standard. The same is true for CSV in practice. At least
SQLite has a high quality reference implementation.

3) It’s a shame that SQLite doesn’t have a standard of some sort.

------
derefr
Cool fact you might not know: SQLite has direct support for “SQLite archives”
(i.e. an SQLite database containing a conventional table of {file metadata,
BLOB data} records representing archived files) built into its CLI client.
This functionality lives under the `sqlite ar` subcommand, which exposes
similar command-line functionality to tar(1).

Read more at:
[https://www.sqlite.org/sqlar.html](https://www.sqlite.org/sqlar.html)

The nice thing about this is that, since this “sqlar” table is just one table
in the file, and the commands work whether or not other such tables exist, a
file can be an “SQLite archive” while also being a regular SQLite database
containing other tables at the same time. It’s sort of like how Fireworks used
to add special chunks to PNG—the file was still a PNG, but now it was _also_ a
Fireworks project. (But, in this case, the “chunks” aren’t opaque binaries,
but rather are SQL tables you can manipulate using regular SQL queries.)

And, of course, the SQLar “standard” isn’t all that complex—it’s designed so
that you can easily construct your own SQLite archive files just by issuing
regular DDL+DML queries through the SQLite binding of your language of choice.
(Though, if you want support for inserting files compressed—which the `sqlite
ar` command supports extracting—you’ll need a zlib binding as well.)

------
romwell
I have migrated a binary file format to an SQLite-based ORM schema at one of
my jobs. It worked out amazingly well, with (limited) forward and backward
compatibility for nearly free.

A lot of hairy if-else code was made unnecessary.

SQLite is IMHO the best option for data serialization/persistence/interchange
in many cases, especially when some data is a blob, or, say, a huge array of
floating-point numbers that you need to store with full precision (which makes
text-based formats clunky to deal with).

------
LeoNatan25
> Single-File Documents. An SQLite database is contained in a single file,
> which is easily copied or moved or attached. The "document" metaphor is
> preserved.

Is this still true by default? With write-ahead (wal) and shared memory (shm)
enabled, it is no longer safe to consider only the database file. Pragmas
exist to disable those, but people should be familiar of these features and
use with care. The document should be updated to introduce these concepts.

~~~
adzm
Why would you realistically need wal or shm for an application file format
anyway? That's for high concurrency in general.

~~~
gumby
The article explicitly mentions concurrent access.

------
tommoor
Sketch ([https://sketchapp.com](https://sketchapp.com)) actually used a zipped
SQLite database as it's file format until the past year.

~~~
glhaynes
Any idea why they changed?

~~~
tommoor
They changed to a zip of JSON files, I believe the main reason was likely
hackability – one of the main advantages Sketch has in the design ecosystem is
it's plugin architecture and the array of tools that exist around it.

If you make your file format easy to work with then it only widens this moat.

------
saagarjha
CoreData has an option to write to a SQLite database, so a lot of data on iOS
and macOS apps ends up being written to disk this way.

~~~
thechao
If you code in c/c++, and have even a modest OS/RTOS available, you should use
SQLite as your persistence layer, full stop. I switched to this methodology 5
years ago & never looked back. You know what happens when another engineer
gets your on disk format & asks how to look at it? You say “open it in SQLite
browser”; all further questions disappear.

~~~
romwell
Seconding this.

One thing though: last time I did that, I ended up writing my own
ORM/serialization library (with templates and all), but I wish I didn't have
to.

Is there anything freely available out there that would allow one to store
object hierarchies with the ease of boost::serialization, but into an SQLite
EAV scheme?

~~~
thechao
I always _trace_ rather than _log_. That means I have one table per ‘tracing
function’, and just record the arguments to each table. Recording an object
means recording the function calls that generated the object; regenerating the
object is just playing those functions back. I’ll admit that this works far
better in C than C++, as C++ tends to treat new types as opaque primitives.

~~~
trampi
Sounds like
[https://martinfowler.com/eaaDev/EventSourcing.html](https://martinfowler.com/eaaDev/EventSourcing.html)

------
tedunangst
I believe ZIP supports updates about as well as SQLite. You append the new
file to the end (and write the "header" after that). This leaves a hole in the
middle, but that can be reused by a later addition. There's some details about
internal page size and fragmentation, but I think it could amortize to about
the same. There's something to be said about quality of implementation, but
that's not an argument about the file format per se.

~~~
manigandham
How? There's a massive difference in usability and performance with a SQL
database and all the data querying and manipulation it provides compared to a
ZIP file that just straps together a bunch of compressed payloads with a
key/value file structure.

The linked page even has a case study of the OpenDocument file format that
shows exactly how SQLite is better than a Zip file:
[https://www.sqlite.org/affcase1.html](https://www.sqlite.org/affcase1.html)

~~~
tedunangst
If you're using SQLite as a pile of files, you're not doing much querying if
the compressed data inside.

> Newer machines are faster, but it is still bothersome that changing a single
> character in a 50 megabyte presentation causes one to burn through 50
> megabytes of the finite write life on the SSD.

That's funny, since Firefox would burn through gigabytes of writing to SQLite.
All comes down to quality of implementation.

~~~
manigandham
The point is that you don't just use it as a pile-of-files when you have SQL
access to the data and instead use a high-level data model. This is not
possible at all with ZIP.

------
qlk1123
When using kubernetes, I always wonder if we can transform all the tedious
"kubeconfig"s into a collection that can be easily queried, modified, saved
and retrieved, rather than maintinaing small text pieces in some directories.
More ambitiously, maybe include container/image metadata as well?

After reading this document, well, I think I will give it a try. It might not
be a perfect fit, but I don't quite like yamls.

~~~
tyingq
Lxd uses distributed sqlite for it's config:
[https://github.com/lxc/lxd/blob/master/doc/database.md](https://github.com/lxc/lxd/blob/master/doc/database.md)

And there's this simplified fork of K8S that replaces etcd with sqlite:
[https://github.com/ibuildthecloud/k3s/blob/master/README.md](https://github.com/ibuildthecloud/k3s/blob/master/README.md)

~~~
cyphar
It should be noted that the distributed SQLite is a patchset written by the
LXD folks. Last time I talked to hem they said they're working on upstreaming
it but want to be sure that everything is sane (SQLite has crazy backwards-
compatibility requirements, not to mention the testing).

------
qwerty456127
Yeah, I have always felt this is a great idea for almost anything that is not
really so small and simple that plain text would make more sense. BLOB-by-
nature data like multimedia, pictures etc come as the only reasonable
exception.

~~~
sjwright
The third great file format (after text and SQLite) is the zip file with
custom suffix. That is ideal for BLOB-by-nature data.

~~~
qwerty456127
I didn't really study the formats myself but as far as I know 7z container
format is a way better than zip.

~~~
sjwright
Very likely true however zip has somehow become a de-facto go-to standard for
this stuff. From Java (JAR files) to Microsoft (.docx et al) and numerous open
formats including ODF and EPUB.

------
zubairq
Using sqlite as our format has worked well for us at dannea. It also means we
can have web only documents since sqlite can be embedded in a web page

~~~
tenken
> since sqlite can be embedded in a web page

howso ? ... beyond querying the database server-side ...

~~~
lioeters
Perhaps they're using SQLite compiled to JS via Emscripten:
[https://github.com/kripken/sql.js/](https://github.com/kripken/sql.js/)

~~~
zubairq
Yes that is correct, at Dannea we use sqlite compiled to JS. We can even load
the data from the same Html page so it allows us to have sqlite powered web
apps which can be emailed to others and is true server less

------
phreack
I've used SQLite for a distributed project lately where different offline
teachers in rural schools would back up their app data to a thumb drive as
file databases, take it back to a town with connectivity and upload that data
to the app server, so they'd get extra functionality and updates. It was
incredibly straightforward to do and worked wonders!

------
Const-me
It's a good idea unless you're exchanging these files. I'd love couple of my
clients switch from HDF5 to SQLite, HDF5 is horrible..

But if you exchange them, it's too easy for app developers to misuse the
library causing user's data to be leaked to third parties, in free DB pages,
and even in the way how exactly the DB is fragmented.

~~~
romwell
What you suggest sounds like security by obscurity.

~~~
naniwaduni
Based on the mention of free DB pages, I suspect the issue is information
leaking because it isn't scrubbed after being deleted, so that it's still in
the db file when it's sent off in a third party, just not "accessible" via
SQLite APIs. For example, a redaction procedure which consists of deleting one
table that contains all the sensitive data doesn't necessarily _erase the
data_; you'd probably want to VACUUM or reconstruct the database from a dump.

Moreover, it's a general issue with file formats that can encode the same data
in multiple ways that, in addition to the data itself, the data file also
incidentally encodes information about the process by which the data was
encoded. This metadata is usually ignored and abstracted out by APIs for
working with the data, so application developers tend to overlook it as a
place where sensitive data can leak.

For comparison, you might consider fingerprinting API clients based on the
order in which they send HTTP headers or keys in JSON objects, which can often
be correlated with language or library versions in environments where the most
convenient map structure is an arbitrarily ordered hash.

It's not necessarily easy to think of nefarious uses for this sort of
information, but SQLite database fragmentation can reveal, albeit in rather
rough detail, the order in and frequency with which changes were made.

Quite the opposite of "security by obscurity", in fact.

~~~
justinclift
The SECURE_DELETE pragma sounds like it'd be useful for this:

[https://sqlite.org/pragma.html#pragma_secure_delete](https://sqlite.org/pragma.html#pragma_secure_delete)

Needs to be manually enabled though, as that doc says it's generally not
enabled by default. :/

~~~
Const-me
This indeed helps but only partially. Will still leak data in the way the DB
is fragmented.

E.g. if you have a text document with 1 row per paragraph, fragmentation will
reveal where it was edited, and what kinds or edits were made. You insert a
bunch of images into the document, and the physical order of pages will reveal
in which order they were inserted, even if you re-arrange them afterwards.

It's possible to workaround by careful programming, but still, very easy to
screw up and leak data.

OTOH, with zipped XMLs people normally use instead, you overwrite the complete
file each time, it's not fragmented and hopefully doesn't contain any extra
info.

------
henridf
For those wishing to see Sqlite used to publish data sets:

Datasette
([https://github.com/simonw/datasette](https://github.com/simonw/datasette))
is a new tool to publish data on the web. It uses SQLite under the hood.

------
hyperpallium
Why not a text serialization format? That SQLite can read/write.

e.g. tables and schema in JSON.

A table as a list of row-objects, of attributes. (or more efficiently, an
object of lists. Or, following the achema, just a list of rows, each a list of
attribue's values).

------
nafey
I was thinking of using SQLite for creating game saves. Does anyone have
experience with that? The more I think about it the more I wonder why we don't
see SQLite in games more often.

~~~
justinclift
It has been used for game saves before. Not sure how commonly, but we
(sqlitebrowser.org) do have links to our website from various places giving
tips on how to change things (in the saves). ;)

------
MikusR
[https://news.ycombinator.com/item?id=18685296](https://news.ycombinator.com/item?id=18685296)

~~~
oconnor663
I'm surprised this is the only comment in the thread that mentions this
security issue. It's only a week old!

------
hestefisk
I have done this on a large .NET Win32/XAML client / server application in two
ways:

1) Saving application state and config/settings (window layout, ... very easy
to marshall XAML stage to an SQLite file) 2) Caching of common data sets to
avoid unnecessary polling to the server of data that is accessed all the time
(ie customer master data)

It works beautifully.

------
ramraj07
What if I need to store binary data like images or PDFs? Is storing them as
blobs an acceptable method?

~~~
combatentropy
In SQLite, a database file can be 140 terabytes,
[https://www.sqlite.org/limits.html](https://www.sqlite.org/limits.html)

And reading and writing can be even faster than if they were stored directly
in the filesystem,
[https://www.sqlite.org/fasterthanfs.html](https://www.sqlite.org/fasterthanfs.html)

~~~
qwerty456127
Why don't we replace entire filesystems with SQLite then?

~~~
darkpuma
> _" This file contains some example code demonstrating how the SQLite vfs
> feature can be used to have SQLite operate directly on an embedded media,
> without using an intermediate file system."_

[https://www.sqlite.org/src/doc/trunk/src/test_onefile.c](https://www.sqlite.org/src/doc/trunk/src/test_onefile.c)

[https://www.sqlite.org/vfs.html](https://www.sqlite.org/vfs.html)

IIRC, during one of his talks D. Richard Hipp mentions this actually being
done for real.

------
geocar
You can't stream SQLite files.

~~~
SQLite
I'm not sure what you mean by "stream" here. However, there are two APIs
designed to facilitate sending SQLite database files over a socket. The
sqlite3_serialize() interface
([https://www.sqlite.org/c3ref/serialize.html](https://www.sqlite.org/c3ref/serialize.html))
takes a database and converts it into a blob of contiguous memory that can be
sent over a wire. The sqlite3_deserialize() interface
([https://www.sqlite.org/c3ref/deserialize.html](https://www.sqlite.org/c3ref/deserialize.html))
does the opposite, accepting a blob and using it as a database file. These
interfaces were created for the purpose of "streaming" database files across
network connections. Depending on your specific needs, they might work for
you.

~~~
shaklee3
I'm assuming they mean some parts in the file require parts that appear
farther on, so you'd have to receive the whole file?

------
revskill
Yaml is stupid. JSON is fine, because it could be `import` from javascript as
a real javascript object. In general, make any configuration as a Javascript
Object and you can program it whatever you want.

~~~
dagenix
1\. "YAML is stupid" is not constructive. Why is it "stupid"?

2\. One place where YAML is nice is if you want to have a user-edited file.
JSON can be a pain to edit by hand due to a lack of comments and being so
picky about where commas can and cannot go - YAML isn't perfect, but, it does
help with issues like that.

3\. There are languages outside of Javascript, and it's not possible to import
a javascript file directly from most of those languages.

4\. There is a difference between the file that you use to configure an app -
for which JSON or YAML may be a good fit - and the format for which you export
data - where JSON or YAML may or may not be a good fit.

5\. JSON has some very real issues with different parsers not implementing the
JSON corner cases the same way - which doesn't really matter, until you hit
one of those cases and then it really matters.

6\. JSON doesn't support streaming access to file data - it's not impossible
to process a JSON file incrementally, but, it's hard to do as JSON is easiest
used when you can load the whole thing into memory.

~~~
hliyan
I've been trying to decipher this irrational hatred of YAML coming from a
certain subset of developer for years. I've only three or four data points so
far, but all of them were very gifted, opinionated developers who were unable
to explain their hatred.

Recently, it began to dawn on me that _maybe_ the problem is one of visual
acuity required for discerning whitespaces. I always assumed that just because
I could visualize a YAML hierarchy just as well as I could JSON, and could
edit YAML without messing up the indentation, that others could too. But I've
witnessed highly capable developers bang their heads against the keyboard
trying to fix repeated parsing errors in YAML they've written. Perhaps not
everyone can pick up whitespaces quite so easily...

~~~
ojosilva
My experience failing with YAML: I've converted a visual workflow tool into a
YAML file mode so I could give my users a file format they could edit, share
and version with any text editor and commit to a git repo.

The result was that they were at a loss trying to figure out the free-form map
and array structures. They could not grasp when to use dashes (arrays) mixed
with maps (array of maps) and would fail to indent structures all the time.
Trying to create meaningful error reporting was very hard since their mistakes
were also valid YAML.

YAML is definitely powerful but can become too complex too fast. And being a
format full of sigils and its TIMTOWTDI "there's more than a way to do it"
approach can easily enrage lean purists and complexity detractors, hence the
hatred.

------
xte
Mh,

> Simplified Application Development

It's depend "application development" is a vast area...

> Single-File Documents

Not much related to SQlite, can be or cannot be done with nearly any format
you like. Also remember as a good advise how worse go Catia v6 R2009 (Catia is
the best CAD/CAE/CAM suite in the world, used anywhere from automotive to
aerospace&defence, it's sole real competitor are Creo (formerly Pro-e) and Nx)
when it start to offer single-file PLMs for "light usage"...

> High-Level Query Language

Depending on the format you choose you may have the best/highest level query
language: your own apps programming language itself, so IMO may not be an
SQLite plus at all

> Accessible Content

Any binary content is FAR LESS accessible than text

> Better Applications

Sorry, it have NO mining at all.

I like SQLite for many aspects but I certainly not look at it as a file format
for most applications I use or imaging, for some usage is really nice, for
others usages {Tokyo,Kioto}Cabinet is another nice format, for MANY other
usage text files in various format, binary files in various formats (including
classic pickle) may be better...

