
SQLite small blob storage: 35% Faster Than the Filesystem - i_feel_great
https://www.sqlite.org/fasterthanfs.html
======
rusanu
I must point out Jim Gray's paper To Blob or Not To Blob[0]. His team
considered NTFS vs. SQL Server, but most rationale applies to any filesystem
vs. database decision.

The summary was "The study indicates that if objects are larger than one
megabyte on average, NTFS has a clear advantage over SQL Server. If the
objects are under 256 kilobytes, the database has a clear advantage. Inside
this range, it depends on how write intensive the workload is," but keep in
mind this is spinning media from 2006. Modern SSDs change the equation quite a
bit, as they are much more friendly to random IO and benefit less from
database write-ahead log and buffer pool behavior.

Also, when deciding between blob vs. filesystem, blobs bring transactional and
recovery consistency. The DB is self contained, and all blobs are contained in
it. A restore of the DB on a different system yields a consistent system, it
won't have links to missing files, and there won't be orphaned files left over
(files not referenced by records in DB).

Despite all this, my practical experience is that filesystem is better than
blobs for things like uploaded content, images, pngs and jps etc. Blobs bring
additional overhead, require bigger DB storage (more expensive usually, think
AWS RDS) and the increased size cascades in operational overhead (bigger
backups, slower restore etc).

[0] [https://www.microsoft.com/en-us/research/publication/to-
blob...](https://www.microsoft.com/en-us/research/publication/to-blob-or-not-
to-blob-large-object-storage-in-a-database-or-a-filesystem/)

~~~
patio11
My experience (for an application which had a working set of under 1 GB of
files in the 50kb to N MB range, and approximately 50 GB persisted at any
given time) was that preserving access to the toolchain which operates
trivially with files was worth the additional performance overhead of working
with the files and, separately, occasionally having to retool things to e.g.
not have 10e7 files in a single folder, which is something that Linux has some
opinions (none of them good) about.

Trivial example: it's easy to delete any file most recently accessed more than
N months ago [+] with a trivial line in cron (the exact line escapes me -- it
involves find), but doing that with a database requires that you roll your own
access tracking logic. Incremental backups of a directory structure are easy
(rsync or tarsnap); incremental backups of a database appeared to me to be
highly non-trivial.

[+] Since we could re-generate PDFs or gifs from our source of truth at will
(with 5~10 seconds of added latency), we deleted anything not accessed in N
months to limit our hard disk usage.

~~~
troutaway123
¿Porque no los dos?

Store the files in a database but expose them via FUSE.

~~~
pjc50
That gives you the time cost of a filesystem plus the time cost of a database
plus an extra round-trip in and out of kernel space.

~~~
troutaway123
> preserving access to the toolchain which operates trivially with files was
> worth the additional performance overhead of working with the files

Sounds like they're OK with that.

In any case, couldn't you avoid the kernel trips with some dylib foolery
(assuming the toolchain is dynamically linked)?

~~~
firethief
So: store the files in a database, expose them via FUSE, bypass FUSE and
access the database?

~~~
troutaway123
Yep ;D

------
craigds
For web map tiles (millions of tiny PNGs), everyone who's anyone stores their
tiles in sqlite rather than on disk: [https://www.mapbox.com/help/an-open-
platform/#mbtiles](https://www.mapbox.com/help/an-open-platform/#mbtiles)

~~~
asmosoinio
Interesting and very relevant to the discussion!

Hmm... Does SQLite have some geo data or 2d coordinate lookup capabilities as
well?

~~~
shawn-butler
Try SpatialLite[0] if you need advanced geometry capabilities. Has R-Tree
spatial indexes if you use SQLite 3.6+, and rudimentary MBR index for prior
versions. Has rudimentary support for curves as well.

[0]: [https://www.gaia-gis.it/fossil/libspatialite/index](https://www.gaia-
gis.it/fossil/libspatialite/index)

~~~
justinclift
As a data point, one of our users figured out how to load the SpatiaLite
extension for DB Browser for SQLite (on Windows), then wrote up step by step
instructions:

[https://github.com/sqlitebrowser/sqlitebrowser/wiki/SpatiaLi...](https://github.com/sqlitebrowser/sqlitebrowser/wiki/SpatiaLite-
on-Windows)

(people had been asking previously, but we had no idea. ^\o/^)

------
tarasglek
This is weird benchmarketing. They are comparing reading/writing 100,000
individual files vs writing 100,000 entries into a single file(sqlite
database). For comparison one could concatenate the same data into a single
big file even faster than into sqlite.

They then do not offer logical analysis as to why things are faster.

My understanding is that reads are probably faster due to operating system
readahead being able to predict reads better when they are within a single
file. Writes are faster because they do one bulk COMMIT instead of many
individual fsyncs.

~~~
imron
> For comparison one could concatenate the same data into a single big file
> even faster than into sqlite.

One could, but then one would have great difficulty retrieving the individual
files back when needed.

The point is not about what has the greatest raw speed, the point is that that
for applications that read lots of small files from the filesystem, they'll
possibly get better performance and almost certainly use less disk space by
sticking the files in SQLite instead.

~~~
nimchimpsky
> have great difficulty retrieving

Not that great, and the point is its comparing apples to oranges, pointless.

~~~
function_seven
Writing your own methods to retrieve arbitrary chunks of data from a
monolithic file would be a lot of work. Using map tiles as the example, how do
you easily retrieve just the tiles in a specific region? Ok, how about all
tiles that have the “hasLand” attribute set? Or the “containsCoastline”
attribute?

You’d end up rewriting your own version of a database.

~~~
inetknght
> Writing your own methods to retrieve arbitrary chunks of data from a
> monolithic file would be a lot of work.

Oh goodness have we come a long way if interacting with a file is harder than
interacting with a SQL database.

(it really depends on what language and/or library(ies) you use and for what
purpose)

~~~
imron
> Oh goodness have we come a long way if interacting with a file is harder
> than interacting with a SQL database.

It's not interacting with a file, it's interacting with a file that contains
other files of varying sizes that need to be accessed randomly with good
performance.

~~~
diroussel
If you only want to read then both zip and tar are good for this purpose. It's
commonly used and known to work well. There are many formats that are really
just a renamed zip file. Like jar, and epub. File entries don't have to be
compressed so you can mmap.

If you want to do random writes then more consideration is needed.

~~~
imron
> If you only want to read then both zip and tar are good for this purpose

And programmatically accessing them is significantly more complicated than
using SQLite which involves dropping a single header in your project and about
10 lines of code.

> If you want to do random writes then more consideration is needed.

Which you almost certainly what you want to do for the sorts of use cases
where SQLite is also under consideration.

~~~
dkersten
_which involves dropping a single header in your project and about 10 lines of
code._

10 lines of user code + the sqlite library. Using other formats like zip is
hardly any more user code (+ some external library).

~~~
imron
The sqlite 'library' is a single header file and a single c file that you can
just drop in your project and use.

~~~
dkersten
So are many libraries for other formats.

Btw I'm personally a big fan of SQLite, I just think the argument here is a
bit of a straw man.

------
dsacco
_> So let your take-away be this: read/write latency for SQLite is competitive
with read/write latency of individual files on disk. Often SQLite is faster.
Sometimes SQLite is almost as fast. Either way, this article disproves the
common assumption that a relational database must be slower than direct
filesystem I/O._

This is under the _1.1 Caveats_ heading, which makes me feel the title is _a
little_ misleading (but only in the way most benchmark headings probably are,
I guess).

Incidentally, can someone more experienced with filesystem and database I/O
confirm or contest the assertion here? Specifically, I'm not sure it's fair to
generalize these results (even if valid) to categorical relational databases.
But this is not a special area of expertise for me.

~~~
peterwwillis
You're asking whether a database can read/write records faster than disk? Yes.
The opposite is also true. I'm not an expert in either fs or db, but it's
obvious that the design and use of one is not the same as the design and use
of the other, and given the right circumstances one will do a particular job
better than another.

An append-only transaction log is probably going to perform better on a
spinning disk than random writes. An intelligent app sorting and committing
writes in order is probably going to perform better than an fs on a disk with
no queue ordering facility. As they said in the post, block-aligned writes in
bulk are going to be more efficient than spreading out a bunch of writes not
block-aligned. And latency for a db write is going to be lower if you don't
close an fd, assuming that closing an fd would always trigger an fsync.

But a properly tuned filesystem, with the right filesystem, with the right
kernel driver, with the right disk, with the right bus, with the right control
plane, etc etc may work fantastically better than an unoptimized database, and
a given workload may fit that fs perfectly.

It's important to remember that a benchmark is only meaningful to the person
who is running the benchmark. If you want it to be meaningful to you, you have
to try it yourself (and then presumably blog about it).

------
rusanu
From OP: "SQLite is much faster than direct writes to disk on Windows when
anti-virus protection is turned on. Since anti-virus software is and should be
on by default in Windows, that means that SQLite is generally much faster than
direct disk writes on Windows."

I don't get this. If scanning the content is important (as acknowledged by the
author), then bypassing the scan via blob storage is a security issue and the
application _should_ go through some extra hoops to scan the content before
saving it to blob, and this should be measured and part of the comparison.

Also, if the SQLite files are exempt from AV scan, then the level field should
also exempt the uploaded files folder in test. I mean, knowing the dice are
loaded and then claiming it as an advantage does not seem professional.

~~~
Dylan16807
Data chunks inside another file are effectively quarantined by default. There
is no exempting or security issue, it's safe by design. It will get scanned
occasionally but not on every write.

~~~
45h34jh53k4j
Except when they are not. Just because malcode is encapsulated doesn't mean it
is neutralised. The issue is untrusted data, not how it is stored.

Word Docs with embedded content or can trigger code execution on parsing, data
structures that break AV parsers, image or font data that causes system
libraries to overflow/UAF/etc; There all could potentially be a security issue
in a DB, the same as in a FS.

~~~
Dylan16807
You don't need to worry about a single process writing data to a file and then
reading it back and performing malicious actions. That process could just as
easily have been exploited entirely in-memory. Scan the database when it exits
and you're fine.

------
cyberferret
Interesting. We built a couple of systems many years ago using SQLite as the
storage engine for binary files - one was a kind of mini version control
utility for storing report files for a legacy ERP system.

I always thought that the SQLite system was far quicker to search and retrieve
files from the dataset than it was doing so in our previous version, using
hierarchical file folders.

Interestingly, we canned the project because the ERP vendor themselves release
a similar tool using - wait for it - individual files in a folder system...
(Addendum - much later on they switched to - wait for it - CVS for version
control.... in 2010!!)

------
Const-me
SQLite is good when you’re mostly reading.

For writing, the major drawback of SQLite is it doesn’t support concurrent
writes.

All filesystems do (at least when writing different files), all full-fledged
RDBMS-es do, even some embedded databases do (like ESENT). Yet, in SQLite only
a single thread can write.

Even embedded chips are multicore these days…

~~~
JBReefer
Isn't that because SQLLite's whole use case is "simple, small, fast database
that competes with fopen() and not Postgres"? At least in my experience on
NTFS, unless you know the section that 2 threads are writing to, and that
section is different, you can't write to a file concurrently (N.B.: in C#)

I'd be happy to be wrong (seriously that would make my day if someone could
explain a clean way to do the above) but if I want to get "big boy database
features" then I'd use a heavy database like SQL Server. I think it's great
that there's an option like SQLLite out there for rapid prototyping and "store
all your music metadata" type applications!

~~~
Const-me
> simple, small, fast database that competes with fopen() and not Postgres"

The hardware evolved in a way so even very small systems (phones, embedded,
Raspberry Pi and other IoT devices) are now multi core. Multithreading is
required to benefit from a multi-core CPU. For IO-heavy tasks, it’s sometimes
a good idea to implement multi-threaded IO as well.

> you can't write to a file concurrently

I can concurrently write different files. Or to different
streams/blobs/records for most other IO methods, except SQLite.

------
WaxProlix
Consistently happy with SQLite, just a lovely little database. Performance is
basically a cherry on top of the simplicity it provides.

------
cassandra1024
"To blob or not to blob" \- a paper by Microsoft Research from 2006.
[https://www.microsoft.com/en-us/research/publication/to-
blob...](https://www.microsoft.com/en-us/research/publication/to-blob-or-not-
to-blob-large-object-storage-in-a-database-or-a-filesystem/)

------
flohofwoe
Storing many small files into few (compressed) bundle files is common since at
least the 90's for storing game assets, but usually not with SQLite, but some
sort of compressed format (simple zip archives are quite common but there are
faster alternatives now). The bundling gives you better overall compression,
_and_ faster read performance compared to individual files (at least on
Windows) as long as you open the file only once, and from then on only seek
and read.

SQLite is a _big_ code base, using a simple archive file format instead gives
you most of the advantages, but without the bloat.

PS: This is mostly for read-only situations though. Using SQLite probably
starts to make sense when the applications needs to write to and create new
files in the archive.

~~~
rusanu
But that would work only for static content, like the game assets. The
discussion of BLOBs vs. filesystem comes up mostly in the context of content
management and user/app uploaded content.

~~~
flohofwoe
Yes you are right. I wrote the 'PS' before I saw your comment. Apologies :)

------
mpweiher
Not surprised by the results, very surprised by the explanation given, the
overhead of open/close calls.

As far as I know (from many measurements and talking to kernel people and
researching the mechanism involved), the difference is due to the fact that
buffers are shared between all pages of a single file, and not between files.

So for reads, the filesystem will do read-ahead of significantly more data
than requested and keep that in buffers, and future reads will profit. Similar
for buffers shared when writing.

The same effect will be reproducible with any format storing multiple objects
in a single file, it has virtually nothing to do with "SQLite" or "Databases".

One tradeoff is the greater potential for inconsistencies, which despite all
the measures taken is much greater when you modify a file rather than writing
only completely new files. Another is the inconvenience and duplication of
effort, because all your default file-system management tools aren't
available.

It would be interesting to see if a pseudo-filesystem that is mapped to a
single underlying file would show the same effects (preferably implemented in
user space to avoid overheads of multiple kernel round trips).

~~~
wyoung2
> The same effect will be reproducible with any format storing multiple
> objects in a single file

Given unbounded developer time, any advantage SQLite has here can of course be
matched or beaten by custom code. It's just another C library, not magic.

The real issue is, how much work will it take you to do that?

SQLite is billed as competing with `fopen()`. That's the proper way to compare
it: given equal development time spent talking to the C runtime library vs.
integrating SQLite, how much speed and robustness can you achieve?

> the greater potential for inconsistencies

How? In this application, SQLite is filling a role more like a filesystem than
a DBMS or `fopen()` call. You can corrupt a filesystem just as easily as a
DBMS. SQLite protects itself in much the same ways that good filesystems do,
and it can be defeated in much the same sort of ways.

> Another is the inconvenience and duplication of effort, because all your
> default file-system management tools aren't available.

It's a classic tradeoff: do you need the speed this technique buys or not?

A better reason to avoid this technique is when the files you're considering
storing as BLOBs need to be served by a web server that uses the `sendfile(2)`
system call. There, the additional syscalls caused by the DBMS layer will
probably eat up the speed advantage.

Software development is all about tradeoffs. No technique or technology is
perfect for everything.

Now you know one more technique. Maybe it will be of some use to you someday.

> a pseudo-filesystem that is mapped to a single underlying file

That pretty much describes SQLite. Both SQLite and a good filesystem use tree-
based structures to index data, both have ways to deal with fragmentation,
both have ways to ensure consistency, both strive for durability, etc., etc.

> preferably implemented in user space to avoid overheads of multiple kernel
> round trips

How are you going to avoid kernel round trips when I/O is involved?

If you think user-space filesystems are fast, go try FUSE.

~~~
mpweiher
> how much work will it take you to do that?

Dunno, use a zip library?

> [inconsistencies]

No, SQLite doesn't have the same tools available as filesystems. It is very
good, but not quite as good.

> [Pseudo FS mapped onto a single file] That pretty much describes SQLite.

Not really. For one, it doesn't provide a filesystem-like interface.

>> avoid overheads of _multiple_ kernel round trips

> How are you going to avoid kernel round trips when I/O is involved?

Added emphasis.

> If you think user-space filesystems are fast, go try FUSE.

Exactly, because you tend to make _multiple_ round trips. If you can flatten
that to just one, you avoid those overheads.

------
Johnny555
_SQLite is much faster than direct writes to disk on Windows when anti-virus
protection is turned on. Since anti-virus software is and should be on by
default in Windows, that means that SQLite is generally much faster than
direct disk writes on Windows._

Isn't this because writing to a SQLite database file bypasses most (or all) of
the antivirus file scanning since it's can't see a complete file, and can only
look at raw blocks of data?

So if you find value in having your antivirus scan all of your files, that's a
disadvantage for using SQLite to store them?

------
TekMol
The reason I still use files rather then SQLite is that I don't know how
SQLite handles concurrency.

For example I have a PHP app that is used by 10k users a day and it happily
handles 100k tmp files in a single directory. On each request, it checks the
file age via filemtime() and if new enough, includes the tmp file with a
simple include(). (I write PHP arrays into the tmp files). If too old, it
recalculates the data and writes it via fopen(), fputs() and fclose().

This migh be archaic but it has been working fine for years and never gave me
any problems.

Somehow I would expect that if I simply replaced it with SQLite, I would run
into concurrency problems.

~~~
wyoung2
> I don't know how SQLite handles concurrency.

Why remain in ignorance? There are several articles in the SQLite
documentation that address concurrency:

[https://www.sqlite.org/wal.html](https://www.sqlite.org/wal.html)
[https://www.sqlite.org/lockingv3.html](https://www.sqlite.org/lockingv3.html)
[https://www.sqlite.org/whentouse.html](https://www.sqlite.org/whentouse.html)

> Somehow I would expect that if I simply replaced it with SQLite, I would run
> into concurrency problems.

SQLite wouldn't really be ACID-compliant if multiple threads were enough to
defeat the Durability guarantee, would it?

That's not to say that there are no concurrency problems in SQLite, but that
they're more in the way of potential bottlenecks than data corruption risks.
For a reader-heavy application like yours, I suspect SQLite will perform just
as well or better than your existing solution.

If you're solely after speed, I'm not sure it would be worth rewriting your
app to use SQLite. This technique's value is simply in the benefit it gives
when you were already going to use SQLite for some other reason. If you just
want a 35% speed boost, wait a few months or buy a faster SSD. Both are going
to be easier and cheaper than rewriting the data storage layer of your
application.

That said, maybe there are other things in SQLite that you could use. Easy
schema changes, full-text searching, more advanced indexing than the
filesystem allows, etc. If you go for one of those, then the extra speed is a
nice bonus.

~~~
TekMol

        SQLite wouldn't really be ACID-compliant if
        multiple threads were enough to defeat the
        Durability guarantee, would it?
    

I'm not so much concerned about durability. More about SQLite not responding
to "SELECT v FROM t WHERE id=123" with value v but instead with something like
"Error: v is currently being written by another process. Try again later" or
something.

No idea if that is a realistic scenario. I'm kind of surprised I don't have
this kind of problem with my filebased solution. What happens if process A
reads from a file while process B writes it? No clue.

~~~
wyoung2
> What happens if process A reads from a file while process B writes it?

If process B started first, the writer blocks access to the table being
written to, so the reader waits for the writer to complete. There are timeout
and retry behaviors, but within those configurable limits, that's what
happens.

You can make SQLite behave as you worry about if you set the retries to 0 and
timeout to 0, but that's not the default.

------
codewiz
_35% faster than the same blobs can be read from or written to individual
files on disk using fread() or fwrite()._

How about reading efficiently with unbuffered I/O directly on the file
descriptor?

------
wyc
I know that SQLite performs exceptionally on embedded devices including most
phones, IoT devices, and more. However, for better or worse, it's a flat file.
Does anyone know of a TCP/IP-speaking SQL database that would work well on an
embedded device? PostgreSQL/MariaDB seem kinda heavy, and the net couplers for
SQLite look pretty unsupported.

~~~
Quarrelsome
and what exactly do you think any database is at the end of the day? It's not
magic, it all ends up on the disk as a file. I mean why is a page file called
a FILE?

~~~
wyc
Sure, we can agree that (almost) everything's a file. However, through their
related programs and daemons, we can ask files for more useful functionality
such as indexes, versioning, and yes, socket communications.

------
YZF
What about mixed delete/write/read workloads? How well does SQLite deal with
fragmentation?

~~~
wyoung2
> What about mixed delete/write/read workloads?

The source code to the test program is part of the SQLite source tree. It
should be fairly easy to modify it to test the scheme you have in mind.

> How well does SQLite deal with fragmentation?

SQLite operates a lot like a modern filesystem: tree-based indexing
structures, writes go to empty spaces, deletes leave holes that may later be
filled by inserts, etc.

SQLite has active mechanisms to reduce fragmentation on the fly: b-tree page
defragmentation
([https://www.sqlite.org/fileformat.html](https://www.sqlite.org/fileformat.html))
and auto-vaccuum
([https://www.sqlite.org/pragma.html#pragma_auto_vacuum](https://www.sqlite.org/pragma.html#pragma_auto_vacuum))
being the main ones. If the DB file gets too fragmented, you can run a manual
VACCUM
([https://www.sqlite.org/lang_vacuum.html](https://www.sqlite.org/lang_vacuum.html))
on it.

~~~
YZF
Sounds good. I might give it a test in our product to replace some file based
storage scheme.

------
_pmf_
> All filesystems do (at least when writing different files)

The article is about writing 100000 files, which will not scale well (since
there will be massive contention at the directory inode).

------
ComodoHacker
Can we draw a side conclusion from these measurements that open()/close()
overhead on Win10 is significantly higher than on Win7? This seems doubtful.

~~~
wyoung2
See section 2.1 in the report: the Win10 and Win7 installations are on
different hardware.

If you have a dual-boot Win10/7 box, it's easy to run the test yourself.

------
JepZ
So anybody knows where I can get this blazing fast SQLiteFS? ;-)

------
bedros
it's basically storing in memory vs storing on a disk

it's not sqlite vs filesystem it's memory storage vs disk storage

~~~
jdmichal
Are you stating that the SQLite database did not write the files to disk? Or
simply that it's caching them in-memory? In the latter case, doesn't an
operating system typically offer in-memory file buffering / caching?
Otherwise, why would we need O_DIRECT or O_SYNC or fsync()?

~~~
bedros
I'm no expert in FS or sqlite, but as everyone knows, filesystems are not good
processing small files, having an application that bundles in memory small
blobs and then flush them to disk as a single file, is what make it faster.

ReiserFS 4 was (I believe still is) the only FS that improves small files
performance by bundling small files and stored them on disk as a single blob.

as for that person who downvoted my original comment, I'd love to hear from
you why you think I'm wrong. you don't just downvote comments because you
disagree.

~~~
jdmichal
I do think your original claim is wrong. Your explanation in this response is
not the same as your original claim. And what you wrote here is exactly in
line with claims in the article:

"The size of the blobs in the test data affects performance. The filesystem
will generally be faster for larger blobs, since the overhead of open() and
close() is amortized over more bytes of I/O, whereas the database will be more
efficient in both speed and space as the average blob size decreases."

Yes, of course SQLite is going to do some operations in-memory. Raw file
writes using fwrite are also going to be initially written in-memory, unless
O_DIRECT or some other mechanism is involved. And the article explicitly
outlines that they made no effort to bypass file buffering, to the point of
not even explicitly flushing to disk.

If both processes are writing the BLOBs to disk, then I don't see how your
original claim applies. Writing to disk in a manner that is more efficient for
small files is still writing to disk, and so the test is not in-memory vs on-
disk as your original claim.

~~~
bedros
sqlite works in memory most of the time, on disk sometime when flushing,
faster

filesystem, works on disk most of time, in memory some time (some buffering),
slower

which makes my original comment true,

------
jstimpfle
This page does not live up to the standards of the other high quality
technical articles on sqlite.org. I couldn't even find a reference to the file
system used for comparison. But the whole setup is a farce anyway.

~~~
eridius
They tested it on 5 different OSes, which presumably included 3, probably 4
different filesystems (Windows, Mac, and Ubuntu would be using different ones,
and I'm guessing the Ubuntu system has a different filesystem than Android
too, though I don't actually know), so it doesn't really seem like it matters.

And why is the whole setup a farce anyway?

~~~
jstimpfle
Of course it matters. Different file systems have different performance
characteristics. Make sqlite into a file system, and I'm pretty sure you can't
beat it by putting another sqlite on top.

~~~
eridius
When the author has tested 3 or 4 different filesystems against SQLite (and at
least 2 of them you can be pretty sure as to what they are simply based on the
OS, with a good guess at the other 2 as well), and SQLite beats them all, then
it really doesn't matter.

As for turning SQLite into a filesystem, that's not really going to work. SQL
isn't designed to support things like cheap "file" appends or reading only
portions of a value or seeking or anything like that, so you'd end up having
to read the entire value for any read, and write a new copy of the entire
value for any write, and your performance would be _really really bad_. So
yeah, putting SQLite inside of SQLite isn't going to work, because SQLite
isn't a filesystem and wasn't designed to behave like one. Not to mention this
entire article is about small blob storage, and embedding a SQLite database
inside of SQLite isn't a small blob.

~~~
jstimpfle
As for benchmarking multiple file systems with consistent result, point taken.
To a degree. Some file systems optimize for directory lookups, and some don't.

As for turning SQLite into a filesystem, we could make it into a file system
that would be fast in this particular benchmark. Right?

------
copenja
I mean, is SQLite not also using the filesystem? Maybe I'm confused here.
Seems like saying using C is 15% faster than ASM. But that includes user
error, right?

~~~
mcpherrinm
The article is basically entirely an answer to that question.

The third sentence in particular is:

The performance difference arises (we believe) because when working from an
SQLite database, the open() and close() system calls are invoked only once,
whereas open() and close() are invoked once for each blob when using blobs
stored in individual files. It appears that the overhead of calling open() and
close() is greater than the overhead of using the database. The size reduction
arises from the fact that individual files are padded out to the next multiple
of the filesystem block size, whereas the blobs are packed more tightly into
an SQLite database.

~~~
copenja
The advice to use a single open/close call instead of multiple is great, but
that is advice on how to effectively _use_ the filesystem. It isn't in anyway,
shape, or form beating the filesystem.

~~~
jdmichal
You can't use a single open/close call to write multiple BLOBs, unless you are
concatenating them into one file. That's not typically what happens nor what
people are advocating when they recommend not using database BLOBs and using
the filesystem instead.

------
meritt
If I asserted: "It's faster to put 10,000 rows of csv data in single file
instead of 10,000 individual files" even the most junior programmer would
likely say "Well, duh, it's 1 file instead of 10,000".

Yet this benchmark is at the top of HN for some reason.

~~~
joesb
So you never think that using database to store file will be slower than using
file to store file?

"Duh, I always knew that" is easy to say when you don't have to provide proof.

Also, are you sure that CSV will be faster as well? Do you have a benchmark
for it?

~~~
jstimpfle
"Measure before optimizing" has limited application. There are many things
that need not be measured. Appending to a file is definitely not slower than
doing memory management inside the file to allocate a new chunk, seeking to
that position and then writing out the chunk. Period. (And the serialization
overhead is negligible compared to the disk I/O).

~~~
joesb
It's not only "writing" though. It's "writing" _AND_ "reading".

Can you read the correct line in in CSV faster? Can you find the row with
specific critiria like SQLite faster? Can you handle concurrency correctly
like SQLite? Can you make sure your CSV is always in consistent state like
SQLite?

The point of the article is that SQLite is still fast _even with all the
benefit of database_.

The idea of SQLite is that you could default to using it, and move away when
it hits its limit. Nobody should default to using CSV, you use CSV when you
have to, despite all its limitation.

~~~
jstimpfle
No. Read parent again.

------
songzme
During my first week at American Express I did not have the credentials to
install any applications. My manager wanted me to build a prototype and I
couldn't install a database. Firebase and other third party was impossible
because I had to work on an internal server. I thought SQLite would save my
day, but I remembered it being really hard to install and set up. So I had to
write my own DB that I called 'stupid-db'. It literally just reads/writes data
into a file: [https://github.com/songz/stupid-
db](https://github.com/songz/stupid-db)

The demo went well.

~~~
jitl
???

SQLite is a library, not a database server. There's no install step - it's a
single .C file (and a header, if you're into that). Most of the major
scripting languages have rock-solid bindings that don't require any additional
system libraries or software installs.

Python (in the stdlib!):
[https://docs.python.org/2/library/sqlite3.html](https://docs.python.org/2/library/sqlite3.html)

Nodejs (prebuilt):
[https://www.npmjs.com/search?q=sqlite](https://www.npmjs.com/search?q=sqlite)

Ruby (needs libs, which are preinstalled on macOS):
[https://rubygems.org/search?utf8=&query=sqlite3](https://rubygems.org/search?utf8=&query=sqlite3)

~~~
songzme
you are right. this was also many years ago so my memory might fail me.
Perhaps my internal server didn't have access to npmjs.

