
Internal Versus External BLOBs in SQLite - raphaelj
https://www.sqlite.org/intern-v-extern-blob.html
======
mattbillenstein
How old is this benchmark? HDD or SSD (I think the former given the 2011
reference.

A filesystem is the canonical means for storing files - just use it - it's
incredibly handy to have regular tools at your disposal for managing those
files outside of the database. The only case I can think where maybe files in
the db makes sense is if you have millions of really tiny files (100-200
bytes).

Having files in the db maybe avoids an extra seek - but in any real system
high throughput system, you're probably leaning towards SSDs where you get
many more seeks/s than you do on a spinny disk.

If your application needs something more like a distributed filesystem, there
are better things (gluster, moosefs, even nfs, etc).

~~~
gilgoomesh
> A filesystem is the canonical means for storing files - just use it

Sure but what about the many use cases where you're not really storing files
but metadata or other generated data?

A simple example might be photo libraries. The photos themselves are files but
are the 5kB small and 50kB medium thumbnails generated from the photos worth
storing as files? Or should they be stored as blobs in the database along with
their other metadata?

It's good to know where the cutoff should be.

~~~
mattbillenstein
Metadata makes sense in the db - it's structured non-binary data.

But I think the binary thumbnails and so forth should almost always be files
on the filesystem.

------
xpaulbettsx
Caution that on Windows (and even OS X to some extent), this graph will be
__very __different - NTFS perf for accessing a lot of small files is very slow
compared to Linux.

------
Spooky23
I've always pushed back at storing blob data in databases in most cases --
probably 80-90% of the time depending on the job.

Why? Databases are all about storing data in a structure defined by the
relationships between the records. In most cases with files, the relevant data
is the metadata or other structured data, not the file.

If you do the math, it's rarely cost effective to store unstructured data in a
structured database. Databases are always harder to scale.

On the file side, there are dozens of ways to effectively deliver files that
are cheaper, more scalable and simpler for many use cases.

~~~
electrum
Databases also have advantages: transactions, durable logging, replication,
integrated backups, etc.

Tom Kyte has some insightful points in this thread:
[https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUE...](https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1011065100346196442)

"Do I want you to keep MY medical stuff in a directory, unsecured, unmanaged,
not backed up like the database is - we have no idea if recovery is possible,
who accessed it last? Who has access to it."

"if this data is valuable to your business, if this data is necessary for your
business, if the loss of this data would damage your ability to do business -
it has no business NOT being in the database."

~~~
xorcist
Relatational databases has their advantages compared with hierarchical
filesystems of blobs, but that comment is really completely beside the point.

File systems are simpler and backups and replication are really much more
straightforward. As anyone who did disaster recovery of Oracle databases can
attest to, there are a lot of details to get wrong.

It boils down to what data you want to store and what your access patterns
look like.

~~~
Spooky23
Great point -- Oracle has a solid capability that is easy to operationally
screw up. Database recovery is usually a real shitshow.

When you go to other platforms, you need solid process AND deep exterior in
whether the database can do what you want. Can MySQL handle the same recovery
scenarios that Oracle can? With which storage engines?

------
bch
I've ran into situations where this is pertinent information to know w/ a
music pet-project. I ended up jamming all my .mp3s into the DB though, because
not only is it storage and access in this case, but the collection (in a DB)
is in a defacto archive; all I do is copy the (admittedly large) DB where I
need it, and I've got my data and metadata. If I were doing it again, I'd
rethink this design, which isn't to say I'd abandon it, but I'd reconsider it;
it's unclear whether the all-in-one solution is worth the bulk, though it
might still be.

~~~
bane
I'd always be concerned of some minor level of corruption to the big file,
something that would only ruin a song or two, could render your entire dB
useless.

~~~
icefox
So the natural question is why don't "file systems" (aka big database) become
useless with some minor level or corruption?

~~~
hobs
I was about to comment that a database will only be corrupt if it is corrupted
in the same way a file system would be corrupted, or its not a great database
file format.

I work with SQL Server and while database corruption can/does happen, I often
can get away with a row level restore from a backup, no problem.

I still dont like storing big ass files in a database though, I dont love the
additional level of abstraction when you generally dont need to have a high
transactional/relational system.

~~~
emn13
I think you've got that backwards: if you have a database for _something_ , is
it worth the additional levels of abstraction to _also_ store some data in
files?

If your data is fairly small, and perf not the biggest issue (which is pretty
common), you might as well go for the simplest solution, which is going to be
purely DB or purely FS, not some mixture of the two.

~~~
hobs
Agreed, the overhead of maintaining two solutions can be a huge burden
(especially when troubleshooting something going wrong via the synthesis of
two solutions), but I definitely called out "big ass files" because the
difference between some text data and some smaller blob is almost nothing in
terms of performance.

------
herf
You could compare with an append-only-file (with offline compaction), in which
case saving tons of "file open" operations would actually be a big win,
especially if you ever use a network file system. In some cases, a "simple"
log+index can beat both schemes.

Also, I've stored lots & lots of photos in a filesystem, and backup is very
hard. Backing up big files is just a lot easier.

~~~
pjc50
How is backing up lots of small files harder than a small number of big files?
With which software?

~~~
herf
Using rsync and 20M files is a good enough example: \- to check the date of
each file on UNIX you must run "stat" once per file (unless you have an
external log that says what to skip) so that's very slow. \- to backup a big
file is "\--append-verify" or something like this and one streaming read per
file.

------
PythonicAlpha
I think, that is a rather complicated question. You should also take into
account how accessing other data is affected by large blobs that reside in the
same database. For example: you have a table with large data blobs and other
tables with conventional relational data. The blobs will take a lot of the
cache size away from the other data. Also by increasing the database size, you
will have potential longer access times, at least when you use spinning hard
drives.

So, several things to think about (make cache bigger, how much grows the
overall db size) that can change the database behavior.

Of course those numbers can give some directions, but since only the blob data
was taken into account, it can not give full information.

I personally would most of the time prefer smaller db sizes, at least, on edge
cases. Relational databases are great for structured data -- unstructured data
is not their specialty.

~~~
cefstat
About 9 years ago I had a small site where both page content and some large
PDF files were stored in the same SQLite database. Nothing extravagant: about
20 pages and 10 PDF files, each PDF file averaging 2-3 MBs. The website was
extremely and consistently slow: it would take several seconds to load any
page. I eventually figured out that the problem was the binary blobs. After
moving the PDF files from the database to the filesystem the load times went
below 0.1s.

Since then I have not put large binary blobs in a database but I wonder if I
would get reasonable performance by putting them in a _separate_ database
instead of mixing them with other small-size rows.

~~~
emn13
That's weird, and should not have happened. I've done similar things with
sqlite databases many times that size, and though the FS might have been even
faster, sqlite was still more than fast enough (as in, sub-10ms responses the
norm).

I suspect there's some way in which your access pattern or usage was
suboptimal. Perhaps you had big reads with small writes in a transaction, or
your network streamed directly from db leaving connections open a long time,
or... something, because there's no way this should have taken so long.

------
geku
Wondering how MySQL and Postgres perform. Does anybody have links to such
benchmarks?

------
btrask
SQLite supports a single writer at a time, so writing large amounts of data
limits your write throughput. A file system can cache data for several files
concurrently and fsync them independently when you're done.

I believe SQLite also has to write the data twice, using the WAL. Maybe there
is an optimization to avoid that though.

~~~
tines
> I believe SQLite also has to write the data twice, using the WAL. Maybe
> there is an optimization to avoid that though.

If I understand correctly, it writes the data twice if you're _not_ using WAL.
With WAL, it only writes the data once [1].

[1]: The "Performance Considerations" section of
[https://www.sqlite.org/wal.html](https://www.sqlite.org/wal.html)

~~~
seppo0010
The data is written twice when using WAL. From the same link you pasted:

> The original content is preserved in the database file and the changes are
> appended into a separate WAL file.

The difference is whether the journal keeps the new data (WAL) or the old data
(not WAL). If you are writing big chunks of data, WAL will probably be more
I/O intensive.

Taking a wild guess I'll say that if you are adding data to the database, and
not using WAL, the data will only be written once since the journal won't keep
a reference to old data if no pages is being overwritten.

~~~
tines
Ah, I see the difference now, thanks.

------
microcolonel
HTTP headers are telling me that this was last modified on 2014-12-18, I
really hope they didn't JUST get around to testing ubuntu 11.04.

------
Sami_Lehtinen
Another great question is, if there's any reason to use smaller than 4096 byte
pages with current operating and file systems?

------
Sami_Lehtinen
Isn't file system just a hierarchical key value storage? Think about it.

------
Animats
Note that the performance difference rarely exceeds 2:1 either way. So this
isn't a performance issue, unless you're using SQLite for something bigger
than SQLite is for.

~~~
delinka
I don't think I'd define "rarely" as "nearly 25% of cases." Seems it happens
with enough frequency to at least be aware.

