
Performance/Avoid SQLite in Your Next Firefox Feature (2014) - mpweiher
https://wiki.mozilla.org/Performance/Avoid_SQLite_In_Your_Next_Firefox_Feature
======
inshadows
In 2018 Mozilla ditched JSON files for extension preferences in favor of
IndexedDB (SQLite)[1]. This new format is undecodable unless you pull Firefox
codebase. See [2] for what is required just to decode keys (I haven't found
anything that can decode the content aside from Firefox itself). Forget poking
extension preferences anywhere except in Firefox. Forget easy transforming and
sharing between you machines. They turned perfectly usable format into
unusable binary blobs in SQLite, 4 years after this article was published, all
in the name of performance, as if extension preferences/data manipulation
overhead was something someone cared about. I guess this is was some student
project.

[1] [https://blog.mozilla.org/addons/2018/08/03/new-backend-
for-s...](https://blog.mozilla.org/addons/2018/08/03/new-backend-for-storage-
local-api/)

[2] [https://stackoverflow.com/questions/22917139/how-can-i-
acces...](https://stackoverflow.com/questions/22917139/how-can-i-access-
firefoxs-internal-indexeddb-files-using-python)

------
Yoric
Wow, that is old. The last meaningful edit to this document is 2014.

Everything has changed several times since then. Please ignore this link :)

~~~
weare138
This. Firefox Quantum doesn't even use the same codebase as Firefox circa
2014.

~~~
jaas
Yes it does. Quantum is not new from scratch, it's (excellent) improvement and
reorganization of largely the same code base. Some code and components are
new, but that's rare all in all.

~~~
weare138
> _reorganization of largely the same code base_

Quantum uses a new UI framework called Photon, new CSS engine and rendering
architecture called Servo and the compositor has been completely overhauled to
integrate Servo. So no, it is _not_ largely based on a 6 year old codebase.
Just Google it like I did if you don't believe me.

~~~
michaelmrose
Got some numbers on total line count and total modified line count? I know
lines isn't a great proxy for complexity but don't know of a better one
available in this context.

~~~
cmiles74
I don't think the number of lines changed, removed and added it going to be
all that useful. If anything, it will likely lead to erroneous snap
judgements.

------
ipython
Skimming through the bullet points, it seems that a lot of that advice would
be applicable to _any_ storage engine. Storing too much data and not expiring
it would seem to be just as much as a problem for flat files.

I would argue that it’s much harder to meet SQLite’s performance and stability
targets with a home grown storage system than it would be to learn how to use
SQLite effectively.

~~~
komarov_om
Speaking of stability: [https://www.deconstructconf.com/2019/dan-luu-
files](https://www.deconstructconf.com/2019/dan-luu-files)

~~~
Lex-2008
_psuedo-transcript_ of the talk by the talk author:
[https://danluu.com/deconstruct-files/](https://danluu.com/deconstruct-files/)

------
adrianmonk
> _Features that regularly add data to the DB (e.g. visited URIs), but have no
> expiration policy for the data, can easily cause the DB to balloon into tens
> or hundreds of MBs._

I don't know if it's fair to call that a SQLite pitfall. If they can recommend
an alternative storage option where this problem doesn't occur, I'd be
extremely interested to hear about it.

~~~
hobs
I think the only real critique is that a file system already has systems to
identify individual ballooning items, whereas the database would be a single
container that is opaque to many basic users.

------
zbuf
For a long time I've been running many apps with a wrapper to disable fsync in
SQLite integrations. Its a harsh solution.

Am I correct that out of the box it's implementing a full integrity around
"commit" \-- ie. The data has to fully hit the disk before the call returns?

For most applications I don't want this. Browser history for example is fine
if the most recent transactions are lost on a power failure. Of course I do
not want the file to be corrupted and lost.

At a previous employer we had similar issues with Chrome's internal SQLite
database (Linux) in an environment with network mounted home directories. It
was dealt with by a Chrome wrapper to change one of the internal flags on the
user's SQLite database.

Possibly we may just have a case of a bad default here. Particularly in the
context of other posts eg. "SQLite as a file format"

In the docs. I only find "PRAGMA synchronous=OFF." Did I dream it, or is there
a similar setting to preserve the integrity of the file through write
barriers, journaling etc., but not have a full fsync on transaction?

Because then on a power failure the file would contain "a stable commit" but
not necessarily "the most recent commit" ... And it seems that for most SQLite
integrations that is what we want?

Instead we're now all burning through SSDs for our trivial user settings
files, a bit like everything is O_SYNC and the page cache was never invented.

~~~
snuxoll
> Because then on a power failure the file would contain "a stable commit" but
> not necessarily "the most recent commit" ... And it seems that for most
> SQLite integrations that is what we want?

You HAVE to commit the journal to disk in a stable fashion - that requires
fsync() because without there no guarantee that the data is actually on disk.
Without the journal there’s no way to back out a partially applied write,
leaving you with a corrupt database.

Could an application maintain the state in memory and periodically flush out
to SQLite to get around it? Sure. But many people would be pissed to lose
their browser history for the past N time before a crash or power loss after
just finding a page they needed for some purpose.

Get a better storage solution for your home directories, ZFS with high
lifespan Optane drives as a SLOG would dramatically increase the write
endurance of standard NAND flash by batching writes while maintaining
acceptable fsync performance (since synchronous writes don’t hit the pool
itself until the transaction group commits every X seconds).

~~~
zbuf
> You HAVE to commit the journal to disk in a stable fashion - that requires
> fsync() because without there no guarantee that the data is actually on
> disk.

But does this indicate the issue; that userland only has fsync() API -- it's a
case of full synchronous write, or not. Whereas the SQLite file could contain
its own journal and data, and maintain the integrity using a write barrier.

I still have no need for my browser history to be fully synchronous to disk.
Yet that's what we're doing, and then...

> Get a better storage solution for your home directories, ZFS with high
> lifespan Optane drives as a SLOG would dramatically increase the write
> endurance of standard NAND flash by batching writes while maintaining
> acceptable fsync performance

... spending on high grade tech in an attempt to solve the problem we just
created.

Fundamentally, we're asking these systems to do synchronous writes, when it's
not what we actually want. When everything's a priority, nothing's a priority.

------
catalogia
One of the more prominent uses of sqlite in Firefox is the 'places' db where
your history and bookmarks are stored. If you go to _History- >Show All
History_, then select a few thousand entries and press delete, you can lock
firefox up for several minutes, tens of minutes even.

I don't think this is necessarily damning of sqlite though, I'm pretty sure
Firefox is not quite doing things right in some cases. I can delete thousands
of rows in sqlite in mere milliseconds, so something else is going on. A
shitty schema maybe? I'm not sure.

~~~
komarov_om
This is the relevant bug report:
[https://bugzilla.mozilla.org/show_bug.cgi?id=734643](https://bugzilla.mozilla.org/show_bug.cgi?id=734643)

Apparently, it has something to do with the UI rendering the selected items,
and not with the database.

~~~
catalogia
Hot damn, opened 8 years ago? That report describes the behavior I experience
to a T, it's definitely the same bug.

Is the this 'Library' window a dead component? Besides this nearly decade-old
bug, it still doesn't get themed like the rest of firefox either; dark mode
doesn't effect it.

~~~
ta17711771
If you think that's bad, wait til you see the bug date for X11 exploit on
Linux systems.

------
aeturnum
This reminds me of a problem I've been having on newer versions of Chrome.
Chrome, even if I'm not using it for anything, will sometimes consume so much
power it drains my battery while plugged in (albeit to a previous-gen 85W
brick with an adapter). There are a few threads on chrome support[1], but one
of the theories is that Chrome's internal SQLite database that is used for
tracking history locally is the source of the drain.

[1]
[https://support.google.com/chrome/thread/17537877?hl=en](https://support.google.com/chrome/thread/17537877?hl=en)

~~~
jve
Run a profiler to see what exactly drains your power.
[https://randomascii.wordpress.com/2016/03/08/power-
wastage-o...](https://randomascii.wordpress.com/2016/03/08/power-wastage-on-
an-idle-laptop/)

That guy makes sure Chrome is performant and not wasting your battery.

~~~
aeturnum
Uh yah, there are people in that thread who have profiled their Chrome
installs and found that SQLite was draining power. Thus my post.

Since I needed to use Chrome, it didn't really matter to me why the problem
was happening. But, since this was a thread on performance problems with
SQLite, I posted it.

------
Groxx
Issues like large-WAL and the need for periodic blocking maintenance (e.g.
vacuum, to free space and un-interleave data) are _excellent_ reasons to
aggressively try to keep your insert/trigger/etc logic as simple as possible.
In all databases.

When it's simple enough, you can safely abstract over the DB and do stuff
like:

1) maintain an in-memory operation-queue, so you can copy the DB when idle,
vacuum as long as necessary on the copy, and then switch to the vacuumed copy
after replaying the queue. (SQLite allows only a single writer, so statement-
replay is safe, unlike concurrent-writer databases in some cases since you
can't recreate the DB's row-visibility logic)

2) in-memory caches can alleviate a lot of load, but are hideously difficult
or impossible to maintain if you allow complex or DB-only insertion logic
(e.g. 'on conflict' rules). Simple table+rowid based caches can be utterly
trivial and _correct_ if you have nothing complex. Or if you really want, a
full-blown lambda architecture is possible, and sometimes not even very
difficult.

3) simple logic almost always completes more quickly in the DB, and long-
running transactions are the bane of your DB's existence, causing large WALs,
worse contention, etc. Aim to keep small atomic operations safe, e.g.: don't
require a transaction to modify 100k records correctly, allow each one to be
atomic. Then you can batch in any way you please, optimizing for throughput or
non-blocking-ness as desired.

Complex _queries_ are far safer since you can just ignore your abstraction
layer. But complex _modifications_ are worth avoiding whenever possible.

------
trashburger
I guess this is a nice complement to the pro-SQLite articles posted here in
the last few days. It gives an interesting perspective on the possible
problems one might face when using SQLite as their application format (from
the link posted here yesterday).

~~~
The_Colonel
SQLite is one of the unfortunate victims of reddit/hackernews circlejerk (the
other common victims in this space are MySQL, Postgres, Mongo).

I use and love SQLite for what it is. But it's not a silver bullet for every
use case.

I have an anecdote - few months ago a problem surfaced with SQLite -
relatively simple query (3-4 joins) was working completely fine for a long
time, but suddenly started taking very long time. It turned out it stopped
using correct indexes - probably some heuristic threshold(s) have been crossed
and suddenly query planner started returning very inefficient plan which was
100 times slower. It was very easy fix - force use of correct indexes, but it
would be very hard to debug if I didn't have access to the database (as is
often the case with embedded databases). This is the price for the abstraction
of only defining what you want declaratively and you hope the database will do
the right thing to execute your request.

------
TheChaplain
Note: The info on that page is 3 years old, the situation could be different
today.

~~~
dralley
It was last edited 3 years ago, but it was only a minor edit, most of the
actual content is from 2014.

------
WhatIsDukkha
Paraphasing this page -

"Firefox has a lot of performance problems already from our architecture, we
didn't necessarily integrate Sqlite in the best way on top of that"

"If you make a bad sql schema it will make it worse so maybe you should think
of doing something off the cuff with json files that might be just as likely
to be terrible"

It's really nice to be able to read a sql schema when you are trying to
understand an application.

It would be a shame to lose that because core issues weren't addressed more
broadly.

~~~
BossingAround
I think there's a balance. Pretty much every performance issue can be reduced
to "if you don't know what you are doing, it's gonna be slow".

But, there are systems that enable you to make mistakes quite easily, and
there are systems that make it more difficult to make mistakes. I think these
issues are often transferable to regular DBs, but you typically don't run a
full-on DB on your smartphone, which has very different performance and memory
footprints/limitations.

It's good to take it into consideration when using DBs on mobile devices (or,
consider whether you really do need a DB, albeit small and nimble like
SQLite).

------
cbsmith
I hate that this resulted in a lot of people using JSON, which ought to be
barely considered a data format.

~~~
theamk
I dunno, I think that json is likely one of the best things that could have
happened to data formats in general.

A machine readable and writeable format, without embedded type info or
external schemas, and supported by all major programming languages? That’s
exactly what is needed to allow third-party apps to access that data.

~~~
lokedhs
Yes, that would be great. The problem is that JSON doesn't actually provide
this.

One of the most important aspects of a universal format is in my opinion the
ability to rely on the fact that when I store a value on one system, that I
retrieve the same value when I read it back on another.

JSON doesn't even do this for numbers. The documentation explicitly makes it
undefined as to what the precision is. I might be able to store and retrieve
the value 1099511627776 using one JSON implementation but when using another,
I may get a different value.

~~~
theamk
Yes, the fact that JSON only does floats is annoying. It is especially bad
when I want to store large, high precision numbers, like microsecond
timestamps.

The right approach is to treat "high precision floating point numbers" as just
another type which cannot be represented by JSON natively, and thus store it
as a string. After all, we cannot store dates, binary data or 128-bit integers
in JSON either.

You might argue that JSON should support doubles, and dates, and binary
strings, and a ton of others -- but this will make it much less portable. If
you add a "high precision float" type, then every parser ever will need to
support it.

Right now, there is a JSON parsers entirely implemented in a single 3000-line
C89 file with no external dependencies. This would be impossible to do with
bigger library of data types.

~~~
lokedhs
My point was that JSON doesn't even reliably store _integers_. If you could
only store a single datatype outside of strings, it should be integers, and
JSON doesn't even do that.

I'm guessing a lot of software are storing 64-bit integer ID's in a JSON
numeric field. How many people know that this is unreliable and when it fails
for larger numbers you won't even get an error message but silently get the
wrong number?

~~~
theamk
What exactly would you want it to do? Have a separate "float"/"integer" type?

This would break roundtriping/comparition on languages which do not
distinguish between integers and floating point types representing integers.

And it would still make you unhappy, because there are systems out there
without simple 64-bit integer support (some examples which come to mind is
perl5 on 32-bit machines; TCL; and you know, Javascript before 2019)

So JSON has a single numeric type, a float. Yes, you have to remember that you
cannot store a 64 bit value there. This is an annoying gotcha. But it does not
come from deficiency of the serialization format, it comes from the primary
goal: if you want to support many languages, you need to conform to common
subset of features. And since there are languages which cannot natively handle
64-bit integers, you should play safe and store those as strings.

~~~
lokedhs
I would likely only have an integer type. Perhaps even limited to 32 bits.
That is because you can't rely on more than that anyway, but if the standard
specified it then at least you'd have predictable behaviour.

If I can't rely on being able to store a number of certain magnitude, what
good is it that some platforms are able to do it?

------
wendelmaques
Are we are going to have MongoDb on firefox?

------
talideon
This amuses me as it reminds me of the performance issues Mork
([https://wiki.mozilla.org/Mork](https://wiki.mozilla.org/Mork)) had, and
introducing SQLite was originally a way to fix that problem.

------
sitkack
These all seem like _great_ reasons to _use_ sqlite, not avoid it. All of that
complexity, all of those bugs, security bugs, will now be your custom code to
save and store data?

