In 2018 Mozilla ditched JSON files for extension preferences in favor of IndexedDB (SQLite)[1]. This new format is undecodable unless you pull Firefox codebase. See [2] for what is required just to decode keys (I haven't found anything that can decode the content aside from Firefox itself). Forget poking extension preferences anywhere except in Firefox. Forget easy transforming and sharing between you machines. They turned perfectly usable format into unusable binary blobs in SQLite, 4 years after this article was published, all in the name of performance, as if extension preferences/data manipulation overhead was something someone cared about. I guess this is was some student project.
What exactly has changed? It’s not like sqlite suddenly stopped calling fsync all the time, and the memory consumption is still relevant.
It might be just the opposite, in fact - faster SSDs may mean that directories with many files are faster, and packing many objects into one big sqlite archive does not have as many benefits as before.
What's the latest, then? Preferably this document could be updated to point to more up-to-date advice on using SQLite, while retaining the old information for historical reference.
Yes it does. Quantum is not new from scratch, it's (excellent) improvement and reorganization of largely the same code base. Some code and components are new, but that's rare all in all.
Quantum uses a new UI framework called Photon, new CSS engine and rendering architecture called Servo and the compositor has been completely overhauled to integrate Servo. So no, it is not largely based on a 6 year old codebase. Just Google it like I did if you don't believe me.
Photon is just the UX design guidelines that were overhauled with Quantum. Having said that, they were developed in close collaboration between the UX designers and UI engineers to ensure that it performs as best as it possibly can.
> new CSS engine and rendering architecture called Servo
Servo is an experimental engine developed by Mozilla Research. Its style system, dubbed Stylo, was adapted for use by Gecko.
> and the compositor has been completely overhauled to integrate Servo.
Gecko’s compositor was moved into its own process during Quantum, and Servo’s graphics subsystem, WebRender, is in the process of being gradually deployed on release, with several more cohorts pending.
Quantum does not use Servo. Servo is a research project, and _ideas_ from Servo get reimplemented into Firefox. As it stands, the main things that came from Servo are Stylo and WebRender. While they're two large things, there's still a _lot_ of Firefox that is old code. And a lot of things they still want to reintegrate.
Got some numbers on total line count and total modified line count? I know lines isn't a great proxy for complexity but don't know of a better one available in this context.
I don't think the number of lines changed, removed and added it going to be all that useful. If anything, it will likely lead to erroneous snap judgements.
That's not really true... Firefox code base is just in constant evolution. Quantum is not a rewrite from scratch, so a lot of code dates back from before 2014.
Skimming through the bullet points, it seems that a lot of that advice would be applicable to any storage engine. Storing too much data and not expiring it would seem to be just as much as a problem for flat files.
I would argue that it’s much harder to meet SQLite’s performance and stability targets with a home grown storage system than it would be to learn how to use SQLite effectively.
> Features that regularly add data to the DB (e.g. visited URIs), but have no expiration policy for the data, can easily cause the DB to balloon into tens or hundreds of MBs.
I don't know if it's fair to call that a SQLite pitfall. If they can recommend an alternative storage option where this problem doesn't occur, I'd be extremely interested to hear about it.
I think the only real critique is that a file system already has systems to identify individual ballooning items, whereas the database would be a single container that is opaque to many basic users.
For a long time I've been running many apps with a wrapper to disable fsync in SQLite integrations. Its a harsh solution.
Am I correct that out of the box it's implementing a full integrity around "commit" -- ie. The data has to fully hit the disk before the call returns?
For most applications I don't want this. Browser history for example is fine if the most recent transactions are lost on a power failure. Of course I do not want the file to be corrupted and lost.
At a previous employer we had similar issues with Chrome's internal SQLite database (Linux) in an environment with network mounted home directories. It was dealt with by a Chrome wrapper to change one of the internal flags on the user's SQLite database.
Possibly we may just have a case of a bad default here. Particularly in the context of other posts eg. "SQLite as a file format"
In the docs. I only find "PRAGMA synchronous=OFF." Did I dream it, or is there a similar setting to preserve the integrity of the file through write barriers, journaling etc., but not have a full fsync on transaction?
Because then on a power failure the file would contain "a stable commit" but not necessarily "the most recent commit" ... And it seems that for most SQLite integrations that is what we want?
Instead we're now all burning through SSDs for our trivial user settings files, a bit like everything is O_SYNC and the page cache was never invented.
> Am I correct that out of the box it's implementing a full integrity around "commit" -- ie. The data has to fully hit the disk before the call returns?
Yes, as is the default of any ACID-compliant RDBMS. This is what the "D" in ACID stands for (durability):
> Durability guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure (e.g., power outage or crash).
There is no way to guarantee durability without fsync, as without fsync there is no guarantee that the data has been written to disk and is not simply cached by the OS. Any database that guarantees durability performs an fsync after every transaction.
> Possibly we may just have a case of a bad default here. Particularly in the context of other posts eg. "SQLite as a file format"
I would argue it is a very sane default. The database system does not know how you value the data that you ask it to store. By default it takes the position that your data is very valuable, and that you would not want to lose it even in the case of a crash.
In some cases that is overkill, but in some cases it is not. The safer default is to be cautious. You can always disable fsync if you find that you need the performance boost. Valuable data that is lost is lost forever.
I do agree that a mode that you describe ("I don't care if I lose the latest writes - just don't corrupt my database") would be very useful in many situations, and should perhaps be added/promoted in a more clear fashion.
> ("I don't care if I lose the latest writes - just don't corrupt my database") would be very useful in many situations, and should perhaps be added/promoted in a more clear fashion.
PostgreSQL has such a capability; "synchronous_commit=off". If you do want performance at all costs at the risk of integrity there's a separate "fsync=off". The former is very useful, exactly what we needed on a database application I was developing. The latter I can't see a benefit to using. I'm interested now as to how it's implemented -- which APIs and what assumptions of the filesystem it makes. And therefore whether SQLite would be capable of something similar.
If it exists in SQLite (I can't find it) then it needs promoting and using more.
In general my interactions with SQLite have been good, but its ubiquity seems to be spreading a little plague of disk chatter and bottleneck effects on I/O. It's anecdotal, but a lot of the pauses on my Android phone seem like they might be rooted in this, too.
WAL mode can help large part of the fsync issues, but it does break the "one file" niceties SQLite has. If you don't care about losing some recent commits, WAL is definitely the way to go: https://www.sqlite.org/wal.html
I would also say that WAL has been receiving improvements from what I can see. For example, 3 years ago, you cannot open a WAL mode SQLite file in read-only mode, but now you can since 3.22.0.
> Because then on a power failure the file would contain "a stable commit" but not necessarily "the most recent commit" ... And it seems that for most SQLite integrations that is what we want?
You HAVE to commit the journal to disk in a stable fashion - that requires fsync() because without there no guarantee that the data is actually on disk. Without the journal there’s no way to back out a partially applied write, leaving you with a corrupt database.
Could an application maintain the state in memory and periodically flush out to SQLite to get around it? Sure. But many people would be pissed to lose their browser history for the past N time before a crash or power loss after just finding a page they needed for some purpose.
Get a better storage solution for your home directories, ZFS with high lifespan Optane drives as a SLOG would dramatically increase the write endurance of standard NAND flash by batching writes while maintaining acceptable fsync performance (since synchronous writes don’t hit the pool itself until the transaction group commits every X seconds).
> You HAVE to commit the journal to disk in a stable fashion - that requires fsync() because without there no guarantee that the data is actually on disk.
But does this indicate the issue; that userland only has fsync() API -- it's a case of full synchronous write, or not. Whereas the SQLite file could contain its own journal and data, and maintain the integrity using a write barrier.
I still have no need for my browser history to be fully synchronous to disk. Yet that's what we're doing, and then...
> Get a better storage solution for your home directories, ZFS with high lifespan Optane drives as a SLOG would dramatically increase the write endurance of standard NAND flash by batching writes while maintaining acceptable fsync performance
... spending on high grade tech in an attempt to solve the problem we just created.
Fundamentally, we're asking these systems to do synchronous writes, when it's not what we actually want. When everything's a priority, nothing's a priority.
Or maybe stop using sqlite, and serialize your state to disk periodically, just like this document says.
Modern filesystems are pretty good in maintaining file integrity if you use “atomic overwrite via rename” trick, and for extra assurance you can store an older version of config or five.
This will give you a “pretty robust” solution which will work “most of the time”, and that will completely avoid fsyncs and prolong your SSD life. What’s not to like about it?
Sure, use WAL journal mode and then run "PRAGMA synchronous=NORMAL". Documentation is at https://sqlite.org/pragma.html#pragma_synchronous . The default is FULL (compile time option SQLITE_DEFAULT_SYNCHRONOUS ) which makes SQLite durable, but naturally causes a fsync() at every commit. With NORMAL it only runs fsync() once it checkpoints from WAL to db file.
I run it with synchronous=NORMAL, but set it to FULL for specific transactions which I want to be durable. Then I turn off auto checkpointing and manually run WAL checkpointing in another thread once it is relatively large. That way it runs fsync() really rarely. That more performant setup is a bit complicated (and error prone) so it is no wonder that it is rarly done. Though I hope Firefox does it nowadays.
One of the more prominent uses of sqlite in Firefox is the 'places' db where your history and bookmarks are stored. If you go to History->Show All History, then select a few thousand entries and press delete, you can lock firefox up for several minutes, tens of minutes even.
I don't think this is necessarily damning of sqlite though, I'm pretty sure Firefox is not quite doing things right in some cases. I can delete thousands of rows in sqlite in mere milliseconds, so something else is going on. A shitty schema maybe? I'm not sure.
This is almost always a case of a programming language loop that implicitly does..
begin transaction
delete from history where id = @id
commit
.. on every iteration. Which is asking the storage engine to write the new data and make sure it's flushed to disk between each deleted entry. No wonder it's slow. You can get massively better just by hoisting the transaction begin and commit to outside the loop, and even better if you can drop the loop altogether and specify range of entries to delete directly.
Hot damn, opened 8 years ago? That report describes the behavior I experience to a T, it's definitely the same bug.
Is the this 'Library' window a dead component? Besides this nearly decade-old bug, it still doesn't get themed like the rest of firefox either; dark mode doesn't effect it.
It might be that they are deleting each entry in its own transaction. A transaction causes file system level sync and with the millisecond latency of common hard drives it causes transactions to have a millisecond overhead. Multiply this by a few thousand and you get the latency you are talking about.
I've made this mistake myself in a project of mine and after tracking down the bad performance to this issue ended up adding a layer that emits explicit transaction begin/end statements after N operations.
Edit: See the bug thread linked in the sibling comment. The transaction latency seems to be a component, but other issues remain.
This reminds me of a problem I've been having on newer versions of Chrome. Chrome, even if I'm not using it for anything, will sometimes consume so much power it drains my battery while plugged in (albeit to a previous-gen 85W brick with an adapter). There are a few threads on chrome support[1], but one of the theories is that Chrome's internal SQLite database that is used for tracking history locally is the source of the drain.
Uh yah, there are people in that thread who have profiled their Chrome installs and found that SQLite was draining power. Thus my post.
Since I needed to use Chrome, it didn't really matter to me why the problem was happening. But, since this was a thread on performance problems with SQLite, I posted it.
Issues like large-WAL and the need for periodic blocking maintenance (e.g. vacuum, to free space and un-interleave data) are excellent reasons to aggressively try to keep your insert/trigger/etc logic as simple as possible. In all databases.
When it's simple enough, you can safely abstract over the DB and do stuff like:
1) maintain an in-memory operation-queue, so you can copy the DB when idle, vacuum as long as necessary on the copy, and then switch to the vacuumed copy after replaying the queue. (SQLite allows only a single writer, so statement-replay is safe, unlike concurrent-writer databases in some cases since you can't recreate the DB's row-visibility logic)
2) in-memory caches can alleviate a lot of load, but are hideously difficult or impossible to maintain if you allow complex or DB-only insertion logic (e.g. 'on conflict' rules). Simple table+rowid based caches can be utterly trivial and correct if you have nothing complex. Or if you really want, a full-blown lambda architecture is possible, and sometimes not even very difficult.
3) simple logic almost always completes more quickly in the DB, and long-running transactions are the bane of your DB's existence, causing large WALs, worse contention, etc. Aim to keep small atomic operations safe, e.g.: don't require a transaction to modify 100k records correctly, allow each one to be atomic. Then you can batch in any way you please, optimizing for throughput or non-blocking-ness as desired.
Complex queries are far safer since you can just ignore your abstraction layer. But complex modifications are worth avoiding whenever possible.
I guess this is a nice complement to the pro-SQLite articles posted here in the last few days. It gives an interesting perspective on the possible problems one might face when using SQLite as their application format (from the link posted here yesterday).
SQLite is one of the unfortunate victims of reddit/hackernews circlejerk (the other common victims in this space are MySQL, Postgres, Mongo).
I use and love SQLite for what it is. But it's not a silver bullet for every use case.
I have an anecdote - few months ago a problem surfaced with SQLite - relatively simple query (3-4 joins) was working completely fine for a long time, but suddenly started taking very long time. It turned out it stopped using correct indexes - probably some heuristic threshold(s) have been crossed and suddenly query planner started returning very inefficient plan which was 100 times slower. It was very easy fix - force use of correct indexes, but it would be very hard to debug if I didn't have access to the database (as is often the case with embedded databases). This is the price for the abstraction of only defining what you want declaratively and you hope the database will do the right thing to execute your request.
Agreed. For me, I think the internal fragmentation is well-understood. I would love to know more about the external fragmentation critique they had in this article though.
"Firefox has a lot of performance problems already from our architecture, we didn't necessarily integrate Sqlite in the best way on top of that"
"If you make a bad sql schema it will make it worse so maybe you should think of doing something off the cuff with json files that might be just as likely to be terrible"
It's really nice to be able to read a sql schema when you are trying to understand an application.
It would be a shame to lose that because core issues weren't addressed more broadly.
I think there's a balance. Pretty much every performance issue can be reduced to "if you don't know what you are doing, it's gonna be slow".
But, there are systems that enable you to make mistakes quite easily, and there are systems that make it more difficult to make mistakes. I think these issues are often transferable to regular DBs, but you typically don't run a full-on DB on your smartphone, which has very different performance and memory footprints/limitations.
It's good to take it into consideration when using DBs on mobile devices (or, consider whether you really do need a DB, albeit small and nimble like SQLite).
I dunno, I think that json is likely one of the best things that could have happened to data formats in general.
A machine readable and writeable format, without embedded type info or external schemas, and supported by all major programming languages? That’s exactly what is needed to allow third-party apps to access that data.
Yes, that would be great. The problem is that JSON doesn't actually provide this.
One of the most important aspects of a universal format is in my opinion the ability to rely on the fact that when I store a value on one system, that I retrieve the same value when I read it back on another.
JSON doesn't even do this for numbers. The documentation explicitly makes it undefined as to what the precision is. I might be able to store and retrieve the value 1099511627776 using one JSON implementation but when using another, I may get a different value.
Yes, the fact that JSON only does floats is annoying. It is especially bad when I want to store large, high precision numbers, like microsecond timestamps.
The right approach is to treat "high precision floating point numbers" as just another type which cannot be represented by JSON natively, and thus store it as a string. After all, we cannot store dates, binary data or 128-bit integers in JSON either.
You might argue that JSON should support doubles, and dates, and binary strings, and a ton of others -- but this will make it much less portable. If you add a "high precision float" type, then every parser ever will need to support it.
Right now, there is a JSON parsers entirely implemented in a single 3000-line C89 file with no external dependencies. This would be impossible to do with bigger library of data types.
My point was that JSON doesn't even reliably store _integers_. If you could only store a single datatype outside of strings, it should be integers, and JSON doesn't even do that.
I'm guessing a lot of software are storing 64-bit integer ID's in a JSON numeric field. How many people know that this is unreliable and when it fails for larger numbers you won't even get an error message but silently get the wrong number?
What exactly would you want it to do? Have a separate "float"/"integer" type?
This would break roundtriping/comparition on languages which do not distinguish between integers and floating point types representing integers.
And it would still make you unhappy, because there are systems out there without simple 64-bit integer support (some examples which come to mind is perl5 on 32-bit machines; TCL; and you know, Javascript before 2019)
So JSON has a single numeric type, a float. Yes, you have to remember that you cannot store a 64 bit value there. This is an annoying gotcha. But it does not come from deficiency of the serialization format, it comes from the primary goal: if you want to support many languages, you need to conform to common subset of features. And since there are languages which cannot natively handle 64-bit integers, you should play safe and store those as strings.
I would likely only have an integer type. Perhaps even limited to 32 bits. That is because you can't rely on more than that anyway, but if the standard specified it then at least you'd have predictable behaviour.
If I can't rely on being able to store a number of certain magnitude, what good is it that some platforms are able to do it?
Eh? It is easy to read and write, the syntax is simple and unambiguous. It also maps easily to native types of most languages.
The backslash escaping is annoying, I think HDLC-like symmetric escapes are much nicer - but this is a minor detail and not a deal breaker.
There are some people out there who claim that JSON has a problem because many parsers accept some forms of malformed output. I don’t consider this a problem. All of the json data I have seen is either well-formed, or so broken any parser will reject it.
This amuses me as it reminds me of the performance issues Mork (https://wiki.mozilla.org/Mork) had, and introducing SQLite was originally a way to fix that problem.
These all seem like great reasons to use sqlite, not avoid it. All of that complexity, all of those bugs, security bugs, will now be your custom code to save and store data?
[1] https://blog.mozilla.org/addons/2018/08/03/new-backend-for-s...
[2] https://stackoverflow.com/questions/22917139/how-can-i-acces...