Hacker News new | past | comments | ask | show | jobs | submit login
We’re pretty happy with SQLite and not urgently interested in a fancier DBMS (beets.io)
397 points by samps on June 19, 2016 | hide | past | web | favorite | 146 comments

I never stop being impressed at how often people will jump to odd, unsupportable, conclusions like, "using MySQL will make this thing faster".

I've seen it so many times over the years regarding users and email configurations. I can't count the number of times I've dropped into someone's badly behaving mail configuration and found they had MySQL hosting the users, and explained it was for "performance" reasons. Somehow they didn't grasp that /etc/passwd fits entirely in memory, and the map files Postfix uses for various lookups and stuff are already a database (just one specifically designed for the task at hand) and also fit entirely in memory. Putting that MySQL layer in there is disastrous if performance matters; it is orders of magnitude slower for any case I've seen...still plenty fast for most cases, but it's ridiculous that this idea gets cargo-culted around that if you store your mail users in MySQL your mail server will be faster.

A little knowledge is a dangerous thing, is what I'm trying to say, and people who know MySQL is "fast" may not know enough to know that it's not the right tool for the job in a lot of cases...and is probably slower for many use cases. I'm pretty confident this is one of those cases. SQLite is wicked fast on small data sets, and being smaller means more of it will fit in memory; I can't think of any way MySQL could be a more performant choice for this workload.

Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!

Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!

It's really not that hard to imagine. There is an embedded library version of MySQL called libmysqld which is made for this very purpose. Of course as you point out it's quite overkill for something that would work perfectly well with Berkeley DBs.

I've never seen it used like that, but Microsoft's equivalent SQL Server Express is a nightmare in deployment and support, because you need to administer a full-blown SQL Server on every customer PC (backups, migration debugging, …). I've no idea why people don't use SQLite instead.

_SQL Server Express_ is not the MS equivalent of SQLite.

_SQL Server Express_ is just a free (as in beer) limited version of _SQL Server_. It's still a full-blown traditional server like MySQL, Oracle and other that require services and administration tools to be installed.

Mirosoft has 2 equivalents to SQLite: The Jet database engine is pre-installed in all versions of Windows and allows creation of .mdb databases (those used by older versions of MSAccess). You don't need to include any dll file to use it.

The other is _SQL Server Compact_ which, like SQLite, is an embedded engine that you can bundle with your application by including a library.

All of these embedded databases are able to do multi-user writes to some extent. Jet is actually quite good if careful with locking (emphasis on careful). SQLIte is a de-facto standard because it's simple, performant, cross-platform and flexible.

While it can replace full-blown databases in some cases, it's far from being always true. There are still many cases where using something like SQL Server Express may make more sense, for instance if you want to offer a path to your customer for drop-in replacement of the database based on the growth of their needs over time. Not saying it's not a costly lock-in, but it's an easy one to sell.

It should be noted, of course, that both Jet and SQL Server Compact are essentially considered "deprecated" technology and are not recommended for new development efforts. The compact database format Microsoft recommends these days (and which comes bundled with the Universal Windows Platform SDK these days) is now actually SQLite.

SQLite is popular enough for Windows applications that SQLite had to rename its temporary files to "etilqs_..." (SQLite backwards) to avoid getting unnecessary bug reports from naive users (https://answers.microsoft.com/en-us/windows/forum/windows_7-...)

Certianly from my experience the main reason for using it is that once it becomes to large for a desktop machine it's trivial to migrate to a 'proper' MsSql server.

I also don't find the admin side much of a faff. Provide your users a backup/restore option within your app and ensure you install your own instance.

SQL Express has some rather shitty default configuration when you install it, that you usually have to change to actually use it for anything useful.

I've done also some tests, and so far in most of cases SQLite3 has been fastest option and it's also trivial to manage compared to other options. SQLite3 is absolutely awesome. Until you'll need concurrent writes. It's also important to notice that configuration options will make a big performance difference. http://www.sami-lehtinen.net/blog/sqlite3-performance-python... http://www.sami-lehtinen.net/blog/database-performance-tests... http://www.sami-lehtinen.net/blog/sqlite3-performance-testin...

SQLite is a practical option if concurrent writes are not required. It is not meant for that, and that is also one of the reasons why it is both fast and simple, since without concurrency, a whole set of really complex problems goes away. SQLite is great when the communication and arbitration between two services takes place at the application level, as is the case with DNS AXFR requests, for example, and each of those services may have its own private data store. In such a situation, an immediate benefit is realized by having a domain specific, uniform data manipulation tool (SQL) without having to write custom data manipulation code. This is especially well suited to scenarios where configuration management is performed via OS packaging.

Yup, I did some benchmarking (for a webapp which runs a separate process for each customer) and SQLite had the highest TPS and lowest memory usage by a large margin.

Probably because it runs in the same address space as each server process and there is no message passing (TCP/unix socket) overhead.

Its mostly about concurrency, sqllite serialises all writes, for an embedded database in a single user application that is ok, but for a web application that may have a lot of concurrent writes going on its a performance disaster area. Note that for good perfomamce on a write heavy web application even mysql can be a problem unless you use innodb. As myisam has the same write table locking behaviour.

I'm a little ignorant of the situation on the sqlite side, so this may be a dumb question...

Say you have an MVC web app, could you have asynchronous calls to a thread performing the actual DB writes? From your app's perspective, concurrent writes would be placed into a queue that performs them serially.

Are there any functional issues to this approach? Thanks in advance for your thoughts or info.

That might work, as long as you don't ever need to read the data you've just written (logging?).

Otherwise you'll end up serving stale data that might be several minutes out of date.

Ah, longer caching is the factor with web I always forget after focusing on apps / games. Thanks for pointing it out.

> I never stop being impressed at how often people will jump to odd, unsupportable, conclusions like, "using MySQL will make this thing faster".

That is a good opportunity to ask questions and learn about the person, their knowledge and their use case.

Yeah sometimes the answer is they don't know how stuff works underneath and are just following a blog post or a trend or heard someone talk about it at meetup.

Programmers love programming tools / frameworks / languages. There are so many new shiny things out there. New ones pop up every month. New language, new build system, new db, new UI library, new message queue etc. Interstingly over the years, lots of those have had great performance improvements. Maybe they took advantage of new kernel API or some new algorithm and so on. That feeds the hype -- put new shiny things in and performance doubles.

So people have been trained to expect all the new things to just magically be better. But like said, you have to probe and find out the reason they propose the change, only then you can only see if it is justified or not (one of the resonable responses could be "I've actually measured with a production load and it does look better"),

So many times I see people handing out a root password to edit /etc/password. Sure, it may be crazy for performance but, even a trivial number of users and aliases are more easily managed with a remote mysql client. Everyone that has used postfix in a production environment knows the pain otherwise, the rest feel free to vote down.

There are practically countless ways to delegate management of mailboxes and users without granting root. That's not even a hard problem to solve, and certainly not one that justifies introducing a hugely complex additional variable to the equation.

Anyway, I'm not saying "never use MySQL for mail users" (though, I think the percentage of deployments where it makes sense is closer to none than it is to one), I'm just trying to make the point that MySQL is, in some folks minds, a magical solution to performance problems. Often, it not only introduces needless complexity, it won't even improve performance. It's a classic example of "when all you have is a hammer, everything starts to look like a nail". MySQL is a very fine hammer. It just isn't the right tool for every job.

The article we're talking about is another case where a little knowledge is a dangerous thing. A desktop app serving one user with a tiny data set (as I understand it, we're talking about the metadata for a person's music collection) is exactly the right workload for SQLite. I'd be shocked if a naive port to MySQL were faster (though they acknowledge that there's room for query optimization), and not at all surprised if it were slower. And, I know it'll require more memory and disk space for the same working set.

> So many times I see people handing out a root password to edit /etc/password.

Shouldn't user groups give a bit more accountability there?

The next step up is userdb though, not MySql

Why MySQL and not LDAP?

> Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!

Amarok music player on KDE linux desktop environment (at least on version 3.5) kept the music collection metadata in database. You could choose sqlite, mysql or postgress during installation if I remember correctly. It worked ok, because of the package system.

I guess if you used separately installed databases there would be some conflicts, though.

Not faster perhaps but with sqlite3 I had locking issues and after I switched to mysql I don't. It's really that simple. The app was used by a small office of 40 people at first, then started being used by close to a hundred people and that's when the locking errors began.

That's a pretty big difference in scale. The app in the original article seems to be a desktop application, not a client/server system with many users. There's plenty of use cases where MySQL (or PostgreSQL) is a great choice. Yours is probably one of them.

Were you using wal mode with sqlite? It helps a lot with locking.

I don't even know what it is so I wish I had looked it up first but now it's too late. Also mysql will aide in later replicating the application between different locations to ensure availability for each office.

Yeah the WAL basically makes "concurrent processes reading/writing to the same database" work magically, where as without you'll get all sorts of issues and timeouts. I don't know why it isn't enabled by default, or at least more prominently advertised as an option.

Initially, WAL mode was off by default because older versions of SQLite do not support it, and it is a property of the database file. Thus, a database created in WAL mode would be unreadable by older SQLites. But WAL has been available for 6 years now, so it might be reasonable to make it the default. We will take your suggestion under consideration. Thanks.

Kodi media player uses mysql and runs on most platforms. As an end user mysql is completely invisible

No, Kodi uses SQLite internally (I've fixed garbage in its databases with the sqlite CLI tool before). It does also support external MySQL, though.

Wow, the article is such a fresh breath of air, primarily because the author demonstrates common sense.

He (they?) picked SQLite for all the correct reasons:

- best tool for the job for their situation;

- write-light and read-heavy;

- zero configuration;

- easy to embed;

- understanding that optimizing queries by far gives the best performance in the shortest amount of time.

As an aside, I'm currently using SQLite for Bacula and Postfix, and it's a joy to use; the only drawback I found so far is lack of REGEXP REPLACE in the SQL dialect which the database supports (must be loaded with .load /path/to/lib/libpcre.so, but it is not part of the language). I used the Oracle RDBMS for my PowerDNS deployments, but in retrospect, the way PowerDNS works, SQLite would have been an even better match. All in all, it is great to read that someone picked it for all the correct reasons, rather than some fashion trend, as is often the case in computer industry.

Premature optimization is evil, but preemptive optimization is necessary unless you want to paint yourself into a corner. I realized this after implementing a bitcoin full node.

In my bitcoin implementation, as an experiment, I tried storing the blockchain in sqlite, postgres, and leveldb. I gathered up a bunch of data from the first ~200k blocks of the blockchain and benchmarked all three databases. I queried for something like 30,000 utxos out of a set of a couple million. What took 300-400ms in leveldb took 1.6 seconds in postgres (on the repl. in my actual node it would have taken longer due to deserialization of the utxos). What took 1.6 seconds in postgres took over 30 seconds in SQlite.

Now, you can tell me I did the benchmarks wrong, and "oh, if you just did this it would be faster!", but 30+ seconds is slower to an absolutely insane level. Needless to say, I went the key-value store route, but I was still astounded at how slow sqlite was once it got a few million records in the database.

I actually like sqlite, but when you know you're going to be dealing with 70gb of data and over 10 million records, preemptive optimization is the key. If I were the author, I would consider switching to postgres if there are over 500k-1m records to be expected. That being said, if they're partial to sqlite, SQLightning (https://github.com/LMDB/sqlightning) looks pretty interesting (SQLite with an LMDB backend).

edit: To clarify, these weren't particularly scientific benchmarks. This was me timing a very specific query to get an idea of the level of data management I was up against. Don't take my word for it.

What operations were you doing in your benchmark? What was SQLite actually doing (CPU?, disk I/O?) while taking 30 seconds to finish? This would be a lot more useful and less FUDdy with more detail.

I'll clarify here: these weren't very scientific benchmarks, as I alluded to. So, don't take my word for it. I didn't measure ops per second, I didn't take into account system load or cpu usage, etc. This is all anecdotal evidence. I'm not saying this an announcement that "you shouldn't use sqlite". I measured the time of one query.

It was a benchmark I ran just to personally give me a general idea of what I was up against in terms of data management. This is primarily why I didn't post them in the first place. This was several months ago. I'll try to find the code that ran them and put them up on github if I can, but I doubt it will be that useful without the actual data, which isn't easy to upload.

So yes, to clarify, don't take my word for it. This is just my experience.

I would be interested in a repo or post about your setup for the actual project; downloading, storing and using blockchain data. Sounds super interesting so if it(article, repo, site) isn't private & exists, would be keen. Cheers.

Not the poster, but I did some benchmarks with embedded databases and Python if you're interested in completely useless, unscientific data:


The project itself an alternative bitcoin full node, one of the few (along with btcd, bitcoinj, etc), and the only one that runs in the browser: http://bcoin.io/browser.html. I've posted it here before but no one seemed to be interested.

It's probably not good if you want to index anything more than addresses, but you could easily modify it to index other things if you wanted to. The actual blockchain database resides here: https://github.com/bcoin-org/bcoin/blob/coinviews5/lib/bcoin...

^ That's the current branch I'm working on which, as it happens, optimizes a lot of the storage of utxos. Before, it was storing them the messy bitcoinj way. Now it uses coin viewpoints similar to bitcoind and btcd.

this is super awesome, starred the repo. thanks!

I've found sqlite's performance on bulk insertions to be massively (100x) improved by wrapping the bulk insertion in a transaction.

fsync()ing a couple hundred thousand individual INSERTs isn't fast.

Batching inserts, wrapping the batches in transactions, waiting til after to add indexes: best way to go.

Journaling mode and sync modes etc, also make a big difference: http://www.sami-lehtinen.net/blog/sqlite3-performance-testin...

Those optimizations also apply to PostgreSQL.

It would still allow you to use a simpler solution and not paint yourself in a corner if you optimized on sqlite; no preemptive optimization necessary wrt choice of dbms if you can perform the work on the simpler solution.

Optimizing in SQLite is just as much of a sunk cost as optimizing in PostgreSQL if you switch away to another database.

> What took 300-400ms in leveldb took 1.6 seconds in postgres (on the repl. in my actual node it would have taken longer due to deserialization of the utxos). What took 1.6 seconds in postgres took over 30 seconds in SQlite.

I'm sorry to say but you're almost certainly doing something wrong then.

It's possible, but it was a very simple database layout. Pretty hard to screw up. Then again, I screw up simple things all the time. But whatever I screwed up would likely have been duplicated on the postgres side since the sqlite and postgres setups were nearly identical. Postgres performed fine compared to sqlite. It was a bit behind leveldb, but I figure the overhead of flushing the data to a socket has an added cost. Leveldb has the advantage of being in the same process.

If I remember right: Querying for leveldb was iteration from [20-byte-hash][lo-utxo-id] to [20-byte-hash][hi-utxo-id] and subsequent lookups on the keys containing the actual data. The sql table setup was `id` as a char primary key and `address` as a char with an index. The query was a simple `select data from utxos where address = ?`. The actual `data` blob was on average maybe 60 bytes in size.

Maybe there was something I could have done better there. I'm not sure. This was just my experience. This is all beside the point anyway. My point was whatever database was better, it was worth testing each one, and I don't consider it to be premature optimization.

`select * from utxos where address =?` should not take 300ms no matter the size or type of database (assuming there is an index on address).

Not really. If your table does not fit into memory, there is a good chance that a disk seek is required. Now all bets are off.

By table you mean index, right ? What matters most is the size of the index, and most index data structures (see btrees) work so that very few disk seeks are needed, even if the index is large.

Perhaps if you're using a really slow spinning disk and have no indexes, or the results are distributed evenly across the entire dataset (i.e lots of random, non sequential access).

It's just 300ms is really slow, even with a largeish dataset that doesn't fit into memory. Perhaps you hit a corner case in some way that destroyed performance in sqlite, but I'd be surprised if those results were representative.

Until there are millions of records grep will win

With relational databases, you absolutely can see subsecond to 30+ second ranges for the same query on the same data, if you don't have proper indexing or stats on your tables.

Indeed! An index can be the difference between a 2-5minute full table scan and a subsecond query.

Depends from your disk system and data set size ... It can also mean 2-5 hours or days versus subsecond query. Try having 6 TB database with integer column. Then select just one row from it. Either it's indexed or not. You don't need to explain that, if query ends up taking forever it wasn't indexed.

I'm curious to know how come Sqlite performed so bad on couple millions records.

Can you share what the table schema is? What the queries are? And what indices built for the Postgres and Sqlite tables?

While we're speaking of SQLite; one thing that has little exposure that could probably use more is that it now ships with Windows as a system DLL:


Between that, and packages readily available on most Linux and BSD distros out there (and, in most cases, installed by default), it's well on its way to become a de facto standard system API for relational storage.

It's amusing how Microsoft's different departments keep trying to kill off each other. They spent so much time trying to shove SQL Server Express down everyone's throat, and now everyone gets SQLite included instead.

SQLite is not a competitor to SQL Express, because the latter is still an out-of-proc server. It is a competitor to SQL CE (which shipped its last release to date in 2011, so...).

The nice thing about SQLite is how little it assumes about the world outside. That made it easy to run in WinRT application sandbox with minimal changes; and for quite a while, it was the only local DB readily available to WinRT code written in C# or C++ (JS got IndexedDB).

Also, Visual Studio's Intellisense replaced the SQL server compact single database file format backend with sqlite!

If to speak about desktop applications then any embedded DB will be unbeatable.

So I am speaking about embeddable DBs here.

Konstantin Knizhnik have implemented impressive set of various embedded DBs: http://garret.ru/databases.html

Like his POST++ has direct mapping to C++ classes so if you use C++ then you don't need any ORM.

In my Sciter[1] Engine I am using his DyBase library [3] as a bult-in persistence for Sciter's script [2] (JavaScript++).

With the DyBase in script you have features similar to MongoDB (noSQL free-form DB) but without any need for ORM and DAL - you can declare some root object as be persistable and access those data trees as if they are JavaScript objects. The engine pumps objects from DB into memory when they are needed:

  var storage = Storage.open(...);
  var dataRoot = storage.root; // all things inside are persitable
  dataRoot.topics = []; // flat persistable list
  dataRoot.topics.push({ foo:1, bar:2 }); // storing object
  /* create indexed collection with string keys, keys can be unique or not */ 
  dataRoot.titles = storage.createIndex(#string);
DyBase has Python bindings too.

[1] http://sciter.com - multiplatform HTML/CSS UI Engine for Desktop and Mobile Application

[2] TIScript - http://www.codeproject.com/Articles/33662/TIScript-Language-...

[3] DyBase - http://www.garret.ru/dybase.html

I was unfamiliar with this project and assumed it was a hosted service at first. Not so, this is a local application, so an embedded database makes sense.

It took until the very last paragraph for the blog post to make that point.

FWIW, I've run SQLite in a few production sites (low 6 figure pageviews per month) and it has worked fantastically. If you understand and work with the limitations, it really is amazing how much you can get out of it.

I'm actually surprised WordPress hasn't ever moved over to it for ease of installation/deployment - WordPress and PHP seem more likely to trip over in most deployments I've seen before SQLite would.

> I'm actually surprised WordPress hasn't ever moved over to it for ease of installation/deployment

For the same reasons Wordpress hasn't moved to Postgres or doesn't include feature X,Y or Z. Wordpress has to remain relatively stable. A change in the database means breaking Wordpress ecosystem, as many plugins add tables to the DB. There is very little abstraction when it comes to Wordpress API and its interaction with the DB. Wordpress doesn't ship with an ORM.

What are the limitations? I love SQLite, and have vaguely heard that it has issues with concurrency, but what, exactly? I use it on production for www.tithess.gr and it seems to be working beautifully, no concurrency problems whatsoever there.

What problems should I be expecting in a multi-access scenario? I've never had that question answered adequately.

Only one process can have a SQLite database file open for writes at a time. Multiple processes/threads can read, however.

That's about it. I'm hesitant to describe it as an "issue" with concurrency because that has connotations that it's a problem with SQLite that the authors should address. Rather, it's part of the simplicity that is a key design feature of SQLite.

As SQLite's own documentation[0] says, it doesn't compete with client/server databases like Oracle/Postgres/MySQL - it competes with fopen() (edit: not a system call, see below), hand-maintained serialization/pickling formats, etc.

0: https://www.sqlite.org/whentouse.html

> it competes with the fopen() system call

Nitpick: fopen() is not a syscall, it's part of stdio which is in libc (in user mode). The nearest POSIX syscall equivalent is open(2), but stdio does things like buffering and formatting in user mode on top. It's also more portable to non-Unix because stdio is in the C language spec.

I had heard this phrase before as simply "it competes with fopen()".

Andrew has already spoken to the single writer matter. Another issue is it's not very easy to scale horizontally at all. You could shard, but you're not going to be doing replication very easily (although a project does exist to provide replicated SQLite - https://github.com/rqlite/rqlite - but then you might as well finally use Postgres or whatever).

My attitude to this is if my project can be working well and turning a profit within SQLite's limitations, we can worry about migrating to a different stack later on. The same reason I use Ruby for almost everything initially too.

Consider multi-user wordpress site for example. You can have one SQLite DB per user - there is simply no information to share between users. If so you can split all your users between multiple backend machines and do per user request routing - like one machine serving users from A to F. SQLlite has cheap DB loading sequence so you can open/close it for each page request. That would be perfect horizontal scaling as all your users will not fight for single DB access.

well, you kinda just move the problem to another layer. You'll face same horizontal scaling issues once you need to scale on each user.

Or just remove one [DB] layer completely. Such databases can be stored directly on machines executing http requests. And if you need to rebuild DB structure you don't need to stop the world - do them one by one.

Of course, all this depends on amount of data you need to store for each user and informational structure of the app.

You cannot have multiple writers to a SQLite database. Only a single file descriptor may be open with 'write' access. As long as you can get the performance you need out of a single writer, then you're good!

Clarification: You can have multiple connections (aka file descriptors) open on the same database file for writing at the same time. But only one can be actively writing at a time. SQLite uses file locking to serialize writes. Multiple writers can exist concurrently, they merely have to take turns writing.

WAL allows multiple readers and writers:


Yes, but in some cases it could be faster to work in sole- writer mode. It is kind of cooperative multitasking demonstrating by Node.js.

I thought sqlite didn't allow any readers while a write was in progress, but apparently that is no longer the case https://www.sqlite.org/wal.html

It only supports a subset of ALTER TABLE and the work around for the limitations is very annoying: https://www.sqlite.org/lang_altertable.html

I've run into this issue before with Django's automatically generated migrations breaking in SQLite. As a workaround, I rm my local test db or make the change myself.

The worst thing that I experienced was actually the lack of basic support for routine schema alterations that you find in other DMBS's. I don't have time to recall exactly but relatively mundane things like altering the names or data types of columns required basically dropping and rebuilding the columns / tables from scratch. It really made the ongoing maintenance quite painful compared to other DBMS's.

Your entire database is in one file, so you can only write from one process at a time. You can read from multiple processes at a time though.

From other limitations there can be the aspect of security. If you have a more complex application / set of applications you may want to use different database users for different purposes. Maybe some audit / append-only tables as well.

With sqlite whoever controls the app can do whatever they want with the database file.

Two questions:

Wouldn't a "full" RDBMS like Mysql/Postgres offer a ton of benefits over SQLite (like the features to handle edge cases as they arise) to the point where, even if SQLite would work, SQLite still wouldn't be the ideal choice?

Does wordpress completely abstract the database or do third-party plugins use their own interfaces to the database, in which case a migration to SQLite would break a lot of them?

Wordpress does not abstract the database, so migration to SQLite would break tons of plugins.

Wordpress is also a PHP application, and it's common to have multiple PHP processes running in parallel, which makes it much harder to use SQLite.

Wordpress does attempt some abstraction of the database via its various wrapper classes/functions like wpdb, WP_Query, etc. It's definitely not ORM-level abstraction by any stretch of the imagination, though.

Yeah I've written many small websites which, such as a dynamic DNS service[1], which use SQLite as their sole storage.

Providing there aren't too many updates at once, such that locking becomes a problem, it works really really well.

[1] - http://dhcp.io/

Thanks; I've added a phrase to the intro section to mention that beets is a desktop app.

Agreed. Not that the 3 items are not relevant, but the fact that it's a local application sounds to me like the most decisive argument in favor of an embedded DB.

Presumably the target audience is people asking why his application does not use MySQL, and not people who have never heard of it at all.

SQLite also does remarkably well with recovering from all manner of power loss / crashes / worst case scenarios. We created a "power loss" rig just to test this facility for one particular system. Really SQLite's biggest weakness is concurrency, and if your app needs that in any serious amount you probably ought to look elsewhere. If you're just dealing with occasional concurrency though SQLite shouldn't be dismissed out-of-hand.

The breadth and depth to which SQLite is tested[0] is both admirable and inspiring.

It's not just "internal" tests, like unit tests or things that can be run from client C programs, but tests of all kinds of hard-to-simulate external situations like out-of-memory and power failure situations.

[0]: https://www.sqlite.org/testing.html

That is simply amazing write-up. I'd still not call it a high-assurance system given some things missing. Yet, the amount of rigor in the testing phase could easily be the baseline for that niche in that it exceeds about anything I've seen. There's basically so little I could suggest on code or testing side that I'm not going to even bother here given how marginal it would be due to effort already put in. Just too well-done for mere speculations.

I also noted that the assert section essentially does Design-by-Contract. This is a subset of formal specification that's prime value is in preventing interface errors (vast majority in big apps) and supporting formal verification. Done in design/spec phase in high-assurance since both Edsger Dijkstra and Margaret Hamilton independently discovered technique's value. Done at language-level since Eiffel due to Bertrand Meyer. Good to see that, even if not all techniques, they're doing the 80/20 rule to get most benefit out of what formal specs they're using. Also allow you to easily enable run-time checks if you can accept performance hit. Nice.

I gained a newfound respect for SQLite recently after reading Dan Luu's post on file consistency (http://danluu.com/file-consistency/). I had always thought of SQLite as a bit of a toy database, but having read that post I was surprised by how rigorously it appears to have been developed.

Reading that post makes me want to investigate LMDB as an alternative "competition for fopen()". I'd be very interested if anyone had opinions/experience on that idea.

(Disclaimer: LMDB author here) When your data is simple, too simple for SQL, LMDB is the rational choice. (And as others have already pointed out, you can use SQLightning, SQLite built on LMDB, if you really want the relational data model.)

Nothing else even approaches LMDB for small footprint, efficiency, and reliability.

It works very well for concurrent readers and can handle one concurrent writer when set to WAL mode. Saying that it's no good at concurrency is selling it short. It's blazing fast as long as the use case doesn't require multiple concurrent writers. In fact, it will be faster than client-server databases when you aren't hitting a weak point like that, since it has no IPC overhead.

Classic HN bait.

You don't even need to read the comments to know what people will say:

"SQLite is a great fit for this type of application. It's a replacement for fopen people. fopen."

"What about ALTER TABLE?"

"It's just a toy database, it doesn't even support concurrent writers"

----- "WAL mode"

"Hey, golang, rqlite"

----- "Whoa I wrote something similar for a ..."

----- "Why would you use this? Just use postgres"

"SQLite is the best database ever"

"SQLite is the worst database ever"

For any database that isn't huge, a library embedded into your application is going to be faster than anything that has to communicate with a server over a socket connection. Though both execute SQL queries, SQLite is completely different than relational database servers and appropriate many places where running a full RDBMS is not. For example, you can't run MySQL or Postgres on the iPhone, but you can use SQLite.

I've found SQLite absolutely amazing for getting the power of SQL for R data frames.

Could you expand on some of your use cases? An R data frame, by definition, has to fit into memory, so it would seem that any sort of map/filter/group/fold operation would be fastest if performed in-memory, as well. And I assumed that e.g. joining data frames (where you would run out of memory really quickly if your datasets are large to begin with) would be uncommon... am I wrong?

Depends on what you mean by uncommon.

I for example often need to score/model data which doesn't fit in RAM (on my PC) so I use libraries like bigGLM which can use out-of-memory data to build the models. One of the options is SQLite, but you can use an ODBC connection.

Additionally, I can explore slices of the data, which resides only on disk. I don't even need to import it. I can use dplyr (very famous package for aggregations and slicing) which will map the R syntax to SQL which is executed by SQLite.

The set up is quite interesting. The sql data frames library will embed a dataframe in sqllite and let you use sql. It has been quite useful in transitioning our SQL data analysts to R.

> we’re read-heavy and write-light

> we have almost no indices, no principled denormalization

Sounds like an easy win. People are probably suggesting a database switch because they're finding issues with the current speed, but they're not using their current database to its full potential yet.

Is the bottleneck even the database in the first place? From the article, that doesn't seem to be the case:

> The main case when beets writes to its database is on import, and import performance is dominated by I/O to the network and to music files.

Small file I/O is universally slow, even on SSDs, and if you're hitting rate-limited MusicBrainz servers for each file, database performance is almost irrelevant.

lots of people use beets to query their music to build playlists. those cases don't require network access.

This appears to be a desktop application, so SQLite seems like a good fit. Although if some people are requesting it perhaps they have their reasons. For example if I had RDBMS already setup with proper backups I would generally prefer to use that, since there's no effort to make yet another allocation use it. It most likely would also enable me to have access to the same databases from multiple computers. With SQLite you would have to solve the same problems again (e.g. store the database in Dropbox account, and deal with shortcomings of that approach)

IMO instead writing how SQLite is the best choice for the project I think it would be better to add support for multiple backends. Something that is good for you and majority of users does not mean it's good for everyone.

If they don't need indices, almost anything will work. Few people have a big enough music library that the data won't fit in memory.

Sqlite is fine for small scale systems. It is not a "web scale" database, but not every web site is "web scale." SQLite does have performance limits, and will break at certain load, but until that, it's okay. For single user databases, like desktop applications, SQLite is awesome! What the others bring to the table is concurrent sever performanc, user management, and such. There's nothing surprising about this, right?

I'm a long time user and lover of SQLite, since way back when. Use it in a lot of our projects (web and Win32) that require local databases for logging etc.

Sure for larger or concurrent user access to a db, we use other databases, but nothing beats the 'zero footprint' install of SQLite. I even enjoyed the fact that earlier versions of SQLite forced me to think optimise my queries due to the lack of nested SELECTs.

SQLite still kind of reminds me of my early days in MySQL. I was recently trying to downgrade MySQL 5.7 to 5.6 after a slew of issues, which forced me to reminisce about how simple things used to be when MySQL was a lot like SQLite still is now...

I find that Firebird has been pretty nice as well, with the option to move from embedded to server.

And much overlooked. It doesn't get half the love it deserves. While it could do with some work around making the sysadmin experience a bit better and have a way of aliasing short 'public' database names to the file-system level database, it's quite a good RDBMS.

About a decade ago, I wrote a system of intermittently connected systems that would run independently and sync up to a central db, all using firebird (uuid custom type) and a few utilities that ran via C# (mono under suse). It worked surprisingly well for the purpose it was designed.

Firebird was definitely nice in terms of being able to utilize the same db from the local systems (embedded) to the centralized server (service mode). As you say, it could use some love, but some of that could be done via symlink and/or consistent structure (/var/firebird/db/*). I haven't even looked at it in a while, wonder how hard it would be to use with electron/node, may have to look.

Beet is an awesome program, you should really check it out if you still are among the minority of people who actually have a music collection and don't rent access from spotify/itunes/etc.

I'm glad to see this post; one of the reasons that I like beet so much is that everything is self-contained.

You're doing it right for your application! MySQL or PostgreSQL would most probably be slower and introduce a lot more overhead as they are client/server oriented systems. Don't listen to those armchair architects!

I don't get the point of this article. SQLite is fine, especially in an embedded database, but once you have concurrent access, it starts to suffer because it has very coarse grained locks, so a "real" database is better for a distributed single-DB design. It's more about using the right tool for the job, and the author seems to be talking himself out of some kind of guilt for SQLite being the right tool for him.

I think you're misreading the post. They're just trying to explain why sqlite is the right tool for their particular job, so they don't have to keep explaining it to other people who keep trying to shoehorn in the wrong tool. Lack of concurrency in their app is specifically mentioned as a reason sqlite is appropriate. There's no guilt there.

Yep! This suggestion comes up shockingly often, and I got tired of re-explaining my reasoning every time.

I can't blame them. I've been a huge fan of SQLite for years. Anytime I need storage it's my default choice unless there is a specific reason to use something else.

Another nice advantage of it is if you are distributing something that requires a small dataset[0][1]. If I give you both the code and the data already imported into a sqlite database, then you can use the code right away, or you can dump the data to a different database very easily.

[0] https://github.com/jedberg/wordgen

[1] https://github.com/jedberg/Postcodes

What people actually seriously suggest that a desktop application needs more than sqlite offers in the way of databases?

Desktop apps are like the sweet spot for sqlite. It's practically made for them.

Suggesting you add a server dependancy to your desktop app as a solution to a problem that isn't there is pretty braindead.

I have used SQLite for similar use cases, but occasionally it's led to a corrupted db. I had a cron task writing to it once a day, but an issue with the scheduler led to 2 tasks one day with the latter one finishing before the former.

Of course I can add locking or something in my code, but I'd prefer to handle at a lower level — for example, have SQLite take the latest write without corrupting. I'm hoping someone has solved this problem with SQLite elegantly.

SQLite is supposed to handle multiple concurrent writes by returning SQLITE_BUSY. I'm imagining hypothetical other causes for your corruption problem, like power loss at a bad moment.

SQLite is designed to handle power loss at bad moments (indeed, anything that calls itself a "database" ought to be resilient against power failure).

Relevant: https://www.sqlite.org/howtocorrupt.html

The article you linked has an interesting line: "Unfortunately, most consumer-grade mass storage devices lie about syncing." That's the kind of problem I was talking about.

This is helpful; it's possible I'm assuming too much. The corruption did happen only once and was also during a time period that the server (a PaaS) was running maintenance including some downtime.

I may be imagining things but I seem to recall that at one point you had to build your own SQLite to get safe concurrent writes, but that was made the default years ago.

Haven't used SQLite in a while, but this is how I did it.

You should enable WAL (pragma journal_mode=wal) I believe you need to do this on every connection.

Now, whenever you are accessing the data do it within transactions (https://www.sqlite.org/lang_transaction.html) generally SQLite will acquire proper locks when needed (it will minimize amount of locked time by deferring locks until they are needed, but you can use immediate or exclusive modifiers. If you for example put exclusive modifier only one program will be able to read/write at the time, you generally will want to use defer (default) behavior since it still provides the safety but multiple applications still can access the database at the same time)

Also note that locking might not work if you keep the database on network file system.

I didn't know beet, but it looks exactly like what I've been wanting for years.

Even a file can be convenient. It's all about how you integrate it into the system.

Some people are unfamiliar with the phrase "right tool for the job".

As the developers behind the project, I'd have to think the authors are in the best position to make the determination about which tool is appropriate.

"right tool for the job" ends up being almost meaningless unless the players involved have enough experience with multiple tools to make a useful judgement. Too often "right tool" is "whatever I've already used before". :/

That is true. I failed to adjust my perspective for the fact that not everyone has nearly thirty years of programming experience given I still consider myself an amateur.

Still we should recall the utility gained in using the right tool is finite and variable. In some cases can be like night and day while in other cases a moderate or negligble improvement might be all you see. Under some circumstances familiarity with a given tool may outweigh the advantages of using a better suited tool - especially if a team lacks well-rounded expertise.

Reinforcing the other side of the argument is the fact that programmers are often subject to constraints placed on them from on high that restrict the choices they might make in such.

Given the specific situation, the author seems to build a pretty good case for SQLite at least by my own limited understanding of the facts.

The "constraints" part is def something to remember. I've done a lot of short term consulting or training over the years, and I'll come in to a new team and see what they're doing. The "right tool" line is often used to justify something, and it's often either because it's the only thing person X knew, but often the 'right tool' might take an extra several weeks to get the team up to speed on. It's definitely the 'right' tool, but they're denied that decision.

My reaction to "right tool" was less about this particular article (and I agree with you) as it was to the phrase in general. I've seen it used to justify just about any tech decision ever made.

The other reality is... if you're a C# shop, with a team of 15 C#/.NET devs, the 'right tools', if there's any deadline at all, will probably involve Windows, C# and .NET tools, even if they are demonstrably inferior to, say, a Linux-based stack. Deadlines, existing code and budgets do play a role in decision making, further confounding the usefulness of the "right tool" rule.

For everyone loving SQLite, you should consider donating to them. I remember a post this last year about the maintainers working on it full time, but making much less than most of us probably do.

I wish HTML5 storage standardized on Sqlite. The inconsistent story on HTML5 storage across browsers is kind of sad.

Implementing WebSQL was so much work that all the browser vendors just basically bundled SQLite and called it a day. The problem was that the working group required at least one other implementation, before it became a standard. But that was such a monstrous project that none of the browser vendors wanted to take it on.

What the browser vendors agreed on is that they would rather have IndexedDB. I've never used it, but they say it is a lower-level API than SQL, and using it you could build your own SQL abstraction layer above it.

So we have as standards the key-value-based localStorage and then, halfway between that and SQL, is IndexedDB.

More: https://hacks.mozilla.org/2010/06/beyond-html5-database-apis...

I participated in HTML5 specification work group at W3C as Invited Expert. We had such a proposal to add SQL storage to HTML5. But it was pretty much unanimous opinion that adding specification of any SQL flavor to the HTML spec would be too much.

Yeah, basically it would've had to specify all of SQL, or to say "do what SQLite does", and neither option was very palatable to all the parties. Some browsers [1] did implement WebSQL before it was formally abandoned, though.

[1] http://caniuse.com/#feat=sql-storage

Some browsers can use Open Source libraries, some simply cannot , for whatever reasons. Yet formally we need at least two independent implementations of the same thing in order the spec to get Recommendation status.

So each UA (a.k.a. browser) has to implement any SQL spec from scratch adding its own SQL flavor.

In fact I am pretty sure that it is possible to add some basic SQL features by JS on top of IndexedDB. It should be such things already I think.

SQLite is Ok, but write access must be synchronized. I used it for my Flask (Python) application and was forced to switch to PostgreSQL because of synchronization problems. I would prefer sticking with SQLite which was simpler to manage.

The author doesn't say a word about synchronization when writing to SQLite.

It's intended for desktop, single-user use from the command line. In such cases, you're not going to have much of an issue with write contention.

Then SQLite is obviously a better solution than MySQL and PostgreSQL.

The bigger news here is they arent using an ORM to make moving between databases trivial.

Firefox uses SQLite


This was a useful and informative post.

Where on earth do you think the author is asking for a 'pat on the back' in this?

Where was the useful information? I don't like negative comments as much as the next person, and have written many posts about Sqlite...

I'm glad the author of beet uses Sqlite, but to use a client/server database system would be ludicrous!

It's not that I am trying to hate on the guy, he had good intentions and wanted other people to listen up.

At the end of the day, though, dudes just another dev who made a reasonable decision. Bfd.

The entire thing reeks of self-congratulation on how smart he is for not engaging in "premature-optimization" for using a embedded database engine...as the database engine embedded in an single-user application.

You say it is useful, I find it bizarre.

I have written several "Use Sqlite!" posts that have made the rounds on hackernews... reading this watered down post, which is devoid of any new, surprising or usable info, it strikes me that repping SQLite has achieved meme status.

If you want tangible info you can actually use, read sqlites documentation. There's a wealth of information there.

Here are some of posts, for the Python crowd:




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact