
We’re pretty happy with SQLite and not urgently interested in a fancier DBMS - samps
http://beets.io/blog/sqlite-performance.html
======
SwellJoe
I never stop being impressed at how often people will jump to odd,
unsupportable, conclusions like, "using MySQL will make this thing faster".

I've seen it so many times over the years regarding users and email
configurations. I can't count the number of times I've dropped into someone's
badly behaving mail configuration and found they had MySQL hosting the users,
and explained it was for "performance" reasons. Somehow they didn't grasp that
/etc/passwd fits entirely in memory, and the map files Postfix uses for
various lookups and stuff are already a database (just one specifically
designed for the task at hand) and also fit entirely in memory. Putting that
MySQL layer in there is _disastrous_ if performance matters; it is orders of
magnitude slower for any case I've seen...still plenty fast for most cases,
but it's ridiculous that this idea gets cargo-culted around that if you store
your mail users in MySQL your mail server will be faster.

A little knowledge is a dangerous thing, is what I'm trying to say, and people
who know MySQL is "fast" may not know enough to know that it's not the right
tool for the job in a lot of cases...and is probably _slower_ for many use
cases. I'm pretty confident this is one of those cases. SQLite is wicked fast
on small data sets, and being smaller means more of it will fit in memory; I
can't think of any way MySQL could be a more performant choice for this
workload.

Also, I don't even want to try to imagine shipping an installable desktop
application for non-technical users that relies on MySQL!

~~~
mianos
So many times I see people handing out a root password to edit /etc/password.
Sure, it may be crazy for performance but, even a trivial number of users and
aliases are more easily managed with a remote mysql client. Everyone that has
used postfix in a production environment knows the pain otherwise, the rest
feel free to vote down.

~~~
SwellJoe
There are practically countless ways to delegate management of mailboxes and
users without granting root. That's not even a hard problem to solve, and
certainly not one that justifies introducing a hugely complex additional
variable to the equation.

Anyway, I'm not saying "never use MySQL for mail users" (though, I think the
percentage of deployments where it makes sense is closer to none than it is to
one), I'm just trying to make the point that MySQL is, in some folks minds, a
magical solution to performance problems. Often, it not only introduces
needless complexity, it won't even improve performance. It's a classic example
of "when all you have is a hammer, everything starts to look like a nail".
MySQL is a very fine hammer. It just isn't the right tool for every job.

The article we're talking about is another case where a little knowledge is a
dangerous thing. A desktop app serving one user with a _tiny_ data set (as I
understand it, we're talking about the metadata for a person's music
collection) is exactly the right workload for SQLite. I'd be shocked if a
naive port to MySQL were faster (though they acknowledge that there's room for
query optimization), and not at all surprised if it were slower. And, I _know_
it'll require more memory and disk space for the same working set.

------
Annatar
Wow, the article is such a fresh breath of air, primarily because the author
demonstrates common sense.

He (they?) picked SQLite for all the correct reasons:

\- best tool for the job for their situation;

\- write-light and read-heavy;

\- zero configuration;

\- easy to embed;

\- understanding that optimizing queries by far gives the best performance in
the shortest amount of time.

As an aside, I'm currently using SQLite for Bacula and Postfix, and it's a joy
to use; the only drawback I found so far is lack of REGEXP REPLACE in the SQL
dialect which the database supports (must be loaded with .load
/path/to/lib/libpcre.so, but it is not part of the language). I used the
Oracle RDBMS for my PowerDNS deployments, but in retrospect, the way PowerDNS
works, SQLite would have been an even better match. All in all, it is great to
read that someone picked it for all the correct reasons, rather than some
fashion trend, as is often the case in computer industry.

------
chjj
Premature optimization is evil, but _preemptive_ optimization is necessary
unless you want to paint yourself into a corner. I realized this after
implementing a bitcoin full node.

In my bitcoin implementation, as an experiment, I tried storing the blockchain
in sqlite, postgres, and leveldb. I gathered up a bunch of data from the first
~200k blocks of the blockchain and benchmarked all three databases. I queried
for something like 30,000 utxos out of a set of a couple million. What took
300-400ms in leveldb took 1.6 seconds in postgres (on the repl. in my actual
node it would have taken longer due to deserialization of the utxos). What
took 1.6 seconds in postgres took over _30 seconds_ in SQlite.

Now, you can tell me I did the benchmarks wrong, and "oh, if you just did this
it would be faster!", but 30+ seconds is slower to an absolutely insane level.
Needless to say, I went the key-value store route, but I was still astounded
at how slow sqlite was once it got a few million records in the database.

I actually like sqlite, but when you know you're going to be dealing with 70gb
of data and over 10 million records, preemptive optimization is the key. If I
were the author, I would consider switching to postgres if there are over
500k-1m records to be expected. That being said, if they're partial to sqlite,
SQLightning
([https://github.com/LMDB/sqlightning](https://github.com/LMDB/sqlightning))
looks pretty interesting (SQLite with an LMDB backend).

edit: To clarify, these weren't particularly scientific benchmarks. This was
me timing a very specific query to get an idea of the level of data management
I was up against. Don't take my word for it.

~~~
mappu
I've found sqlite's performance on bulk insertions to be massively (100x)
improved by wrapping the bulk insertion in a transaction.

fsync()ing a couple hundred thousand individual INSERTs isn't fast.

~~~
lomnakkus
Those optimizations also apply to PostgreSQL.

~~~
Programmatic
It would still allow you to use a simpler solution and not paint yourself in a
corner if you optimized on sqlite; no preemptive optimization necessary wrt
choice of dbms if you can perform the work on the simpler solution.

~~~
lomnakkus
Optimizing in SQLite is just as much of a sunk cost as optimizing in
PostgreSQL if you switch away to another database.

------
int_19h
While we're speaking of SQLite; one thing that has little exposure that could
probably use more is that it now ships with Windows as a system DLL:

[https://engineering.microsoft.com/2015/10/29/sqlite-in-
windo...](https://engineering.microsoft.com/2015/10/29/sqlite-in-windows-10/)

Between that, and packages readily available on most Linux and BSD distros out
there (and, in most cases, installed by default), it's well on its way to
become a de facto standard system API for relational storage.

~~~
creshal
It's amusing how Microsoft's different departments keep trying to kill off
each other. They spent so much time trying to shove SQL Server Express down
everyone's throat, and now everyone gets SQLite included instead.

~~~
int_19h
SQLite is not a competitor to SQL Express, because the latter is still an out-
of-proc server. It is a competitor to SQL CE (which shipped its last release
to date in 2011, so...).

The nice thing about SQLite is how little it assumes about the world outside.
That made it easy to run in WinRT application sandbox with minimal changes;
and for quite a while, it was the only local DB readily available to WinRT
code written in C# or C++ (JS got IndexedDB).

------
c-smile
If to speak about desktop applications then any embedded DB will be
unbeatable.

So I am speaking about embeddable DBs here.

Konstantin Knizhnik have implemented impressive set of various embedded DBs:
[http://garret.ru/databases.html](http://garret.ru/databases.html)

Like his POST++ has direct mapping to C++ classes so if you use C++ then you
don't need any ORM.

In my Sciter[1] Engine I am using his DyBase library [3] as a bult-in
persistence for Sciter's script [2] (JavaScript++).

With the DyBase in script you have features similar to MongoDB (noSQL free-
form DB) but without any need for ORM and DAL - you can declare some root
object as be persistable and access those data trees as if they are JavaScript
objects. The engine pumps objects from DB into memory when they are needed:

    
    
      var storage = Storage.open(...);
      var dataRoot = storage.root; // all things inside are persitable
      dataRoot.topics = []; // flat persistable list
      dataRoot.topics.push({ foo:1, bar:2 }); // storing object
      /* create indexed collection with string keys, keys can be unique or not */ 
      dataRoot.titles = storage.createIndex(#string);
    

DyBase has Python bindings too.

[1] [http://sciter.com](http://sciter.com) \- multiplatform HTML/CSS UI Engine
for Desktop and Mobile Application

[2] TIScript - [http://www.codeproject.com/Articles/33662/TIScript-
Language-...](http://www.codeproject.com/Articles/33662/TIScript-Language-A-
Gentle-Extension-of-JavaScript)

[3] DyBase -
[http://www.garret.ru/dybase.html](http://www.garret.ru/dybase.html)

------
niftich
I was unfamiliar with this project and assumed it was a hosted service at
first. Not so, this is a local application, so an embedded database makes
sense.

It took until the very last paragraph for the blog post to make that point.

~~~
petercooper
FWIW, I've run SQLite in a few production sites (low 6 figure pageviews per
month) and it has worked fantastically. If you understand and work with the
limitations, it really is amazing how much you can get out of it.

I'm actually surprised WordPress hasn't ever moved over to it for ease of
installation/deployment - WordPress and PHP seem more likely to trip over in
most deployments I've seen before SQLite would.

~~~
StavrosK
What _are_ the limitations? I love SQLite, and have vaguely heard that it has
issues with concurrency, but _what_ , exactly? I use it on production for
www.tithess.gr and it seems to be working beautifully, no concurrency problems
whatsoever there.

What problems should I be expecting in a multi-access scenario? I've never had
that question answered adequately.

~~~
andrewguenther
You cannot have multiple writers to a SQLite database. Only a single file
descriptor may be open with 'write' access. As long as you can get the
performance you need out of a single writer, then you're good!

~~~
SQLite
Clarification: You can have multiple connections (aka file descriptors) open
on the same database file for writing at the same time. But only one can be
actively writing at a time. SQLite uses file locking to serialize writes.
Multiple writers can exist concurrently, they merely have to take turns
writing.

------
Feneric
SQLite also does remarkably well with recovering from all manner of power loss
/ crashes / worst case scenarios. We created a "power loss" rig just to test
this facility for one particular system. Really SQLite's biggest weakness is
concurrency, and if your app needs that in any serious amount you probably
ought to look elsewhere. If you're just dealing with occasional concurrency
though SQLite shouldn't be dismissed out-of-hand.

~~~
eudox
The breadth and depth to which SQLite is tested[0] is both admirable and
inspiring.

It's not just "internal" tests, like unit tests or things that can be run from
client C programs, but tests of all kinds of hard-to-simulate external
situations like out-of-memory and power failure situations.

[0]:
[https://www.sqlite.org/testing.html](https://www.sqlite.org/testing.html)

~~~
nickpsecurity
That is simply amazing write-up. I'd still not call it a high-assurance system
given some things missing. Yet, the amount of rigor in the testing phase could
easily be the baseline for that niche in that it exceeds about anything I've
seen. There's basically so little I could suggest on code or testing side that
I'm not going to even bother here given how marginal it would be due to effort
already put in. Just too well-done for mere speculations.

I also noted that the assert section essentially does Design-by-Contract. This
is a subset of formal specification that's prime value is in preventing
interface errors (vast majority in big apps) and supporting formal
verification. Done in design/spec phase in high-assurance since both Edsger
Dijkstra and Margaret Hamilton independently discovered technique's value.
Done at language-level since Eiffel due to Bertrand Meyer. Good to see that,
even if not all techniques, they're doing the 80/20 rule to get most benefit
out of what formal specs they're using. Also allow you to easily enable run-
time checks if you can accept performance hit. Nice.

------
coleifer
Classic HN bait.

You don't even need to read the comments to know what people will say:

"SQLite is a great fit for this type of application. It's a replacement for
fopen people. fopen."

"What about ALTER TABLE?"

"It's just a toy database, it doesn't even support concurrent writers"

\----- "WAL mode"

"Hey, golang, rqlite"

\----- "Whoa I wrote something similar for a ..."

\----- "Why would you use this? Just use postgres"

"SQLite is the best database ever"

"SQLite is the worst database ever"

------
omarforgotpwd
For any database that isn't huge, a library embedded into your application is
going to be faster than anything that has to communicate with a server over a
socket connection. Though both execute SQL queries, SQLite is completely
different than relational database servers and appropriate many places where
running a full RDBMS is not. For example, you can't run MySQL or Postgres on
the iPhone, but you can use SQLite.

~~~
greenleafjacob
I've found SQLite absolutely amazing for getting the power of SQL for R data
frames.

~~~
int_19h
Could you expand on some of your use cases? An R data frame, by definition,
has to fit into memory, so it would seem that any sort of
map/filter/group/fold operation would be fastest if performed in-memory, as
well. And I assumed that e.g. joining data frames (where you would run out of
memory really quickly if your datasets are large to begin with) would be
uncommon... am I wrong?

~~~
blahi
Depends on what you mean by uncommon.

I for example often need to score/model data which doesn't fit in RAM (on my
PC) so I use libraries like bigGLM which can use out-of-memory data to build
the models. One of the options is SQLite, but you can use an ODBC connection.

Additionally, I can explore slices of the data, which resides only on disk. I
don't even need to import it. I can use dplyr (very famous package for
aggregations and slicing) which will map the R syntax to SQL which is executed
by SQLite.

------
chillacy
> we’re read-heavy and write-light

> we have almost no indices, no principled denormalization

Sounds like an easy win. People are probably suggesting a database switch
because they're finding issues with the current speed, but they're not using
their current database to its full potential yet.

~~~
creshal
Is the bottleneck even the database in the first place? From the article, that
doesn't seem to be the case:

> The main case when beets writes to its database is on import, and import
> performance is dominated by I/O to the network and to music files.

Small file I/O is universally slow, even on SSDs, and if you're hitting rate-
limited MusicBrainz servers for each file, database performance is almost
irrelevant.

~~~
johnny22
lots of people use beets to query their music to build playlists. those cases
don't require network access.

------
jwatte
Sqlite is fine for small scale systems. It is not a "web scale" database, but
not every web site is "web scale." SQLite does have performance limits, and
will break at certain load, but until that, it's okay. For single user
databases, like desktop applications, SQLite is awesome! What the others bring
to the table is concurrent sever performanc, user management, and such.
There's nothing surprising about this, right?

------
cyberferret
I'm a long time user and lover of SQLite, since way back when. Use it in a lot
of our projects (web and Win32) that require local databases for logging etc.

Sure for larger or concurrent user access to a db, we use other databases, but
nothing beats the 'zero footprint' install of SQLite. I even enjoyed the fact
that earlier versions of SQLite forced me to think optimise my queries due to
the lack of nested SELECTs.

SQLite still kind of reminds me of my early days in MySQL. I was recently
trying to downgrade MySQL 5.7 to 5.6 after a slew of issues, which forced me
to reminisce about how simple things used to be when MySQL was a lot like
SQLite still is now...

~~~
tracker1
I find that Firebird has been pretty nice as well, with the option to move
from embedded to server.

~~~
talideon
And much overlooked. It doesn't get half the love it deserves. While it could
do with some work around making the sysadmin experience a bit better and have
a way of aliasing short 'public' database names to the file-system level
database, it's quite a good RDBMS.

~~~
tracker1
About a decade ago, I wrote a system of intermittently connected systems that
would run independently and sync up to a central db, all using firebird (uuid
custom type) and a few utilities that ran via C# (mono under suse). It worked
surprisingly well for the purpose it was designed.

Firebird was definitely nice in terms of being able to utilize the same db
from the local systems (embedded) to the centralized server (service mode). As
you say, it could use some love, but some of that could be done via symlink
and/or consistent structure (/var/firebird/db/*). I haven't even looked at it
in a while, wonder how hard it would be to use with electron/node, may have to
look.

------
nickysielicki
Beet is an awesome program, you should really check it out if you still are
among the minority of people who actually have a music collection and don't
rent access from spotify/itunes/etc.

I'm glad to see this post; one of the reasons that I like beet so much is that
everything is self-contained.

------
pskisf
You're doing it right for your application! MySQL or PostgreSQL would most
probably be slower and introduce a lot more overhead as they are client/server
oriented systems. Don't listen to those armchair architects!

------
oppositelock
I don't get the point of this article. SQLite is fine, especially in an
embedded database, but once you have concurrent access, it starts to suffer
because it has very coarse grained locks, so a "real" database is better for a
distributed single-DB design. It's more about using the right tool for the
job, and the author seems to be talking himself out of some kind of guilt for
SQLite being the right tool for him.

~~~
andrewflnr
I think you're misreading the post. They're just trying to explain why sqlite
_is_ the right tool for their particular job, so they don't have to keep
explaining it to other people who keep trying to shoehorn in the wrong tool.
Lack of concurrency in their app is specifically mentioned as a reason sqlite
is appropriate. There's no guilt there.

~~~
samps
Yep! This suggestion comes up shockingly often, and I got tired of re-
explaining my reasoning every time.

------
jedberg
I can't blame them. I've been a huge fan of SQLite for years. Anytime I need
storage it's my default choice unless there is a specific reason to use
something else.

Another nice advantage of it is if you are distributing something that
requires a small dataset[0][1]. If I give you both the code and the data
already imported into a sqlite database, then you can use the code right away,
or you can dump the data to a different database very easily.

[0] [https://github.com/jedberg/wordgen](https://github.com/jedberg/wordgen)

[1]
[https://github.com/jedberg/Postcodes](https://github.com/jedberg/Postcodes)

------
zaphar
What people actually seriously suggest that a desktop application needs more
than sqlite offers in the way of databases?

Desktop apps are like the sweet spot for sqlite. It's practically made for
them.

------
qwertyuiop924
Suggesting you add a server dependancy to your desktop app as a solution to a
problem that isn't there is pretty braindead.

------
tedmiston
I have used SQLite for similar use cases, but occasionally it's led to a
corrupted db. I had a cron task writing to it once a day, but an issue with
the scheduler led to 2 tasks one day with the latter one finishing before the
former.

Of course I can add locking or something in my code, but I'd prefer to handle
at a lower level — for example, have SQLite take the latest write without
corrupting. I'm hoping someone has solved this problem with SQLite elegantly.

~~~
dietrichepp
SQLite is supposed to handle multiple concurrent writes by returning
SQLITE_BUSY. I'm imagining hypothetical other causes for your corruption
problem, like power loss at a bad moment.

~~~
kentonv
SQLite is designed to handle power loss at bad moments (indeed, anything that
calls itself a "database" ought to be resilient against power failure).

Relevant:
[https://www.sqlite.org/howtocorrupt.html](https://www.sqlite.org/howtocorrupt.html)

~~~
dietrichepp
The article you linked has an interesting line: "Unfortunately, most consumer-
grade mass storage devices lie about syncing." That's the kind of problem I
was talking about.

------
franciscop
I didn't know beet, but it looks exactly like what I've been wanting for
years.

------
partycoder
Even a file can be convenient. It's all about how you integrate it into the
system.

------
kefka_p
Some people are unfamiliar with the phrase "right tool for the job".

As the developers behind the project, I'd have to think the authors are in the
best position to make the determination about which tool is appropriate.

~~~
mgkimsal
"right tool for the job" ends up being almost meaningless unless the players
involved have enough experience with multiple tools to make a useful
judgement. Too often "right tool" is "whatever I've already used before". :/

~~~
kefka_p
That is true. I failed to adjust my perspective for the fact that not everyone
has nearly thirty years of programming experience given I still consider
myself an amateur.

Still we should recall the utility gained in using the right tool is finite
and variable. In some cases can be like night and day while in other cases a
moderate or negligble improvement might be all you see. Under some
circumstances familiarity with a given tool may outweigh the advantages of
using a better suited tool - especially if a team lacks well-rounded
expertise.

Reinforcing the other side of the argument is the fact that programmers are
often subject to constraints placed on them from on high that restrict the
choices they might make in such.

Given the specific situation, the author seems to build a pretty good case for
SQLite at least by my own limited understanding of the facts.

~~~
mgkimsal
The "constraints" part is def something to remember. I've done a lot of short
term consulting or training over the years, and I'll come in to a new team and
see what they're doing. The "right tool" line is often used to justify
something, and it's often either because it's the only thing person X knew,
but often the 'right tool' might take an extra several weeks to get the team
up to speed on. It's definitely the 'right' tool, but they're denied that
decision.

My reaction to "right tool" was less about this particular article (and I
agree with you) as it was to the phrase in general. I've seen it used to
justify just about any tech decision ever made.

The other reality is... if you're a C# shop, with a team of 15 C#/.NET devs,
the 'right tools', if there's any deadline at all, will probably involve
Windows, C# and .NET tools, even if they are demonstrably inferior to, say, a
Linux-based stack. Deadlines, existing code and budgets _do_ play a role in
decision making, further confounding the usefulness of the "right tool" rule.

------
therealdrag0
For everyone loving SQLite, you should consider donating to them. I remember a
post this last year about the maintainers working on it full time, but making
much less than most of us probably do.

------
ww520
I wish HTML5 storage standardized on Sqlite. The inconsistent story on HTML5
storage across browsers is kind of sad.

~~~
c-smile
I participated in HTML5 specification work group at W3C as Invited Expert. We
had such a proposal to add SQL storage to HTML5. But it was pretty much
unanimous opinion that adding specification of any SQL flavor to the _HTML_
spec would be too much.

~~~
niftich
Yeah, basically it would've had to specify all of SQL, or to say "do what
SQLite does", and neither option was very palatable to all the parties. Some
browsers [1] did implement WebSQL before it was formally abandoned, though.

[1] [http://caniuse.com/#feat=sql-storage](http://caniuse.com/#feat=sql-
storage)

~~~
c-smile
Some browsers _can_ use Open Source libraries, some simply cannot , for
whatever reasons. Yet formally we need at least two independent
implementations of the same thing in order the spec to get Recommendation
status.

So each UA (a.k.a. browser) has to implement any SQL spec from scratch adding
its own SQL flavor.

In fact I am pretty sure that it is possible to add some basic SQL features by
JS on top of IndexedDB. It should be such things already I think.

------
chmike
SQLite is Ok, but write access must be synchronized. I used it for my Flask
(Python) application and was forced to switch to PostgreSQL because of
synchronization problems. I would prefer sticking with SQLite which was
simpler to manage.

The author doesn't say a word about synchronization when writing to SQLite.

~~~
talideon
It's intended for desktop, single-user use from the command line. In such
cases, you're not going to have much of an issue with write contention.

~~~
chmike
Then SQLite is obviously a better solution than MySQL and PostgreSQL.

------
Ultimatt
The bigger news here is they arent using an ORM to make moving between
databases trivial.

------
known
Firefox uses SQLite

------
coleifer
I have written several "Use Sqlite!" posts that have made the rounds on
hackernews... reading this watered down post, which is devoid of any new,
surprising or usable info, it strikes me that repping SQLite has achieved meme
status.

If you want tangible info you can actually use, read sqlites documentation.
There's a wealth of information there.

Here are some of posts, for the Python crowd:

[http://charlesleifer.com/blog/five-reasons-you-should-use-
sq...](http://charlesleifer.com/blog/five-reasons-you-should-use-sqlite-
in-2016/)

[http://charlesleifer.com/blog/using-the-sqlite-json1-and-
fts...](http://charlesleifer.com/blog/using-the-sqlite-json1-and-
fts5-extensions-with-python/)

[http://charlesleifer.com/blog/my-list-of-python-and-
sqlite-r...](http://charlesleifer.com/blog/my-list-of-python-and-sqlite-
resources/)

