
Multi-threaded SQLite without the OperationalErrors (2017) - throwaway1492
https://charlesleifer.com/blog/multi-threaded-sqlite-without-the-operationalerrors/
======
ebg13
Everything old is new to somebody.

Python recipe for multithreaded sqlite access from 2007:
[http://code.activestate.com/recipes/526618-multithread-
suppo...](http://code.activestate.com/recipes/526618-multithread-support-for-
sqlite-access/)

Which is itself based on this one from 2006:
[http://code.activestate.com/recipes/496799/](http://code.activestate.com/recipes/496799/)

A quick google search for 'python multithread sqlite' also brings up
[https://pypi.org/project/sqlite3worker/](https://pypi.org/project/sqlite3worker/)

------
lacampbell
I feel like the scenarios in which it makes sense to use SQLite for multi
users apps is growing, for a few reasons:

\- SSD hard disks are ubiquitious

\- WAL Mode: readers don't block, and multiple processes can access a single
database

\- DB-per-customer is getting more common due to privacy laws

\- The browser itself is increasingly being used as a client side cache DB in
its own right

IIRC Nomadlist uses (or used) sqlite. Wonder how many more web sites are using
it in anger.

~~~
bob1029
I am 100% on board with this sentiment. We are already exclusively using
SQLite to back our B2B application and it has absolutely zero issues even
under heavy workloads. I would say its easily faster (especially in terms of
request latency) than the SQL Server instance we previously-utilized for the
same purpose. WAL is an absolute must if you are dealing with any sort of
concurrency. As long as you can satisfy the following constraints, I see zero
reason to use anything other than SQLite:

\- Only one process will ever own the authoritative copy of the database at a
time.

\- You don't have any near-term plans to cluster or otherwise go wide on your
persistence layer across multiple processes. Keeping in mind that even if you
do need to do this, it shouldn't be a huge deal to import your SQLite files
into something that natively supports clustering like Postgres, or use some
partitioning strategy that can distribute SQLite databases across multiple
nodes.

I feel like a lot of misinformation has been put forth that would make SQLite
seem unacceptable for larger-scale concurrent business applications, but every
benchmark I've personally ran against SQLite makes every other database engine
look like a joke (within the constraints laid out above; obviously a 30 node
Postgres cluster is going to destroy a single instance of SQLite). You just
need to understand a few magic PRAGMAs to get things up and running (journal
mode = WAL), because it certainly isn't capable of any of this directly out of
the box.

I don't know about the 'anger' aspect of it, but I certainly feel dejected by
the current approach that most take regarding business object persistence.
SQLite feels like a way to sidestep an entire universe of non-value-add
concerns (creating a separate sql instance, networking, security,
configuration, policy, etc).

In terms of management, SQLite absolutely cannot be beaten. It is literally as
easy as copying any other file when performing backups or restores. You need
to be a little careful with the .WAL and .SHM files, but nothing that can't be
managed with a few lines of script in a cron job.

~~~
lacampbell
I'd be interested to hear as much about your set up as you can divulge. How
many concurrent writing users you have, you use a cache, etc.

> In terms of management, SQLite absolutely cannot be beaten. It is literally
> as easy as copying any other file when performing backups or restores.

This - I have a lot of respect to DBAs, enough to know that I am not one.

~~~
tracker1
Can't speak for GP, but it really will vary on workload. How much data is in
each DB, how much workload is write vs read. In a mostly read scenario (say a
blog), I'd be surprised if you couldn't handle 100k simultaneous users on a
fast drive, more with a good caching strategy.

For modest read/write, should be able to get 1000-10k users without much
issue.

A little advice would be to ensure logging is to a separate db/drive from the
main data if you are logging changes, etc.

The only down side is distribution with High Availability, but so many
scenarios don't actually demand the extra overhead for true HA.

------
kijin
> _By default, pysqlite opens a transaction when you issue your first write
> query (it does this by parsing every query you execute), and then committing
> that transaction when you issue any other query that is not a SELECT,
> INSERT, UPDATE or DELETE, e.g. CREATE TABLE or PRAGMA._

What's the rationale for this default? Performance? Consistency? Even without
the problems mentioned in the article, I'm not sure if I would want my
database to commit multi-query transactions without an explicit command to do
so.

~~~
ebg13
Definitely performance. Writes without a transaction are slow. SQLite's
99.999% use-case is for single thread single access, so you lose little and
gain a lot by deferring writes until you actually need them done.

> _I 'm not sure if I would want my database to commit multi-query
> transactions without an explicit command to do so._

You can easily disable it by setting the isolation level per
[https://docs.python.org/3/library/sqlite3.html#controlling-t...](https://docs.python.org/3/library/sqlite3.html#controlling-
transactions)

------
marktangotango
In my testing putting a mutex around a single write connection gives
incredible performance, beats Postgres by a wide margin. SQLite for single db,
single file per user is incredibly compelling for the one db per user privacy
story alone.

------
tedunangst
What's the benefit of pushing everything to a thread and faking sync
operations instead of putting a mutex around writes?

~~~
tedivm
In cases where you don't care about the query results your logic thread
doesn't end up blocking.

~~~
cperciva
In cases where you don't care about query results _and you don 't care if the
operation completes_, you mean.

------
siscia
I use a similar approach in RediSQL ([http://RediSQL.com](http://RediSQL.com),
fats in-memory SQL on redis protocol).

There is a single thread that reads from a queue of statement.

The downside is that is hard to hold a lock on the db while doing application
level operations, but that is a really advanced use case that I see rarely on
the wild.

~~~
le-mark
Redisql sounds interesting, are you getting any customers/tranction?

~~~
siscia
Thanks!

Some, not as much as I would hope. Definitely still need a job!

I believe is a quite though sale, there are many open source databases that
you can use for free, but very few (if any?) offer in-memory as first class
feature.

And databases are worse than marriages, quite difficult to migrate out so a
new player need to sell itself very well before to even be considered.

If you are interested, or you would like to consider some form of partnership
feel free to drop me few lines at simone@redbeardlab.com

------
mmargerum
I'm looking to build a saas app on AWS and was seriously considering using
SQLITE db for each company. I was also looking at RDS (Postgres) but im really
thinking with SSD this isnt going to be necessary. The only downside I can see
is that i'd have to handle backups and provide read instances if that became
necessary.

~~~
atombender
Have you looked at ActorDB [1]? It's a distributed version of SQLite that may
be particularly suited to multitenant SaaS applications.

[1] [http://www.actordb.com](http://www.actordb.com)

~~~
mmargerum
Did not know about this. thanks

------
adontz
I think the real answer starts with using APSW, which peewee already supports.
APSW, while not 100% compatible with pysqlite, is much more predictable and
overall more transparent wrapper to my taste.

------
stijnsanders
Judging by the title only, I expected the post to handle 'BEGIN IMMEDIATE
TRANSACTION' and sqlite3_set_busy_timeout...

