Multi-threaded SQLite without the OperationalErrors (2017)

ebg13 · on May 30, 2019

Everything old is new to somebody.

Python recipe for multithreaded sqlite access from 2007: http://code.activestate.com/recipes/526618-multithread-suppo...

Which is itself based on this one from 2006: http://code.activestate.com/recipes/496799/

A quick google search for 'python multithread sqlite' also brings up https://pypi.org/project/sqlite3worker/

lacampbell · on May 30, 2019

I feel like the scenarios in which it makes sense to use SQLite for multi users apps is growing, for a few reasons:

- SSD hard disks are ubiquitious

- WAL Mode: readers don't block, and multiple processes can access a single database

- DB-per-customer is getting more common due to privacy laws

- The browser itself is increasingly being used as a client side cache DB in its own right

IIRC Nomadlist uses (or used) sqlite. Wonder how many more web sites are using it in anger.

bob1029 · on May 30, 2019

I am 100% on board with this sentiment. We are already exclusively using SQLite to back our B2B application and it has absolutely zero issues even under heavy workloads. I would say its easily faster (especially in terms of request latency) than the SQL Server instance we previously-utilized for the same purpose. WAL is an absolute must if you are dealing with any sort of concurrency. As long as you can satisfy the following constraints, I see zero reason to use anything other than SQLite:

- Only one process will ever own the authoritative copy of the database at a time.

- You don't have any near-term plans to cluster or otherwise go wide on your persistence layer across multiple processes. Keeping in mind that even if you do need to do this, it shouldn't be a huge deal to import your SQLite files into something that natively supports clustering like Postgres, or use some partitioning strategy that can distribute SQLite databases across multiple nodes.

I feel like a lot of misinformation has been put forth that would make SQLite seem unacceptable for larger-scale concurrent business applications, but every benchmark I've personally ran against SQLite makes every other database engine look like a joke (within the constraints laid out above; obviously a 30 node Postgres cluster is going to destroy a single instance of SQLite). You just need to understand a few magic PRAGMAs to get things up and running (journal mode = WAL), because it certainly isn't capable of any of this directly out of the box.

I don't know about the 'anger' aspect of it, but I certainly feel dejected by the current approach that most take regarding business object persistence. SQLite feels like a way to sidestep an entire universe of non-value-add concerns (creating a separate sql instance, networking, security, configuration, policy, etc).

In terms of management, SQLite absolutely cannot be beaten. It is literally as easy as copying any other file when performing backups or restores. You need to be a little careful with the .WAL and .SHM files, but nothing that can't be managed with a few lines of script in a cron job.

lacampbell · on May 30, 2019

I'd be interested to hear as much about your set up as you can divulge. How many concurrent writing users you have, you use a cache, etc.

> In terms of management, SQLite absolutely cannot be beaten. It is literally as easy as copying any other file when performing backups or restores.

This - I have a lot of respect to DBAs, enough to know that I am not one.

tracker1 · on May 31, 2019

Can't speak for GP, but it really will vary on workload. How much data is in each DB, how much workload is write vs read. In a mostly read scenario (say a blog), I'd be surprised if you couldn't handle 100k simultaneous users on a fast drive, more with a good caching strategy.

For modest read/write, should be able to get 1000-10k users without much issue.

A little advice would be to ensure logging is to a separate db/drive from the main data if you are logging changes, etc.

The only down side is distribution with High Availability, but so many scenarios don't actually demand the extra overhead for true HA.

kijin · on May 30, 2019

> By default, pysqlite opens a transaction when you issue your first write query (it does this by parsing every query you execute), and then committing that transaction when you issue any other query that is not a SELECT, INSERT, UPDATE or DELETE, e.g. CREATE TABLE or PRAGMA.

What's the rationale for this default? Performance? Consistency? Even without the problems mentioned in the article, I'm not sure if I would want my database to commit multi-query transactions without an explicit command to do so.

ebg13 · on May 30, 2019

Definitely performance. Writes without a transaction are slow. SQLite's 99.999% use-case is for single thread single access, so you lose little and gain a lot by deferring writes until you actually need them done.

> I'm not sure if I would want my database to commit multi-query transactions without an explicit command to do so.

You can easily disable it by setting the isolation level per https://docs.python.org/3/library/sqlite3.html#controlling-t...

marktangotango · on May 30, 2019

In my testing putting a mutex around a single write connection gives incredible performance, beats Postgres by a wide margin. SQLite for single db, single file per user is incredibly compelling for the one db per user privacy story alone.

tedunangst · on May 30, 2019

What's the benefit of pushing everything to a thread and faking sync operations instead of putting a mutex around writes?

tedivm · on May 30, 2019

In cases where you don't care about the query results your logic thread doesn't end up blocking.

cperciva · on May 30, 2019

In cases where you don't care about query results and you don't care if the operation completes, you mean.

siscia · on May 30, 2019

I use a similar approach in RediSQL (http://RediSQL.com, fats in-memory SQL on redis protocol).

There is a single thread that reads from a queue of statement.

The downside is that is hard to hold a lock on the db while doing application level operations, but that is a really advanced use case that I see rarely on the wild.

le-mark · on May 30, 2019

Redisql sounds interesting, are you getting any customers/tranction?

siscia · on May 30, 2019

Thanks!

Some, not as much as I would hope. Definitely still need a job!

I believe is a quite though sale, there are many open source databases that you can use for free, but very few (if any?) offer in-memory as first class feature.

And databases are worse than marriages, quite difficult to migrate out so a new player need to sell itself very well before to even be considered.

If you are interested, or you would like to consider some form of partnership feel free to drop me few lines at simone@redbeardlab.com

mmargerum · on May 30, 2019

I'm looking to build a saas app on AWS and was seriously considering using SQLITE db for each company. I was also looking at RDS (Postgres) but im really thinking with SSD this isnt going to be necessary. The only downside I can see is that i'd have to handle backups and provide read instances if that became necessary.

atombender · on May 30, 2019

Have you looked at ActorDB [1]? It's a distributed version of SQLite that may be particularly suited to multitenant SaaS applications.

[1] http://www.actordb.com

mmargerum · on June 3, 2019

Did not know about this. thanks

adontz · on May 30, 2019

I think the real answer starts with using APSW, which peewee already supports. APSW, while not 100% compatible with pysqlite, is much more predictable and overall more transparent wrapper to my taste.

stijnsanders · on May 30, 2019

Judging by the title only, I expected the post to handle 'BEGIN IMMEDIATE TRANSACTION' and sqlite3_set_busy_timeout...