
Getting the Most Out of Sqlite3 with Python - pythux
https://remusao.github.io/posts/2017-10-21-few-tips-sqlite-perf.html
======
chubot
For a project a few years ago I chose to use apsw:

[https://github.com/rogerbinns/apsw](https://github.com/rogerbinns/apsw)

It is a different wrapper for sqlite than the one in the Python standard
library. As far as I remember, the stdlib one had some brain-dead choices with
regard to concurrency and transactions.

There may have been some other reasons I chose it too, but I don't recall at
the minute. The issue is that Python has this spec called "DB-ABI" which is
supposed to make all databases look alike. But I don't want that -- I just
want to use sqlite!

So 'apsw' is a thin layer on top of sqlite. I'm able to read the sqlite
documentation in C, and be reasonably confident in what it's doing. But as far
as I remember the Python stdlib version had a bunch of magic in the wrappers
that was annoying. I might be mistaken but I think it parsed its own SQL? I
think it had some sort of silly cache of parsed queries, rather than letting
you simple store/control that in the application.

I think it also inserted SQL statements for transactions between your own
calls!!! I just want the raw SQL to go to sqlite as I intended.

I definitely ran into issues where non-obvious flags to the API increased
insert performance by 10x, and it had nothing to do with sqlite itself! So I
abandoned it for a thinner wrapper.

~~~
tyingq
Thanks for this pointer. It appears to have python bindings for the sqlite VFS
layer and virtual tables too. That would be fun to experiment with.

~~~
coleifer
You might also like this library, which allows you to create virtual tables
using a very simple API (2 functions).

[https://github.com/coleifer/sqlite-
vtfunc](https://github.com/coleifer/sqlite-vtfunc)

------
masklinn
> 1\. Use Bulk Operations

> If you need to insert a lot of rows at once in your database, you really
> should not use execute. The sqlite3 module provides a way to bulk
> insertions: executemany.

Note that this is actually part of DBAPI 2.0[0], so pretty much every Python
database driver should have an executemany(). Worst case scenario, it's
implemented on top of execute() and you get little to no benefits (that's the
case for psycopg2 currently[1]), best case scenario it can use much more
efficient methods (seems to be the case for sqlite3).

> 3\. Cursors Can Be Iterated Upon

That is an optional extension but part of the DBAPI 2.0 spec as well[2], and
most drivers should implement it

> for row in connection.execute('SELECT * FROM events'):

The return value of execute() is spec-undefined, so beware that specific
method, it is absolutely not portable, neither is execute/executemany on the
connection object.

> 7\. Use Placeholders to Interpolate Python Values

Warning: the actual placeholder is driver-dependent and not portable, you can
query the driver for the preferred param style[3], you can also ask it for its
thread-safety guarantees (the sqlite3 module advertises module-level thread-
safety but not connection-level let alone cursor-level)

[0]
[https://www.python.org/dev/peps/pep-0249/#executemany](https://www.python.org/dev/peps/pep-0249/#executemany)

[1]
[http://initd.org/psycopg/docs/cursor.html#cursor.executemany](http://initd.org/psycopg/docs/cursor.html#cursor.executemany)

[2]
[https://www.python.org/dev/peps/pep-0249/#next](https://www.python.org/dev/peps/pep-0249/#next)

[3]
[https://www.python.org/dev/peps/pep-0249/#paramstyle](https://www.python.org/dev/peps/pep-0249/#paramstyle)

~~~
auxym
In the past when I've had to insert many rows (for example), I wrapped it
inside a transaction. Why and when would someone use executemany() instead? Or
does executemany() do a transaction behind the scenes?

~~~
gaius
_Why and when would someone use executemany() instead?_

What kills you in this use case is roundtrips - send one row, wait for an
acknowledgement (even if you are not committing it yet) then sent one more,
and so on and so on. There is a lot of dead time spent in protocol overheads
or just idle. A well implemented executemany() sends as many rows at a time as
will fit in a network packet and keeps streaming them as fast as it can.

------
brittohalloran
Been meaning to try this, good write up.

An alternative between keeping all data in-memory and going full relational
database that's worked well for me is Python + H5PY. Write a simple class
where your data getters / setters interact directly with the H5 file and don't
keep it in memory.

Super fast (data handling and setup), easy on memory, and doesn't lock you
into the schema of a relational database.

------
nrjames
I often use the Dataset library (from Mozilla, I believe). It provides a lot
of convenience for incorporating SQLite (and other databases) into Python
projects and it's very easy to use it to fall back to straight SQL queries and
other statements.

[https://dataset.readthedocs.io/en/latest/](https://dataset.readthedocs.io/en/latest/)

~~~
jgalt212
Dataset is neat, but it relies on the monster that is sqlalchemy. I have
recently been using Pony ORM for new experiments with sqlite, and then if the
project sticks data transfer to MySQL, et is trivial.

Pony is not a light as Dataset, but way less heavy/confusing than sqlalchemy.

~~~
StavrosK
I agree, but Pony is also less flexible than SQLAlchemy :/ For anything other
than trivial queries, you'll end up writing some horrible code.

Having used Pony for a month or so, it feels like someone saw generators,
thought SELECTs would be cool using them, and then implemented all other query
types as well instead of shelving the whole thing like they should have.

------
maxpert
Some really nice tips. I have worked quite a few times with Python + SQLite3
combination and I always enjoyed the API, that's why I chose Python for my toy
document store that I created on top of SQLite3 (shameless plug
[http://blog.creapptives.com/post/47494540287/a-document-
stor...](http://blog.creapptives.com/post/47494540287/a-document-store-with-
sqlite3-and-python) ).

------
coleifer
The docs cover all this and much more. Do _yourself_ a favor and read them.
Sqlite is a very capable library and this post is just the tip of the iceberg.

[https://docs.python.org/2/library/sqlite3.html](https://docs.python.org/2/library/sqlite3.html)

~~~
jrimbault
[https://docs.python.org/3/library/sqlite3.html](https://docs.python.org/3/library/sqlite3.html)

------
tyingq
> 7\. Use Placeholders to Interpolate Python Values

Well, where you can. You can't use placeholders for table names, for example.

When you can't, just don't derive them from unsafe input.

------
NelsonMinar
One thing I was hoping to see; building sqlite3 for Python with json1
extension support. I've tried a few times and gave up, confused. How hard is
it?

------
Scaevolus
Entirely disabling synchronous is incredibly dangerous. Use "pragma
journal_mode=wal" for a safe performance boost.

~~~
mythrwy
Disabling synchronous is dangerous and I don't know where it could be used
practically (if you care about your data
[https://www.sqlite.org/howtocorrupt.html#_failure_to_sync](https://www.sqlite.org/howtocorrupt.html#_failure_to_sync)).

That being said 'journal_mode=wal' is a bit faster than normal async but
nowhere even close to the speed of disabling async completely.

~~~
justinclift
> ... don't know where it could be used practically.

Maybe with read only databases?

~~~
mythrwy
Would you get a speed increase in reads? I thought async and wal speedup would
just be for writes (no good for read only database). Maybe I'm misinformed.
Have to play with it and see.

~~~
justinclift
Hmmm, I'm not sure either. If you got around to testing it, it'd be
interesting to hear about. :)

------
purplezooey
sqlite3 is the most underrated database ever, especially with python. You can
get a ton done with it. Who needs HBase and MongoDB...

------
luord
Nice writeup. I'll keep this in mind for when PostgreSQL + SQLAlchemy would be
overkill.

------
naveen99
I used autohotkey for this kind of in memory mostly read string lookups and
nosql type queries without sqlite. Autohotkey is way faster than Python. Maybe
it's the immutable strings slowing down python, but there was like a >10x
performance difference. I even had a linux port of autohotkey at one time, but
I think you can also just use wine if you are on linux.

~~~
naveen99
Python is slow, you can downvote me all you want. Case in point:
[https://news.ycombinator.com/item?id=15548904](https://news.ycombinator.com/item?id=15548904)

