
SuperSQLite: SQLite library for Python (2018) - jkldotio
https://github.com/plasticityai/supersqlite
======
dest
Interesting pick from one of the links in the article:

"SQLite has fantastic write performance as well. By default SQLite uses
database-level locking (minimal concurrency), and there is an “out of the box”
option to enable WAL mode to get fantastic read concurrency — as shown by this
test. But lesser known is that there is a branch of SQLite that has page
locking, which enables for fantastic concurrent write performance."

[https://blog.expensify.com/2018/01/08/scaling-sqlite-
to-4m-q...](https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-qps-on-
a-single-server/)

~~~
marktangotango
_When you have a write-heavy workload with multiple servers that need to write
concurrently to a shared database (backend to a website), you would probably
want to choose something that has a client-server model instead like
PostgreSQL_

It's easy to get really stellar concurrent performance out of SQLite using a
many reader, single writer model (ie many threads, single process). In testing
we did it easily surpassed Postgres.

~~~
thunderbong
That's really interesting.

I've always been a big fan of SQLite and this is the one challenge I've always
faced.

Can you give some more insights as to how you achieved that?

~~~
marktangotango
For reads just create a new connection for every request (be sure to set
connection properties for wal etc). Then create a global (or something
equivalent to global, like a singleton) connection to serve as the writer and
put a mutex around it when doing writes. Easy, scales like a mofo.

~~~
firebacon
What does "mofo" mean here. Can you give us a rough estimate on the
transaction rate you achieved with this setup? My own experience and all
independent benchmarks I can find seem to indicate a limit of 100-1000TPS on
reasonable hardware.

Note that you can "batch" up many inserts into a transaction, which gives you
a high "nominal" insert rate but still only ~100 actual transactions or so per
second. To see why this is not the most useful number when comparing to a
database like Postgres, consider that the limiting factor in a SQLite/Posgres
design are cache flushes which outweight the costs of actually writing the
data, so the number of rows per batch is mostly arbitrary; using this metric
you can always claim a huge insert performance by choosing a suitable N. Also,
if you do the batching, you of course loose some of SQLite's
consistency/durability guarantees for your writes, which is probably fine if
you didn't need them in the first place, but begets the question if an
embedded ACID database is the best tool for the job at hand.

~~~
skrebbel
> What does "mofo" mean here

Motherfucker

~~~
no_wizard
I think they meant what makes it a positive `mofo`, if you will.

~~~
skrebbel
Ah yes of course. Not deleting my comment because I think my misunderstanding
is funny.

------
coleifer
Looks like it just bundles a full build of the sqlite shared library, along
with some kind of gross hacks. It bundles pysqlite2, which notably is missing
a ton of features and improvements from upstream. I'll just compile my own,
thanks...

~~~
patelajay285
Hi @coleifer, I'll disagree, I'm the author. I don't think they are gross
hacks, just a more sensible default configuration. In fact, I used a lot of
your advice from your blog post [https://charlesleifer.com/blog/going-fast-
with-sqlite-and-py...](https://charlesleifer.com/blog/going-fast-with-sqlite-
and-python/) while making this library.

Fair enough if you want to compile your own version, but there's a lot of
people and Python devs who don't know how / won't go through the effort and
that's what this project is for.

It's also a work in progress and not meant to be released yet. Shoot me an
e-mail ajay@plasticityai.com if you have suggestions for how to improve it.

~~~
coleifer
The advice in my post that you linked can be supplemented by this:

[http://charlesleifer.com/blog/compiling-sqlite-for-use-
with-...](http://charlesleifer.com/blog/compiling-sqlite-for-use-with-python-
applications/)

Describes several handy ways to compile sqlite for use with python (2 or 3),
as a system-wide shared library or compiled directly into the python module.
This can also be applied to sqlcipher, etc.

If you don't know how to compile sqlite, I'd argue that you have no business
trying to use it's more advanced features. How can you tune or optimize
something you don't understand?

Furthermore, your library is stale. Anyone using it is dependent on you to
merge in upstream changes constantly. And based on what I saw, it's already
well out of date.

~~~
patelajay285
Yes, I've seen that post as well. Thanks! It's good advice.

>> If you don't know how to compile sqlite, I'd argue that you have no
business trying to use it's more advanced features. How can you tune or
optimize something you don't understand?

I'm not sure I agree with that, some people understand the advanced features
but have a miserable time with compilers and compiling something like SQLite.
There's a lot of great engineers with Python and SQL expertise that just don't
know how to compile a C project. That might sound contradictory, but it's just
something a lot of engineers still don't do day to day since people are using
a lot of scripted languages (Node/Python/Ruby).

For example, there's probably a lot of people who want to use the JSON1 or
FTS5 modules with SQLite in Python. That doesn't require advanced knowledge,
but requires them to re-compile SQLite!

>> Furthermore, your library is stale. Anyone using it is dependent on you to
merge in upstream changes constantly. And based on what I saw, it's already
well out of date.

It is stale, I agree. But it's still a WIP. This was posted here early. My
plan is to pull in changes from the upstream sqlite module with the latest
CPython 2.7 tag and latest CPython 3 tag in the source.

------
patelajay285
Hi, I'm the author and founder at Plasticity (YCS17) and built this for some
internal use cases our startup. This is actually a Work-in-Progress and not
meant to be released yet, but it seems like someone has found it online. I
will re-submit this here when it is ready as a Show HN given the interest,
shoot me an e-mail at ajay@plasticityai.com if you would like to be kept up to
date!

------
nabdab
This looks like a dead project that just bundles together the built-in sqlite3
And another wrapper library APSW which itself is more active than this
project. Why is this on hn?

~~~
stedaniels
No commits in 9 months, "looks like a dead project".

------
tony
Any features in this library you'd like to see standard library's sqlite3 [1]?
Maybe a PEP [2, python enhancement proposal] could do it.

[1]
[https://docs.python.org/3/library/sqlite3.html](https://docs.python.org/3/library/sqlite3.html)

[2]
[https://www.python.org/dev/peps/pep-0001/](https://www.python.org/dev/peps/pep-0001/)

~~~
nerdponx
In the standard library? Probably nothing.

But if someone published an alternative APSW wheel with JSON1, ICU, and FTS5
enabled, I'd be happy.

~~~
rogerbinns
I'm the APSW author. The binary builds for Windows are distributed with those
extensions all compiled in, although my doc needs some updating. It is also
only a single flag for other platforms to include all extensions during
compilation. What can I do?

~~~
no_wizard
Whats the hold back to getting it via pip install via a wheel? I'm curious
(happy APSW user here). Its not well elaborated from what I could find. This
is something I would love to help with but not sure where to start, as I'm not
sure where in the process of doing the builds it falls down.

Would Cython help, perhaps?

~~~
rogerbinns
Cython wouldn't help. The reason[1] is in the docs.

I already do Windows builds[2] - 21, becoming 23 in next release so that isn't
a difficulty.

The problem is tying all this together in a way that is useful, with sensible
defaults and appropriate tools. APSW supports python versions going back 16
years! I welcome discussion at python-sqlite[3]

[1] [https://rogerbinns.github.io/apsw/download.html#easy-
install...](https://rogerbinns.github.io/apsw/download.html#easy-install-pip-
pypi)

[2] [https://rogerbinns.github.io/apsw/download.html#source-
and-b...](https://rogerbinns.github.io/apsw/download.html#source-and-binaries)

[3] [https://groups.google.com/forum/#!forum/python-
sqlite](https://groups.google.com/forum/#!forum/python-sqlite)

------
johnisgood
> SQLite is extremely reliable and durable for large amounts of data (up to
> 140TB). It is considered one of the most well-engineered and well-tested
> software solutions today, with 711x more test code than implementation code.

I keep seeing this statement. Why is it considered one of the most well-
engineered software?

~~~
firebacon
About the well-tested bit: Per it's own documentation, SQLite has a massive
test suite. [*Not all of] the test suite is actually open source though, so
the overlap between commenters selling you on how well tested SQLite is and
those that have seen how the sausage is made is probably [close to] zero.

However, pointing this or any of the other practical shortcomings of SQLite
out on hacker news is blasphemy and will invariably get you downvoted into
oblivion by people who (apparently) never ran into them.

~~~
smush
What are some other shortcomings of SQLite? I'm genuinely curious even though
the below probably sounds like some text from the ~Rust~ Sqlite evangelism
strike force.

With the default settings which I semi-affectionately refer to as paranoid
mode, an untuned database can start to have worse performance after getting
500,000+ records going. Then things like indexes, RediSQL, and WAL mode start
being more necessary rather than just best-practices.

But if you set your pragmas correctly and so on, SQLite scales up just fine. I
haven't done a large scale Sqlite base simply due to caution around needing to
use a 'real' database in production like SQL server or Postgres, Maria etc.
Sqlite is excellent for ephemeral databases to be created, seeded with test
data, run tests against, and deleted in repeatable automated testing.

Based on the link to the Expensify article, it sounds like Sqlite can scale up
even better than Sql server under some circumstances.

But I have barely tried using it in production because of that aforementioned
caution. What are some pitfalls to watch out for?

~~~
zzzeek
the elephant in the room with SQLite is that they refuse to support most forms
of ALTER TABLE, that is, to be able to change the structure of a schema.

Their rationalization for this is that the way SQLite stores data, this is
more efficiently performed by simply creating a new table and copying all the
rows from the old table into the new one, and that one would want to batch all
the table changes together rather than emitting individual ALTER statemnts.

from the POV of people who make tools like my own Alembic Migrations, this is
an annoyingly insufficient answer because the logistics of recreating table
schemas and copying data over is much more complicated than just emitting an
ALTER directive. I'd like if SQLite had at least the ability to be _extended_
to support a third party "ALTER" plugin that would under the hood run the
intricacies of copying the tables around, rather than pushing this out to the
tool creators. It doesn't really matter in most cases, for the use cases used
by SQLite, that ALTER would be inefficient.

instead, my users bugged me for years to solve this problem and I have to
maintain this thing
[https://alembic.sqlalchemy.org/en/latest/batch.html](https://alembic.sqlalchemy.org/en/latest/batch.html)
which I mostly hate completely.

SQLite's typing model is also very idiosyncratic and is based on a naming
convention approach, which I think most users of SQLite don't understand very
well, because it works in a completely strange way based on looking for
substrings inside of the completely arbitrary names you can assign to types. I
can create a column with the datatype ELEPHINT and that is a legitimate
datatype which will store integers. there's also the "INTEGER PRIMARY KEY " /
"INTEGER PRIMARY KEY AUTOINCREMENT" silliness but that's a relatively mild
poor API compared to things MySQL does all over the place, I suppose.

------
Gys
Its supports 'Remote Streaming over HTTP' without explaining what that means.
Maybe someone here knows?

~~~
niea_11
From what i saw in the source code, it's a feature that lets sqlite open a
database file stored on a web server. There is a class in the code HTTPVFS [1]
that proxies sqlite's filesystem operations to http requests.

[1]:[https://github.com/plasticityai/supersqlite/blob/01e54bbb829...](https://github.com/plasticityai/supersqlite/blob/01e54bbb829a9441627c416342b1f1e50f94333e/supersqlite/__init__.py#L1203)

------
ddorian43
Title is misleading. It needs to add something like `python-library`. I
thought they supercharged sqlite itself.

~~~
patelajay285
Hi I'm the author, see my parent comment in the thread. This is still a WIP.
You're right that it's mostly a Python library, but when it's released, it
will actually be released with pre-built native static SQLite libraries that
can be linked into any C application, so it is actually supercharged SQLite
itself :).

------
jgalt212
> SQLite is extremely reliable and durable for large amounts of data (up to
> 140TB).

This has not been our experience. Our experience is that it sort of bogs down
around 12GB in file size. linux box, ext3/4, spinning disk (not SSD)

~~~
patelajay285
Hi @jgalt, I'm the author of this library, and we've been using it to serve
extremely large files over ~130GB at Plasticity (YCS17) for AI model data!
Shoot me an email ajay@plasticityai.com and if you'd like to chat, but it
should remain extremely responsive even at that file size.

~~~
jgalt212
Thanks may take you up on that after running some new tests. It was years ago
we had performance issues (approx 3-4), so will see if time has healed all
wounds.

------
d_burfoot
I've been toying with the idea of using SQLite as a data exchange format
instead of JSON or XML. I can't stand navigating through complex JSON trees, I
want to just use SQL queries. Has anyone else tried this?

~~~
sametmax
Works well if your data is big and doesn't have to be read on a web browser.

------
mtw
How accurate is this statement? "SQLite is faster than nearly every other
database". It links to a page that's decades old.

------
trollied
The title would be better as "SuperSQLite: a supercharged SQLite library for
Python".

