> “But nobody writes production applications with SQLite, right?" We've been doi...

benbjohnson · on Feb 11, 2021

Those are awesome tips. I didn't even think of using "user_version" for storing a migration version. I'm definitely stealing that trick.

lrossi · on Feb 11, 2021

This blog post gives a nice example of how that could be implemented:

https://levlaz.org/sqlite-db-migrations-with-pragma-user_ver...

jitl · on Feb 11, 2021

This is how we manage SQLite migrations in Notion’s native apps.

the_duke · on Feb 11, 2021

Out of curiosity, what kind of read and write concurrency is your application dealing with?

In my experience, sqlite performance becomes problematic quite quickly, even with settings you mentioned (WAL etc).

bob1029 · on Feb 11, 2021

We are able to get reads on the order of 10k/s+, and writes on the order of 5k/s+ using NVMe drives and practical serialized business object sizes (0.1~5 megabytes). I can easily saturate an NVMe drive using SQLite. In fact, it is substantially easier to max out storage devices with SQLite and carefully-tuned code than it is with something like SQL Server.

I should amend my original post, because I know a lot of developers fall into the trap of thinking that you should always do the open/close connection pattern with these databases. That is a huge trap with SQLite. If you want to add some extra zeroes to your benchmark figures, only use a single connection for accessing SQLite databases. Use application-level locking primitives, rather than relying on the database for purposes of getting consistent output from things like LastInsertRowId and in cases where transactional scopes are otherwise required. This alone can take you from 100 inserts/second to 10k without changing anything else.

lrossi · on Feb 11, 2021

> Use application-level locking primitives, rather than relying on the database for purposes of getting consistent output from things like LastInsertRowId

You mean for generating unique primary keys? Why would last insert row id be slow?

> and in cases where transactional scopes are required

Could you elaborate on what you mean by this?

bob1029 · on Feb 12, 2021

LastInsertRowId is not slow, but if you are inserting on the same connection from multiple threads, you will require a mutex or you will be getting other threads' row ids.

Transactional scopes meaning scenarios like debiting one account and crediting another. This is something you can also manage with locking with application-level primitives.

Aeolun · on Feb 12, 2021

So the mutex in sqlite (for multiple connections) is worse than the one you implement in your own application?

I’d assume the DB would be most efficient at handling it’s own. At least to the extend that it wouldn’t garner a 100x speedup to do it in app.

bob1029 · on Feb 12, 2021

Yes it is substantially worse to use multiple connections vs a single connection. This is fairly easy to test in a few lines of code.

We need to remember that opening a connection to SQLite is like opening a file on disk. Creating/destroying file handles requires far more resources and ceremony than taking out a mutex on a file that is never closed.

liuliu · on Feb 12, 2021

That doesn't sound right. SQLite's lock for writes is not the best, but it is still pthread mutex under the hood. Are you sure your compilation options for SQLite is right? One common pitfall is compiling without `-DHAVE_USLEEP`. In absence of that flag, SQLite will use sleep in case of conflict, and that will have time resolution of 1 second, causing 1s delay on every lock contention. That flag tells SQLite to use usleep instead, and it is substantially faster on busy timeout.

Here is my SQLite compilation flags: https://github.com/liuliu/dflat/blob/unstable/external/sqlit...

Here is where the flag used: https://github.com/sqlite/sqlite/blob/d46beb06aab941bf165a9d...

breischl · on Feb 11, 2021

>We can afford to lose the last few minutes of work without anyone getting yelled at. Some modern virtualization technologies do help a lot in this regard. Running bare metal you need to be a little more careful.

How does virtualization help with data loss? I would expect that a VM can't have guarantees better than the underlying physical hardware provides.

ithkuil · on Feb 11, 2021

VMs can be migrated to other nodes, so for example you can mitigate failures that don't occur out of the blue

robertlagrant · on Feb 12, 2021

> How does virtualization help with data loss? I would expect that a VM can't have guarantees better than the underlying physical hardware provides.

E.g. storage virtualisation.

throwaway3699 · on Feb 11, 2021

It sounds likely that it can. Software solutions to hardware problems have been a common pattern for years now.

mattgreenrocks · on Feb 11, 2021

Great writeup, thank you.

I'm used to leaning on SQLite for desktop apps, but now I'm keen to think about it WRT to web apps using these tips.

majkinetor · on Feb 11, 2021

Thanks for details.

I yet wait to see how somebody serve 1M users on a web service using sqlite. Sounds like you can do all that since you create desktop app, you almost never need anything more then sqlite for that.

motiejus · on Feb 11, 2021

https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-q...

majkinetor · on Feb 15, 2021

Surely that doesn't qualify as an answer being a read only test. Also, it looks like there is not much interaction between users in their core domain.

pbowyer · on Feb 11, 2021

> We can afford to lose the last few minutes of work without anyone getting yelled at. Some modern virtualization technologies do help a lot in this regard. Running bare metal you need to be a little more careful.

Can you say more about how (and which) modern virtualization technologies help? RTO is something I've never found a happy to, since piecing together any missing data is painful, but avoiding a clustered DB setup (or the cost of Aurora) is always welcome.