Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: SQLite in Production?
111 points by martinbaun 8 months ago | hide | past | favorite | 59 comments
I am using it for a small app, and the latency is so low that it feels like the app is like 10x faster than if it was running Postgresql. Now of course scaling will not be the same, but still.

Are any of using SQLite in production and if so what have your experience been with it? Hit any limits?




Yes, we use it to back several services in production at an 8 figure ARR business, with Litestream as our streaming backup option. I'm quite happy with it because it's trivial to debug and test against a local copy of the production data if necessary, but there are a few things to think about.

* You need to have your code repopulate index statistics with ANALYZE or PRAGMA OPTIMIZE now and again, or you may get a confused query planner.

* It's best if your code defines meaningful types to scan SQLite values into, and you ideally avoid ever writing to the DB via the CLI. Alternatively, be aggressive in writing CHECK constraints and triggers to ensure data validity.

* You need to be working with a team of developers that are pretty comfortable with databases to get a lot of value out of SQLite. I've been surprised at how performant features can be if a single mind is designing the schema, constraints, triggers, and queries, examining the query plan to choose indexes, and writing the application code that accesses the database. SQLite minimizes the barrier to doing this with real snapshots of production data - but that still doesn't mean it's going to be easy.

* You need to think about how long your code holds a write transaction open. Individual statements in SQLite run very quickly (I've seen ~250 microseconds per insert on a EBS-backed EC2 instance), but if you have a Django app with @transaction.atomic everywhere, you're going to run into lock contention quite quickly because your Python functions take much longer to run. If possible, "organize" all the data for your writes outside the transaction and then hold the lock for as short as possible.

* Any transaction that may eventually write must start with BEGIN IMMEDIATE, otherwise SQLite may throw an error to keep it's promise of serialized isolation on writes.

All said, I'd do it again for any other services-based backend. It also incidentally enforces the good hygiene of "services can only access their own data." Happy to answer any questions you have.


To expand on the BEGIN IMMEDIATE part in the context of Django (since this is/was a pet peeve of mine), they recently added the option to do so after keeping the bug open for years: https://code.djangoproject.com/ticket/29280


Cool, I am using Django on some projects so I'll check this out.


This is good insight.

I've been experimenting with the newer features to get concurrent writes working but I don't like that I have to have more than one database file, means I end up having to move lots of files around - might as well use MySQL and.. move files around.

https://www.sqlite.org/cgi/src/doc/begin-concurrent/doc/begi...


Use the best tool for the job, most of the time is the best thing to do :)

I never dabbled in multiple files, but I can see how that will be beneficial. Especially if you have a service that have completely isolated instances - like let's say per organization basis.


How do you manage cron jobs with a setup like that? Does each lambda pull down its own SQLite db, write changes, and then litestream replicates the changes to the web and other lambda instances?

Similar scenario for multiple web nodes and saving data. Also, do you use sticky sessions so that any routes that write to a db also read from the same node/db so you don’t have to wait for litestream?

What do you do for BI? Are you able to ETL the data from sqlite to a warehouse? If so, what does that look like?


Interesting question. Our services currently contain "own" all of cron jobs, so the jobs run as background tasks on the host containing the db. I think we used Huey for the Django app, and just some CLI applications fired via systemd timers for the Go services.

Every service is running on a single (somewhat meaty) host, so we get true snapshot isolation on reads and serialized isolation on writes without doing anything extra. We run weekly tests to determine how long AWS takes to spin everything back up upon catastrophic failure, and it's between 6 and 10 minutes, depending on how much data the service has. It does bother me that this can only get longer, but we've never actually had a host crash in production.

Like the other commentator, we did our BI for a while by restoring the Litestream backup to another server. Then that started to get expensive (because the file was very large), so we just added a scraping endpoint to each table. I think there's probably a more elegant way of doing this.


Sounds complicated. What object storage you use? If on aws, 's3 sync' might help.


S3, yes. Litestream doesn't store the backup as a single DB file - rather, as snapshots and WAL segments that are streaming in, so it's unfortunately not so simple.


re: BI and Metabase usecase

If it fits the usecase, you can use a VFS SQLite so the process will just pull the needed ranges from storage.


Not the author of this reply, but what we do is just to scp the file into another instance and use Metabase on it. It is pretty sweet as we can have metabase not running on production and "pollute" the environment. Extremely easy.


Wow this is an awesome breakdown!

Do you have written anything more extensive about this? Especially the ANALYIZe or PRAGMA OPTIMIZE, and/or Litestream?

Or maybe you can refer some links?


I have not, but I keep meaning to collate everything I've learned into a set of useful defaults just to remind myself what settings I should be enabling and why.

Regarding Litestream, I learned pretty much all I know from their documentation: https://litestream.io/


I'd be the first one to read about that and share it :)

And thank you, I'll try to set it up for one of our projects and see what happens!


I have spent the last 18 months building a web app using SQLite, python (starlette) and htmx. It is a back office app that now runs daily operations for a 60 person tax firm. The load is low, and it runs on a very cheap ec2 instance with litestream backups to s3.

The low latency is wonderful because I can write serial queries to construct a complex response. All the queries are written by hand and I do not use an ORM.

I have had a couple of outages, and all but one were basic operator errors. The notable exception was when tax season started, load went up, and I found that I was leaking (or creating too many) db connections. This was scary and I never quite understood the low level failure mechanism. The solution I came up with is to issue warnings whenever a connection (subclass of std sqlite3 Connection) is destroyed (__delete__()) without having been explicitly closed. Then I found all of the usage sites and put them into `with` contexts. Explicitly managing the connection lifetimes took the pressure off and I haven’t had to think about it since. I’d still like to reproduce the problem better though.

I do wonder what I would do if we scaled up to a point where things started to fall apart again, but there’s a good chance it won’t happen. It would require a lot more traffic, which implies massive staff growth.

The other architectural decision worth noting is that I keep complete local copies of various saas api data. A lightweight CRM, customer support, billing, call center, etc. Each of these has a background api fetcher/poller, and writes to its own SQLite file. Then the web app attaches each file to the main db. This gives me schema.table namespacing in the sql, and allows for separate backup policies for the different files. The trade off is no atomicity across the files, but for my purposes it is not significant.


Hey GW King,

Interesting, I never had this issue but I also run it through Peewee (on Python) so that might manage that. Or maybe because I put

PRAGMA journal_mode = 'wal';

this helps by putting things in write ahead logs to avoid having to lock so excessive. Maybe this iwll help you scale?

About the Litestream to S3, how is that? I considered that but it seems very new so I am unsure how stable it is?


I am using WAL mode. Without it the site cannot really function.

I am starting to think that the connection errors I have have to do with a combination of the crash recovery process, which takes an exclusive lock, and the fact that I am using ATTACH on multiple auxiliary databases. I have a suspicion that I have entered into less tested territory with the latter. I also might be getting the system into tight recovery/crash loops when systemd is restarting these processes due to "database locked" errors.


Running SQLite in production for our mid-six figure SaaS for a year now. We use it with Go and run it in WAL mode that allows for concurrent reads and serialized writes. Litestream for backups. No issues, and extremely good performance.

Our app is read-heavy so it works great with WAL mode only allowing one writer at a time. If you have a write-heavy workload, I would go with Postgres.

We use Go's mattn/sqlite3 library and open a connection pool for read-only queries, and a single connection for write queries. The read-only connection pool uses DEFERRED transactions, and the write-only connection uses IMMEDIATE transactions.

All tables are STRICT which helps some with type safety, and foreign key checks are always turned on.


How do deal with horizontal scaling or failovers in general? These are two of the reasons I am still unconvinced about using SQLite in production:

- if only one process can write to the db, how do you architect your system so that writes are performed by just one of the instances (the writer) - how do you gracefully upgrade your app without any downtime?


> How do deal with horizontal scaling or failovers in general?

We prefer vertical scaling over horizontal. The entire company is one server, which works for us since most of our customers are close to it.

In the future if we needed geographically distributed read replicas and failover protection I would probably go with LiteFS [1].

> if only one process can write to the db, how do you architect your system so that writes are performed by just one of the instances (the writer)

For multiple machines, LiteFS has ways of dealing with this. But the general idea applies to our system as well. All of our SQL is in either a read.sql or write.sql file. On startup, we open a connection pool for all the read queries and open a single connection for all the write queries. This keeps things clean and separated.

> how do you gracefully upgrade your app without any downtime?

We use a systemd socket and service. When the service is restarted with a new binary, systemd delays request processing until it's back online. It's really simple (thanks to our monolith setup) and entails zero downtime except a 1-2 second delay on unlucky requests. But as far as I can tell, no requests are lost.

[1]: https://fly.io/docs/litefs/


Hey Tazu, great breakdown. How much is "write-heavy" in this context? And do it depend also more on the hardware?


By "read-heavy" I mean 95% reads and 5% writes (GET vs POST/PUT). So I'd classify anything more than that as "write-heavy". I'd say hardware doesn't matter that much, because SQLite writes are always serialized.


"SQLite is likely used more than all other database engines combined. Billions and billions of copies of SQLite exist in the wild." https://sqlite.org/mostdeployed.html

Although this isn't what most people usually mean by "production". See "Appropriate Uses For SQLite": https://www.sqlite.org/whentouse.html


Yea, I know. My Chromium, firefox, and other uses it of course. What I meant with "Production" is in a server production env.


https://kerkour.com/sqlite-for-servers is a good read if you've not seen it before


Thank you, I have but I think I could improve it a lot. I'll put it to my reading list, thanks Jamie!


oh, I am getting an

> Something went wrong! Try reloading the page. If the problem persists, please update your web browser to the latest version.

error?


I write custom apps for enterprise clients. Most think that they need Oracle or DB2. In the majority of cases apps run with SQLite just as effectively. In the early days I used Postgres with Python. Moved to SQLite for ease of admin and nobody even noticed.

The database is just one component of systems, without measuring things you can try to guess (usually wrongly) where the bottlenecks are. See @qweqwe14 response.


Yeah exactly, I have been doing the same though from Postgresql to SQLite.

It greatly simplifies the server environment when you just have a file. And backup is so much easier.


I've been using SQLite/Litestream for https://extensionpay.com for about 3 years now! Serves about 120m requests per month (most of those are cached and don't hit the db), but it's been great!

I was convinced that SQLite could be a viable db option from this great post about it called Consider SQLite: https://blog.wesleyac.com/posts/consider-sqlite

Using SQLite with Litestream helped me to launch the site quickly without having to pay for or configure/manage a db server, especially when I didn't know if the site would make any money and didn't have any personal experience with running production databases. Litestream streams to blackblaze b2 for literally $0 per month which is great. I already had a backblaze account for personal backups and it was easy to just add b2 storage. I've never had to restore from backup so far.

There's a pleasing operational simplicity in this setup — one $14 DigitalOcean droplet serves my entire app (single-threaded still!) and it's been easy to scale vertically by just upgrading the server to the next tier when I started pushing the limits of a droplet (or doing some obvious SQLite config optimizations). DigitalOcean's "premium" intel and amd droplets use NVMe drives which seem to be especially good with SQLite.

One downside of using SQLite is that there's just not as much community knowledge about using and tuning it for web applications. For example, I'm using it with SvelteKit and there's not much written online about deploying multi-threaded SvelteKit apps with SQLite. Also, not many example configs to learn from. By far the biggest performance improvement I found was turning on memory mapping for SQLite.


How do you deploy updates of your app without downtime? (I know not all web apps need that but in case you do I would love to know about your approach)


I don't! I take the moment of downtime. It's usually instant — only as long as it takes to stop and restart the server. Occasionally a DB migration will take longer — the longest I've so far was 10-15 seconds — and the site is down for that time. I think this is acceptable, but it would be nice to have a more seamless automated solution.


We run sqlite at https://tender.run, both in our backend (as a primary data store, backed by litefs) and on the client (as a index and cache).

In general works really great, esp. in terms of speed. There are some limitations worth knowing though - for instance, figuring out making write throughput reasonable with the combination of wal mode, busy timeout, and begin immediate.

Another that I've had to work around recently is the lack of materialized views - you can make and query from a view, but it's just a select statement that gets shoved into queries that use it. To get the perf of a materialized view, you'd have to do something manual and custom. For the most part, this isn't an issue simply due to how fast sqlite is, but in the browser it's a bit more significant because of how much slower the wasm-compiled sqlite is.


Cool, I was thinking of making somehting like tender but for businesses. I always have an issue with expenses being everywhere and nowhere (to be found).

Have you written about your experiences? Especially the write-throughput?


There’s an overview for how our app is built here: https://tender.run/blog/tender-and-crdts

We haven’t written much about our server-side sqlite. I’d say our write workloads are easily within what sqlite can do for now. At one point we looked into doing per-customer databases for scaling write perf (since sharding by customer can be a lot cheaper in sqlite than other dbs), but it’s not necessary yet.


Considering the access patterns of the clients are well known, would it make sense to populate new tables as materialized views before Sending it to the client or you think is not worth it perf wise?


We use the client-side sqlite as part of making the app work locally and offline, so it wouldn’t make sense.

The materialized view isn’t a huge perf problem for now, and I expect we’d switch to using triggers or similar without too much difficulty.


I use it a lot for production ETL pipes, and sqlite3 is wonderful. The only issue we see is the writes slow down when the table rows hit 8–9 figures during a fresh load.

In client/server, our biggest issue is data replication and sync. We have this application that use sqlite3 as a db for mostly read only web application. It really act as a cache server. We can’t figure out a clean way to update the data from remote systems. We end up writing a web socket process on the box to pull data from remote systems and write the data one batch at a time. This was/is a big pain point for us. Especially when updates are coming in fast.


I'm building a hosting platform / system for SQLite databases at https://www.airsequel.com. Just like you we discovered that for many (most?) use-cases SQLite is the better database. Ordinary projects / companies don't have "webscale" requirements and they are better off with a system that is fast, simple, low resource, and easy to deploy and manage.


There's a company (http://turso.tech) that can host sqlite as a service. One of the features allows a replica local sqlite file to be used for reads and the writes still go to the primary on their edge network.

Been using it (without the local feature) for a couple of months on a project and I will do it again for future products without a second thought.


You might be interested in how Expensify use(s/d) SQLite for their product. https://use.expensify.com/blog/scaling-sqlite-to-4m-qps-on-a...


I assume you are thinking about server side applications. Otherwise, it is extremely common in "desktop" applications for saving data. And it is a common approach for saving user data in Android applications (based on my knowledge from a few years ago, could be outdated)


there are tens of thousands of apps that use it, for billions of instances; most all of the most popular apps use it in some fashion. it’s a beast.


I use sqlite as a caching storage for anacrolix/torrent. It's also the main database for my DHT indexer.

The only real limit is transactions per second, and single writer. If you batch things that becomes a non issue.


Cool! I can see I probably used your library one place or another in one of my many torrents usages :P


If it’s an embedded app like an iOS or android app it’s the right choices.


I run a bunch of blogs from a single Django app with a CMS, SQLite, and extremely optimized SEO hacks.

It’s the perfect use case, light on writes, super heavy on reads. Almost as fast as any caching solution would be, except it’s actually running live SQL queries on the data. I am not sure why more people don’t opt for this to start.


I think some HN user talked about his DNS company using sqlite for years and years without a glitch.


How are you syncing data?


There are some options nowadays, but in generally you haven't done that before. This is one of the limitations of SQLite. But then again, most websites could probably fit on one of these beasts here

https://www.hetzner.com/dedicated-rootserver/matrix-ax/

for 200 euros per month.


Ive created a local app in dart/flutter. I have clients for android and desktop with a local sqlite database for persistence. I'm oviously trying to work out the best way to sync the data, hopefully without a cloud/server. I'm not sure it's possible but I'd prefer it over running a server, maybe a manual export. It's likely I'll just have to bite the bullet and teach myself how to go down the server root.


I use it for some doc making in small family business :)

However sqlite gained "SELECT * FROM /etc/passwd" and other system stats gathering features so looking for replacement...

Some fork dedicated to eradicated more complicated things like eg. WINDOWS would be nice. And ex-Redis and others simplifications too !


Perhaps the OP means OsQuery: https://github.com/osquery/osquery

OsQuery is an SQLite extension consisting of hundreds of virtual tables that provide SQL access to operating system status and configuration.


Holy smokes, that's the coolest thing I've found out about recently.

I'm trying to wrap my head around where this sits within the OS. Is it a layer "on top" of everything? Or does it sit "underneath" the higher level os functionality?

Apologies if that doesn't make any sense, it looks to be a really cool tool and I'll be investigating it regardless.


IIRC it reads from the same sources as command line tools do, so /proc/, /sys/ etc.

So it sits in the same place as `top` or similar.


wait, Sqlite3 can access files this way?


SELECT * FROM "/etc/passwd"; does not work for me.


select readfile('/etc/passwd');

Only works in the CLI. Which also has .import

    sqlite> create table pwd(name, pass, uid, gid, gecos, home, shell);
    sqlite> .sep :
    sqlite> .import /etc/passwd pwd
    sqlite> .mode column
    sqlite> select * from pwd where uid = '0';
    name  pass  uid  gid  gecos  home   shell    
    ----  ----  ---  ---  -----  -----  ---------
    root  x     0    0    root   /root  /bin/bash


Ok, it is extra beautifull and very very sharp knive...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: