Hacker News new | past | comments | ask | show | jobs | submit login
Siren Call of SQLite on the Server (pid1.dev)
113 points by todsacerdoti 3 days ago | hide | past | favorite | 101 comments





Be brave. Just try it.

Removing the network latency is a massive gain, especially for small/medium apps, which is most sites. And if you find yourself asking this, the answer is yes, it will work.

SQLite will force you to rethink your architecture a bit, don't forget it doesn't HAVE to be _one_ file. This alone helps you scale further.

There are some chafing points: 1. Schema migrations 2. Configuration, PRAGMA's applied at design time or during each connection


> Removing the network latency is a massive gain, especially for small/medium apps, which is most sites.

I've seen some apps that I can only describe as singleton apps, not even monoliths with their architecture - where you can only ever have one vertically scaled instance due to their reliance on the memory for something like storing user sessions (yes, everyone has to log in or use SSO after a restart), or the local file system for something like ingesting documents or generating reports.

At that point, one might as well ditch Oracle DB (used as example here, which is suprisingly common in those setups) and just work with SQLite directly, because if people made that their architecture then N+1 issues are also pretty much inevitable. It's not like SQLite will magically fix your code, but at least it will get rid of the network round trip when you try to execute 400 SQL SELECTs to render one view.

I will admit, that generally I enjoy architectures where the components are properly separated: front end served from a separate container, the back end API in another one, something like Redis/Valkey or RabbitMQ in that stack as well, in addition to a traditional RDBMS like PostgreSQL or even MariaDB/MySQL. On the other hand, I have made smaller projects for my own needs that use Ruby on Rails or Laravel or Django and SQLite goes really well with those.


Removing the network latency is a massive gain

People are comparing sqlite to RDS or whatever which completely biases the comparison. We have small apps running in containers talking to Postgres in a container via kubernetes networking on the same VM.

Benchmark.bm{ |x| x.report{ 1000.times{ ActiveRecord::Base.connection.execute("select 1") }}}

       user     system      total        real
   0.040279   0.012958   0.053237 (  0.080539)
0.08 seconds for 1000 DB queries is not causing network latency problems for any real world app. Just use Postgres unless you really have good reasons not to.

"Just use Postgres" is as useless as any other "Just use X".

You're trying to compare using SQLite which is stored in a file on the system... with using Postgres in a container via kubernetes on the same VM? What happens when you need to move that Postgres container somewhere else? How do you scale it? If you don't, then you're just using a solution orders of magnitude more complicated than SQLite to get the same functionality and locality as SQLite.


The important nuance here is that with SQLite you can achieve similar numbers even if each of the 1000 queries is dependent on prior results.

SELECT 1 is hardly representative of a real world situation wherein you need to logically serialize each transition of business state.


0.08 seconds for a few bytes, which are fully cached, yet it barely can reach 12500 requests per second.

Your computer is a multicore 2-4GHz machine


It kind of does have to be one file, because foreign keys can't be enforced across files.

For a simple example, if you have one db file per user, as others recommend, how do you check that a username or an email is unique? Do you maintain a "global" db for fields with unique constraints? But then when those fields are updated you need to manage this at the application level, and atomicity can't really be guaranteed... Or is there another way?


The application I'm building (I haven't launched yet) has one database per customer that tracks the things the customer does in the application, plus a single database that tracks the metadata about all of the customers. The metadata changes very slowly (each customer will probably make at most one update per year), so there will be (almost) no write contention on the metadata database. Each customer has an id (a random 64-bit integer) which is used (in hexadecimal) as the filename for the individual customer database.

The total addressable market for my application is about 2,500 customers, and they won't use it at all in most weeks. I'm aware that this usage pattern is very uncommon, so I don't recommend what I'm doing as a general solution.


I've done pretty much that, but without SQLite as the origin of that system predated SQLites first release, with millions of users. It works, and it scales. Since then I've increasingly preferred this pattern when it's been suitable, and as SQLite has grown in maturity, it increasingly is suitable.

> The total addressable market for my application is about 2,500 customers, and they won't use it at all in most weeks. I'm aware that this usage pattern is very uncommon

Is it even?


I'm with you on this. I think this is a widely desirable pattern. If it's uncommon it's because of how poorly traditional dbms's address it.

I suspect as server-side sqlite matures, this pattern will explode in popularity.


The pattern I've used most often is to maintain a db/table of meta records which handles identity and transaction. Then, for each metadata row there is a SQLite database instance.

This is useful for storing instances of very complex type systems (e.g., ones with circular dependencies) and managing customer tenants.


Keep user information in one database, keep user _data_ in another.

It all depends on your app, really. For some apps the above won't make sense, but maybe it would make sense to keep one db per org, or something like that.


You can also use a different but similar tool e.g. use sqlite to keep user information, use duckdb to keep user data. I find this combo very effective since sqlite is best as fast btree based indexed operations and duckdb is best at multiple row based aggregate computations (get me user X vs get me all users with property Y).

For the one DB file per user question, I'll take a stab at a solution:

* Global store of user identifiers mapped to user IDs. Identifiers can change but the underlying user ID is immutable.

* One DB per user ID.

The per-user DB doesn't really need to know the user identifier, and at auth time you don't need to access the business logic, so cross-DB updates should not be a problem.

In the general case, similar logic applies: you can't partition just anywhere, but where you can partition is pretty well defined.


> Do you maintain a "global" db for fields with unique constraints?

Yes.

> But then when those fields are updated you need to manage this at the application level

Yep. BTW, that's not really so different from a traditional db. You're doing the same thing, just in a different way, and one you probably aren't used to, but it's equally valid.

In a traditional dbms you express this constraint in application code using the database's native DDL, loaded into to the db, and executed by the db server whenever a relevant data change is made.

With sqlite you'd express this constraint in application code using practically any language you want, loaded to your app execution environment, and executed by your app whenever a relevant data change is made.

With a traditional dbms you can get away with a bit less of an application data access layer because the built-in DDL can handle some things. But it can't handle that much, so you really want that application level layer anyway.

> and atomicity can't really be guaranteed

Right. That becomes a concern of the application (could be the data access layer, though, which I think you want whether we're talking about traditional db or sqlite).

As with many horizontal scaling designs we're taking about (1) choosing divisions to minimize the possibilities for inconsistencies; (2) eventual consistency.

In your specific example, I'd question whether that global data -- user name and email -- need to be in the per-user db at all. Links between databases might use uuids that don't change, which limits the tricky cases.


It most definitely does not. Yes, you will lose foreign keys across silos. But this will usually be one association, like a user id for example.

You are also thinking of this defensively, "x value cannot exist here if it doesn't exist in y store". Just think how you'd access this, and you'll realize at worst this creates a void of data which your app likely handles anyway.

If there are numerous associations required across these silos. You don't have the right silos yet.


> Be brave. Just try it.

This is the exact opposite attitude I would want to have when it comes to choosing the best way to store and manage my data. My data is actually important. I understand that might not be true for a lot of other people. I also have a lot of data, and non-trivial amounts of data has incredible inertia. It's very difficult to move and "redo" if you mess it up.

For all these reasons, I just use postgres.


On the flip side, sqlite is also a great choice in those circumstances. so regardless of attitude, outcome seems good to me.

incredibly well tested for scenarios where data loss is likely, code is in the public domain, available on all major operating systems, included in language standard libraries... it's one of the most ubiquitous pieces of software out there.


I'm not going to use a database that doesn't even have a proper date/time type and treats everything as a string.

ok, that seems like a bit of a moved goal post, but you do you.

is there any particular impact you see that having? Postgres has had it's share of data integrity issues due to things like locale differences between primary and replica, where strings would get sorted differently.

I'm not advocating for one over the other. I love both, but neither are perfect.

https://www.citusdata.com/blog/2020/12/12/dont-let-collation...

https://wiki.postgresql.org/wiki/Todo:ICU


Yes, we've used the Litestream + SQLite + Go webserver route in production for several years and while we've had to deal with all of these issues, on the whole I don't have a strong argument for going back and doing it differently.

I'm curious what issues the author had with high availability - I totally understand the issue in theory, but in practice our SQLite-backed service has had 6 minutes of downtime in the last 3 years when we were upgrading the host. That works out to being quite a few 9s.

I do agree with the article on a few points - distributed SQLite doesn't really seem all that worthwhile to me (yet, anyway).


It appears we are essentially stack-buddies (I use those exact technologies too). I have 0 issues with perf, and I love the hermetic unit tests that literally doesn’t even use disk (let alone networking).

However, I find it a bit annoying that I have to restore from backup for one-off queries and data exploration. Do you have some tooling or tricks to recommend?


We have some analytics queries powered by Airflow running on a separate box. We set up Litestream to replicate to that box via SFTP, and then dump changes hourly into Snowflake.

I'm curious what SQlite package do you use? Are you also using cgo or not? cgo has been a bit of pain point for me in the past with SQLite

Yeah cgo has been necessary for me, I forgot why but made that determination long ago. I have many pains with cross platform builds but never had an issue with cgo, and I’m targeting Android, iOS, macOS windows and two Linux flavors. I’m using the mattn/sqlite3 package.

> Removing the network latency is a massive gain, especially for small/medium apps, which is most sites.

No it's not. Network latency is negligible, assuming your database isn't halfway across the world or something.

And as soon as you have to scale to just two webservers, you've either got to implement a client-server network layer on top anyways, or else the complexity of a replication layer.

Just start with Postgres or MySQL in the first place. It's just as performant, just as easy to set up, and can actually scale as you need it. Don't be "brave" when there isn't even any payoff involved.


> Network latency is negligible

Yes but now you also have a network to maintain.

> Just start with Postgres or MySQL in the first place. It's just as performant, just as easy to set up

Yes it's easy to set up. But then you have to maintain it. It's another daemon that needs care.


You don't have a network to maintain. If you have a server, it's already on a network.

And you don't have to do any more maintenance on Postgres/MySQL than you have to do on SQLite. You can leave them running for years without touching them. Maybe you want to upgrade them, but maybe you want to upgrade SQLite too.


> You don't have a network to maintain. If you have a server, it's already on a network.

Well, I'm sure the db server has an IP address? Is it IPV4 or IPV6? How do you receive the IP address? Can it change or not? How do you secure the connection between your web server and the database? Should the database only accept network traffic from your web server or from the wide internet? If the former, you have to keep that in mind when adding a new web server. If the latter, make damn sure the credentials don't leak!

Congratulations, you now maintain a network!


You call that maintaining a network?

I call it five minutes of setup, if even.

And if you're dealing with SQLite, are you making backups somewhere on a network? Securely? With credentials? Where do you keep the credentials for your webserver itself? If you're running a website, you're dealing with networking and credentials from the start.


Yes, it's five minutes to set up now. But it's an extra liability to have to maintain the network in the future, when things inevitably change.

OK. Maybe an extra 10 min of work in the future?

Compare that to the days or even weeks it's going to take you to migrate to a client-server database platform as soon as you need to scale to two webservers. All the queries that need to be tested and modified as necessary... potentially different table structures... the migration of data... no thanks.


Most projects will never need to scale to two web servers. The ones that do will be successful enough to have the resources to migrate.

But I guess this all depends on whether it's B2B (likely not to have that many customers), B2C (potentially more users), or a hobby project (gimme operational simplicity!). And many other variables. YMMV!


What's difficult about schema migration? Not being snarky, genuinely curious :)

> What's difficult about schema migration? Not being snarky, genuinely curious :)

SQLite's ALTER TABLE support isn't on par with other dbs. That's primarily (perhaps solely) because it stores table definitions as their raw SQL, rather than in some "idealized form." When SQLite first needs to understand a table's structure, it reads the original SQL which was used to create that table. Retroactively modifying arbitrary string input from the user (in this case SQL code) is a minefield.

Edit: see <https://www.sqlite.org/lang_altertable.html> in particular the final section, currently titled "Why ALTER TABLE is such a problem for SQLite"


Interesting, thanks for sharing! This sounds like a nightmare haha

What kinds of problems does this cause in practice? Do you have any examples? Not questioning you, just curious to learn more.


honest but dumb question(s): how does sqlite handle "connections" from multiple instances of an application? will it constrain to single-instances (or single-machine) running the application?

The historical answer has generally been: it doesn't (at least, not well). The traditional recommendations were not to open multiple connections to the same file, and especially not to share the same file over a network filesystem. This remains the general advice, as far as I can tell, but it may be out of date against some newer features (like WAL mode) and better filesystems (like NFSv4).

You are still correct for network filesystems, but there are no issues opening multiple connections on the same host (even across OCI container boundaries).

There are no data corruption risks in general for multiple processes on the same local filesystem, but the lock obtained by write operations is pretty aggressive. Multiple readers are okay but a single writer will cause transient errors in all other concurrent read and write operations (edit: though you can turn this into just blocking instead with pragma busy_timeout).

> Multiple readers are okay but a single writer will cause transient errors in all other concurrent read and write operations

They don't cause "transient errors", they cause transaction failures, which you should be handling by retrying the transaction anyway.


If you could retry it, then obviously the error was transient. I don't know what the issue is with me describing it that way.

But you generally won't have to write retry logic if you have the file open in only one place. The other issues that can arise, like constraint violations or no space left on disk, usually aren't transient and thus can't be fixed with retry anyway.


> Multiple readers are okay but a single writer will cause transient errors in all other concurrent read and write operations

While in WAL mode, readers and writers don't block one another.

> WAL provides more concurrency as readers do not block writers and a writer does not block readers. Reading and writing can proceed concurrently. https://www.sqlite.org/wal.html


As long as the connections are from the same machine (local filesystem), it's fine.

Reads are concurrent, writes block reads (and other writes) while they write. (There's also something called WAL mode that enables reads during a write, which uses multiple files for each database.)


They don't just get blocked, they error out. This is a well defined error condition, so you can detect it easily, but it's not going to happen if you stick to the one connection rule. This means you have to intentionally think about handling it everywhere if you want to use another connection anywhere.

Edit: looks like you can set up blocking on the client using pragma busy_timeout [1] so the above is only true in default configuration

[1]: https://www.sqlite.org/pragma.html#pragma_busy_timeout


> This means you have to intentionally think about handling it everywhere if you want to use another connection anywhere.

so things like mutexes (one process, multiple threads) or mutexes over shared memory (multiple processes, multiple threads) ?

sounds complicated (and error-prone).

but at the end of the day i guess that kind of juggling has to be done somewhere, either a database server does it for you or you do it yourself in the database client.

i wouldn't mind sqlite on the server, but i'm not sure I would want to trade zero-downtime-deployments (spin up new version, move load from old to new, spin down old version) for single-file ease of operations.


You don't have to really think about it anywhere.

It's handled by the file lock. You don't need to track anything.

Yes there is a timeout value that will return an error if the database file hasn't unlocked in time. In Python the default timeout is 5 seconds, which is plenty if your queries execute on the order of milliseconds or tens of milliseconds.


We do zero-downtime deployments with a single Docker volume containing the db. Spin up a container running the new code, wait til it's healthy, then kill the old container.

You definitely want WAL mode (unless you’re accessing SQLite over NFS). It’s only not the default for backwards compatibility. I’d enable:

PRAGMA journal_mode = WAL; PRAGMA foreign_keys = 1; PRAGMA mmap_size = 1099511627776;

Unless you know a reason that you can’t (like NFS or using a 32-bit cpu).


> Unless you know a reason that you can’t (like NFS or using a 32-bit cpu).

interesting, why there's that limitation when nfs and a 32-bit cpu are involved?


WAL mode requires shared memory, so doesn’t work over the network.

mmap_size = 1TiB requires 1TiB of address space per connection. This is fine on 64-bit, unless you have millions of connections. Doesn’t work with 32-bit (4GiB) address space.

You want mmap_size to be greater than the size of your database. If you know your database will be small then you could set it to a smaller number, but you could still become address space constrained with many connections.


You can have as many connections from as many processes as you want.

Good points, but it doesn't mention the reason I think that Fly (and Cloudflare) are doing server-side SQLite in the first place, which is to have a db "at the edge" to go with their edge compute model

Whether that's necessary or useful for your use case is another question


Also "at the edge" sometimes often implies read-heavy applications and places where you can bake essentially "static" DBs. In theory all the SQL query power benefits of a full RDBMS, but with the presumed simplicity of (mostly) static file deployments via traditional CDN file copies.

Many applications are going to be write-heavier than that. Many applications don't need to distribute database deployment by "simple file copying" to potentially hundreds of "edges". But there are certainly interesting use cases for that when you can be read-heavy, deal with write trade-offs, and can easily "simple file copy everywhere" a DB. (Though I think most of the ones I find interesting are more Datasette and its plugins than Fly/Cloudflare style.)


Also with mentioning Turso's efforts in a similar vein.

Definitely interesting conceptually. But I question if something like cockroach could be just as good in practice.


I think the point of "edge db" is mostly to reduce latency

Otherwise the point of the article stands - using a conventional db server solves all the bits that are difficult with sqlite as a backend


I've started using SQLite for all new Apps a couple of years ago and haven't looked back. It was the catalyst for us to get off AWS (and RDS) where we now deploy (.NET) Docker Apps with Kamal to a shared Hetzner Linux VM which uses Litestream for replication to Cloudflare R2.

We've adopted several ways to workaround SQLite's single concurrent writer [1] including: splitting system functionality in multiple databases, separate locks per db, performing writes within a single worker thread for each db, etc.

Despite being significantly cheaper, its next to zero latency actually performs better than our Apps using RDS. We also benefit from monthly rolling databases for request logs and db jobs and make use of SQLite's Application defined SQL Functions (i.e. calling C# functions in SQL queries) for some features which isn't really possible with other networked RDBMs.

[1] https://servicestack.net/posts/scalable-sqlite


There are exceptions to every rule but most of the time you're way better of when you use simple and reliable technology. If you need to copy something between servers you probably don't need anything more complicated than rsync. For blob storage you probably don't anything more complex than S3. For database backups do you really need multi-region streaming backups or does mysqldump suffice?

Every product has limitations. Well designed and simple products/tools have limitations that are easy to understand and easy to work around. The more complicated your architecture gets the more devilishly complicated the edge cases become. That's where you get punished for choosing complexity.

MySQL and Postgres are larger than Sqlite by more than an order of magnitude. What do you get in exchange for all those lines of code? Do you actually need the extra functionality? If you do, great. If you don't, choose the simplest solution that works. When you use simple tools changing your mind later is cheap.


Proper concurrency, and the ability to scale out your applications across multiple servers / nodes. I’ve never worked on any application that had only a single web server, and yes, most of my applications have had to scale out a few times under load— even the smaller ones. Doing this with Postgres is trivial. Doing this with SQLite requires engineering. I’m not scared of engineering. I just don’t think it’s warranted when you have such an obvious, tried and true off the shelf database like Postgres available to you.

Scaling with SQLite and a datastore-per-user model is trivial. If you build your app as if it's using Postgres when you're using SQLite, then, yes, you'll suffer. If you pick SQLite, you should pick it because you will actually build your application in the way that makes most sense with SQLite.

Datastore-per-user makes sharding/scaling trivial, and tends to drive an architecture that is easier to reason about. You're never going to e.g. accidentally leak data between users because of a broken query, for example.


> MySQL and Postgres are larger than Sqlite by more than an order of magnitude.

But they're still small, since you can run them on the smallest available cloud VM's and have plenty of memory/CPU/disk left over for webserving.

> What do you get in exchange for all those lines of code? Do you actually need the extra functionality?

The ability to scale beyond a single webserver.

It's just as easy to use Postgres as SQLite, but by choosing SQLite you've made life much more complicated for yourself if you ever need to scale to two servers. Why would you inflict that headache upon yourself?


The smallest available (burstable) instance type on AWS (t2.nano) has 1 vCPU, 3 CPU credits/hour (~= 5% average CPU utilization), and 512 MB of RAM. I don't think you're going to run very much on there and definitely not with room to spare.

You can run MySQL on that perfectly fine, together with Apache and PHP or something.

If you've just got a small website without tons of traffic, it's no problem at all.


One alternative is to use pglite [1] which is Postgres compiled to WASM. It can act as a sqlite replacement during development, then when you are ready to move to a server model your SQL is already in Postgres format.

[1]: https://pglite.dev/


In the WASM Component Model, there's a Postgres example available where the database runs in Docker. I'm hoping they whip up a version that uses SQLite instead, but with a similar WIT interface, and drop the Docker nonsense.

Isn't its use limited to Typescript?

Or anything that can embed a WASM runtime I imagine.

They're considering a native build as well, search term "libpglite".

Though I feel like providing it as a library would work very well initially. Roll a suitable minimal WASM runtime and a WASM blob into a single package, separately for every language.


This is something that I am a huge fan of, particularly from a latency & performance perspective. NVMe + SQLite is a hell of a package deal. If you just need to go as fast as possible no matter what, this is a great place to start. At the highest end the hosted solutions can outpace SQLite but it's like that meme [0].

That said, I'm working on a new contract/project that is somewhat antagonistic to the idea of using SQLite on the server. The biggest reasons are due to employees needing direct (transactional) access to the live data and the ability to do online schema changes.

There are ways to force this with additional tools, but I'm finding that just paying for MSSQL provides a lot of uplift on this particular problem. The possibility that other employees already have experience with tools like SSMS should not be overlooked when making technology decisions. People are generally the biggest bottleneck in any business.

[0] https://knowyourmeme.com/memes/look-what-they-need-to-mimic-...


Good article.

I currently use Litestream (https://litestream.io/) for efficient SQLite backups to S3, and I'm happy with that.

For me, the main point to consider wrt SQLite is if you need high availability or not. And how easy it is to setup litestream and persistent storage.


IMO it's always going to be a problem if you rely on hype instead of common sense. All the hype around SQLite is because you often don't _really_ need to have high availability, online schema changes, etc, and 99% of the web sites can easily fit on a single (large) server.

It's also worth mentioning that it doesn't meab that if you're not using SQLite you automatically get backups, replication, high availability, etc. There are few (if any) production-ready databases that are even capable of high availability, and those who are typically do this at a cost of catastrophically low write performance, especially compared to SQLite WAL mode.


The "Whys" aren't strong.

#1 makes the point that once you use it in production you start needing production things, like backups and horizontal scaling, and those can get complicated, but for non-sqlite dbs you can use a managed solution to handle the complications for you. But (1) managed solutions for sqlite exist too, and, (2) anyway, successful projects soon out-grow the limitations of managed solutions, so that's a limited runway -- probably about the the same runways where unmanaged sqlite remains simple.

#2 points out "Migrations are not great in SQLite." Migrations aren't great in any db. I think easy migrations can be easier in some other dbs, but those weren't the problem. Tough migrations are essentially rooted to evolving understanding of business rules, which no dbms can make "great" to do.

#3 points out that decoupling storage from compute is a good idea. But with traditional client/server dbs, compute is soldered directly to storage, so that's a big win for sqlite. I know they aren't thinking about the db server server as compute, but they certainly should be -- it's compute in all the ways that hurt you and help you.

#4 points out that migrating away from sqlite is extra work. But that's not really an independent argument for not choosing sqlite since it presupposes sqlite was the wrong choice in the first place.

I have nothing against, say, postgres. But while sqlite on the server has some real drawbacks, it has some really nice properties too. Its simplicity and flexibility make it a strong default choice these days.

I would be wary of the current managed providers -- their edge bias tends to lead you to system designs that aren't great.


You can’t build fault tolerant system without network lag. Your hardware can fail you any second, and every backup will be inconsistent with your local db. To solve it, you need to have a system with proper control plane and consensus algorithm like paxos inside.

SQLite is great for simple projects that can lose data, but, if you need to have no downtime, you need a distributed db.


In practice, there are very few systems that need "no downtime", and even fewer that achieve it, even with a distributed db.

SQLite is great for databases that don’t need management and won’t change their structure and don’t need remote access.

what databases are those

I've used SQLite in a project once, client-server application configuration management interface; I picked SQLite because the application I was replacing also used it, and because it was standalone; the application was run on-premise, and one of the issues was that the version of PHP it ran on was horribly outdated (plus it was PHP, so it needed some setup work). My replacement was intended to be fully standalone, no external dependencies, and so also no requirement to have a database server running somewhere.

I would've preferred something like postgres but embedding that and running it with redundancy is nontrivial compared to sqlite. Mind you I never did get as far as running sqlite redundantly (using e.g. rqlite), I quit after 2.5 years since the project was bigger than any project I've ever done, and those were fully staffed teams.


There are also rqlite and dqlite. SQlite is good enough for small to middle projects, which are the vast majority.


from that page: A program accessing the database is free to decide how the data is to be stored in a record. Berkeley DB puts no constraints on the record's data. The record and its key can both be up to four gigabytes long.

Berkeley DB supports database features such as ACID transactions, fine-grained locking, hot backups and replication.


Ever since I saw Cloudflare D1, I have been obsessed by distributed SQLite. Primary reason is the cost of managing and owning it is relatively low, but then it was still not easy managing Rqlite. I wanted managed and I recently found Turso and it is amazing, but it's hard to understand the reads and writes with such databases. I knew about Fly.io investment in this area but I did not know about the offering Litefs. The author made some good arguments and probably it is wise to listen about the challenges and think twice before making it a default database out of sheer excitement and cost.

https://litesync.io/en/ is a mature way of distributing SQLite with arbitrary topologies and no modifications to client code. Supports multi primary write and DDL replication.

It’s a paid solution, but that follows the model of all commercial SQLite mods (one time license fee).

I use it, though just for a simple case of having a live replica of a production server running SQLite so I can try out migrations and generally work in an isolated copy.


I once built an app for compliance reasons that needed a little bit of dynamic data and quarterly updates that allowed us to stay in business in a few of our minor markets (worth cool tens of millions) and a SQLite DB with some basic turn key CRUD admin tools was fantastic. The data was larger than a simple YAML/TOML file would be tolerable to live with and needed some love and attention by lawyers. I could update/let them update the DB in dev and check in the DB file (it was smaller than our very unreasonably large TOML file we used for app config) and had the same app in every environment without managing MySQL or Postgres migrations. The whole thing happily ran on three EC2 micro instances and given the relatively low traffic (low thousands of visitors per month) it was a great set of technical decisions.

For that narrow use case at least, I am a huge believer in doing this.

Another prod use case for SQLite I’ve implemented was in taking a market specific search engine component (eventually replaced with Elasticsearch after a few years of R&D, but I digress) where I needed to implement sharding, reordering of results, and a few extensions to the engine that nominally relied on INI files for the limited extensibility it did support. The engine itself lacked any network coordination so I scheduled things through updated files lobbed into S3 and let a scheduled job pull them down to each machine. If a specific shard was to be out of service that data was in SQLite along with what operations to run. There are a lot of reasons for why this was the case and happens a bit over a decade ago so there are much better ways of doing this now. RQLite didn’t exist yet for example and there was a very dumb license restriction keeping MySQL and Postgres off the table (kill me). Simply syncing SQLite files across S3 sufficed since the engine itself had a more severe write lock on indexing than SQLite did. I could just swap the files while I waited for that operation to complete. Since I could actually change how SQLite worked, some full text search operations moved to SQLite since the licensed product wasn’t good at that as it was marketed/intended to be used for its faceted search and proprietary ontology. Overall SQLite was the nicest part of that experience as well. Well at least until we moved to Elasticsearch and I no longer had insane license terms I had to code around.


I've only really suggested sqlite for a client/server web app once. It was mostly as the application has data which was naturally referred to specific events and needed a good arrival strategy.

I felt that being able to run the event and simply copy out the data at the end was a very prudent option.

I'm the end was IMO a very convoluted solution and a custom archival format. Combined with most of the application logic in stored procedures.

To this day I wish I'd fought that battle a bit harder.


This is all true. Like most people here I'm deeply, deeply in love with SQLite, but using it on the server is not worth it, exactly for the reasons listed in the post.

To paraphrase: a big part of the appeal of SQLite is that there's nothing to set up. But if one wants to use it in production then that advantage is gone.


SQlite on the server is worth it when the server side is an application and you treat SQlite as a per user datastore for an application that is mostly per user, or where shared data across users can be treated as a queuing/async messaging problem.

Decomposing a system to treat it as a per-user application working on per-user data works for a whole lot of problems and solves a whole lot of problems. It scales effortlessly. I've done that kind of system even with flat files, handling millions of user accounts on a set of servers with an aggregate capacity less than my current $300 mini PC.


> where shared data across users can be treated as a queuing/async messaging problem

Really asking: Why would you want that complexity when a single database that provides ACID with multiple users can provide that out of the box?

Sqlite shines when it's on the edge (be it a fat desktop app, or a thin-client which can hold a cache, or any other modern definition of edge), and while I absolutely adore the tech[0], I don't get why people try to repurpose it to simplify server architecture, especially in this day and age even the most database with the most complex installation is one docker pull away.

[0]: I honestly think Sqlite is one of the best software ever developed.


You're assuming complexity, whereas my experience is that it reduces complexity. E.g. you get infinite scalability. You get no need to ensure all your queries are scoped correctly, because they can't directly access other data.

Most large systems ends up with async processing and/or eventually sharding anyway, and then you get to restructure around those flows anyway, while still having the complexities of semi-centralised databases.

The world is async - once you embrace async designs from the start and design your UX around accepting that there are very situation you need global serialisation of events, it gets a lot simpler.

> Sqlite shines when it's on the edge (be it a fat desktop app, or a thin-client which can hold a cache, or any other modern definition of edge),

Which is not materially different from a server spinning up to serve requests for a single user.

> I don't get why people try to repurpose it to simplify server architecture, especially in this day and age even the most database with the most complex installation is one docker pull away.

Because maintaining a large database cluster and scaling it to large user sets, and ensure the application software is using it correctly to avoid problems is a lot of work. I built my first datastore-per-user system in '99, without the luxury of SQlite. It was a revelation. It was a fundamentally so much more pleasant way to work.

In some cases, datastore-per-user is the right pattern. In other cases datastore-per-tenant. In some cases, a big RDBMS is still the better option. But don't dismiss the former two without actually exploring it. Over the years, I'm finding it fits more and more places, especially thanks to SQlite.


Realistic applications can be mostly per-user data, but with a residue of global data (e.g. the tables that list all those beautifully segregated users). Even if per-user databases work well and can be managed simply (e.g. doing backups by briefly taking them offline when idle and copying entire files) the global part usually has more requirements (e.g. cannot be shut down for maintenance, requiring "live" backups) and requires greater complexity (for example sharing it through a traditional DBMS network server or through application-layer services).

There's rarely any reason for not being able to shut them down, because you can trivially replicate. E.g. I've built systems where the only global piece of data was which backend held which user, and that data was trivially lazily updated via gossip-style protocols. Login requests were broadcast, and the backends responded, and the central services acted only as "backups" storing the most up to date info seen in case a backend happened to be unavailable. The whole thing was trivial to set up.

(And you can back up sqlite "live" just fine with some care if you want to, or things like litestream).


That's a pretty good take. Would make sense for a whole range of apps focused on 1 user or client. Accounting software is personal, company A doesn't need to query/join with data of company B.

I am currently experimenting with multi read/single write via async FastAPI endpoints and my benchmarks tells it will scale well enough for the first 6 month of anticipated growth, while major part of the time will be beta testing users.

In any case, we do things that scale at start by principle.


To my surprise, it is performing better than a ZMQ based solution I begun with (in terms of concurrent connections).

I don't know why, but much more appealing to me is PostgreSQL on the server.

Something I've always wondered... How do downstream DB integrations like Fivetran or other ELT services play with SQLite? It seems like that's another consideration for running it in production at a real company.

I don’t think most touch it. As much as I’ve always thought embedded dbs have a place in data analytics and warehousing, I think the issue is that it’s really a sql api on top of a file system and it’s easiest to just copy the db with something like rclone and treat it like a file. Like others mentioned there’s also litestream for replication.

Fivetran, I believe is still involved and committed to duckdb. Duckdb has a SQLite extension. So, it’s entirely possible they could add SQLite support. I would say the issue then is familiarity and demand within the industry.

https://www.fivetran.com/blog/how-fast-is-duckdb-really

They also have a talk given at duckcon that goes over the architecture.


gobackup[0] and sqlite_rsync are good tools to backup sqlite databases.

[0]: https://github.com/gobackup/gobackup


SQLite is great for what it is and its stated purpose, but it's a fundamentally poor database if you actually care about your data. I'm not talking about durability. It's perfectly durable, as in it won't lose your data (well, it might). I'm talking about the lack of actual data types and "everything is a string". I'm talking about the general flimsiness. Hipp even said that is a non-goal because it will bloat the embedded object size in your binary. Because that's what it's for....

Insert a string into an int column? Go ahead, it doesn't care. Use a date function on a text value it doesn't recognize as a date? Go ahead, you'll just get a NULL instead of any indication that your value is wrong for what you think it should be.

It's fine for the address book on your phone, or the bookmarks in your browser. But terrible for server things that are important.


That's not quite true. There is support for "strict" tables which does have more stringent rules around types

https://sqlite.org/stricttables.html


Mongo DB also disables safety features by default to improve their benchmarks. But Mongo DB gets lots of criticism for that and SQLite gets none.

I believe foreign key constraint validations are also disables by default.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: