Hacker News new | past | comments | ask | show | jobs | submit login
HC-tree is an experimental high-concurrency database back end for SQLite (sqlite.org)
524 points by nalgeon 11 days ago | hide | past | favorite | 190 comments

Concurrent writes and replication!

See also this thread on the SQLite forum: https://sqlite.org/forum/forumpost/d9b3605d7ff40cf4

The linked performance graphs look remarkably promising, indicating hctree with 25% smaller pages scales to 1 million transactions per second:


Nit: 1kb pages vs 4kb pages, so 75% smaller (25% of the size).

Thank you. One of my (granted, OCD driven) peeves is the recent ubiquity of framing that sort of comparison as "3 times smaller!". Ohmygod, no.

Oops, yes thank you for the correction!

It makes me wonder why I would use anything else?

For workloads that are on the same node -- Putting the database in the same memory space as my code as well as the same I/O and scheduling contexts isn't always what I want. I want to be able to prioritize database I/O far above app I/O. In addition, I want to make sure my database isn't prioritized "lower" than my application, as this can trigger priority inversion issues.

Also, the scaling characteristics of database nodes vs service nodes are very different. I might not want to put reliable storage into every node I have, but I'm probably willing to shell out a couple bucks for my database server's storage.

An example from my last project that used a lot of postgres. The database node was about ~20x the price as the workers.

From an operational perspective, having the database be maintained separately from the app also has its benefits.

To add to this, you want you application to be stateless and immutable. This makes your life so much easier in so many different ways:

(1) You can know exactly what code was running when debugging past issues

(2) Scaling can be done dynamically and relatively easy. Just turn on/off application nodes.

(3) Rollbacks are (relatively) trivial. Just go back to the know good immutable version

(4) Memory leaks can be "solved" by restarting nodes every so often

(5) If, for whatever reason, your application node dies you shrug your shoulders and start a new one

Once you add DBs with state to your application all of become much harder. You want that state in something that's rock solid and rarely changes to avoid all the above problems.

You just gave me a tiny database education in about 8 sentences.

One scaling paradigm that's occurred to me for SQLite would be to run lots of small replicated instances in a system like Nomad or Kubernetes and even allow many running on the same physical node. So you're replicating multiple times on the same box which wastes some space but storage is cheap.

Wouldn't be good for huge data sets but these days huge means like a terabyte or more. You don't really have to start thinking about sharding or smart fractional replication until you hit many terabytes.

Even many successful SaaS companies don't have databases that big. Several hundred gigs these days is not too big for a brute force replicate everywhere approach and it comes with some benefits like less locking and stronger fault tolerances. If you have 50 replicas the odds of them all being screwed at once gets very low.

Postgres focuses more on this use case. It's probably still going to do a better job with concurrency, or at least have more features. It'll also cost more to pay for as a service, so you decide if it's worth. Whereas HC-tree is an obvious win if you're already using SQLite. It's fine for two things to approach similar use cases from different sides.

Personally I want the opposite direction, a sort of Postgres-lite that uses a single flat file and lives inside a lib, for things where the power of the full thing isn't needed yet but I want to design my backend code against a Postgres interface. I've seen Wasm Postgres examples, and maybe there are others, but nothing mainstream. And yes there was this SQL standard that everything was supposed to conform to, but that went out the window long ago.

I have an absolutely horrifying Python script that makes a little sort-of-container that spins up a brand new MySQL instance, initializes it with a provided set of statements (from mysqldump), and runs it. And cleans it up when it’s done.

And I have a wrapper that does it in a shell.

This makes it possible to run MySQL-dependent tests, in parallel, on a dev machine, without too much misery.

(MySQL is awful. Don’t use it for new designs. MariaDB isn’t obviously better. AWS RDS MySQL is, in my considered opinion after wasting days on it, even worse. RDS CANNOT INGEST THE OUTPUT OF mysqldump WITHOUT ILL-CONSIDERED AND POORLY DOCUMENTED KLUDGES!!! WTF!! Don’t get me started on how defective every MySQL replication mode is.)

(Don’t get me started on how defective every MySQL replication mode is.)

I was actually reading up on this this the other day (doing a bit of a breadth first search of multi-master replication in industry).

I thought the way it worked was... interesting. Sounds like it's just as interesting in production.

Yeah, I use Postgres. It's good. MySQL only has the advantage of being offered by more cloud providers and thus being cheaper sometimes.

Doesn't MySQL also not need you to configure a bouncer/proxy if you want more than a dozen connections at a time?

Postgres in my experience can handle like 100 connections with the default config I get installing it on my Ubuntu server, but at some point you want pg-bouncer, and I vaguely remember MySQL having higher limits. Idk, it's been a while cause I've almost always limited the number of connections in my own code instead.

I use an embedded postgres testing library for the JVM that does something along those lines.

Well no actually it just unpacks the tar file in a temp dir and runs the full postgres, but it mostly feels like what you describe (minus the single file part) and starts surprisingly fast. That would totally work for a little proof of concept (https://github.com/zonkyio/embedded-postgres)

I used hypersql/hsqldb for unit tests. The Oracle mode was usually close enough. It’s pretty close for postgres. Sometimes there’s small issues (I think using “INSERT INTO … RETURNING” required column name in caps for some reason) but overall it’s almost always what you want for unit testing with java.

H2db has a Postgres-compatibility mode too that would work for local testing.


That's pretty neat, but not the same level of portability as SQLite.

Duckdb should be compatible with postgres to a high degree.


> SQL Parser: We use the PostgreSQL parser that was repackaged as a stand-alone library. The translation to our own parse tree is inspired by Peloton.

Watch out with DuckDB that if you have an open reader you can't open a writer. A writer must be the only connection.


> direction, a sort of Postgres-lite that uses a single flat file and lives inside a lib,

Can you expand on this? How did you package postgres as a lib?

I haven't done this, but I'd want it. I'd like to write my code to use a Postgres database, except swap in something similar to a SQLite database instead when I'm either testing or not yet at scale. SQLite does everything in just a library linked with your code, unlike Postgres which is a separate running server. I can't simply use SQLite for this because its API is different from Postgres's.

Doesn't almost every language have some kind of abstraction layer that allows this?

JDBC/ODBC/DB-API/Sequelize ORM (ignoring the ORM part).

This used to be one of the major use cases for SQLite - running automated tests and being able to throw away the database file (or manage it at different points in the tests).

Yeah, but the abstraction layers get in the way if/when I need to optimize queries or use special features, plus it's easier to just write SQL. I'd rather target a specific DBMS and rewrite my queries if I ever change it.

> ignoring the ORM part

As it should be :)

No, all (?) these things let you pass raw SQL straight to the database.

In the case of JDBC and ODBC (and I think DB-API) thats all they do.

But raw SQL made for Postgres doesn't necessarily work for SQLite and vice versa. That's what I mean by the SQL standard being thrown out the window. You might be able to write SQL that works in both if you're careful and avoid a lot of features.

ODBC doesn't help you if you use any sort of pgsql-specific extension, however.

You should read on the ancient tech called ODBC.

I almost said ODBC too but wasn't sure if I fully understood it.

even when this is not the case, sqlite works well enough to replace maybe 80% of heavier database setup out there.

Scaling and ability to scale is very valuable, but I would guess massive premature scaling is much more common than scaling cliffs - especially as they don't get reported or have nearly as much visibility.

the fact that SQLite is a library (embedded DB) means only one node can access the DB at a time. This would not be appropriate for many apps that require HA.

Only one node can write to a given SQLite database at a time. SQLite handles concurrent readers swimmingly.

The fact that SQLite is a library has not that much to do with its concurrency model?

What do you do if the node running the SQLite instance goes down?

The same thing you do in a multi-reader, single-writer MySQL cluster, except with SQLite there's one less thing to go down, because the database is embedded in your process.

I’m not familiar with MySQL specifically, but people worried about HA have standby replicas. Enterprise DBs make this almost trivial to do and it’s very possible with PG extensions. I’m sure someone has built a system like that with SQLite, but it’s much less ideal than other database systems.

Exactly what is "less ideal" about it? SQLite has distinct advantages over n-tier databases: it's embedded, so you're not round-tripping to the network for fetches, which makes simple query patterns that are untenable for things like Postgres reasonable for SQLite. It's also, of course, faster (about as fast as you can possibly get) for some access patterns.

We’re not talking about latency though. The argument is around the HA story. Totally agree that if your app and DB are on the same machine and if HA isn’t a hard requirement, SQLite is probably what you should go to before PG.

However, if you can’t have any downtime, you’d have to build some bespoke solution to make sure you have standby replicas ready to go if there’s an issue on the primary instance. For a lot of enterprises, that’s a total nonstarter.

We're going around in circles. You can replicate SQLite. What makes replicated SQLite "less ideal" than n-tier databases?

Just playing devil's advocate (I don't have much of a dog in the fight):

* SQLite's replication isn't built in, you have to use another library (LiteStream, LiteFS, etc) to achieve it. That in itself is an indication it's not inherently designed for such, and at a minimum will require a lot more testing to ensure your HA/DR failover scenarios work like you're envisioning. Perception matters.

* Litestream, LiteFS today are in "beta mode", fairly DIY, mostly CLI and YAML configs, making it comparatively complex and error-prone to configure vs. most n-tier databases which offer OOTB replication via GUI, and it's usually just a checkbox with cloud PaaS databases.

* "No one ever got fired for choosing IBM" there are tons of blogs and walkthroughs of setting up HA/DR for all the major RDBMSes, it's a safe choice if you run into any issues, whereas you might find yourself deep in an obscure dev forum debugging Docker dependencies or networking protocols. It's a risk.

* Needless to say, commercial support also matters.

* On LiteStream's own tips page ( https://litestream.io/tips/ ) there's a subheader called "Data Loss Window", if I were competing in any way with this solution, I think I'd bring that up pretty early.

Broadly, I think it'd be fine for a lightly written, mostly read OLTP system that's relatively stable. Anything else and I might get itchy about relying on just the WAL for replication.

SQLite fills a great ethos of "no muss, no fuss" but replication a lot times is 100% fuss.

It seems like you keep missing the SQLite caveat — it’s the same process as your application. HA doesn’t make sense here in the context of SQLite by itself.

Here’s how I think about it, but I don’t do replicates with SQLite often, so I might be wrong.

You really can’t compare SQLite to a Postgres or MySQL or Oracle server. With a database server you worry about replication/HA etc for the server. If the server is up, it can be queried, independent of the application.

With SQLite, you worry about replicating the data. If your application is running, it can access the data. You just need to make sure the data is consistent between nodes (if you use multiple nodes).

The data can be replicated, but if your SQLite node goes down, there’s no need to worry about the application because that means the application is also down. Similarly, if the application is up, your database is also up!

I mean this is our entire point. SQLite isn’t built for HA. You’re stuck replicating it yourself or using non-enterprise-ready solutions. It’s not SQLite’s fault, it’s just not the tool for the job if you need HA.

SQLite is one of the most reliable codebases in the industry. You're ignoring what the comment says. In an n-tier architecture, both your app server and your database server needs to be HA, because they're servers. With SQLite, the app server is the database server. It's still HA, there's simply one fewer thing to HA-ify.

I don't know what "non-enterprise-ready" means, but it sounds like a selling point to me. The XML can come later.

When the app server is stateless, HA for the app server is usually a near trivially easy problem to solve though, and a whole lot of web devs are used to treating the database as a black box that they'll treat as always up so I get the fear and uncertainty of suddenly having the HA setup interfere with "their" domain.

Coming at it from a devops angle, I'm used to being the one to have to deal with the database HA myself, and then having one tier less becomes a lot more appealing, and not really more scary than ensuring any other database setup is resilient.

The comment I replied to upthread was also talking about n-tier replication features that weren't built in.

LiteFS isn't the only way to replicate a SQLite database.

Do you want Oracle servers? Because "Nobody got fired for choosing IBM" is how you get Oracle servers.

If you read just one additional sentence in to the Litestream "Data Loss Window" section, you'll find: "This is how many replication tools work including PostgreSQL’s log-shipping replication".

I don't know what you mean by "relying on just the WAL". The WAL is the database. What else would you replicate? How are you contrasting this against Postgres replication?

Sometimes I honestly think that the problem is that the SQLite ecosystem makes it easy to understand how things work, so people get scared. Whereas other things comes off as magic and magic is comforting, you don't need to know how it works, you can just trust it.


On so many topics, this is the answer. People want a pleasant fiction of clean abstracted magic.

It is so hard for me to imagine ever trusting magic, but I think you are very right that people seem to prefer it. I think this discrepancy explains so much of the conflict I get into in my career.

> LiteFS isn't the only way to replicate a SQLite database.

What are other ways to replicate that you're aware of?

Another approach I like is to use a replicated block storage like GCP Regional Persistent Disks. This is making failover trivial. But then I'm not sure the replica can be used as a read-only copy.

> What are other ways to replicate that you're aware of?

Litestream (wal -> object storage), Rqlite (wrap in http layer + raft + streaming changes between nodes; rqlite kinda defeats much of the purpose for me), Dqlite (in-process server-thread + raft + streaming changes; dqlite obscures that there are server threads plus network communication involved, but it's there), LiteReplica (VFS extension streaming to a replica), Verneull (VFS extension -> object storage), Sqlite Ceph VFS (VFS extension storing to RADOS)

Very different sets of tradeoffs. E.g. several of these requires you to run a separate tool to do the replication. Several depends on object storage either for the primary storage (the Ceph extension) or as the replication target / to restore from, which may be fine if you already have a compatible object store. Some can use e.g. Consul to control which server is the writer.

> Another approach I like is to use a replicated block storage like GCP Regional Persistent Disks. This is making failover trivial. But then I'm not sure the replica can be used as a read-only copy.

Block storage that can guarantee in-order read/writes is fine. Network filesystems that can guarantee in-order writes and support proper locking can also allow writes (with the usual caveats) from multiple nodes. The problem here is that you really must be sure, and often - e.g. for NFS - it can appear to work just fine but be just broken enough you'll run into it at just the wrong moment.

I will throw my weight behind this reply as what I would have written if I had the time and energy :)

Abstractions. Sometimes you need them. Different operating systems, CPU architectures, applications, clients, libraries, versions, hell different companies running proprietary apps, licensing, regulations... A database with a dumb network socket on a random host is fairly well abstracted.

As far as HA goes, that abstraction can then go toward allowing your apps and DB to be built for different reliability/durability/etc requirements, potentially saving money, gaining better performance or reliability, meeting weird compliance requirements, contractual requirements, vendor or software limitations, etc.

It's also easier to troubleshoot and maintain the database separate from the application in some circumstances. If SQLite is bundled with your application, it might be harder to deal with a problem affecting one or the other. In traditional organizations, you can hire people to manage databases and people to manage applications, rather than one person who has to deal with both, or even weirder, two people that would be working on one db-in-your-app.

Sometimes this is true, and sometimes it's what CORBA advocates say you to get you to adopt object brokers. Sometimes your app benefits from an explicit networked database tier, and sometimes it doesn't. I'm fine if you think most of the time it does. I'm not fine if you think it rarely does: that's not true. There are big advantages to not having a separate database tier, and the industry is sleeping on them.

(To be clear: the company I work for sells a Postgres service, and we benefit from n-tier databases! I have strong opinions here because I think this is an interesting controversy.)

"you’d have to build some bespoke solution"

Or you could use one of the many off the shelf solutions that has sprung up over the last few years.

What is the point of having a HA database if your frontend app isn't running to be able to query it?

SQLite gives the guarantee that if your frontend app is running, then the DB is available as well. Try doing that with a non-embedded DB.

The point of disaggregating your replicated database is you can scale your app and db tiers independently. Typically you only want to have a handful of database replicas (e.g. to failover after your primary fails), especially if you want strongly consistent replication. But your stateless app tier may require many more servers.

yes this is true, I was imprecise

The only real difference between SQLite and something like Postgres or MySQL is that Postgres and MySQL bundle a networking layer on top of their embedded DB engines.

If you are worried about high availability, chances are you too are building a networking layer on top of a database, so what do you need two networking layers for?

Do people build their own networking layers above SQLite (besides for fun)? If you’re doing that, then you need to build some kind of replication story. At that point it makes sense to just use PG or MySQL

Unquestionably. Web apps and the like which are little more than networking frontends to a database are probably the most common type of software written these days.

Ah, we’re talking about different things. I took “network layer” to mean something like the ability to connect directly to the database over the network, not through some shim application.

Meaning something like rqlite[1]? The age of fat desktop clients all connecting back to the central SQL server is long behind us, so yeah there is probably little reason beyond fun for something like that these days, but where there is fun!

[1] https://github.com/rqlite/rqlite

Ha! PostgreSQL still needs some kind of replication story! ;)

(PS. Mostly just pointing out that replication is still a bit awkward.)

Most apps nowadays are HTTP APIs on top of a database, which means we have one networking layer – the HTTP API – on top of another networking layer – the database protocol. The idea behind using SQLite is to just skip the latter, which is simplifying the stack significantly and reducing the database latency, which is making things like 1+n not a problem anymore.

With a tool like LiteFS, it is now possible to get replication done by a sidecar process: https://fly.io/docs/litefs/

It's because people are using super slow languages which incur lots of overhead in running the queries and formatting the results into human readable form. As such they need many more machines to run these webapp machines than they do to run the database itself.

I broadly agree, however running a cache/networking layer is difficult, and is handled well by other open sourced implementations.

If someone already wrote that networking layer, why would I want to do it again? And run into all of the bugs that they already discovered and solved?

Agreed, although we are having this discussion now exactly because those same problems are starting to get solved for SQLite in an open and polished way, so it is no longer a case of needing that second networking layer to integrate with such solutions.

And stored procedured, debugging, JIT compilation, extension in multiple languages, ....

May I introduce you to https://litestream.io

Litestream didn’t exist until late 2020, meanwhile postgres has existed for decades.

Moreover, SQLite requires the place you run your application to have durable storage, which is a huge departure from the status quo.

It’s definitely neat, but the stack as a whole doesn’t strike me as mature enough to replace Postgres just yet.

You're writing this as if LiteFS and Postgres were basically the same thing, and the selection criteria just boils down to the Postgres pedigree and maybe all the Postgres-specific SQL features. But that's not the case at all. The difference between replicated SQLite and Postgres is that SQLite is in-core: it doesn't have to round-trip on the network to fetch data; it can burn through a large set of N+1 queries faster than Postgres can handle a single select. The difference is that SQLite is much faster.

You sacrifice things to get that speed (Postgres features, set-and-forget write concurrency). Nobody is saying there's no reason to use Postgres, or maybe even that Postgres is the right call most of the time. But the idea that SQLite is rarely appropriate for concurrent serverside applications? It's received wisdom and it's wrong.

Somebody across the thread actually suggested that WordPress was an example of the kind of application that SQLite wouldn't work for, that needed an n-tier database. (Leave aside the fact that WordPress doesn't support SQLite, has instead a longstanding MySQL dependency). WordPress! WordPress is a perfect example of a concurrent serverside application that probably should almost exclusively use SQLite. As I said in a different comment: the whole movement towards static site generators is, in large part, a reaction to how bad n-tier databases are for a very large, popular class of concurrent serverside applications.

My second paragraph was specifically talking about the gigantic difference between operating a stateful application vs. a more typical stateless app + stateful DB. Most people aren’t deploying to Fly.io (yet!), so making Litestream work sounds pretty high risk to me :) combined with the lack of pedigree I led with, I just don’t see litestream as a good answer to “I need a HA database solution” for most people.

For most of the world (outside silicon valley) stateless is not the typical. The typical is stateful app + stateful DB run on stateful VMs.

For most enterprises, running a stateful app with SQLite is not only possible, it's easier than running stateless apps as their entire IT infrastructure is setup to support stateful apps and it's what their sysadmins know how to do.

I have tried to deploy stateless docker container apps into a typical enterprise network and it's a nightmare. The people done understand it, are not interested in understanding it, none of the infrastructure supports it and if you want to setup something like Kubernetes then you end up in configuration hell as none of the easy to deploy standard setups actually work out of the box in an enterprise network, everything ends up needing highly customized configuration to work.

Litestream isn't a Fly.io feature.

No, but the ability to deploy it on stateful containers is.

I'm sorry, I don't understand any word of that sentence (for instance: we don't run containers?). We sponsor Litestream and LiteFS (in the sense that we pay Ben to work on it full time), but no part of Litestream is baked into or in any way coupled with our platform. If you can run Litestream on Fly.io, you can run it anywhere else.

> for instance: we don't run containers?

You "transmogrify container images into Firecracker micro-VMs" - surely you knew what I meant?

And there is no denying that fly.io is a _particularly_ well suited platform to run stateful SQLite apps (with or without Litestream), since fly makes running stateful applications easy. I think thats awesome.

I'm really confused why you seem to be trying to distance yourself from that, and why you are latching on so hard to my passing mention of fly. I wasn't making a dig, just pointing out that most people are stuck with (arguably worse) platforms that don't lend themself to stateful, "in-core" databases.

My confusion is that there's nothing container-y about Litestream. It works anywhere SQLite does. I like our platform a lot, but I mean, Litestream works just as well on AWS.

I'm just confused, is all. :)

If your application exceeds the capacity of a single server (e.g. a Java Web Service that is backed by a database), do you see SQLite as a useful solution for that case?

That seems like when you'd need a standalone database, no?

If you think the capacity will never exceed that one standalone database. If not, you'll need to plan for sharding anyway, in which case sqlite can work just fine.

SQLite has been around for a very long time, it is easily the worlds most deployed SQL database. Litestream is just a way to do live sync/replication to make HA easier.

There are certainly use cases where SQLite is not a good fit. There are likewise use-cases where PostgreSQL is not a good fit either.

One is not better than the other, it just depends on your particular needs for that particular project.

My point is, SQLite is a totally sane and reasonable storage/DB solution for many server side applications as well.

> SQLite requires the place you run your application to have durable storage, which is a huge departure from the status quo.

Having durable storage used to be the status quo for decades. It changed only recently with cloud providers (or their customers) pushing for stateless workloads because they are much easier to manage in a distributed system than stateful workloads.

I don’t see how sqlite replicated to other servers can be faster than any other relational dbs replicated on those same servers, at least without taking more chances to loose data. The limit is most likely network latency anyway, isn’t it ?

Other relational databases require network roundtrips to fetch information from the database once it's replicated. SQLite doesn't. It's a huge performance difference, to the point where it changes how you access your database; for instance, there's not much point in hunting down N+1 query patterns anymore.

hm i see.

so the cost is in the initial setup of the client instance, which needs to download the whole database to its local sqlite ?

From what I understand, that streams the file to a separate location, but it doesn't change the file-locking, does it?

Right, for a SQLite based app, you are mostly limited to 1 instance running at any given time(there are multi-node stuff shoved behind SQLite done by 3rd parties, but that's a whole diff. can of worms).

The point being, you run 1 instance and you have litestream replicate to your backup node for HA purposes.

Now you are going to think, what about scaling?!!? How many apps actually need to scale beyond 1 node? Very few. If you run into scaling problems, that is when you deal with solving your scaling problem. Because scaling is unique to each application. But before you remotely think about scaling past one node, you just build the node bigger. Nodes can get pretty massive these days.

There's rqlite (https://github.com/rqlite/rqlite), which looks cool on the surface but... it's a layer on top of sqlite, at which point you should probably think long and hard whether it's still the right tool or you should switch to e.g. postgres.

To get HA, it is now possible to replicate the SQLite database using LiteFS, which is similar to how you would get HA with MySQL or PostgreSQL: https://fly.io/docs/litefs/

Not only that, but typical devs write slow application code, which needs to be distributed horizontally across many workers.

Indeed, SQLite should not be used anywhere that scaling or concurrency is a concern, or in general any external customer facing features.

In a similar vein, an internal subsystem can probably do away with a dedicated database server.

Latter is likely more common.

That's silly. SQLite works fine for all sorts of customer features, and, deployed carefully, is fine with concurrency (writes are serialized, per database, but SQLite makes it easy to use multiple databases). SQLite has this weird reputation, I think, because frameworks like Rails used it as their "test" database, and the industry has broadly slept on how capable SQLite is in serverside applications.

I think if you use a lot of database features, sqlite isn’t really powerful enough for a lot of use cases IMO. Or atleast that’s the impression I have.

Which features would those be?

Concurrent inserts. There are applications where you don't care about that, but that excludes a lot of web-apps, or things running with multiple processes.

If that changes, I'd agree with your point, but currently that's a big constraint.

Maybe I'm using SQLite wrong, but I have a website scraper. It's written in python and so uses processes for concurrency to avoid GIL issues.

On my laptop, 16 processes accessing the same SQLite db file can do something like 5,000 write transactions/second. That seems like plenty for many purposes?

A caveat: it's in WAL mode, with synchronous mode set to NORMAL. This means that it's not fsyncing the WAL on each individual transaction commit. Instead, they get synced in batches roughly every 200ms. If you pull the power plug on the server, it may lose some amount of writes that happened in that time. Terminating the process is fine.

Yeah, yeah, we're giving up some amount of durability. In practice, not very much (200ms of transactions) and only very rarely (when the server dies). That seems like a fine trade-off for many backend processes and many web apps. You could tune this, as well: for the transactions that you _really_ want durability on, run a checkpoint before rendering your response to the user.

How exactly does that exclude a lot of web apps? Most web apps are very read heavy. And SQLite serializes writes to individual databases, not across all of them. It's trivial in SQLite to query across multiple databases.

Further, some form of write serialization is common to most database clusters: the most common cluster architecture is single write leader, multiple write replicas, which have the effect of serializing writes as well. Those clusters are all also taking advantage of the read-heaviness of applications.

I did an experiment of switching an app which uses Postgres to use SQLite. I ran into issues where I had a long running transaction in one process doing writes and another process doing reads and writes. As soon as the second process tried to do any writes, I would often get the queries running into locks and aborting. Otherwise, it worked great for reads and sped up the app a lot as there were lots of N+1 queries issues.

This is also after playing with settings about WAL mode.

It would not be a viable solution to split these tables into separate databases to avoid the concurrency issue. Also, most of the issues involve writes to the same tables.

Is there honestly any way to get that working well with current SQLite or do we need to wait for hctree or begin concurrent to mature?

I would be concerned that to get this working in its current state would require a large refactoring of the app which would require a lot of effort and would introduce limitations that make coding more difficult.

Many web apps also write a lot. Are you suggesting to have individual databases per process and then read from all of them and aggregate the data you want in memory? That would sound too much like I'm building a database, which I don't want to do.

I'm not talking about clusters and replication, just a single database on a single server, like Wordpress for example. It can handle large amounts of concurrent users if you're using InnoDB (which has been the default since how long?) and nobody will even notice that there's anyone else because it's not locking the table (or the whole database) to write, and you don't have to wait or deal with frequent busy timeouts.

Most SQLite writes can be measured in microseconds. If your app is doing less than 1,000 writes a second you probably won't even notice that SQLite is serializing your writes rather than applying them concurrently.

I must've been doing something wrong then, I've regularly hit longer locks as soon as the DB grew, to the point that it was not usable for me with multiple processes writing in the same DB because it was constantly locked by one.

Having one writer and multiple readers was fine, but even having a dozen writers who update something a few times per second quickly led to the DB becoming the bottleneck. I tried different modes, but saw no difference so I just gave up on using it concurrently.

I spent a bunch of time wrestling with this last year (and into this one), and the general solution I settled on was to serialize writes through a single writer; multiple reader threads and multiple writer threads (or processes) is not a fun configuration to put SQLite in.

There were 500MB-40G databases, probably ~1.5 updates per second at peak load?

I think "web app" is too broad a category to have this kind of discussion over.. for a lot of web apps 1.5 updates per second could be considered a reasonable limit. Many people work on web apps that do 100x that and don't consider themselves to be working at any particular level of scale. 1.5K updates/sec is not uncommon either, in different settings -- high-traffic internal web apps can hit that in a single company, and the people working in those contexts don't consider that too special either, not to mention public facing websites like GH where many people start their careers without having seen anything different.

I'm not benchmarking SQLite, I'm relating the environment in which I was tuning SQLite concurrency, because I feel like it doesn't mean much to say "here's what worked for me" without at least some basic facts about what I was working on. SQLite itself can do tens of thousands of writes per second on commodity hardware without breaking a sweat.

There's more to most applications than interactions with the database. Once your web app exceeds the capacity of a single machine, is SQLite a good choice for state?

I think he’s saying to shard writes by tenant, or something else that makes sense for your app - so you can route reads to an individual shard. Same as you’d do to shard any other database, if a little more DIY.

Speaking of Wordpress though, that’s a pretty great example of a web app that typically handles almost exclusively reads. I’ll bet almost any Wordpress instance would work great on SQLite.

WordPress is the archetypical application that SQLite would kill for, but for historical reasons, WordPress has always wanted MySQL. It's funny, because there's a whole movement towards static sites that is, in part, a reaction to how bad the n-tier architecture behind WordPress is!

They're working on supporting SQLite as a backend: https://make.wordpress.org/core/2022/12/20/help-us-test-the-...

It's crazy that this is only happening as of 2022, right? They could have kept a whole generation of web sites on WordPress by eliminating the MySQL dependency, and gained a performance and deployability advantage in the process. Even multitenant WordPress would have come out great with SQLite. I'm waiting for someone closer to WordPress to tell me how wrong I am about this, because I have to be missing something.

I'm not sure that it'll bring a lot of sites back to or keep them on WP, but it'll be great for testing. Much of it is always integration testing with WP because of how filters and actions work your code is rarely ever isolated, SQLite will make that much easier and faster.

It's also really not a big switch for the most part. I'm not aware of that many MySQL-specific things happening in WP. Things like fulltext indexes aren't in core, so it's really just using a different driver to get the basics working. I'm sure it would break a lot of plugins, but that's fine, they have platform requirements anyhow, adding "does work with sqlite" isn't a huge step and can likely be automated to a high degree (if the plugin never uses wpdb->query, ->get_results etc, it's compatible).

And to be clear, this would be a single instance of the WordPress php application that all traffic would route to, and it would access the SQLite database on that node?

If that node failed, then the php application and database would be restored to a new machine and traffic would resume?

won't long running write transactions block eachother? Not all apps can avoid the need for these kinds of transactions.

Care to justify that? The evidence seems to be against you for most read heavy workloads, especially if latency is any concern.

>"It makes me wonder why I would use anything else?"

this: "There is no support for recovering from a power failure or operating system crash. Such an event may corrupt the database."

You don't always need this at the db level. depending on your use case backups or high availability in multiple datacenters can do fine.

But then what is the recovery scenario.. the story could get hairy and complex quickly.

you either restore from your snapshot/backup of the database, or replicate from the HA node that didn't fail.

it depends on the nature of your app as to whether either of those scenarios are acceptable. the former comes with downtime.

Because it won't have as many years of rigorous testing and polishing as current back-end or PostgreSQL or whatever.

Perhaps because as of today:

> The prototype is still missing features and contains many bugs. But it works well enough to test.

Lack of JSON CONTAINS would be a good reason.

Do these developers, who require JSON CONTAINS, know about basic normal forms and RDBMS in general?!

I understand, that sometimes for performance it is needed to de-normalize huge datalakes, but typical website controlled by CMS is not huge by any means.

Heh, this is the reason Takahe is currently broken on SQLite. Hashtags currently need a JSON contains.

It's possible to simulate JSON contains in SQLite using a devious subselect.

Here's an example:


That page shows every row where the tags column (a SQLite JSON list of strings) contains the string "Data Import"

Here's the underlying SQL:


      :tag in (
    order by
I haven't tried it yet, but I have a strong hunch that this could be dramatically accelerated using a join against a FTS table on that "tags" column in order to first filter the rows down to a likely subset.

Django would need to be patched to do that, since the default SQLite provider would make it hard to do that cleanly.

Here’s the relevant Takahe issue: https://github.com/jointakahe/takahe/issues/325

Yep. That’s exactly what I stumbled upon myself.

Exactly what I was thinking!

Hugged to death currently:

The server load is currently too high. Please try again later. Current load average: 5.190000. Load average limit: 4.000000

The server is on Linode. Looking at the stats, they appeared to have had an outage of some kind last night at about the time you got this message. Everything seems to be running fine now. Please try again.


Should've used SQLite on the backend.

SQLite is a great database engine. I often wonder if it could even be better without extreme backward compatibility.

It could be nice to have Foreign Keys enabled by default, and make all tables strict by default. A new binary format that reduces the size of database by leveraging table strictness and type info could also be great.

Agreed, but priorities: I think that new high concurrency backend is a more important improvement.

This is very interesting. Thank you for submitting this and thank you for working on this.

I am highly interested in parallelism and high concurrency. I implemented multiversion concurrency control in Java.


I am curious how to handle replication with high concurrency. I'm not sure how you detect dangerous reads+writes to the same key (tuples/fields) across different replica machines. In other words, multiple master.

I am aware Google uses truetime and some form of timestamp ordering and detection of interfering timestamps. But I'm not sure how to replicate that.

I began working on an algorithm to synchronize database records, do a sort, then a hash for each row where hash(row) = hash(previous_row.hash + row.data)

Then do a binary search on hashes matching/not matching. This is a synchronization algorithm I'm designing that requires minimal data transfer but multiple round trips.

The binary search would check the end of the data set for hash(replica_a.row[last]) == hash(replica_b.row[last]) then split the hash list in half and check the middle item, this shall tell you which row and which columns are different.

My assumption here is that the replication model described is one leader, multiple replicas - so all writes are applied to the leader and the followers are only used for reads.

This is a pattern commonly used with other databases such as PostgreSQL and MySQL which avoids a huge amount of complexity involved in multi-leader setups.

There's also the hot standby leader configuration that the higher Heroku Postgres tiers do. The standby leader receives writes asynchronously, so during failover you can actually suffer a little permanent data loss. Really should have a bigger warning label.

I only ever do single leader, multi follower like you describe.

I'm not sure about the details about spanner, but one thing still remember - it was much cheaper to read stale data (<10secs ago), as it would've been available from any server, not just the leader, and also that transactions were limited to 20mb or was it 40mb. Although that was 8 years ago, so things might be different now.

Check out: Living without atomic clocks: Where CockroachDB and Spanner diverge


I always thought it would be awesome if SQLite got something like this. The decision to use SQLite for small web apps would become a lot more compelling if there was no "I have to migrate to postgres when I scale" anxiety.

Sqlite is maybe the most extreme success example of a real MVP: solving efficiently a narrow problem and adding complexity incrementally as needed.

While not without limitations (eg it is conceivable that the concurrency backend is made more difficult by earlier choices) overall it is a remarkable pattern which has produced gems like spatialite.

Is there anything else like this out there?

What does it mean for SQLite to use something as a backend? How much of it is still SQLite at that point?

Bloomberg's comdb2 does something similar, using the SQL grammar and query planner but swapping out the storage layer. To quote from their paper:

"In SQLite a query plan is compiled into a bytecode program that is executed by a virtual machine, named Virtual DataBase En-gine (VDBE)... In the original SQLite implementation the VDBE operates on the real B-trees of the database... In contrast the Comdb2 VDBE operates on the cdb2 abstraction of a table during the OCC phase of transaction execution..."


My guess would be the b-tree layer down based on "the database backend module that it uses to store b-tree structures in its database file" and the architecture diagram at https://www.sqlite.org/arch.html

This feels to me a little bit similar to MyISAM v.s. InnoDB in MySQL - two different "backends" that are low level details about how the data is actually stored and accessed.

> How much of it is still SQLite at that point?

I'm guessing just the interface. If there are any well-understood quirks of the regular implementation, maybe they'd want to replicate them too, idk.

Can someone explain why you would want a database larger than 16 TiB? Imagine there's database corruption, or you need to change the schema or something... why would you want to deal with that on an insane scale like hundreds of TiB? If you split your data up into smaller databases or shard, there's less chance of a problem affecting all of your data, and it being so gigantic that it takes forever to fix it.

You want high-density storage to minimize the number of servers required when data models get large. If you limited yourself to 16 TiB, then it would not be uncommon to require thousands of servers for large data models. If your database supports high-density storage then you could fit this in a single rack. For somewhat smaller data models you could fit this on a single server. It doesn't just save a lot of money, some edge environments -- essentially single servers -- are already PiB+. Storage bandwidth is typically greater than network bandwidth these days so many workloads scale well this way.

Quantity has a quality all its own. Fragmenting your database over thousands of servers introduces new classes of failure and greatly increases the risks of other types of failures that you don't have to deal with when you only have dozens of servers or a single server.

High-density storage does require strategies for recovery that reflect the bandwidth-to-storage ratios. Designers are usually aware of the implications and architect appropriately.

16TiB is not a problem. I have admin'd postgres databases larger than that. If you use good hardware (ECC RAM) and an error-correcting filesystem (ZFS) you are generally good to go.

Typically for postgres, DB corruption issues are isolated to single tables/indices, not the entire database. Sqlite's architecture might be a bit less "resilient" in this sense (I'm not sure, just guessing), but it also might have a lower chance of introducing corruption in the first place.

I guess it really depends on your use case? Something like GMail it probably makes sense to have a database more than 16 TiB.

John Titor told me that this isn't an insane scale at all. You know, on the IBM 5100 computer they could only run like 1000 KB.

Sometimes I don't mind waiting for the query to finish. Makes for a good coffee break.

Are there benchmarks comparing it against postgres/mysql/mariadb?

Once this becomes the default, next is networking code.

There was a time where desktop client software connecting to the RDBMS server directly was all the rage, necessitating that the RDBMSes of the era have networking capabilities to listen for those clients. But is there any point in a modern database providing that when we've all but completely moved the client->server networking into the application layer?

As cyclical as tech can be, I find it difficult to think that we'll go back to that style of architecture. There are just so many benefits to having an application server sit in between the client and the database. Going back to green screen terminals hanging off a central computer seems more likely, and that model doesn't need its database networked either.

Not having a network-capable database limits you to only one server machine, which is a problem if your application is more than just a thin layer on top of a database. You usually want a separate worker process (or multiple) to do heavy computation in the background and often, it's useful to run it on a dedicated machine.

For example, you might run the app server and DB on an expensive highly available server, but keep your background workers on cheaper spot instances that might randomly get killed. Or you're running some heavy processing that needs different hardware like a GPU for machine learning.

You could, of course, implement API endpoints for that in your app, but then you need to keep updating it when the workers change. Or you could implement a more generic DB access endpoint, but then you're just reinventing a networked DBMS, but with worse performance and no library support.

> Not having a network-capable database limits you to only one server machine

Not true. What has been true until recently is that replication solutions for SQLite have been lacking, necessitating the use of an old school database to lean on existing multi-machine database solutions, but that is no longer the case.

> You usually want a separate worker process (or multiple) to do heavy computation in the background

Also not true. Network overhead introduces things like the n+1 problem which only adds unnecessary computation and breaks the relational model, requiring some pretty insane hacks to work around. What is true is that SQLite write contention has been a problem, necessitating the use of an old school database in high write environments, but that is also a problem on its way out.

Networked databases have been the norm until recently because they're older and more mature and the only practical solution in many cases thanks to that maturity, but SQLite is starting to gain the same maturity and we are now able to rethink the model and gain the benefits of data locality.

Besides, if you really need networking for your niche use case, and have some reason to use SQLite, there is already rqlite. Tightly coupling networking with the database engine doesn't add any value. They are decidedly distinct layers of concern. If Postgres or MySQL were rewritten from scratch today, even if protocol compatible, no doubt the separation between the database engine and the networking layer would also be more explicit.

While that's true, I also think that the current rage is stateless applications, outsourcing state to centralised database/cache/object storage layers over a network connection.

That allows you to easily spin up/down e.g containers, migrate the application between nodes etc without having to maintain a high performance network file system.

Using sqlite in such an environment requires you to either solve the persistence layer in general (≈ rook/ceph) or specialised for sqlite (≈ litestream/litefs/rqlite depending on your need). I think one probably could argue that rqlite is essentially a network protocol for sqlite.

This really depends on what you mean with "application". It is very popular to have the "backend" stateless, yes, but it is also popular that each service has their own dedicated database, and to consider the service/application to be the combination of the stateless backend and the dedicated stateful database.

The application is stateful even if the backend is stateless.

While that is very true, I think my point still stands.

Even if you consider the application to be the combination of stateless binary + stateful auxiliary services (like a database), the statelessness of the binary allows for some neat things.

I replied to a comment talking about how network based database access is not necessary, and I think it very much is if you want to host stuff with the current popular architecture. Unless you implement a general shared persistence layer on which you can run e.g sqlite, but that dosen't seem better/easier to me.

Ah, SQLite3 grows up! That is, with this (and external projects like rqlite), SQLite3 really begins to break out of the embedded RDBMS space.

The SQLite4 stuff I guess didn't just get completely abandoned. I guess it was a good experiment.

I hope this gets stable enough so I can test it with an e-commerce demo just for fun!

Do we have any idea what the timing might be?

Unbounded spinning is absolutely disastrous when cores are oversubscribed. If their spin locks are really unbounded, this does not belong in production.

At the bottom of the linked article:

> In all cases above, "spin-lock" actually means invoke the SQLite xBusy callback if one is registered, or to literally spin-lock otherwise.

Seems the wait behavior is user configurable.

That seems to only be true for the BEGIN EXCLUSIVE statement, which sounds like it's the entire point. Sqlite is generally a bit more bring your own batteries by design, so if you care you would add timeouts at the writer. Unless I'm missing something.

rqlite[1] author here. It's very interesting see this. Not sure whether it can bring more functionality to rqlite, but it's definitely cool to see the SQLite database evolve with more client-server applications in mind.

I will say that one misconception people often have about rqlite is that it's a drop-in replacement for SQLite. You see that a bit even in the comments section here. That was never its goal. Its goal is a lightweight and super-simple to use distributed relational database. Using SQLite as the database engine was the perfect way to meet that requirement.

[1] https://www.rqlite.io

I will say the promise of "improved concurrency" is very interesting to me. If nothing else, I would definitely pull that into rqlite (probably as an option). That said, working with Raft log has always been the real bottleneck, not SQLite.

Here’s an odd ball question: why did K8s devs go with etcd and not SQLite?

Etcd is a key-value storage based on a distributed consensus algorithm. This (sort of) means that you can have a source of cluster truth in the db and make sure it lives through certain distributed system bad scenarios.

Sqlite is a (relatively) simple in-process sql rdbms. You get a proper reliable sql on a single computer, which is practically useful for many more cases.

It is possible to build distributed dbs on top of a bunch of sqlite instances but one would need to solve these very distributed system problems that etcd solves.

In fact, some of the popular distributed dbs use etcd or zookeeper or custom implementations of the same algos o top of traditional rdmss.

K8s is a distributed system that needs to have a reliable view of its nodes. Clearly, a concensus-based db is necessary.

So that would make etcd consistent and available (CP of the CAP)?

edit: almost -- available and partition tolerant at least in the default config:


CP doesn't exist.

CAP is about what happens in case of a partition (P). You can either remain available (AP) our you can remain consistent (CP).

If you don't have partitions then CAP doesn't apply. But I wouldn't recommend depending on that. For example Google's Spanner is CP, but they work really hard to make partitions rare, giving them 99.999% availability.

etcd is strongly consistent, it will only be available if you have a majority of nodes up and reaching each other (CP over AP). If you don't have a majority it will not be available (but won't get inconsistent).

This is more resilient than SQLite of course, which runs on a single node, and therefore can't remain available if any node (= the only node) fails.

There are distributions of Kubernetes that use SQLite instead of etcd, for example k3s: https://docs.k3s.io/architecture

The drawback is that it's not replicated. With etcd, you set up multiple control-plane nodes (ex: 5) and you can tolerate a minority of them being down without any effect (ex: 2/3 down). With SQLite, you can only have one control-plane node, and if it's down your control-plane is unavailable. This is fine for small clusters where you don't want to run multiple control-plane nodes, or you don't think it will go down, or you don't mind fixing it.

Because they are different things.

it's still pretty amazing that their "stock" (actually a tuned branch) can do 75k updates/s. That's enough for a lot of people

This sounds a lot like Postgres or MySQL. Is the point of this to have compatibility with the huge number of existing SQLite applications?

The SQLite team are good at listening to the people who use the software.

People who use SQLite want to be able to scale it to handle higher concurrent write loads, including from multiple processes.

Yeah, that's what I was thinking. If you're already using SQlite, here's a faster version of it.

SQLite is MUCH easier to deploy.

But HC-tree sounds like it's a full-on server like Postgres, and unlike regular SQLite. Maybe I'm mistaken.

Edit: I was wrong, this makes more sense now. This isn't comparable to Postgres.

Wondering what the HC in HC-tree is for?

Looks like it's right there in the title: "high-concurrency".

Sounds like the right answer. I feel stupid now :) Thanks!

Apple powers iCloud Core Data with SQLite. This way it uses the same DB engine locally on your Mac and iPhone, and the server backend in their data centers.

I don't know if they've swapped the backend, or just use it as-is. SQLite can scale vastly if you use it for file IO and do sharding/replication on top of it as a separate concern. I've done ad-hoc (extremely crude) versions of that in some apps.

Isn't each cloud CoreData store just for a single user, so high concurrency isn't needed?

Yes, we have natural sharding by user. However Core Data also can create shares which multiple users can access. I don't think this typically causes high concurrency, it's more like a Dropbox file share or a Google docs file share for a few users at a time.

This definitely helped them make the choice to run SQLite on the server.

But I still think it's super cool. We're talking 1.8 billion devices with data stored as SQLite files in the cloud.

I would assume single account but multiple devices, so you could have multiple connections syncing at once.

Doing that serially wouldn't be particularly slow. Most likely 4 devices at most.

For all SQLite fans here: We are currently working on a full-fledged SQLite hosting platform. Check it out at https://airsequel.com. So far we provide an instant GraphQL API, an SQLite workbench, a spreadsheet UI, and dashboards.

Why would you ever want to put a network between SQLite and the application logic. That is SQLite’s one cool trick as it were. At that point the db seems like an implementation detail rather than a selling point.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact