See also this thread on the SQLite forum: https://sqlite.org/forum/forumpost/d9b3605d7ff40cf4
Also, the scaling characteristics of database nodes vs service nodes are very different. I might not want to put reliable storage into every node I have, but I'm probably willing to shell out a couple bucks for my database server's storage.
An example from my last project that used a lot of postgres. The database node was about ~20x the price as the workers.
From an operational perspective, having the database be maintained separately from the app also has its benefits.
(1) You can know exactly what code was running when debugging past issues
(2) Scaling can be done dynamically and relatively easy. Just turn on/off application nodes.
(3) Rollbacks are (relatively) trivial. Just go back to the know good immutable version
(4) Memory leaks can be "solved" by restarting nodes every so often
(5) If, for whatever reason, your application node dies you shrug your shoulders and start a new one
Once you add DBs with state to your application all of become much harder. You want that state in something that's rock solid and rarely changes to avoid all the above problems.
Wouldn't be good for huge data sets but these days huge means like a terabyte or more. You don't really have to start thinking about sharding or smart fractional replication until you hit many terabytes.
Even many successful SaaS companies don't have databases that big. Several hundred gigs these days is not too big for a brute force replicate everywhere approach and it comes with some benefits like less locking and stronger fault tolerances. If you have 50 replicas the odds of them all being screwed at once gets very low.
Personally I want the opposite direction, a sort of Postgres-lite that uses a single flat file and lives inside a lib, for things where the power of the full thing isn't needed yet but I want to design my backend code against a Postgres interface. I've seen Wasm Postgres examples, and maybe there are others, but nothing mainstream. And yes there was this SQL standard that everything was supposed to conform to, but that went out the window long ago.
And I have a wrapper that does it in a shell.
This makes it possible to run MySQL-dependent tests, in parallel, on a dev machine, without too much misery.
(MySQL is awful. Don’t use it for new designs. MariaDB isn’t obviously better. AWS RDS MySQL is, in my considered opinion after wasting days on it, even worse. RDS CANNOT INGEST THE OUTPUT OF mysqldump WITHOUT ILL-CONSIDERED AND POORLY DOCUMENTED KLUDGES!!! WTF!! Don’t get me started on how defective every MySQL replication mode is.)
I was actually reading up on this this the other day (doing a bit of a breadth first search of multi-master replication in industry).
I thought the way it worked was... interesting. Sounds like it's just as interesting in production.
Well no actually it just unpacks the tar file in a temp dir and runs the full postgres, but it mostly feels like what you describe (minus the single file part) and starts surprisingly fast. That would totally work for a little proof of concept (https://github.com/zonkyio/embedded-postgres)
> SQL Parser: We use the PostgreSQL parser that was repackaged as a stand-alone library. The translation to our own parse tree is inspired by Peloton.
Can you expand on this? How did you package postgres as a lib?
JDBC/ODBC/DB-API/Sequelize ORM (ignoring the ORM part).
This used to be one of the major use cases for SQLite - running automated tests and being able to throw away the database file (or manage it at different points in the tests).
> ignoring the ORM part
As it should be :)
In the case of JDBC and ODBC (and I think DB-API) thats all they do.
Scaling and ability to scale is very valuable, but I would guess massive premature scaling is much more common than scaling cliffs - especially as they don't get reported or have nearly as much visibility.
The fact that SQLite is a library has not that much to do with its concurrency model?
However, if you can’t have any downtime, you’d have to build some bespoke solution to make sure you have standby replicas ready to go if there’s an issue on the primary instance. For a lot of enterprises, that’s a total nonstarter.
* SQLite's replication isn't built in, you have to use another library (LiteStream, LiteFS, etc) to achieve it. That in itself is an indication it's not inherently designed for such, and at a minimum will require a lot more testing to ensure your HA/DR failover scenarios work like you're envisioning. Perception matters.
* Litestream, LiteFS today are in "beta mode", fairly DIY, mostly CLI and YAML configs, making it comparatively complex and error-prone to configure vs. most n-tier databases which offer OOTB replication via GUI, and it's usually just a checkbox with cloud PaaS databases.
* "No one ever got fired for choosing IBM" there are tons of blogs and walkthroughs of setting up HA/DR for all the major RDBMSes, it's a safe choice if you run into any issues, whereas you might find yourself deep in an obscure dev forum debugging Docker dependencies or networking protocols. It's a risk.
* Needless to say, commercial support also matters.
* On LiteStream's own tips page ( https://litestream.io/tips/ ) there's a subheader called "Data Loss Window", if I were competing in any way with this solution, I think I'd bring that up pretty early.
Broadly, I think it'd be fine for a lightly written, mostly read OLTP system that's relatively stable. Anything else and I might get itchy about relying on just the WAL for replication.
SQLite fills a great ethos of "no muss, no fuss" but replication a lot times is 100% fuss.
Here’s how I think about it, but I don’t do replicates with SQLite often, so I might be wrong.
You really can’t compare SQLite to a Postgres or MySQL or Oracle server. With a database server you worry about replication/HA etc for the server. If the server is up, it can be queried, independent of the application.
With SQLite, you worry about replicating the data. If your application is running, it can access the data. You just need to make sure the data is consistent between nodes (if you use multiple nodes).
The data can be replicated, but if your SQLite node goes down, there’s no need to worry about the application because that means the application is also down. Similarly, if the application is up, your database is also up!
I don't know what "non-enterprise-ready" means, but it sounds like a selling point to me. The XML can come later.
Coming at it from a devops angle, I'm used to being the one to have to deal with the database HA myself, and then having one tier less becomes a lot more appealing, and not really more scary than ensuring any other database setup is resilient.
LiteFS isn't the only way to replicate a SQLite database.
Do you want Oracle servers? Because "Nobody got fired for choosing IBM" is how you get Oracle servers.
If you read just one additional sentence in to the Litestream "Data Loss Window" section, you'll find: "This is how many replication tools work including PostgreSQL’s log-shipping replication".
I don't know what you mean by "relying on just the WAL". The WAL is the database. What else would you replicate? How are you contrasting this against Postgres replication?
On so many topics, this is the answer. People want a pleasant fiction of clean abstracted magic.
What are other ways to replicate that you're aware of?
Another approach I like is to use a replicated block storage like GCP Regional Persistent Disks. This is making failover trivial. But then I'm not sure the replica can be used as a read-only copy.
Litestream (wal -> object storage), Rqlite (wrap in http layer + raft + streaming changes between nodes; rqlite kinda defeats much of the purpose for me), Dqlite (in-process server-thread + raft + streaming changes; dqlite obscures that there are server threads plus network communication involved, but it's there), LiteReplica (VFS extension streaming to a replica), Verneull (VFS extension -> object storage), Sqlite Ceph VFS (VFS extension storing to RADOS)
Very different sets of tradeoffs. E.g. several of these requires you to run a separate tool to do the replication. Several depends on object storage either for the primary storage (the Ceph extension) or as the replication target / to restore from, which may be fine if you already have a compatible object store. Some can use e.g. Consul to control which server is the writer.
> Another approach I like is to use a replicated block storage like GCP Regional Persistent Disks. This is making failover trivial. But then I'm not sure the replica can be used as a read-only copy.
Block storage that can guarantee in-order read/writes is fine. Network filesystems that can guarantee in-order writes and support proper locking can also allow writes (with the usual caveats) from multiple nodes. The problem here is that you really must be sure, and often - e.g. for NFS - it can appear to work just fine but be just broken enough you'll run into it at just the wrong moment.
As far as HA goes, that abstraction can then go toward allowing your apps and DB to be built for different reliability/durability/etc requirements, potentially saving money, gaining better performance or reliability, meeting weird compliance requirements, contractual requirements, vendor or software limitations, etc.
It's also easier to troubleshoot and maintain the database separate from the application in some circumstances. If SQLite is bundled with your application, it might be harder to deal with a problem affecting one or the other. In traditional organizations, you can hire people to manage databases and people to manage applications, rather than one person who has to deal with both, or even weirder, two people that would be working on one db-in-your-app.
(To be clear: the company I work for sells a Postgres service, and we benefit from n-tier databases! I have strong opinions here because I think this is an interesting controversy.)
Or you could use one of the many off the shelf solutions that has sprung up over the last few years.
SQLite gives the guarantee that if your frontend app is running, then the DB is available as well. Try doing that with a non-embedded DB.
If you are worried about high availability, chances are you too are building a networking layer on top of a database, so what do you need two networking layers for?
(PS. Mostly just pointing out that replication is still a bit awkward.)
With a tool like LiteFS, it is now possible to get replication done by a sidecar process: https://fly.io/docs/litefs/
If someone already wrote that networking layer, why would I want to do it again? And run into all of the bugs that they already discovered and solved?
Moreover, SQLite requires the place you run your application to have durable storage, which is a huge departure from the status quo.
It’s definitely neat, but the stack as a whole doesn’t strike me as mature enough to replace Postgres just yet.
You sacrifice things to get that speed (Postgres features, set-and-forget write concurrency). Nobody is saying there's no reason to use Postgres, or maybe even that Postgres is the right call most of the time. But the idea that SQLite is rarely appropriate for concurrent serverside applications? It's received wisdom and it's wrong.
Somebody across the thread actually suggested that WordPress was an example of the kind of application that SQLite wouldn't work for, that needed an n-tier database. (Leave aside the fact that WordPress doesn't support SQLite, has instead a longstanding MySQL dependency). WordPress! WordPress is a perfect example of a concurrent serverside application that probably should almost exclusively use SQLite. As I said in a different comment: the whole movement towards static site generators is, in large part, a reaction to how bad n-tier databases are for a very large, popular class of concurrent serverside applications.
For most enterprises, running a stateful app with SQLite is not only possible, it's easier than running stateless apps as their entire IT infrastructure is setup to support stateful apps and it's what their sysadmins know how to do.
I have tried to deploy stateless docker container apps into a typical enterprise network and it's a nightmare. The people done understand it, are not interested in understanding it, none of the infrastructure supports it and if you want to setup something like Kubernetes then you end up in configuration hell as none of the easy to deploy standard setups actually work out of the box in an enterprise network, everything ends up needing highly customized configuration to work.
You "transmogrify container images into Firecracker micro-VMs" - surely you knew what I meant?
And there is no denying that fly.io is a _particularly_ well suited platform to run stateful SQLite apps (with or without Litestream), since fly makes running stateful applications easy. I think thats awesome.
I'm really confused why you seem to be trying to distance yourself from that, and why you are latching on so hard to my passing mention of fly. I wasn't making a dig, just pointing out that most people are stuck with (arguably worse) platforms that don't lend themself to stateful, "in-core" databases.
I'm just confused, is all. :)
That seems like when you'd need a standalone database, no?
There are certainly use cases where SQLite is not a good fit. There are likewise use-cases where PostgreSQL is not a good fit either.
One is not better than the other, it just depends on your particular needs for that particular project.
My point is, SQLite is a totally sane and reasonable storage/DB solution for many server side applications as well.
Having durable storage used to be the status quo for decades. It changed only recently with cloud providers (or their customers) pushing for stateless workloads because they are much easier to manage in a distributed system than stateful workloads.
so the cost is in the initial setup of the client instance, which needs to download the whole database to its local sqlite ?
The point being, you run 1 instance and you have litestream replicate to your backup node for HA purposes.
Now you are going to think, what about scaling?!!? How many apps actually need to scale beyond 1 node? Very few. If you run into scaling problems, that is when you deal with solving your scaling problem. Because scaling is unique to each application. But before you remotely think about scaling past one node, you just build the node bigger. Nodes can get pretty massive these days.
In a similar vein, an internal subsystem can probably do away with a dedicated database server.
Latter is likely more common.
If that changes, I'd agree with your point, but currently that's a big constraint.
On my laptop, 16 processes accessing the same SQLite db file can do something like 5,000 write transactions/second. That seems like plenty for many purposes?
A caveat: it's in WAL mode, with synchronous mode set to NORMAL. This means that it's not fsyncing the WAL on each individual transaction commit. Instead, they get synced in batches roughly every 200ms. If you pull the power plug on the server, it may lose some amount of writes that happened in that time. Terminating the process is fine.
Yeah, yeah, we're giving up some amount of durability. In practice, not very much (200ms of transactions) and only very rarely (when the server dies). That seems like a fine trade-off for many backend processes and many web apps. You could tune this, as well: for the transactions that you _really_ want durability on, run a checkpoint before rendering your response to the user.
Further, some form of write serialization is common to most database clusters: the most common cluster architecture is single write leader, multiple write replicas, which have the effect of serializing writes as well. Those clusters are all also taking advantage of the read-heaviness of applications.
This is also after playing with settings about WAL mode.
It would not be a viable solution to split these tables into separate databases to avoid the concurrency issue. Also, most of the issues involve writes to the same tables.
Is there honestly any way to get that working well with current SQLite or do we need to wait for hctree or begin concurrent to mature?
I would be concerned that to get this working in its current state would require a large refactoring of the app which would require a lot of effort and would introduce limitations that make coding more difficult.
I'm not talking about clusters and replication, just a single database on a single server, like Wordpress for example. It can handle large amounts of concurrent users if you're using InnoDB (which has been the default since how long?) and nobody will even notice that there's anyone else because it's not locking the table (or the whole database) to write, and you don't have to wait or deal with frequent busy timeouts.
Having one writer and multiple readers was fine, but even having a dozen writers who update something a few times per second quickly led to the DB becoming the bottleneck. I tried different modes, but saw no difference so I just gave up on using it concurrently.
There were 500MB-40G databases, probably ~1.5 updates per second at peak load?
Speaking of Wordpress though, that’s a pretty great example of a web app that typically handles almost exclusively reads. I’ll bet almost any Wordpress instance would work great on SQLite.
It's also really not a big switch for the most part. I'm not aware of that many MySQL-specific things happening in WP. Things like fulltext indexes aren't in core, so it's really just using a different driver to get the basics working. I'm sure it would break a lot of plugins, but that's fine, they have platform requirements anyhow, adding "does work with sqlite" isn't a huge step and can likely be automated to a high degree (if the plugin never uses wpdb->query, ->get_results etc, it's compatible).
If that node failed, then the php application and database would be restored to a new machine and traffic would resume?
this: "There is no support for recovering from a power failure or operating system crash. Such an event may corrupt the database."
it depends on the nature of your app as to whether either of those scenarios are acceptable. the former comes with downtime.
> The prototype is still missing features and contains many bugs.
But it works well enough to test.
I understand, that sometimes for performance it is needed to de-normalize huge datalakes, but typical website controlled by CMS is not huge by any means.
Here's an example:
That page shows every row where the tags column (a SQLite JSON list of strings) contains the string "Data Import"
Here's the underlying SQL:
:tag in (
Here’s the relevant Takahe issue: https://github.com/jointakahe/takahe/issues/325
The server load is currently too high. Please try again later.
Current load average: 5.190000.
Load average limit: 4.000000
It could be nice to have Foreign Keys enabled by default, and make all tables strict by default. A new binary format that reduces the size of database by leveraging table strictness and type info could also be great.
I am highly interested in parallelism and high concurrency. I implemented multiversion concurrency control in Java.
I am curious how to handle replication with high concurrency. I'm not sure how you detect dangerous reads+writes to the same key (tuples/fields) across different replica machines. In other words, multiple master.
I am aware Google uses truetime and some form of timestamp ordering and detection of interfering timestamps. But I'm not sure how to replicate that.
I began working on an algorithm to synchronize database records, do a sort, then a hash for each row where hash(row) = hash(previous_row.hash + row.data)
Then do a binary search on hashes matching/not matching. This is a synchronization algorithm I'm designing that requires minimal data transfer but multiple round trips.
The binary search would check the end of the data set for hash(replica_a.row[last]) == hash(replica_b.row[last]) then split the hash list in half and check the middle item, this shall tell you which row and which columns are different.
This is a pattern commonly used with other databases such as PostgreSQL and MySQL which avoids a huge amount of complexity involved in multi-leader setups.
I only ever do single leader, multi follower like you describe.
While not without limitations (eg it is conceivable that the concurrency backend is made more difficult by earlier choices) overall it is a remarkable pattern which has produced gems like spatialite.
Is there anything else like this out there?
"In SQLite a query plan is compiled into a bytecode program that is executed by a virtual machine, named Virtual DataBase En-gine (VDBE)... In the original SQLite implementation the VDBE operates on the real B-trees of the database... In contrast the Comdb2 VDBE operates on the cdb2 abstraction of a table during the OCC phase of transaction execution..."
I'm guessing just the interface. If there are any well-understood quirks of the regular implementation, maybe they'd want to replicate them too, idk.
Quantity has a quality all its own. Fragmenting your database over thousands of servers introduces new classes of failure and greatly increases the risks of other types of failures that you don't have to deal with when you only have dozens of servers or a single server.
High-density storage does require strategies for recovery that reflect the bandwidth-to-storage ratios. Designers are usually aware of the implications and architect appropriately.
Typically for postgres, DB corruption issues are isolated to single tables/indices, not the entire database. Sqlite's architecture might be a bit less "resilient" in this sense (I'm not sure, just guessing), but it also might have a lower chance of introducing corruption in the first place.
As cyclical as tech can be, I find it difficult to think that we'll go back to that style of architecture. There are just so many benefits to having an application server sit in between the client and the database. Going back to green screen terminals hanging off a central computer seems more likely, and that model doesn't need its database networked either.
For example, you might run the app server and DB on an expensive highly available server, but keep your background workers on cheaper spot instances that might randomly get killed. Or you're running some heavy processing that needs different hardware like a GPU for machine learning.
You could, of course, implement API endpoints for that in your app, but then you need to keep updating it when the workers change. Or you could implement a more generic DB access endpoint, but then you're just reinventing a networked DBMS, but with worse performance and no library support.
Not true. What has been true until recently is that replication solutions for SQLite have been lacking, necessitating the use of an old school database to lean on existing multi-machine database solutions, but that is no longer the case.
> You usually want a separate worker process (or multiple) to do heavy computation in the background
Also not true. Network overhead introduces things like the n+1 problem which only adds unnecessary computation and breaks the relational model, requiring some pretty insane hacks to work around. What is true is that SQLite write contention has been a problem, necessitating the use of an old school database in high write environments, but that is also a problem on its way out.
Networked databases have been the norm until recently because they're older and more mature and the only practical solution in many cases thanks to that maturity, but SQLite is starting to gain the same maturity and we are now able to rethink the model and gain the benefits of data locality.
Besides, if you really need networking for your niche use case, and have some reason to use SQLite, there is already rqlite. Tightly coupling networking with the database engine doesn't add any value. They are decidedly distinct layers of concern. If Postgres or MySQL were rewritten from scratch today, even if protocol compatible, no doubt the separation between the database engine and the networking layer would also be more explicit.
That allows you to easily spin up/down e.g containers, migrate the application between nodes etc without having to maintain a high performance network file system.
Using sqlite in such an environment requires you to either solve the persistence layer in general (≈ rook/ceph) or specialised for sqlite (≈ litestream/litefs/rqlite depending on your need). I think one probably could argue that rqlite is essentially a network protocol for sqlite.
The application is stateful even if the backend is stateless.
Even if you consider the application to be the combination of stateless binary + stateful auxiliary services (like a database), the statelessness of the binary allows for some neat things.
I replied to a comment talking about how network based database access is not necessary, and I think it very much is if you want to host stuff with the current popular architecture. Unless you implement a general shared persistence layer on which you can run e.g sqlite, but that dosen't seem better/easier to me.
The SQLite4 stuff I guess didn't just get completely abandoned. I guess it was a good experiment.
> In all cases above, "spin-lock" actually means invoke the SQLite xBusy callback if one is registered, or to literally spin-lock otherwise.
Seems the wait behavior is user configurable.
I will say that one misconception people often have about rqlite is that it's a drop-in replacement for SQLite. You see that a bit even in the comments section here. That was never its goal. Its goal is a lightweight and super-simple to use distributed relational database. Using SQLite as the database engine was the perfect way to meet that requirement.
Sqlite is a (relatively) simple in-process sql rdbms. You get a proper reliable sql on a single computer, which is practically useful for many more cases.
It is possible to build distributed dbs on top of a bunch of sqlite instances but one would need to solve these very distributed system problems that etcd solves.
In fact, some of the popular distributed dbs use etcd or zookeeper or custom implementations of the same algos o top of traditional rdmss.
K8s is a distributed system that needs to have a reliable view of its nodes. Clearly, a concensus-based db is necessary.
edit: almost -- available and partition tolerant at least in the default config:
CAP is about what happens in case of a partition (P). You can either remain available (AP) our you can remain consistent (CP).
If you don't have partitions then CAP doesn't apply. But I wouldn't recommend depending on that. For example Google's Spanner is CP, but they work really hard to make partitions rare, giving them 99.999% availability.
This is more resilient than SQLite of course, which runs on a single node, and therefore can't remain available if any node (= the only node) fails.
The drawback is that it's not replicated. With etcd, you set up multiple control-plane nodes (ex: 5) and you can tolerate a minority of them being down without any effect (ex: 2/3 down). With SQLite, you can only have one control-plane node, and if it's down your control-plane is unavailable. This is fine for small clusters where you don't want to run multiple control-plane nodes, or you don't think it will go down, or you don't mind fixing it.
People who use SQLite want to be able to scale it to handle higher concurrent write loads, including from multiple processes.
Edit: I was wrong, this makes more sense now. This isn't comparable to Postgres.
I don't know if they've swapped the backend, or just use it as-is. SQLite can scale vastly if you use it for file IO and do sharding/replication on top of it as a separate concern. I've done ad-hoc (extremely crude) versions of that in some apps.
This definitely helped them make the choice to run SQLite on the server.
But I still think it's super cool. We're talking 1.8 billion devices with data stored as SQLite files in the cloud.