Hacker News new | past | comments | ask | show | jobs | submit login
Consider SQLite (wesleyac.com)
362 points by wesleyac on Dec 29, 2021 | hide | past | favorite | 267 comments

We've been using SQLite in production as our exclusive means for getting bytes to/from disk for going on 6 years now. To this day, not one production incident can be attributed to our choice of database or how we use it.

We aren't using SQLite exactly as intended either. We have databases in the 100-1000 gigabyte range that are concurrently utilized by potentially hundreds or thousands of simultaneous users. Performance is hardly a concern when you have reasonable hardware (NVMe/SSD) and utilize appropriate configuration (PRAGMA journal_mode=WAL).

In our testing, our usage of SQLite vastly outperformed an identical schema on top of SQL Server. It is my understanding that something about not having to take a network hop and being able to directly invoke the database methods makes a huge difference. Are you able to execute queries and reliably receive results within microseconds with your current database setup?

Sure, there is no way we are going to be able to distribute/cluster our product by way of our database provider alone, but this is a constraint we decided was worth it, especially considering all of the other reduction in complexity you get with single machine business systems. I am aware of things like DQLite/RQLite/et.al., but we simply don't have a business case that demands that level of resilience (and complexity) yet.

Some other tricks we employ - We do not use 1 gigantic SQLite database for the entire product. It's more like a collection of microservices that live inside 1 executable with each owning an independent SQLite database copy. So, we would have databases like Users.db, UserSessions.db, Settings.db, etc. We don't have any use cases that would require us to write some complex reporting query across multiple databases.

> It is my understanding that something about not having to take a network hop and being able to directly invoke the database methods makes a huge difference. Are you able to execute queries and reliably receive results within microseconds with your current database setup?

with SQL Server you can get a very fast local connection by specify "server=(local)" in the connection string - this uses shared memory protocol bypassing the net stack.

This is true - The counterpoint is that now you have this leviathan that is the SQL Server process(es) running on the same machine as the application.

If I am constraining my SQL Server usage to fit on 1 box, I might as well use SQLite (assuming no future plans for horizontal scaling).

How is this handling privledge separation? Typically you'd have all accesses as root in SQLite.

>Typically you'd have all accesses as root in SQLite.

So? It's the backend talking to the db.

Systems of least privilege prevent the levels of access exploitation (either vuln or misconfigure) from going further than is risk assessed. Or from "accidents" like a little utility script nuking the database with root privilege. Or having multiple entry points with their own tables of data. That's the difference between "hackers stole X but not Y and Z" and "hackers dumped everything".

Your "backend" should not be assumed to be the infallible security. Outside of limiting DB access procledges, we have proper selinux and other configs to lock down the software abilities so even in the case of Arbitrary code execution caused by a common exploit it may be very difficult to use. This is important when you're running a box with multiple other applications with their own DB's as well.

Infact these reasons are why many times the "backend" software and DB are hardware isolated or at least VM isolated so that those db access limitations aren't bypassed.

I haven't done anything like this before, but simple FS level file permissions should resolve the issue as long as the data with separate priv requirements is put into different sqlite databases.

Oracle clients can do the same by setting the ORACLE_SID and ORACLE_HOME environment variables.

For remote databases, setting the TWO_TASK environment variable to the server's TNS descriptor is one way to force a network login.

How is this setup fault tolerant? What happens if there is a hardware failure? How do you partition access in a way that means an extremely active user doesn't impact availability?

> How is this setup fault tolerant?

It is not.

> What happens if there is a hardware failure?

The product would suffer a total outage until manual intervention takes place. A restore of the VM from snapshot would be carried out by the customer. Some loss of the most recent business data would occur (i.e. between latest snapshot and time of failure). All of this is understood and agreed to by our customers.

> How do you partition access in a way that means an extremely active user doesn't impact availability?

Partitioning is not applicable. Our entire product fits on 1 machine and dividing the I/O along any dimension does not add much value to the performance equation.

Wait a minute....are you implying that I don't need Azure and FB level infrastructure for my local recipe application?


I'm pretty sure you do need FB and Azure level infra for you recipe app. I've read a couple blog posts, watched a video on YouTube and copied the code from Stack Overflow, so I'm pretty much an expert on this, so trust me.

My recipe application is going to be written in a purely functional language with CQRS and a message bus that provides "guaranteed once" determinism. I should be able to spin up no more than a half dozen containers on two VM's to handle this architecture. The database will probably be redundant MSSQL and Oracle (just in case one technology is fundamentally broken). Database writes will be proxied by a Redis instance. I will use 3x 3080 GPU to train a ML model used for scaling the recipe depending on the number of people you wish to serve. Hmmm...I might need a third VM.

You can't do any serious neural stuff on 8GB of VRAM. I'd say save yourself some pains and buy the $3000 professional models with ECC so you can train transformers.

It's never gonna scale. Get on the NoSQL train now to future proof and woo the VCs.

Yes. Make it web scale.

Also, think early on about your compensation packages. You don't want to lose a 10x engineer to a FAANG, do you?

> Make it web scale.

Oh god whenever I read/hear that it reminds of the ridiculously funny video about nodejs and Apache servers. [1]

[1]. https://youtu.be/bzkRVzciAZg

Glad you liked it! The reference to this series of videos was intended of course.

The specific video I had in mind was actually https://youtu.be/b2F-DItXtZs


Oh yes! I’ve seen this before as well. Should’ve guessed it’s this considering we were talking about sql.

This video is just gold.

asyncio is probably one of the worst things that happened to programming in the last 2 decades. It's essentially a step backwards, to the technology of the 90s (cooperative multitasking), when a badly written text editor could cause an entire OS to freeze.

Curious why you think that. We have an app that does computation using thread pools, and moving from manually-created threads to asyncio has made it much more efficient and robust.

Some advantages: cancellation and error handling are trivial, there's no need for error-prone thread coordination, threads don't have to waste time waiting on locks (because we use asyncio sync primitives instead, so we don't submit work to a thread pool unless it's doable right now with no wait).

Of course, it depends on the tasks being written correctly. But "don't block the event loop" is a much easier rule to follow than "don't deadlock these 10 different threads that share queues and locks".

We didn't write an entire OS with it, but I don't think that was ever the point of asyncio, was it?

> The database will probably be redundant MSSQL and Oracle (just in case one technology is fundamentally broken)

Only one technology?

Interesting. For an extremely specific use case and with users who understand and accept the caveats of this approach I'm sure it would work well enough. The most confusing thing to me is that there is apparently an intersection of users who are ok with an outage and data loss with users who want a product which can

> execute queries and reliably receive results within microseconds

What is your product? Who are these users?

I can think of plenty of services that an occasional (once a year? less?) outage is okay. Heck, anything relying on AWS us-east-1 is going to have outages that frequently based on the last few months. Meanwhile, almost any service is better off when its response times are cut drastically. I’ve seen many instances where a service’s response times are more than half waiting for a db to respond.

It's not the threat of an outage with data loss that is concerning to me- I just want to understand use case that needs fractions of a second shaved off of query times by using SQLite in this way that is also ok with the possibility of data loss.

Interactive interfaces. There's a huge difference between moving a slider and seeing the reaction in real time and moving the slider and seeing the reaction a second later. If you define "real time" as 30 fps, you have 33ms to process each query and show the result. That could involve multiple database queries with computation in between, if your business logic isn't easily expressible in SQL.

Come to think about it, that covers most apps with UI. Apps where you are exploring data are definitely more impacted however.

I'd rather have a service that is down one day per year, rather than death by thousand papercuts of a slow UI every day.

Imagine some frustratingly slow app like Jira for example, i would be much happier with a snappy UI there, where one could theoretically still work for a downtime of week or more without updating your task status.

If we are talking about shaving the last milliseconds, it could be something very interactive like autocompletion. Which is something with a graceful fallback of typing in the whole thing manually.

If my webpage loading makes ten database calls, and each database call takes 45 milliseconds, my webpages takes 300 milliseconds longer to load than if each call only took 15 milliseconds. 0.3 seconds across each page load, especially if the rest of the call is less than a second, can significantly improve the usability of a website and increase user retention. This applies for any user facing website whether its a social media platform, an online shop, or just a blog.

Obviously there are other improvements that can be made too, like batching db requests or in-memory caching, but this is one option.

All users understand and empathise when you say "Sorry, the system is down right now" once or twice a year.

None of them display any understanding or empathy whatsoever when you say "Your trade will always be executed after a 1 second delay, even if the price has moved"

Users find occasional downtime awful, but they find consistent lethargy worse.

No customer in fintech is going to accept the "we lost some data transactions" and buy the software so your use case is covered in that they are up front with the customer that if the server goes down any transaction in progress will not complete.

Financial transactions, especially payments, work under the assumption that any node in the path from "payment attempt made" to "Payment attempt successful" can go down and not have an incorrect record of payment.

Authority? Me - I worked in the payment space for EMV for a number of years. There are (maybe) a minimum of five nodes in the path from swiping a card/making a payment to a successful payment, and each of these computers/systems are owned by different companies, all of which don't trust the other not to be a MitM, and all of which are designed to not lose transactions if any of the systems involved suddenly power off.

It's almost a myth at this point that a system doing financial transactions needs 100% uptime or it will lose transactions.

Trust me, even a system with only 80% uptime is never going to mislay transactions. The worst that happens is that transactions won't be processed during the downtime, payment or transfers are declined and the entity making payment/transfer can use the decline message to try again later.

If you're quadrupling your development and infrastructure cost in an attempt to reduce your downtime from twice a year to once every decade you're simply throwing money away[1].

[1] If you're doing that, it's also possible that you don't know of the many fail-safes built into the protocols for message exchange. Remember that these systems worked fine prior to HA/Always-Available cloud systems. Adding HA cloud tech is not going to make them more resilient than they are now, it just introduces additional points of failure.

There is a difference between losing data, and losing transactions. Transactions can be replayed from the message log (depending on the implementation) and reprocessed if needed.

I have seen plenty of recovery operations in finance, sometimes cases where the entire database server was lost, and even though the transactions were not saved, no actual financial data was ever lost, as this data could always be reloaded.

Of course, certain systems exist that do store data in a way where loss is not accepted, but even in finance these are somewhat rare compared to the vast number of other systems.

> users who are ok with an outage and data loss

It depends on how you couch it. Working with large fleets of cloud servers has jaded us (well, me) into forgetting just how high you can get the uptime of a single server or well-managed VM if you're careful.

Sure, power cords get tripped over and hardware fails sometimes, but I'd imagine a lot of users who do care about reliability are making totally reasonable judgements that single points of failure in their systems are acceptable if the SPOFs are hardened and the oh-shit plan (restoring from a VM snapshot in this example) is something that they have confidence in.

Heck, given the availability issues that arise from the complexity of redundant/HA systems, I think there are probably plenty of seasoned enterprise customers who prefer hardened SPOFs managed by experts over distributed systems, especially once the dimension of cost enters into the discussion.

For starters, how likely is a data loss? SQLite may lose a transaction if the server shuts down or the application crashes, but this doesn't mean the db gets corrupted.

Secondly, even if that happens, the fact that the database is a simple file on the production server, makes it easy to back up and restore.

Thirdly, a "traditional" db is no proof against loss either...sure, the production server with the SQLite can go down...so can the container running Postgre.

And lastly, actually most services are doing perfectly fine when a data loss occurs. Take a message board for example. Worst case scenario: A post isn't written and someone has to type a few lines and emojis again.

I agree that 'reliably' would be a poor choice of diction when considered in your context.

The user group of our product is effectively only professional bankers and their managers. No one in the general public has any direct access.

For the extremely specific use case of "almost everything" and the users "most everyone, maybe at the right price point", yeah.


No need to be condescending when I'm trying to learn. The premise of using SQLite because it was easier and to save fractions of a second didn't make sense to me as a tradeoff for potentially losing data

As long as you're estimating your future costs correctly then you're golden. If you aren't and your application becomes complex through growth or you need resiliency then you'll need to pay that cost and that cost can be big.

Interesting. So I would ordinarily want to put a foreign key constraint on the user_id column of a UserSessions table (or similar). In general, presumably you have relationships across the tables that are in those discrete databases. Do you just enforce these constraints/do joins in code? It seems like splitting related tables across multiple databases loses some (possibly a lot?) of the benefits of relational DBs, so I'm curious how you handle/manage that.

That said, I love the idea this architecture. Might use it for whatever next dumb little web service I cook up! I love how this simplifies a lot of dev/deployment ops, perfect for a side project.

You'd have to do the join on the client. This is a problem in general for scaling SQL - its not impossible, but with sharded SQL servers you have to be very deliberate in how you do joins so as to coincide as much as possible with your sharing strategy.

It would be similar here. According to the author each sqlite DB belongs to a single microservice, which will naturally group together the most common of joins. Anything else will indeed have to be manually joined.

Part of the reason nosql became popular for a bit. That's reversed, and sharded SQL is pretty common now, but it definitely adds more cognitive load to schema design.

SQLite has ATTACH (https://sqlite.org/lang_attach.html) for cross database operations. I've never tried it for foreign key constraints across databases, but I think it would work?

> So, we would have databases like Users.db, UserSessions.db, Settings.db, etc.

How do you do joins?

Not the OP, but you can attach and join tables from multiple sqlite databases in the same process.

Really??? How? Because I use sqlite, and the query functions that I know take a single DB as an argument.

I would guess they do parallelized calls to the microservices to collect the data.

Are you capable of achieving no downtime deployment ? I mean, on the product I currently work on, we have one mongo database, and a cluster of 4 pods on which our backend is deployed. When we want to deploy some new feature without having any downtime, one of the pod is be shut down, our product still work with the 3 remaining pods, and we start a new pod with the new code, and do this for the 4 pods. But with SQLite, if I understand correctly, you have one machine or one VM, that is both running your backend, and storing your SQLite.db file. If you want to deploy some new features on your backend, can you achieve no downtime deployment ?

This is really interesting. Would you mind sharing a bit about how you keep the data that is shared across services in sync? Or is there a hard separation of concerns so that services would only commuincate with the service that owns that data to obtain it?

To my knowledge, WAL mode still needs to serialize writes for each database file. I'm assuming this is not a setup where there are too many concurrent writers?

Correct - We are using the typical SQLite build which serializes all writes by default and we have made no effort to undo this behavior. We actually rely on this to ensure things happen in the right order and take advantage for performance reasons.

Because SQLite is effectively serializing all the writes for us, we have zero locking in our code. We used to have to lock when inserting new items (to get the LastInsertRowId), but the newer version of SQLite supports the RETURNING keyword, so we don't even have to lock on inserts now.

Also, the fact that we have the databases divided across function helps free up some of the lock contention in the provider. We don't really have any mixed workload databases - its either "slow" gigantic operations (JSON blob access), or super quick tiny things (updating session timestamps). So, there is some minor isolation of contention issues on a subsystem basis.

Testing with JMeter, I got upto 10,000 concurrent writes on SQLite with WAL mode enabled

Is this a situation where multiple web servers in a farm are accessing the SQLite databases on a file server?

This is a situation where a single process owns all of its SQLite databases and handles everything out of 1 machine.

Is the process multithreaded? I guess WAL takes care of that.

How do you join?

I believe SQLite is about to explode in usage into areas it’s not been used before.

SQL.js[0] and the incredible “Absurd SQL”[1] are making it possible to build PWAs and hybrid mobile apps with a local SQL db. Absurd SQL uses IndexedDB as a block store fs for SQLite so you don’t have to load the whole db into memory and get atomic writes.

Also I recently discovered the Session Extension[2] which would potentially enable offline distributed updates with eventual consistency!

I can imagine building a SAAS app where each customer has a “workspace” each as a single SQLite db, and a hybrid/PWA app which either uses a local copy of the SQLite db synced with the session extension or uses a serveless backend (like CloudFlare workers) where a lightweight function performs the db operations. I haven’t yet found a nice way to run SQLite on CloudFlare workers, it need some sort of block storage, but it can’t be far off.

0: https://sql.js.org/

1: https://github.com/jlongster/absurd-sql

2: https://www.sqlite.org/sessionintro.html

I evaluated sqlite for a web extension but ultimately decided it wasn't worth it. There is no easy way to save the data directly to the file system. And saving the data in other ways meant I was probably better off with IndexDB instead. Still it is a tempting option and one that seems to work well for separate tenancy.

> There is no easy way to save the data directly to the file system.

That's what absurd SQL is for (link in the parent comment).

I read that one and agree it feels absurd. Not something I want to depend on.

I believe the absurd things are: - the idea (i.e. storing a database in a database) - the fact that the resulting storage is faster than vanilla indexed db.

The repo's readme is a bit misleading and it does make it look like a mad scientist experiment but you can read more about it here (really interesting stuff): https://jlongster.com/future-sql-web

Obviously this project is still young, but I find it convincing (disclaimer: I don't use it yet).

I’m 100% sure this is (very nearly) suitable for production. It works and works well.

It is only a matter of time before browsers implement the proposed sandboxed virtual file system apis with block level access and this would be superseded.

Do you know of any oss projects using absurd? Or maybe an ORM built on top of it? It doesn't seem like there's any ecosystem built around it yet, and their example project is too trivial to be useful. I'd have to have to built even the most basic CRUD functionality from scratch.

It’s really not that much trouble. Most of the ORM like things over IndexedDB introduce extra bugs and slowness for you to deal with anyways - SQLite is millions of times easier to use than IndexedDB, so you’re getting a good deal out of considering AbsurdSQL “just” an ORM for IndexedD… as long as you don’t need to support Safari.

Yeah but you don't need to go through a web worker to use any other database product. Seems like with absurd you'd have to create a separate worker for each query and write your CRUD stuff in a potentially clunky, asynchronous style by passing messages to/from those workers. Sounds painful to me.

When they tried to put SQLite into browsers I was excited but I had no project that needed it. It took years to finally get there, and wouldn’t you know it, they had just marked it as deprecated. KV stores are just not the same.

You can run it on AWS Lambda and store the SQLite file(s) on EBS [1].

[1] https://aws.amazon.com/blogs/compute/using-amazon-efs-for-aw...

That's EFS, not EBS. EFS is file-level storage (i.e., NFS); EBS is block-level storage (SAN). AWS Lambda only supports EFS.

Correct, I mispelled. The link points to the correct service.

I wouldn’t use a SQLite file over NFS. It has caused me locking problems when used by multiple clients.

NFS does have a relatively low limit in terms of lock. For some use cases it can handle it fine.

Having clients running on read-only mode, when possible, can help.

Also partitioning the database can help, but then it's too much trouble already.

> I can imagine building a SAAS app where each customer has a “workspace” each as a single SQLite db

I did just that at my (now defunct) startup a few years ago. We were building a collaborative database query tool. The software we built used sqlite to keep a local db for storing user credentials, caching query results, etc. Bonus, we were able to have the local database file be encrypted protecting the data at rest.

> we were able to have the local database file be encrypted protecting the data at rest. How do you store the password locally? Or do you requesting the passwords from a remote server?

The golang equivalent of native SQL can be found here - https://pkg.go.dev/modernc.org/sqlite

Very useful for pure go applications!

Wouldn't that be leaving one huge advantage of sqlite behind (namely it's insane test regimen and battle tested history)?

It’s cross compiled, so all the tests are brought along and test fine. And god, there are so many tests.

How would we deal with conflicts (e.g. syncing back several conflicting offline clients for the same user) with something based on the Session Extension?

You have to handle the merge conflicts yourself, see https://www.sqlite.org/session/sqlite3changeset_apply.html

So you need to be carful how you design your schema, but very possible.

One option is to use something like Yjs and a JSON column to get proper CRDTs for merging.


How does one deal with fault tolerance ?

Using the transaction API for modifications, and ordinary fault tolerance measures for the hosting application. It is no more difficult than any other app-foundation technical feature.

I've always thought it interesting that there was a time when large(ish) websites were hosted using servers that would struggle to outperform a modern smart toaster or wristwatch, and yet modern web applications tend to demand a dramatic distributed architecture. I like the examples in this article showing what a single modern server can do when you're not scaling to Google's level.

As an aside, what about distributed derivatives of sqlite, like rqlite, as a response to the criticism that sqlite requires your database server to also be your web server. Could something like rqlite also provide a way for an sqlite database to grow into a distributed cluster at a later point?


> and yet modern web applications tend to demand a dramatic distributed architecture

The dirty secret is: Most of them don't really need that architecture. Most, and I mean +95% of websites would run just fine on a single box, running the websrv, db and whatever script-interpreter the backend runs on.

Sure, it may be a bit slower if its not hooked up to a global CDN, and if the box goes down the user gets 404. But its also alot easier for an admin to simply give the one box a good wallop and put it back online, since it so simple.


I don't disagree with you, a single server can go a really, really long way in scale before you run into problems. I know because I've done it a few times.

The problem to me isn't ability to scale on one server, it's the single point of failure. My biggest site is a wordpress box with one instance on a pretty big VPS. In the last year I've had several outages big enough to require a post-mortem (not complete outages, but periods with high levels of error rates/failures), and every time it has been because of internal data center networking issues at my preferred cloud provider (and thankfully their customer service is amazing and they will tell me honestly what the problem was instead of leaving me to wonder and guess). So the main incentive for me to achieve horizontal scalability in that app is not scaling, it's high availability so I can survive temporary outages because of hardware or networking, and other stuff outside of my control.

I don't mean to be too glib, but to me this reads like: "I am concerned about SPOF so I outsourced that."

Except now you are not in control of plugging the cable back in, and pay (less?) for that convenience.

I'm not sure I understand your point?

Even if I ran it on my own hardware, where I was responsible for plugging the cable in, it would still have the same single point of failure problem in the event that my ISP started suffering Network issues. I've never had an ISP that didn't have some sort of issues at some point.

Alternatively, if you think I am blaming my cloud provider for being the single point of failure, please read again. I specifically mentioned that I want to fix up the application so that it is highly available. I entirely and completely blame myself for the current situation. I didn't build it (otherwise it would be 12 factor[1] from the start and horizontally scalable), but I maintain it now.

[1]: https://www.youtube.com/watch?v=REbM4BDeua0

Wouldn't a snapshot mechanism plus load-balancer handle this case and still keep the architecture simple?

The biggest problem is that the snapshot mechanism would need to traverse multiple data centers. The current arch uses the disk for storage in several ways which aren't easy to fix, and because of the way several of the plugins were built (with lots of the nastiest custom code I've ever seen) each time I start experimenting I run into weird breakage and debugging hell. Enough so that I decided it's not worth it for HA at this point. Will be rebuilding much of it in Elixir/Phoenix though, and that will be 12 factor/HA[1] from the start.

[1]: Me explaining 12 factor for anyone not familiar already (and I've actually had people who were well experienced with 12 factor tell me they got a lot out of this video): https://www.youtube.com/watch?v=REbM4BDeua0

> and thankfully their customer service is amazing and they will tell me honestly what the problem was instead of leaving me to wonder and guess

Which cloud provider is this, if you don’t mind sharing?

Sure! I wasn't going to say because I didn't want to people to judge them for the network issues, but it is Linode. I've become a huge fan of them and use them for almost everything now. The Newark data center has been struggling a bit with networking issues but they are working hard on the problem.

I really can't say enough good things about the Linode customer service. They respond timely and with honest info, which is invaluable to me as a head of engineering trying to make important platform and infrastructure decisions. After a terrible experience with Digital Ocean, before moving over to Linode (so I still had a pretty small account, less than $10/month) I actually opened some mostly stupid support tickets just to see how they would respond. They were professional, courteous, and helpful. For the record I have no affiliation with Linode beyond being a happy customer.

Consider, though, that in the days past (when your server would be probably on an equivalent of dual-socket Pentium 166-MMX), most clients would be coming from slow links like 33.4-56.1kbps dialup, and it wouldn't be a problem to serve them at all. Links were slow, users were patient, timeouts were high, webpages were sort of slim. Although if you ask me, they always have been heavy, just within the constraints of their time.

Then, of course, there was ISDN and xDSL, which would give you true to god whopping 128 kbits/s for a while. 64 kpbs if you were cheap. It took a while to get to affordable multiples of Mbits per second.

Now that there's at least 10 Mbps uplink from each residential subscriber, doesn't take long to DoS even a beefy server.

And I'd say that server-side, things improved vastly with advent of FastCGI and its equivalents. Back in that heyday of your P166-MMX server, it was CGI with Perl, spawning a process for each incoming request, or "blazing-fast" Apache's server-side includes, or other things like that. Maybe mod_perl with its caveats on memory sharing.

Anyway, you're right in that whenever you show them a wider pipe, they will find more stuff to congest it with.

Not even a decade ago we were hosting our web-facing services and our postgres DB on the same server. A lot of smaller projects had their own db and credentials on the same instance. The idea of having to spin up independent DB servers for each one seemed wasteful to us, since back then each box would cost at least £10/mo or more on Linode or Rackspace and it meant more secrets to keep, more stuff to secure, more stuff to maintain.

It was only with the advent of heroku and the sudden shift to the cloud (AWS) that the DB stopped being software you ran on your box as a daemon, and became essentially a dedicated resource. Even for services running at a completely trivial scale.

It's mostly because there is a demand for HA which requires multiple replicas. The moment you start going down that route you increase complexity.

Whether most things actually require HA is debatable, but a lot of businesses make it a core requirement and so it gets baked into the architecture. Personally I feel like most stuff would be better suited to having fast fail-over and recovery early on, but my advice rarely gets taken. Instead you end up with complicated HA architectures that nobody totally understands, which then (inevitable still) fall over and take hours to recover.

HA is generally needed because users of a SaaS can crash the instance especially in products that offer customisation abilities.

Sure, but that becomes a requirement at that particular design junction. A lot of stuff is built with HA that isn't even close to that complex.

Edge networking should work for SaaS, especially if your tenants tend to be geographically clustered. Stand up stateless front end web servers in half a dozen regions, and have them talk back to a more central region, with caches and circuit breakers to control the amount of cross-region traffic. That geographical separation will also make diurnal traffic patterns more obvious because you need more servers later in the day in PST than in EST.

You don’t have to distribute everything to scale pretty high.

There are some important things that SQLite does not do.

It is not client/server; a process must be able to fopen() the database file. NFS and SMB are options that can convey access to remote systems, but performance will not likely be good.

Only a single process can write to the database at any time; it does not support concurrent writers.

The backup tools do not support point-in-time recovery to a specific past time.

If your application can live with these limitations, then it does have some wonderful features.

> NFS and SMB are options that can convey access to remote systems, but performance will not likely be good.

And the safety of your data depends on the quality of your network filesystem's locking implementation. It's not too difficult to design a locking method that works most of the time, but it's a lot harder to build something that guarantees mutual exclusion over an imperfect network.

On a single machine, file locking is generally reliable (mediated by the kernel). Multiple clients can access the same database simultaneously, with readers and writers taking turns, and all that happens is you pay the performance penalty of serialized access.

But if your client thinks it has a lock on a remotely-stored database file, but the server-side lock has actually expired and another client is writing to the database, your data is likely to get hosed.


Databases on remote filesystems should be limited to SQLITE3_OPEN_READONLY access, agreed.

I'm experimenting with using SQLite to store users' history in fish shell, but the remote filesystems problem seems likely to be a showstopper. What can be done about it?

I'm just reading this, and learning a few new things:


SQLite has an alternate lock mode with dotfiles that seems to prevent database corruption over NFS. It is important that all SQLite accesses from all connected processes use the same lock mode.

"2.3. Two processes using different locking protocols

"The default locking mechanism used by SQLite on unix platforms is POSIX advisory locking, but there are other options. By selecting an alternative sqlite3_vfs using the sqlite3_open_v2() interface, an application can make use of other locking protocols that might be more appropriate to certain filesystems. For example, dot-file locking might be select for use in an application that has to run on an NFS filesystem that does not support POSIX advisory locking.

"It is important that all connections to the same database file use the same locking protocol. If one application is using POSIX advisory locks and another application is using dot-file locking, then the two applications will not see each other's locks and will not be able to coordinate database access, possibly leading to database corruption."

The problems with dotfile locking are:

1. A crash risks leaving the lock file in place; it must be manually removed or else you hang forever.

2. The same home directory may be local and remote on different machines, meaning different locking protocols.

I am considering a mode where the database is loaded fully into memory, and writes are atomically saved via the classic adjacent-file-renamed mechanism. You risk losing data due to races, but it can't corrupt the database.

talk to a server on the remote "storage" where the history is rather than directly to the sqlite server? Queue up changes and make them serial in the "service"

None of those matter to me, but I'm being forced to switch from SQLite simply because of its lack of uint64 support.

Being forced as in you can't make things work without that support? If so, is parsing/processing blobs infeasible for some reason?

I want to use arithmetic functions like sum().

Sure, I could just sum them myself, but not only would that still leave me SOL with ad-hoc, interactive queries, but actually be more work than just firing up postgres.

In WAL mode, a write can mean writing some data in memory on the local machine. With Postgres/MySQL/etc, it has to go over the network.

I can't parse [0] right now about the overhead of TCP and round trips with the database server, but it's basically a question of whether one SQLite write is 10x, 100x, 1000x, or more faster than a database server write. That should make a lot of difference.

[0] https://gist.github.com/jboner/2841832

If you have PostgreSQL or MySQL on the same machine as your application, you can use UNIX sockets; they won't have much latency at all (I think Linux also optimizes TCP on localhost to skip all the pointless protocol overhead).

In terms of latency it'll still be difficult to beat a database that lives in the same process as your application, but it won't be as bad as going over the network might be.

> If you have PostgreSQL or MySQL on the same machine as your application, you can use UNIX sockets; they won't have much latency at all (I think Linux also optimizes TCP on localhost to skip all the pointless protocol overhead).

Sounds like a nice topic for a study; I'd expect the latency using local sockets to connect to a DB server to be about 3x as much, minimum, as the latency to simply write to a file.

Writing to a file: a single context-switch from process to kernel (CS). Writing to a socket: CS on write to socket, CS when receiving process writes the data to file, CS when receiving process sends ACK to sender via a socket.

The worst thing you can do in your program is to cause a context switch by passing data to kernel or recieving data from a kernel.

That being said, a trial of the two approaches is not a bad idea for an easily-publishable paper :-)

The context switch buys you safety from many cases of corruption due to uninitialized memory mistakes etc. though, if using an unmanaged language. In C-type code with sqlite embedded, you might write garbage over its internal datastructures.

A program error might still make you issue the wrong command or write corrupt data, but you shouldn't corrupt the internals with process isolation.

Well, for "local" TCP, it's pretty straightforward to see how the kernel can be smart and just copy the sent data into the receiving process' buffer directly and skip ACKs and other overhead if no-one's looking. Neither the sender nor the receiver actually care whether network traffic occurs as long as the semantics of the operation stay the same.

> Well, for "local" TCP, it's pretty straightforward to see how the kernel can be smart and just copy the sent data into the receiving process' buffer directly and skip ACKs and other overhead if no-one's looking.

I didn't mean the TCP ACK packet, I meant the application layer return saying "Thanks, I got your query".

It is never a good idea to blast some data off into a transmission channel without knowing if the receiver ever got it, hence the receiver always sends something back to say that they received it.

> Neither the sender nor the receiver actually care whether network traffic occurs as long as the semantics of the operation stay the same.

Whether you are sending over the network or over a local UNIX socket is irrelevant, the context switch is going take place for each `write()` and for each `read()`.

I don't know what backup tools you have in mind... But since a SQLite database is a single file (modulo write ahead journal and whatnot), making whatever you need is trivial.

> modulo write ahead journal and whatnot

That's like saying immortality is trivial modulo death and whatnot. If you don't integrate with sqlite's locking system, you can easily "backup" a file mid-write, corrupted and unusable.

That's why sqlite has a built-in backup command.

Litestream does streaming backups using a separate process: https://litestream.io/

I guess my secret is having almost no writes (the couple things I deployed with SQLite would see a dozen writes per month).

Be careful! If you copy a database file while it is being modified, you may end up with an inconsistent view of the database depending on the order in which different parts of the file are modified.

There are ways around this (filesystem snapshots and potentially file locking) but it's not trivial and failing to handle this correctly is a very good way to build a system which passes tests under low load (when the file-copying operation will win all the relevant races) but fail catastrophically under heavy load.

The sqlite3 utility can be used to create a transaction-consistent backup of a live database. This interface is the only way that a backup should be taken, either from the utility or the C API.

    .backup ?DB? FILE        Backup DB (default "main") to FILE

The state of the art is that you click a few buttons in the UI of the cloud provider of your choice. That takes a few seconds.

What you are proposing may be trivial but in comparison it might just as well be rocket science. Doing more work to use a simpler database that is harder to setup and has less features is not exactly compelling.

In Oracle, I can do this:

SQLite does not implement such a feature.

There is a diff tool [0], and an online backup tool [1]. That kinda thing should be pretty easy to cobble together. You'll just need ~3x disk space to store 2 backups at an interval, diff them and discard the older backup, then apply backward diffs on your most recent backup to achieve a specific backup time.

EDIT: or maybe the Session Extension [2]

[0]: https://www.sqlite.org/sqldiff.html

[1]: https://www.sqlite.org/backup.html

[2]: https://www.sqlite.org/sessionintro.html

Litestream may be able to do that on top of SQLite: https://litestream.io

See the -timestamp option here: https://litestream.io/reference/restore/

> Only a single process can write to the database at any time; it does not support concurrent writers.

I think you can do concurrent writing now with Write-Ahead Logging (WAL): https://www.sqlite.org/wal.html

I've never tried it though, so I don't know how suitable it is for web apps that might potentially have multiple processes trying to write to the DB at the same time.

Nope. In the default rollback-journal mode, SQLite supports either multiple readers or a single writer at a time. WAL mode improves this to multiple readers and at most one writer.

But transactions in SQLite are often fast enough that this could still be acceptable for a lot of purposes, especially on an SSD.

There is no concurrent writing with WAL mode

> It is not client/server; a process must be able to fopen() the database file.

The one time I actually wanted to do that, I wrote the server that `accept`ed incoming connections and used the single `fopen`ed SQLite DB.

It can be very flexible that way, TBH, but if you really need that type of thing a more traditional DB is better.

> Only a single process can write to the database at any time; it does not support concurrent writers.

Potential plus in cloud/container deployments.

> It is not client/server; a process must be able to fopen() the database file. NFS and SMB are options that can convey access to remote systems, but performance will not likely be good.

There are other possibilities given a bit of additional wrappers (over the db process) and federating nodes with some (consensus) protocol. It may actually be a good fit as the 'node-unit' for very large scale distributed metadata store that would benefit from relational DB semantics instead of basic K/V.

Note -- the single process write at any one time is a killer for most web apps, where for example within SaaS you have many users doing things at the same time.

Even in write-heavy environments, I've used SQLite effectively with multiple thousands of concurrent (as in logged in and changing things at the speed of human site interactions, not as in simultaneous web requests at the same instant) users. In this context, SQLite is effectively a write linearizer.

With appropriate pragmas and database handles stored persistently between requests in server processes, the rate (in bytes of data affected by INSERT/UPDATE/DELETE) at which users could make DB changes got pretty darn close to the rate at which a file could be sequentially written on the filesystem, which is more than enough aggregate throughput for 1000s of concurrent non-bulk changes in most web applications, I'd imagine.

Can you elaborate? I’m imagining that the DB web server process would have no problem opening a thread per user and multiple threads can write to a SQLite store. Or, optionally you could buffer multi-threaded requests into a single queue for DB I/O. I’m not seeing why the single process constraint is a major impediment, or maybe I guess I’m not sure I understand why multiple separate processes might be required. Am I misunderstanding your comment?

Most writes only take tens to hundreds of microseconds and you can perform those writes from different processes serially. As long as you set WAL mode & a busy timeout then you should be fine:

PRAGMA journal_mode = WAL;

PRAGMA busy_timeout = 30000;

It's not really an issue if you have 1 db per customer

If a customer has 1000 employees all using your app, it is.

Not always. In fact, I'd strengthen that and say that this is not a problem in general/in most cases with thousands of concurrent users. See my other comment one level up.

If the app is designed correctly, then the thousand employees would write to their own temporary databases, and a background job would pull their changes into the main database sequentially.

If the app is not specifically designed to do this, then SQLite would not be an option.

Serious question, is this just a "how do I get SQLite to work in this scenario?" thing, or is there actually some other benefit to having this sort of data architecture?

This can actually relate to SMTP servers using mbox or maildir formats. Maildir is more resistant to corruption, and doesn't rely (as much) on file locks.


What's the benefit over just using the main database?

Does an event sourcing architecture help with this? I'm thinking a bunch of client piping events into a queue with a single writer working through that queue. Might be stretching the eventual part of eventual consistency if the latency gets too bad, but I don't think SQLite is top of mind for most SaaS solutions anyway so more of a thought exercise.

Sure, though that's if they're all doing CRUD at the same time. I maintain our company blog on a self-hosted install of Ghost backed by sqlite, and it's a been great (since the only inserting or updating is likely to be one person editing a post, and the frontend is mostly cached).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact