Hacker News new | past | comments | ask | show | jobs | submit login
I'm all-in on server-side SQLite (fly.io)
1353 points by dpeck on May 9, 2022 | hide | past | favorite | 404 comments

> SQLite isn't just on the same machine as your application, but actually built into your application process. When you put your data right next to your application, you can see per-query latency drop to 10-20 microseconds. That's micro, with a μ. A 50-100x improvement over an intra-region Postgres query.

This is the #1 reason my exuberant technical mind likes that we use SQLite for all the things. Latency is the exact reason you would have a problem scaling any large system in the first place. Forcing it all into one cache-coherent domain is a really good way to begin eliminating entire universes of bugs.

Do we all appreciate just how much more throughput you can get in the case described above? A 100x latency improvement doesn't translate directly into the same # of transactions per second, but its pretty damn close if your I/O subsystem is up to the task.

How do any writes end up on other horizontally scaled machines though? To me the whole point of a database on another machine is that it is the single point of truth that many horizontally scaled servers can write to and read each others' updates from. If you don't need that, you might as well read the entire dataset into memory and be done with it.

I know TFA says that you can "soon" automagically replicate your sqlite db to another server, but it only allows writes on a single server and all other will be readers. Now you need to think about how to move all write traffic to a single app server. All writes to that server will still take several milliseconds (possibly more, since S3 is eventually consistent) to propagate around all replicas.

In short, 100x latency improvement for reads is great but a bit of a red herring since if you have read-only traffic you don't need sqlite replication. If you do have write traffic, then routing it through S3 will definitely not give you a 100x latency improvement over Postgres or MySQL anymore. Litestream is definitely on my radar, but as a continuous backup system for small apps ("small" meaning it runs and will always run on a single box) rather than a wholesale replacement of traditional client-server databases.

PS: Congrats Ben!

Litestream does a couple of things. It started as a way to continuously back sqlite files up to s3. Then Ben added read replicas – you can configure Litestream to replicate from a "primary" litestream server. It's still limited to a single writer, but there's no s3 in play. You get async replication to other VMs: https://github.com/fly-apps/litestream-base

We have a feature for redirecting HTTP requests that perform writes to a single VM. This makes Litestream + replicas workable for most fullstack apps: https://fly.io/blog/globally-distributed-postgres/

It's not a perfect setup, though. You have to take the writer down to do a deploy. The next big Litestream release should solve that, and is part of what's teased in the post.

> We have a feature for redirecting HTTP requests that perform writes to a single VM. This makes Litestream + replicas workable for most fullstack apps: https://fly.io/blog/globally-distributed-postgres/

Thereby making it a constraint and (without failover) a single point of failover? What’s the upper limit here?

This constraint is common to most n-tier architectures (with Postgres or MySQL) as well. Obviously, part of what's interesting about Litestream is that it simplifies fail-over with SQLite.

Once you have that constraint, it means you will either have the same network latency when writing to SQLite (if it is fronted by some lightweight proxy), or have a lot more frequent failover of SQLite (if it is running embedded within the app, thus following the app's deployment schedule).

I suppose if someone decides to deploy Postgres/MySQL replicas as a sidecar, then it will be the same as what you will end up with?

Yes: nobody is claiming otherwise. SQLite drastically speeds up reads, and it speeds up writes in single-server settings. In a multi-server setting, writes have comparable (probably marginally poorer, because of database-level locking, in a naive configuration) performance to Postgres. The lay-up wins of SQLite in a multi-server environment are operational simplicity (compared to running, say, a Postgres cluster) and read acceleration.

> The lay-up wins of SQLite in a multi-server environment are operational simplicity (compared to running, say, a Postgres cluster) and read acceleration.

What's the operational simplicity? You still have to do backups and replication and SSL. Maybe you don't have to worry about connectivity between the app and the database? Maybe auth?

You don't have to manage a database server if there is no database server.

I mean, there are managed SQL services too. Comparing managed SQLite to DIY Postgres seems disingenuous.

EDIT: I didn’t expect this to be controversial, but I’d like to know where I’ve erred. If you need lightstream to make SQLite operationally simple (beyond single servers, anyway), that seems pretty analogous to RDS to make Postgres operationally simple, right?

>I didn’t expect this to be controversial, but I’d like to know where I’ve erred

I don't think what you are saying is controversial but it feels like you are being contentious for no reason. Your rebuttal doesn't even make sense - telling me I can pay someone to manage it for me, is not the same as it not needing management at all.

Whether I manage it, or someone else does; there is still an operational or financial cost.

The parent's point seems to be that using litestream to replicate your sqlite data also is not the same as not needing management at all.

Whether you do your own sqlite replication, or litestream does it for you; there is still an operational or financial cost.

I didn't downvote you. Postgres as a database server is operationally more complex when compared to Sqlite. Since Postgres is a network service, you have to deal with networking and security. Upgrading Postgres is a big task in and of itself. Backups has to happen over the network.

Number of network connections is another sore point. One of Postgres' biggest pain point is the low number of connections it supports. It is not uncommon to have to run a proxy in front of Postgres to increase the number of connections.

Sqlite gives you so much for free as long as you can work within its constraint, which is single writer (for the most part.)

You have a more complex network setup actually. You have north-south traffic between your client->LB->servers. and you have east-west traffic between your servers for sqlite replication. Both happening on the same nodes and no isolation whatsoever. More things can go wrong and will require more tooling to disambiguate between different potential failures.

W.r.t security, you have same challenges to secure east/west vs north/south traffic. W.r.t # of connections, Postgres has a limit on number of connections for a reason – if you are running a multi-process or milt-thread app framework that's talking to sqlite, you have just traded connection limit to concurrent process/thread access limit to sqlite. I don't know if one is better than other – it all depends on your tooling to debug things when things inevitably fail at redline stress conditions.

You're technically right, which is to say: Not at all.

You don't replicate east-west with Litestream, you replicate north -> south where the "North" is your writer and your "South" is some S3-like service. Basically every application has "Some S3 Like Service" as a dependency now, so that's not a new network connection.

You make a really good point about trading Sqlite write limits to the limited number of Postgres connections. My comment is mostly about having to run a proxy, which is another service that ends up being a liability. Regardless, if you need multi-writers, then the obvious solution is Postgres.

"Both happening on the same nodes and no isolation whatsoever. More things can go wrong and will require more tooling to disambiguate between different potential failures."

As a Postgres fan, the extra Postgres service to me is a far bigger liability than Sqlite. The east-west networking is true for Postgres replication as well, so it's a wash to me.

> Upgrading Postgres is a big task in and of itself.

Learnt it the hard way when I first upgraded the major version, Only to realize that the data needs to be migrated first. pg_upgrade requires binaries of the older version and so we need copies of data, as well as binaries of old & new version of postgres[1] i.e. if not manually dumped; Fortunately it was just my home server.

[1] https://wiki.archlinux.org/title/PostgreSQL#Upgrading_Postgr...

It isn't comparable to a managed PostgreSQL. There is no server. SQLite is just a file format and a library to access it.

It still has to run in a server process on a server host, and virtually all of the challenges of managing a database server are just moved up to the application layer. There are fewer actual hosts, but for the likes of fly.io or a cloud provider a difference of a handful of hosts is negligible because hosts are cattle rather than pets.

There might be advantages to SQLite (e.g., maybe lightstream's replication/backup/etc story is simpler than Postgres's), but "number of hosts" or "number of server processes" doesn't seem compelling to me.

Of course, it has nothing to do with the number of hosts or server processes, and I didn't see anyone claim that it does.

It's difficult-to-impossible to truly run Postgres as a proper "herd of cattle", due to the need to failover replication roles between reader and reader-writer, ensure replication is in sync on startup before allowing reads, handle upgrades carefully, etc. If you're using something like RDS or another managed Postgres, this is still the case, it's just being handled for you by someone else.

So it's not that you're just reducing the number of hosts; you're eliminating an entire class of hosts that have complex state and maintenance requirements. Your application processes are hopefully a herd of identical cattle and can be killed off and relaunched at will, and that property remains the same after introducing Lightstream.

Litestream is a database server, isn’t it?

No; there's no such thing as a sqlite3 server. The database is the file(s). Litestream runs alongside everything else using sqlite3 and ensures that it's replicating. If Litestream crashes, reads from the database keep working fine (though, of course, they'll start to stale if it doesn't come back up).

This is why we called out in the post that Litestream is "just sqlite3". It's not sitting between apps and the database.

That seems disingenuous. If sqlite3 isn't a server, then neither is apache2. But in reality they're both binaries 'serving' 'files' over an interface. You're just hosting them on the same machine, reverting to a monolith-style deployment. Which is fine, but then lets call it what it is.

> That seems disingenuous. If sqlite3 isn't a server, then neither is apache2.

Your argument really is with Dr. Richard Hipp: https://sqlite.org/serverless.html

But in reality they're both binaries 'serving' 'files' over an interface.

By that definition fopen() is also a server.

According to Plan9, fopen() is also a server.

That’s my point though, if you want to use SQLite in production / with Litestream, you’re comparing two databases that need servers to function just implemented quite differently.

In the same vein as your article, it seems very fair to say that if the Litestream server went down in production you’d have a broken app for most use cases within a few minutes. So in practical effect, the server of Litestream is about as essential.

> Litestream crashes, reads from the database keep working fine.

fly-app's litestream-base dockerfile suggests that the litestream process supervises the app process... I guess then that's a limitation specific to fly.io's deployment model and not litestream?

> have a lot more frequent failover of SQLite (if it is running embedded within the app, thus following the app's deployment schedule).

That does sound like it's going to be difficult to get right. But if Litestream eventually implements a robust solution for this problem, then I think some added complexity in the deployment process will be a reasonable price to pay for increased app performance the rest of the time.

For what it's worth, I think this problem (the complexity that bleeds into the app for handling leaders) is mostly orthogonal to the underlying database. You have the same complexity with multi-reader single-writer Postgres. But the code that makes multi-reader SQLite work is a lot easier to reason about.

Let me know if you think I'm off about that.

Unless I'm misunderstanding something, I do think using SQLite makes a significant difference in the complexity of app deployment. When using multi-region Postgres, it's true that you only want the Postgres leader to be accessed by app instances in the same region, so the app instances all have to know which region is running the leader. But multiple app instances in that region can connect to that Postgres leader, so it's easy to do a typical rolling deploy. With SQLite, only one app instance at a time can write to the database, so IIUC, there will have to be a reliable way of doing failover with every app deploy. I suppose the same thing has to happen in the Postgres scenario when updating Postgres itself, but that's way less frequent than deploying new versions of the app.

> multiple app instances in that region can connect to that Postgres leader, so it's easy to do a typical rolling deploy

This is mentioned as a drawback at towards the end of the blogpost, isn't it? It does seem it would make deployments rather awkward.

You can have two app instances writing to the database concurrently, as long as they are running on the same machine. Then it’s possible to deploy a new version without downtime by starting the new binary, switching traffic to the new instance, draining connections to the old instance, and shut it down. That’s kind of how nginx for example upgrades itself without downtime. That’s not the usual way nowadays with containers and VMs, but it was very common before. It’s definitely doable with SQLite, and not very complex.

Your app machine has to be overspecced so it can support running 2 copies of the app at once

No. The number of processes requests in unchanged. They are just dispatched between 2 instances: the new one and the old one. Actually, all new requests are going to the new instance, and only the pending requests are processed by the old instance.

Sure but most applications need a certain amount of memory regardless of the amount of requests they're servicing

Agreed, RAM usage may be slightly elevated during the switch. But it doesn’t seem to be a big issue in practice. nginx for example is updating itself like this and it works fine.

On the other hand, your application code is probably less reliable than Postgres or MySQL and now it can bring down your master, so failover is a more likely scenario. Probably not "worse", just "different".

I don't think so. Single-master, multi-reader is the most common clustering pattern for all these databases. If you lose the app server that's handling writes in any of those systems, you have the same problem. Meanwhile, when your app server is down, the sqlite database is just fine, and so is Litestream.

I'm not thinking very careful in answering this, but I think there's a subtlety I'm not capturing well here, and that it's not the case that this pattern has poorer fail-over than clustered Postgres. I think there are more things that can go wrong with Postgres.

IMO the benefit to n-tier is that you can have multiple instances on your app tier with a single-master multi-reader database tier, and that makes the system resilient to app server crashes.

For a real-world example, some time ago I wrote an admin tool as part of a production monolith, for a rare (but frequent-enough) internal use case where a user could put in a URL, the app server would fetch the remote HTML, process it, and save the results to a database. A few months later, we start getting weird error reports that the server is crashing. Sure enough, a malformed document could cause enough recursion to exceed the Python maximum recursion depth and crash the process.

If this had been the single process running server-side SQLite, even if multithreaded and/or concurrent (with e.g. something like gevent), the crash would have taken down our entire site for however long it would take to restart the process. But since there were other app server processes eager and ready to take ongoing requests, the only disrupted user was the one trying to use that admin tool, not all the other customers using the product at the time. Said user was confused, but was very glad they didn't disrupt the critical work our customers were doing at the time!

Of course, one size doesn't fit all, and there are many services that are limited enough in scope, working with known and battle-tested components, that this wouldn't be a problem. But if I make the conservative assumption that any process that is running code that I myself wrote, no matter how simple-seeming, has a much higher chance of segfaulting or otherwise crashing than a standalone professional database... then that late-night ping is much more likely to be something that can be handled the following morning.

It's worth noting that "single-writer" refers to a single machine, not a single process. Multiple programs can write to the same SQLite DB at once (although contention can be a problem, etc etc). So, if that admin tool was running on the same machine plugged into that same SQLite file, it could crash and leave the main customer-facing services running fine.

Is simplified failover referring to the currently-in-beta async replicas? Or is there something else on the way that will make it easier to failover?

There's nothing that's quite ready to look at for easy failover, but this is a big priority. We absolutely have to figure out failover during deploys. It's doable we just want to reduce the operational complexity as much as we can.

I _think_ he's referring to the ease of use of streaming the WAL to S3 and reloading from it upon starting a new write "leader".

> If you don't need that, you might as well read the entire dataset into memory and be done with it.

Over in-memory data structures,SQLite gives you:

- Persistence

- Crash tolerance

- Extremely powerful declarative querying capabilities

> if you have read-only traffic you don't need sqlite replication.

I agree with you that the main use-case here is backup and data durability for small apps. Which is pretty big deal, as a database server is often the most expensive part of running a small app. That said, there are definitely systems where latency of returning a snapshot of the data is important, but which snapshot isn't (if updates take a while to percolate that's fine).

There are in-memory db's that also write to disk. They can offer both persistence and crash tolerance.

I tried o e that even had a better solution to the object-relational mismatch, you just got objects, which made development very easy.

I’d argue that persistence also includes data portability. It’s very handy to be able to just copy your data around in a SQLite file. That’s not really a feature that’s terribly useful in a remote deployed application, but very handy if you have multiple applications all reading the same data.

It does! "Get a to-go database" is already on the roadmap. :)

I do understand the point of running SQLite in-process to speed up reads.

I do not understand why SQLite must also handle intense write load with HA, failover, etc.

I would rather have the best of both worlds: a proper DB server (say, Postgres) replicated to super-fast and simple read replicas in SQLite on every node.

(My ideal case would be some kind of natural sharding where each node keeps its own updates, or just a highly available data browsing app, with data in SQLite files updated as entire files, like a deploymen.)

> a proper DB server (say, Postgres) replicated to super-fast and simple read replicas in SQLite on every node.

Having your replicas be different database software than your master seems to me like asking for, at least, weird edge case bugs.

Is this something anyone does? Interested to hear experiences if so!

I've not done this but it's intriguing; potentially a best-of-all-worlds solution.

I think "proper" automatic replication is not possible given the mismatch between Postgres and SQLite - not everything in Postgres maps to a thing that is possible in SQLite.

That said, there are a variety of ways to get data out of Postgres, and a variety of ways to get things into SQLite.

You could periodically export CSVs or whatever from Postgres and periodically import them into SQLite.

Or you could do a more realtime sync by using a Postgres foreign data wrapper like this one: https://github.com/pgspider/sqlite_fdw which would let you write directly to SQLite. Combine that with database triggers on the Postgres side of things and you've got something pretty close to realtime replication.

Those sorts of solutions wouldn't be as robust as "real" replication (specifically, what happens when one of the SQLite replicas is unavailable? do you catch things up later?) but could be very useful for a lot of scenarios. You could have Postgres importing gobs of data, and "replicating" it over to your read-only reporting server which uses SQLite as a data source.

My idea was to either use triggers or a process that reads Postgres's WAL and replay transactions to SQLite, by sending updates, rebuilding files entirely, or anything else.

Such replicator processes can be horizontally scaled, and made into an HA configuration.

Of course you should be careful to use on the Postgres side only the SQL features that will map well on SQLite. The range of such features is wide enough for practical applications though.

My real-life example would be a realty site, where changes are infrequent, but the desire to slice and dice the current large(ish) dataset in interesting ways is very high. The latter could be done using a per-node SQLite DB.

That sounds very possible and very performant.

I've worked on an 'eventually consistent' system with read/write SQLite dbs on each host with a background worker that replayed the log into a central source of truth db and workers that made updates across each host instance of SQLite.

It could have been made a lot faster, I think the replication sla for the service was 10 minutes usually done in seconds. But our specific workflow only progressed in one direction, so replaying a step wasn't a huge issue now and again though that was quite rare. If you were to put a little more effort than we did into replication layer and tuning your master db, it could be a really effective setup.

One of the best parts is that when instances are stopped or isolated, they were also almost isolated from everything that used the service so if you go into a black box with your clients, you work as normal and when connection or other hosts are brought back up they replay the db before accepting connections. We could take entire availability zones offline and the workers and clients would keep humming and update neighbors later.

One of the ideas I tried to sell at my last company was to bake SQLite into the lambda images used to run the app. It wouldn’t have been for transactional data, but for the mess of other data you wind up needing (country codes, currency codes, customer name to ID, etc.). It was all stuff that changed infrequently if ever. Unfortunately nobody wanted to do it. I think they wanted to have DynamoDB on their resume.

This is a solid approach; used several times in my previous adtech companies to distribute a large amount of core data to every server instance.

What if, due to ridiculous latency reductions, your business no longer requires more than 1 machine to function at scale?

I'm talking more about sqlite itself than any given product around it at this point, but I still think it's an interesting thought experiment in this context.

I have to imagine having your service highly available (i.e. you need a failover machine) is far more likely to be the reason to need multiple machines than exhausting the resources on some commodity tier machine.

I'll point out that the ridiculous latency reductions don't apply to replicating the writes to S3 and/or any replica servers, that still takes as long as it would to any other server across a network. The latency reductions are only for pure read traffic. Also, every company I ever worked at had a policy to run at least two instances of a service in case of hardware failure. (Is this reasonable to extrapolate this policy to a company which might want to run on a single sqlite instance? I don't know, but just as a datapoint I don't think any business should strive to run on a single instance)

This write latency might be fine, although more than one backend app I know renewed the expiry time of a user session on every hit and would thus do at least one DB write per HTTP call. I don't think this is optimal, but it does happen and simply going "well don't do write traffic then" does not always line up with how apps are actually built. Replicated sqlite over litestream is very cool, but definitely you need to build your app around and also definitely something that costs you one of your innovation tokens.

There's no magic here (that there is no magic is part of the point). You have the same phenomenon in n-tier Postgres deployments: to be highly available, you need multiple instances; you're going to have a write leader, because you're not realistically want to run a Raft consensus for every write; etc.

The point of the post is just that if you can get rid of most of the big operational problems with using server-side SQLite in a distributed application --- most notably, failing over and snapshotting --- then SQLite can occupy a much more interesting role in your stack than it's conventionally been assigned. SQLite has some very attractive properties that have been largely ignored because people assume they won't be able to scale it out and manage it. Well, you can scale it out and manage it. Now you've got an extremely simple database layer that's easy to reason about, doesn't require you to run a database server (or even a cache server) next to all your app instances, and happens to be extraordinarily fast.

Maybe it doesn't make sense for your app? There are probably lots of apps that really want Postgres and not SQLite. But the architecture we're proposing is one people historically haven't even considered. Now, they should.

I'm not sure "litestream replicate <file>" really costs a whole innovation token. It's just SQLite. You should get an innovation rebate for using it. :)

> But the architecture we're proposing is one people historically haven't even considered. Now, they should.

I think this offering, and this idea, are absolutely fantastic, and if not the future, at least a big part of it, for the reason outlined in the post: namely, that for a lot of apps and use cases, sqlite is more than enough.

But I also suspect this is probably already the case, and we don't know about it because people don't talk about it.

Amusingly, I was recently scolded here on HN for suggesting to use sqlite, by someone who said HN was a place for "professionals":


Once sqlite is synced to a virtually indestructible storage solution (or several!} then that removes a lot of objections from "professionals".

Could not be more excited by this. Congrats.

Directing specific queries to a write connection (dbserver) vs directing requests to specific application servers (potentially mid-request) does seem operationally “harder” though.

I’m coming at this from the perspective of a traditional django app that has .using(“write”) when wanting to write data. Otherwise you’re replaying requests at a completely different app server.

Am I understanding this correctly?

This may or may not be that hard, depending on your server. You could proxy all "mutation" HTTP verbs to your one writer instance, and probably do similar if you are using GraphQL.

If you are using something like gRPC I feel this might be more complicated because it's not obvious which actions are read/write.

I'm in the same boat as you though overall - I'm not sure what the ideal strategy is, or if one even exists, since this seems to create a problem that does not normally exist.

If you are greenfield, maybe you create one service that only does writes - this may be CQRS-like.

This is great and I'm definitely going to be using it this week in a client project.

That being said, you don't get an innovation rebate for using a new tool, even if, as here, it's a parsimony-enabler. It's still a new tool.

A description from TFA reads "The most important thing you should understand about Litestream is that it's just SQLite." (This reminds me an awful lot of the tagline for CoffeeScript: "It's just JavaScript" -- where did that leave us?) But that info box is just under a description of how the new tool is implemented in a way that makes it sound (to someone who's never looked at the SQLite codebase) like it's breaking some assumptions that SQLite is making. That's the sound of an innovation token being spent.

CoffeeScript was not in fact just Javascript. Litestream literally is just sqlite3; it's not an app dependency, and doesn't interpose itself between your app and sqlite3. You could add it to an existing single-process SQLite-backed app without changing any code or configuration, other than running the Litestream process.

It's brilliant that a person can ship their WAL from an app that doesn't know anything about Litestream. That's cool. But it is not in fact just SQLite. If it were, there wouldn't be a blog post, or an additional binary to download, or a backup of my database in S3, or...

This is like saying simply rsync'ing the database file is "not just SQLite". Sure, in one particularly unuseful parsing of the words "just SQLite".

I think saying it is “just SQLite” is (unintentionally) misleading. Your app may not know it’s anything else but operationally it’s a sidecar, so another process to manage.

I actually had to go look that up because it was a little unclear from the blog post and this comments section.

It would appear that sqlite is merely "deceived" by a secondary client, its normal operation is not affected. https://litestream.io/how-it-works/

Keep in mind that sqlite is most likely the most tested 100k lines of C code in existence. They have tests covering WAL behavior.

If we're designing a system that relies on an unconventional and otherwise quite rare use-case of a dependency in order to make critical long-term stability guarantees, I would rather that dependency be SQLite, for sure.

> The latency reductions are only for pure read traffic.

Well, no, because every insert will still be fast (until there's too many). It does not block until it's written to e.g. s3.

So there's a window, let's say 1 second, of potential data loss.

I assume syncing the wal to s3 is much faster than inserting to sqlite, so it will never fall behind, but I have not tested.

> Also, every company I ever worked at had a policy to run at least two instances of a service in case of hardware failure.

Yeah, but the goal is not to have X instances, the goal is to limit downtime. In my experience the complicated setups have downtime as well, often related to how complicated they are.

In my mind a setup like this would only be used where some downtime is OK. But that's quite common.

With Postgres, you might have one server, or one cluster of servers that are coordinated, and then inside there you have tables with users and the users' data with foreign keys tying them together.

With SQLite, you would instead have one database (one file) per user as close to the user as possible that has all of the user's data and you would just read/write to that database. If your application needs to aggregate multiple user's data, then you use something like Litestream to routinely back it up to S3, then when you need to aggregate data you can just access it all there and use a distributed system to do the aggregation on the SQLite database files.

This sounds a lot like ActorDB [1], which is a horizontally replicated and sharded SQLite.

With ActorDB, each shard (what it calls actors) is its own dedicated SQLite database (but efficiently colocated in the same block storage as all the others, so essentially a virtual database). The intention with the sharding is to allow data that logically belongs together to be stored and replicated together as a shard; for example, all of a single user's data. When you want to run an SQL query you tell it which shard to access (as part of the query), and it routes the request to the right server.

It has some pretty clever stuff to allow querying across multiple such "actors", so you can still get a complete view of all your data.

Sadly, I don't think it's maintained anymore. Last commit was in 2019.

[1] https://www.actordb.com/

Looks neat, I'll check this out.

For a project I currently have, I am parititioning not only the database but also the software for each "customer" (let's call them that for now), so basically I have my single-threaded Python application that uses SQLite and put that in a container with Litestream and run one of those containers for each customer on one or more kubernetes clusters or on anything else that runs a container (or you can run the software directly). Then you can take the data from one to N of those customers and aggregate it and process it for monitoring and central configuration control and backup with another bit of software that does a distributed query on the S3 data that Litestream is saving. I can also control several of the systems in a coordinated way, if needed, by linking them together in a configuration file and letting them elect a leader that will be the central point of contact for a source of "work" (let's call it that for now) and then that leader can notify the other instances of the work to be done and help to dole out that work in a reasonable manner based on the resources and capabilities of the aggregation.

A few similar projects I worked on that were centralized were a huge mess in both the database and the codebase due to trying to do the same thing for thousands of customers with several dozen horizontal-scaling services that had to be coordinated. In my system, I just write one single-threaded bit of software that does the whole thing, but just for one customer at a time, so the whole system is horizontally scaleable instead of just each service within the system. I can still do the same things to aggregate data from the individual customer software units, and as described above I can still add coordination features for groups of individual customer units, so there is no loss in features, but the whole system is simple to reason and think about with none of the usual synchronization pitfalls you get in the usual distributed system.

Hold on, doesn’t one-database-per-user totally absolve all ACID guarantees? You can’t do cross-database transactions (to my knowledge), which means you can end up with corrupted data during aggregations. What am I missing?

One database per tenant only makes sense in multi-tenant applications that don't have any cross-tenant actions. I imagine there are many B2B applications that fall into this category.

If you have a use case with data that is extremely partitionable (like controlling individual industrial devices and collecting data to improve their performance or monitor the processes), then SQLite and Litestream could be a great option, but if you can't reasonably partition the data then it's probably better to use a centralized database server.

There can also be shared, mostly read-only databases, with no transactions crossing database boundaries.

For example, one database per retail store with sharded customers, orders and inventory (most transactions involve one order of one customer fulfilled from that store) and a shared one with item descriptions and prices (modified, say, daily).

>Hold on, doesn’t one-database-per-user totally absolve all ACID guarantees?

No it doesn't. What gave you that idea? You still have all "ACID guarantees" within each database.

>You can’t do cross-database transactions (to my knowledge),

That's true of most databases. If you have two apps and they use two different databases you won't have transactions across those two apps.

>which means you can end up with corrupted data during aggregations.

No, aggregations within each database work as you would expect.

>What am I missing?

As others have said, you use this pattern only if you don't intend to cross databases. By the way, in NoSQL databases like MongoDB, every document is its own little database so having a per user database is a massive upgrade over NoSQL.

To clarify:

- I was talking about ACID guarantees across databases (ie across users) - I was talking about aggregations across databases (ie across users)

Of course working inside one database works as you would expect it to. My point was that this pattern of database-per-user seems to be a totally different design than people have used with traditional n-tier designs.

Good point about NoSQL! But, wasn’t part of the reason MongoDB fell out of favor because it was lacking consistency?

Yeah, I think MongoDB went through a set of steps - atomically consistent at document level (which is fine for many apps as you have most related data in the single document anyways) - atomically consistent within a collection - and now, with MongoDB 4.0 and higher (released in 2017? 2018? whatever, a long time ago), MongoDB supports full transactional consistency across documents, collections, and shards. It took them awhile, but they got there.

How it would handle conflicts?

A lot depends on your consistency requirements and data model here.

I use SQLite heavily, and have evaluated litestream and rqlite but not deployed them, so bear that in mind.

If the application is set up so that it serves a user for a session, so a given session ID is reading and writing from the same SQLite database, there are many opportunities to replicate that data optimistically, so that you won't lose it if a meteor hits the server, but it might not live in all the replicas right away, since applying patchsets off the gossip network happens in downtime.

If concerns can't be isolated like this then yes, dedicated swarms of database servers are the way to go. Frequently they can be, and using SQLite punches way above its weight here.

There are many systems that have much higher read to write traffic and so writes only need logarithmic scaling or perhaps with the square root of the system size. Waiting for faster hardware worked for these system for a long time, and to a small extent, still does.

The dirty secret is that a lot of systems that require very high write traffic are essentially systems built for narcissists. "Social websites" have higher write traffic than simpler consumption based systems, but we've gone beyond those initial steps into very aggressive systems that are based on recording every interaction with the user and providing them instant gratification for many of those.

These applications don't scale in a way that others do, easily. And maybe it's a feature, not a bug, if the tools I use discourage me from jumping into the maelstrom by making it difficult to even consider doing so. Constraints are where creativity comes from, not possibility.

S3 is strongly consistent now: https://aws.amazon.com/s3/consistency/

What a ridiculous marketing term. This is a RYW (Read your writes) level of consistency which is a far cry from Strong consistency (see https://jepsen.io/consistency). Seems like eventual consistency with some affinity bolted on.

That page does not give a hard definition for strong consistency, it says that it uses them informally as relative terms. AWS is not claiming serializability, they call it "strong read-after-write consistency." I don't see the problem here? S3 famously wasn't guaranteed to read data you had just written for a long time, and now it is. That's significant.

Here's more about the specifics: https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcom...

In particular:

> Any read (GET or LIST request) that is initiated following the receipt of a successful PUT response will return the data written by the PUT request.

So this is stronger than RYW (emphasis mine).

I have always understood strong consistency to refer to linearizability or sequential consistency - i.e. all clients have the same view of the global order but with sequential consistency permitting slightly more flexibility in how operations of different clients can be reordered wrt each other.

I asked below but which property of linearizability is missing here? Is it the way it handles concurrent writes?

Hey, having actually looked at the link you provided, in fact both examples they give are linearizations so they could plausibly be providing linearizability (with respect to a single key). It's hard to say whether there are corner cases in which different clients could observe different orderings but if not then I stand corrected!

There was another thread where somebody claimed it was causally consistent. I’m sort of surprised Amazon hasn’t been clearer about this, but my feeling is that they would say it was linearizable if they were sure it was linearizable. Would love to read a real deep dive on this, I checked to see if Kyle Kingsbury had looked into it yet but he hasn’t.

> > Any read (GET or LIST request) that is initiated following the receipt of a successful PUT response will return the data written by the PUT request.

> So this is stronger than RYW.

I'm not sure that it is? The examples listed below that description only specify making an update and then immediately reading it back from the same process.

Look at the graphics in the section "Concurrent applications," specifically the first one.

At T0 Client 1 writes 'color = red.' Write 1 completes at T1.

At T2 Client 2 writes 'color = ruby.' Write 2 completes at T3.

At T4 Client 1 reads 'color = ruby,' the result of Write 2 from Client 2.

The explanation above says "Because S3 is strongly consistent, R1 and R2 both return color = ruby." There are clearly some subtleties (as explained further down the page) but I don't think Amazon are really being deceptive here.

Maybe it's just my suspicious-of-everything-distributed brain, but that diagram seems to assume a single universal time scale without any discussion of the implications.

You successfully nerdsniped me and I'm having a lot of trouble finding discussion of the formal implications of what they call "strong consistency" here, other than reading that they did in fact formally verify it to some extent. The best that I could find is this other HN thread where people claim it is causally consistent in a discussion about a Deep Dive (frustratingly shallow, as it happens): https://news.ycombinator.com/item?id=26968627

I have never heard strong consistency used to describe such a weak guarantee before - i.e. it's marketing bs. Usually strong consistency refers to linearizability (or at the least sequential consistency). The diagram a few pages in to this paper gives a nice overview: https://arxiv.org/abs/1512.00168

Yes I actually read that paper while I was digging around but it didn’t seem to help in this case because Amazon don’t specify whether reads made after a concurrent write is made are guaranteed to return the same value as each other. If they are I think the system would be linearizable, yes? Either way they don’t say linearizable anywhere and they describe it specifically as “read-after-write” so I think it would be wrong to assume linearizability. What’s missing from this model for linearizability? S3 doesn’t have transactions after all.

Isn't this definition CAP consistency?

CAP is defined wrt linearizability yes.

I'm not convinced that issues requests from the multiple clients for the same key actually matters. My speculation is that they map a key to their backend via some type of (consistent/rendezvous) hash and then ensure that all requests for said key lands on the same process/server* that contains the state for the key.

This means that for a specific key, you end up on 1 specific process. If you can ensure this, you basically get monotonic reads/writes along with RYW and Writes-Follow-Reads. All this maps to causal consistency so it is believable.

* The request could probably be sent to a read-only replica first but it could then forward it to the leader replica handling writes by examining some logical timestamp.

> Latency is the exact reason you would have a problem scaling any large system in the first place.

Let's not forget why we started using separate database server in the first now…

A web server does quite a lot of things: Parsing/formatting HTTP/JSON/HTML, restructuring data, calculating stuff. This is typically very separate from the data loading aspect and as you get more requests you'll have to put more CPU in order to keep up (regardless of the language).

By separating the web server from the database server you introduce more latency in favor of enabling scalability. Now you can spin up hundreds of web servers which all talk to a single database server. This is a typical strategy for scalability: decouple the logic and scale up individually.

If you couple them together it's more difficult to scale. First of all, in order to spin up a server you need a full version of the database. Good luck autoscaling on-demand! Also, now every write will have to be replicated to all the readers. That's a lot more bandwidth.

There are definitely use cases for Litestream, but it's far from a replacement for your typical Node + PostgreSQL stack. I can see it being useful as a lower-level component: You can use Litestream to build your "own" database server with customized logic which you can talk to using an internal protocol (gRPC?) from your web servers.

I don't think anyone's seriously arguing that the n-tier database architecture is, like, intrinsically bankrupt. Most applications are going to continue to be built with Postgres. We like Postgres; we have a Postgres offering; we're friends with Postgres-providing services; our product uses Postgres.

The point the post is making is that we think people would be surprised how far SQLite can get a typical application. There's a clear win for it in the early phases of an application: managing a database server is operationally (and capitally) expensive, and, importantly, it tends to pin you to a centralized model where it really only makes sense for your application to run in Ashburn --- every request is getting backhauled their anyways.

As the post notes, there's a whole ecosystem of bandaids --- err, tiers --- that mitigate this problem; it's one reason you might sink a lot of engineering work into a horizontally-scaling sharded cache tier, for instance.

The alternative the post proposes is: just use SQLite. Almost all of that complexity melts away, to the point where even your database access code in your app gets simpler (N+1 isn't a game-over problem when each query takes microseconds). Use Litestream and read-only replicas to scale read out horizontally; scale the write leader vertically.

Eventually you'll need to make a decision: scale "out" of SQLite into Postgres (or CockroachDB or whatever), or start investing engineering dollars into making SQLite scale (for instance: by using multiple databases, which is a SQLite feature people sleep on). But the bet this post is making is that the actual value of "eventually" is "surprisingly far into the future", "far enough that it might not make sense to prematurely optimize for it", especially early on when all your resources, cognitively and financially and temporally, are scarce.

We might be very wrong about this! There isn't an interesting blog post (or technical bet) to make about "I'm all in on the n-tier architecture of app servers and database servers". We're just asking people to think about the approach, not saying you're crazy if you don't adopt it.

I just wanna equivocate about a single phrase: scale out.

I have a few years of experience w/ SQLite as a backend for dev/test purposes, and the biggest lesson has been in reinforcing best practices... the kind Postgres demand and you're lucky if MySQL reminds you about.

So my newb two cents is that Litestreams makes some unique and significant contributions: it's not "scale out" but "pivot out" to reflect that you've got great backup/replica solutions with impressive agnosticism around what the future solution is.

Thats a lot of leeway to prove that "X's brainchild is a viable product" while minimizing "Y's devops team and infrastructure" without compromising on durability essentials, especially where the solution to so many things is "dump/serve a(n) [encrypted] copy to/from S3" already.

Eager to drink some more kool-aid. :-)

Does Fly give some magic around splitting reads vs. writes to instances? In a typical Django/Rails app I’m not sure how I’d mark some API endpoints as routed to the single node which is allowed to write to the DB. (I know you guys have some sauce around how you route requests between regions, maybe this falls out of your existing infra here?)

I’m just not seeing how I can operate my traditional Django app in this model, other than just having one instance. I’m probably missing something though!

Ah, I think this is answered elsewhere in this thread: https://news.ycombinator.com/item?id=31320995

Short answer: yes.

> There are definitely use cases for Litestream, but it's far from a replacement for your typical Node + PostgreSQL stack

If you're a language like Node.js then horizontal scaling makes a lot of sense, but I've been working with Rust a lot recently. And Rust is so efficient that you typically end up in a place where a single application server can easily saturate the database. At that point moving them both onto the same box can start to make sense.

This is especially true for a low-traffic apps. I could probably run most of my Rust apps on a VM with 128MB RAM (or even less) and not even a whole CPU core and still get excellent performance. In that context, sticking a SQLite database that backs up to object storage on the same box becomes very attractive from a cost perspective.

This is "vertical scaling" and that is indeed a very valid approach! You just have to be aware that vertical scaling has some fundamental limits and it's going to suck big time if it comes at a surprise to you.

Considering that more powerful machines continue to become more affordable, it's a safe bet that most of us will never hit those limits.

Alternatively, instead of just betting on it, you could do a benchmark, figure out the limits of your system and check if your current implementation is capable of handling the future needs.

You're not just looking at hardware limits, there are OS limits to be aware of, like the maximum number of concurrent connections per port.

Not only is it possible that your users grow, it's also quite possibly that your application complexity does. If you suddenly need to run expensive queries joining multiple datasets or do some heavy computation, your apps resource usage might jump quite a bit. Of course, you can always throw money at it, but the point where it gets seriously expensive can come quite fast.

Not sure about that. It would be smarter to just failure test your apps. Once you cross some threshold, you scale. Lots of companies build formulas costing out their cloud spend based on infra needs and failure tests.

As they say, "you are not twitter" ;)

Access to monstrous machines is easy today and you have very fast runtimes like Go and the JVM that can leverage this hardware.

This is a large part of what Rich Hickey emphasizes about Datomic, too. We're so used to the database being "over there" but it's actually very nice to have it locally. Datomic solves this in the context of a distributed database by having the read-only replicas local to client applications while the transaction-running parts are remote.

Only trouble with that particular implementation is that the Datomic Transactor is a single threaded single process that serializes every transaction going through it. As long as you don't need to scale writes it works like a charm. However, the workloads I somehow always end up working with are write heavy or at best 50/50 between read and write.

That's another trait it would share with SQLite + litestream, until SQLite's BEGIN CONCURRENT is released.

It's exciting to see Datomic's architecture realized using more conventional technology.

If you're pushing the database up into the application layer, do you have to route all write operations through a single "master" application instance? If not, is there some multi-master scheme, and if so, is it cheaper to propagate state all the time than it is to have the application write to a master database instance over a network? Moreover, how does it affect the operations of your application? Are you still as comfortable bouncing an application instance as you would otherwise be?

The answer is: yes, you do have to write through a single primary application instance.

So far.

The two important things here are:

1. Fly.io makes it really easy to write through a single primary application instance

2. There are ways to solve this problem so your application doesn't have to worry about it.

Right now, you have to be a little careful bouncing app instances. If you bounce the writer, you can't perform writes for 15s or whatever. This is a big problem during deploys.

There are a tremendous number of Fly.io users that are fine with this limitation, though. It's pretty valuable for some segment of our customers right now.

It definitely seems like it could be useful for some use cases; I'm just trying to get my head around the constraints. :)

What are some ways alluded to in number 2?

This is exactly the reason I am so skeptical of the cloud. I don't care how easy it is to stand up VMs, containers, k8s, etc. What I need to know is how hard is it to lug my data to my application and vice a versa. My feelings on this are so strong as I work mostly on database read-heavy applications.

Local-first software is the future:

[1]Local-First Software:You Own Your Data, in spite of the Cloud:


What confuses me about this architecture I guess is: why have a SQL database at all? This sounds like a local cache. Which sure, of course those are super fast. But why does it need to be relational if all the data fits on the edge?

You get SQL and ACID. If you don't need those then you pay a performance price for having them. If you do need them, then you pay a price for not having them.

The best solution depends on the unit economics of the problem you are trying to solve. If you have a small number of high value users, then these approaches are premature optimisation, just use Postgres. If your business model is ad eyeballs then squeezing every last drop begins to seem very attractive because you can multiply your profitability (potentially).

You usually want ACID, but with N+1 queries not being a problem you no longer need SQL. The database can now be a really low-level key-value store with your favorite query language on top.

Litestream author here. I wrote an embedded key/value database called BoltDB a while back and I experimented with using it for application development. It had its pros and cons. The pros were that it was crazy fast and a lot of data access is just basic iterators. Some of the biggest cons were having to implement indexes on my own and that my schema was tied to my application language. Migrations can be more difficult if everything is tied to structs in your app and it's also difficult to inspect your database since there's no CLI REPL.

So the answer to my question might be "because SQLite already does some stuff really well, like indexes and its CLI, and it just happens to be relational and use SQL as its query language, but that's not the important part". I buy that.

So what you're saying is we need FoundationDB-like layers on top of the fundamental data store? :D

Yeah, but what if my favorite query language is SQL? And what if I want to grab things not by the key (all items in area X, between times Y and Z)? Key-value isn't really great at that.

Most data is relational, so why not store it that way?

Or, from another angle, what would your “local cache” be?

Objects / documents / "key-value". If the data is small enough to fit on a single application server node, it seems to me that the relational model is likely overkill. But I don't doubt that there are good counter-examples to this, it's just my intuition.

Key-value pretty trivially fits into the relationship model if need be. But yeah, if you’re very sure of the key-valueness of your data, something like RocksDB could be a more direct fit. It’s just a little harder to work with in my experience.

I guess my experience is the opposite. We're probably both right, just about different things.

if you can tolerate eventual consistency and have the disk/ram on the application vms, then sure, keeping the data and the indices close to the code has the added benefit of keeping request latency down.

downside of course is the complexity added in synchronization, which is what they're tackling here.

personally i like the idea of per-tenant databases with something like this to scale out for each tenant. it encourages architectures that are more conducive for e2ee or procedures that allow for better guarantees around customer privacy than big central databases with a customer id column.

> personally i like the idea of per-tenant databases with something like this to scale out for each tenant.

So do I. And that type of architecture has come up a few times now in this comment thread. Given that Fly has the lead developer of the Phoenix web framework on staff, maybe it would make sense for him to work on integrating this type of architecture, with Litestream-based replication and the ability to have different master regions for different tenants, into Phoenix.

Not sure that would make sense. Phoenix uses the Erlang OTP platform, which ships with its own distributed database, Mnesia: https://www.erlang.org/doc/man/mnesia.html

Phoenix apps don't typically use Mnesia; they use a relational database through Ecto [1].

[1]: https://github.com/elixir-ecto/ecto

In practice, Phoenix apps tend to be using Ecto and a mainstream database.

> SQLite isn't just on the same machine as your application, but actually built into your application process.

How is that different than whats commonly happening? Android and iOS do this... right? ... but its still accessing the filesystem to use it.

Am I missing something or is what they are describing just completely commonplace that is only interesting to people that use microservices and never knew what was normal.

This is how client apps use sqlite, yes. Single instance client apps. Litestream is one method of making sqlite work for server side apps. The hard part on the server is solving for multiple processes/vms/containers writing to one sqlite db.

> the hard part on the server is solving for multiple processes/vms/containers writing to one sqlite db.

I feel like if you have multiple apps writing to the database then you shouldn't be using SQLite. That's where Postgres etc completely earn their place in the stack. Where litestream is really valuable is when you have a single writer, but you want point-in-time backups like you can get with postgres.

We disagree, a little. I feel like that if you have multiple apps writing to the DB, Litestream is pretty close to making sqlite viable for a lot more apps.

interesting, such a weird way to describe it then. but I guess some people are more familiar with that problem.

It's normal (and HN does something similar, working from in-process data) for systems that don't have to scale beyond one server. If you need multiple servers you have to do something, such as Litestream.

a few years back working on Java project. Used H2 instead of postgres, and included H2 db as in application memory access. It speed up quries tremendously. There is just no beating in application db.

Just wait until (some) devs realize they don't even need sqlite, and can serialize their data directly to binary flat files with simple locking synchronization for backups.

I'm half joking but I've witnessed many devs use databases when a binary file will do. I've done this personally for years for most of my 'fits-in-RAM', non-transactional, denormalized datasets, which is almost all of them.

Better yet, use both if you have both types of data. The performance benefits are enormous and well worth the complexity tradeoff in my experience.

That seems exactly opposite to the growing trend of "sqlite-as-application-file-format". There's a lot of nice features you get "for free" doing this, primarily way better consistency, than you do rolling your own binary format.

I don't want to have to deal with locks if at all possible. Binary works fine if each file is atomic, but that does not sound like the case you are advocating.

Who said anything about rolling your own? I'm talking about writing your data models directly to disk, the serialization and deserialization are done for you with a simple annotation.

And in most applications you don't ever have to deal with locks as most applications don't need multiple threads writing and reading, that can be done with a single thread and a lockless queue that multiple threads write to. You would need a lock for making backups of the files themselves but this is trivial and takes the form of error handling. The OS itself handles the lock, you just need to handle the case where you can't open the file for writing.

This approach is not all that rare, lightweight, and very useful for minimizing latency. Why would you ever use a database if you don't actually need the features? It is much simpler to immediately have access to all your data in your application's data models.

The usual use case for a database is that it has multiple users in different places which would be difficult with SQLite. But for other use cases I don't see a problem. I don't know how it scales ad infinitum, but you can manage a lot of data with it and the usual SQL server has limits too. Could be a good choice even before you care about latency.

> Latency is the exact reason you would have a problem scaling any large system in the first place.

Not always. It depends on the architecture and your hosting strategy. I think it’s more likely for an instance of a web app to receive more requests than it can handle, causing the app to not service any requests.

Just the latency is really important to me! I even built an ERP system that has a response time below 100 ms for all operations, it's a design goal.

My thought is that if you can see consumer changes depending on latency (for example on amazon or google) it is equally important for internal tools. Employee time is expensive.

Throughput for a single service / app improves, but does it really scale? Across a cluster, you will have to have data replication and sync routines, that are a whole mess themselves.

The latency is not reduced, it is shifted elsewhere.

I want functions.

Fly is putting together a pretty great team and interesting tech stack. It's the service I see as a true disruptor to Heroku because it's doing something novel (not just cheaper).

I'm still a little murky on the tradeoffs with Fly (and litestream). @ben / @fly, you should write a tutorial on hosting a todo app using rails with litestream and any expected hurdles at different levels of scale (maybe comparing to Heroku).

If only they could keep their website reachable, that would be the icing on the cake. Like every time I see them linked on HN, I click and cannot connect to their website.

Last time somebody from fly said they'd look into it, but alas. It was related to IPv6 on their end, was as far as I could tell.

We have been chasing this down for weeks and can't find the actual bug/workaround here. It's definitely IPv6 related, we think having something to do with weird MTUs. Are you using an IPv6 tunnel or connecting via a vpn by chance?

Weird MTUs: DSL with PPPoE encapsulation, maybe? iirc they tend to use 1492 or 1452 rather than the usual 1500.

Wow that's a trip down memory lane!

It would be 1492 to allow for another 8 bytes of PPPoE overhead, there are some scenarios where 1508 byte frames might be floating around but I don't really know if anyone adopted that standard.

That is exactly what I have.

According to Fly's latest logbook post [1], they now have a workaround for this.

[1]: https://fly.io/blog/logbook-2022-05-13/


Disable ipv6 like everybody does.

Can we cut this shit out? It's 2022, IPv6 has worked fine for decades, and Google is seeing 40% IPv6 usage on their services:


If only IPv6 "just worked". It's better, but still not quite there.

I've been running IPv6 at home for a couple of years now after replacing my pfSense router with OpenWRT. It mostly works. Every now and then there's an issue and, guess what, disabling IPv6 makes things work again.

Latest one would be Android ignoring IPv4 DHCP DNS setting if Android device has a IPv6 address.

Which are mostly bots scanning serp results to prevent rate limits.

How does Vercel fit in? I am having a lot of pleasure using their free tier and would be happy to pay if needed. My only concern is the pricing model being 0/20/Call us. I think clear usage-based pricing plans going 0-infinity should be the norm.

Render is more of the successor IMO. Fly is a bit of a wildcard — they are bleeding edge, certainly, but they seem to shy away from focusing on implementation of some of the “boring” but extremely useful features present in most managed services (e.g., scaling volumes for Postgres)

We're not shying away from "boring" stuff at all. We just have a small team with bigger priorities that's spread too thin. There's a million things like resizable volumes we need to ship and we're aggressively hiring to get them done.

Are the job listings on your site the source of truth, or are there other listings out there? I’ve been keeping my eye out for a senior full stack role but no luck yet.

There's no one "successor to Heroku". The successor to Heroku is a collection of different companies that work well together. What's important is the Heroku idea of what an application is, as a developer-first prospect rather than an ops-first prospect like Kubernetes running on a cloud platform.

Uhm... experience from a large project that used SQLite was that we where hit with SQLite only allowing one write transaction at a time. That is madness for any web app really.

Why do everyone seem so hyped on this when it can't really work properly IRL? If you have large amounts of data that need to be stored the app would die instantly, or leave all your users waiting for their changes to be saved.

What am I missing?

> Uhm... experience from a large project that used SQLite was that we where hit with SQLite only allowing one write transaction at a time. That is madness for any web app really.

"Lots of readers few writers" is an extremely common application property tho. Your average HN has significantly more reads than it has writes, especially if you bump the "worthless" writes (e.g. vote counts) out of using the DB and only flush them to the durable DB one in a while, for instance.

And with SQLite's WAL support it's supported even better: while it still has a singular writer, the writer doesn't block the readers anymore, which is a workload issue in the exclusive case (as the single writer would bring read concurrency down to 0).

Another pattern to avoid "worthless" writes is using statistical sampling. I.e. if you have votes in the >100.000 range, generate a random number p in [0, 1], and only perform a write if p < 0.01 - when reading votes, multiply by 100 etc. Of course, you have to assess individually if its feasible for your operation, hence the "worthless".

Is there something you can point to that explains this "flush them to the durable DB once in a while" pattern in more detail?

Instead of running each small write query as the event occurs (a vote), collect them in memory and at some time interval (1m, 5m, or whatever your UX will tolerate) write them to the DB in batch with one (or a few) writes.

Thank you. This fits a current need of mine perfectly.

Do you have any other helpful tips you’d like to share? Any thoughts between Redis and Memcache?

Not the GP, but if the GP described a scenario that is useful to you, redis can #1 act as the memory cache if the processes themselves don't last long enough for the optimization to be useful, and #2 can act as a cheap message queue to funnel similar requests into a single (or small number of) processes for aggregation before writing to disk.

- Most transactions are read-only

- "Large" applications can usually be sharded by account. This means 1 file per account, and can easily be put on the most optimal geolocation of the account

- If you defer locking until commit, allowing multiple writers ( https://www.sqlite.org/cgi/src/doc/begin-concurrent/doc/begi... ). This is good enough for most applications anayway.

- Sqlite simple, fast enough for almost anything, supports a good set of features and datatypes, is very easy to embed.

Why would I bake all of those assumptions and limitations into my system though just on the hope it won't ever become a problem

because you can move faster and explore you problem domain cheaper and validate your solution earlier. then, if you "struck gold" and happened to arrive to some product market fit, then even if you would need to rewrite big chunks of your solution, to swap out your persistence layer, you have a solid specification to follow, which is your sqlite-based implementation!

such a rewrite is a lot more predictable endeavor, then building the initial solution, that it's a great problem to have :)

meanwhile, your UI don't have to change and a lot of your other glue code or business-logic code don't have to change either, IF you haven't hardcoded direct calls to SQLite everywhere in your program :)

eg. I used HoneySQL with great success! My queries are safely assembled from Clojure data structures and I had a single function, which I used to format them to the desired SQL dialect H2DB/SQLite/MySQL/Postgres, execute them and parse the results back into Clojure data structures and even take care of lazily paginating through long result sets, without burdening the consumer of the data with such details.


Depending on your expected lifecycle of the app, you should use an adapter between the DB and the rest of your app.

Have an interface made up of clear methods which you do all your business through. That way, if you want to change the DB, you only need to rewrite the adapter.

I believe SQLite is a great, low-insallation-needs portable DB you can start your project with. If it turns out your startup gets millions of users, you can afford to switch.

> Depending on your expected lifecycle of the app, you should use an adapter between the DB and the rest of your app.

A sane design, but realise that limitations tend to leak through abstractions.

This questions goes both ways: Why put effort into something that might never even be a problem?

Simpler backups. Simpler restores. Easier multi tenancy. Easier to have data+app near customer. No firewalls. No passwords. No keys.

Why do you assume you'll run into problems? The moment you're running into problems, you better have a good plan with any RDBMS.

How hard is it to migrate if/when.

Quite right it’s not one size fits all but for any site that’s mostly read only it’s a brilliant solution.

Simon Willison has written about it and coined the term “baked data”: https://simonwillison.net/2021/Jul/28/baked-data/

Mozilla.org uses this architecture, Django app running off SQLite with the db rsync’ed to each application server.

The confusion is probably a lot of us work at smaller companies that serve a wide solution to a niche customer, and that kind of app has a lot of reads and writes but doesn't need to scale. This app might be doing the invoicing/shipments/specialist parts of a business for example.

Whereas there is another different kind of Engineering which I probably will never be a part of (simply due to mathematics of available positions doing it) where you are scaling something up for millions of users but the app is much simpler like a Twitter or Reddit, and the challenge is in the scaling.

Even for those small niche apps for businesses there are a huge number that are unlikely to be doing more than the 10-20 write transactions / second that SQLite can support.

Probably on average correct, but there can be bursts.

The default settings of SQLite are very conservative and essentially enforce serial writes. With tuning and loosening that enforcement, you can go from 50 writes per second to 50,000.

Edit: forgot to mention that yes a major part of that is batching writes into fewer, bigger transactions; AFAIK you can't really get around that.


>INSERT is really slow - I can only do few dozen INSERTs per second

>Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second.

>By default, each INSERT statement is its own transaction. But if you surround multiple INSERT statements with BEGIN...COMMIT then all the inserts are grouped into a single transaction.

>Another option is to run PRAGMA synchronous=OFF. This command will cause SQLite to not wait on data to reach the disk surface, which will make write operations appear to be much faster. But if you lose power in the middle of a transaction, your database file might go corrupt.

What am I missing?

Many sites are Read (almost) Only. For sites where users interactively query/view/explore the data, but (almost) never write their own, it works great.

Speaking of this, I really wish there was SQLite support in WordPress...

A blog is the perfect example of where SQLite should be used other a DB server.

If you chuck Varnish in front of it, does it matter what you use?

Edit: was being serious: if your data is that static you can statically generate it. But I get that CMS is convenient so with that caching is where you get the performance win. A blog post either never updates or gets 1 or 2 edits max.

use more than one SQLite file? we have one per day and project for example.

I don't know if you're joking or not, but this would just be reinventing the Postgres/SQL Server/Oracle/MySQL wheel using duct tape and wishes.

If you're doing something that multiple systems have had millions of hours of development to do, just use one of those.

You are wrong. Using multiple SQLite files solves quite many problems, e.g. let's say your user can have multiple products where each product has its own properties. This can be presented as 3 tables in SQL database: users, products, properties. Now with SQLite you can drop one table (users). This simplifies queries, depending on your situation you can use simpler indexes and etc.

Extra benefit: you can allow your user to download all his SQLite db as per GDPR requirement.

How do you join data across hundreds of databases?

And how is the query plan for indexes when doing so?

Using ATTACH. Works a treat. SQLite query planner knows about ATTACH.

I have seen it suggested to use one table per db to minimize lock contention and simplify backups/vacuums. Do you know if there really is nothing to lose with ATTACH spam? Any tradeoffs at all?

I wouldn't design that way. SQLite has a default limit of 10 attached databases and a hard limit of 125.

Having several areas of concern managed by distinct databases, with some occasions which warrant joins across the domains, this is the more natural use pattern. ATTACH loses foreign key constraints and triggers, among other things.

My understanding is that it just works. They're still all in the same process, and SQLite has all the information it needs to do joins. Transactions get more complex but are still atomic.

You don’t.

But why? That seems such an unnecessary hack.

Until we know how they are used, we are just speculating: I for one am curious ;)

Wow Litestream sounds really interesting to me. I was just starting on an architecture, that was either stupid or genius, of using many SQLite databases on the server. Each user's account gets their own SQLite file. So the service's horizontal scaling is good (similar to the horizontal scaling of a document DB), and it naturally mitigates data leaks/injections. Also opens up a few neat tricks like the ability to do blue/green rollouts for schema changes. Anyway Litestream seems pretty ideal for that, will be checking it out!

An architecture like yours has certainly been done before, though AFAIK it never went mainstream. In particular, check out this post from Glyph Lefkowitz of Twisted Python fame, particularly the section about the (apparently dead) Mantissa application server:


Same pattern is ActorDB: https://github.com/biokoda/actordb

I actually did something very similar to this for an app that produced a lot of data. I wrote a small middleware that automatically figured out which shard to use so the app logic could pretend that it was all just one big db. The app ultimately ended up in the can so it never needed to scale, but I always wonder how it would have gone.

Each user's account gets their own SQLite file.

So now you need one database connection per user...

And? It's SQLite; it's a file handle and some cache, not a connection pool.

Depending on how you define "account", that can be quite reasonable. In a B2B application, each business customer could get their own SQLite database, and the number of SQLite connections would likely be quite manageable, even though some customers have many users.

Without knowing details about the app, it's hard to know if that would matter. If a small number of concurrent users would ever be using it, I would think it would be NBD.

If by connection you mean in-process database.

This a great and interesting offering! I think this fits well with fly.io and their model of computing.

I now wish that I had engaged with this idea that was very similar to litestream that I had about a year and half ago. I always thought SQLite just needed a distribution layer to be extremely effective as a distributed database of sorts. Its flat file architecture means its easy to provision, restore and backup. SQLite also has incremental snapshotting and re-producible WAL logs that can be used to do incremental backups, restores, writes etc. It just needs a "frontend" to handle those bits. Latency has gotten to the point where you can replicate a database by its continued snapshots (which is, on a high level, what litestream appears to be doing) being propagated out to object / blob storage. You could even achieve brute force consensus with this approach if you ran it in a truly distributed way (though RAFT is probably more efficient).

Reason I didn't do this? I thought to myself - why in the world in 2020 would someone choose to use SQLite at scale instead of something like Firebase, Spanner, Fauna, or even Postgres? So after I did an initial prototype (long gone, never pushed it to GitHub) I just felt like...there was no appetite for it.

Now I regret!

Just a long winded way of saying, congrats! This is awesome! Thanks for doing exactly what I wanted to do but didn't have the guts to follow through with.

there’s some stuff out there:

- https://github.com/rqlite/rqlite - https://github.com/chiselstrike/chiselstore - https://dqlite.io/

I’m sure there’s more, those are just the ones I remember.

I implemented exactly this setup, in Rust, last year for a client. Distributed WAL with write locks on a RAFT scheme. Custom VFS in Rust for sqlite3 to handle the IO. I asked the client to opensource it but it's probably not gonna happen... It's definitely doable though.

Did you write your own rust raft implementation or reuse something already available?

Reused a well known library that uses raft. I don't know if I should mention any more details since it was a private project.

@dang, the actual title is “ I'm All-In on Server-Side SQLite”

Maybe I missed it but where in the article does it say Fly acquired Litestream?

EDIT: Ben Johnson says he just joined Fly. Nothing about Fly “acquiring” Litestream.


Elsewhere in this thread he says "the project was acquired" which is more or less "Fly.io Buys Litestream" (the submitted title).

I'm honestly not sure whether we should change it or not - minimizing complaints is the goal - what's it called when a function has two points that it keeps unstably jumping between?

that function is correct when it agrees with me.

> Litestream has a new home at Fly.io, but it is and always will be an open-source project. My plan for the next several years is to keep making it more useful, no matter where your application runs, and see just how far we can take the SQLite model of how databases can work.

As far as I understood it, Fly.io hired the person working on Litestream and pays them to keep working on Litestream.

That’s how I understood it and that’s radically different than how this HN post got titled.

Ben Johnson confirms how you framed it here:


We wrote a different title for this blog post, and we did in fact buy Litestream (to the extent that anyone can "buy" a FOSS project, of course).

> (to the extent that anyone can "buy" a FOSS project, of course).

Does this mean that, in addition to offering a salary / options, you provided some sort of additional one-time compensation for copyright assignment?

All the code that has already been written/published already has the FOSS license (in this case APLv2). No take-backsies.

So presumably no, there was not a one-time compensation for copyright assignment.

> All the code that has already been written/published already has the FOSS license (in this case APLv2). No take-backsies.

You do realize that this fact does not entitle you to the copyright of the work, right? It entitles you to use it, modify, redistribute, etc, with continued attribution of the copyright holders.

As such, copyright re-assignment is possible for any code that Ben wrote. And, any contributions are probably a grey area of sorts since there was probably no agreement of copyright assignment when contributing as there is in some projects. Any who.

So if I take code that was released with a license like the APLv2, at any point the person that wrote that code can change the license and then sue me for using the code without permission? That doesn't sound right.

It is my understanding that the contract with you is APLv2, and he can't one-sidedly change that.

However, nothing is guarantied about the future license of the project. The rights holder could only release future versions under a proprietary license. Meaning buying rights assignment is buying the future of the project. Of course everyone forks and leaves if you get to draconian up front, so you have to slow boil that frog.

Not really though right? Copyright assignment applies to the existing code, which is again already licensed as APLv2. Future versions of software could be licensed differently because there would be new contributions. However now that Ben is working for fly.io, the primary way to do this would be to say fly.io is the rightsholder for contributions Ben makes to Litestream while working for the company.

This does not require selling the copyright of existing code, nor does it seem to me like that sale would be beneficial for achieving the above.

>Copyright assignment applies to the existing code, which is again already licensed as APLv2.

? If you have a copy of the code in your hands, then the license that applies to your use of it is APLv2.

The owner of the rights can take the official public project repo private, and either never publish it again or change the license on the official repo to whatever they like.

It doesn't matter if the code used to be APLv2 if you didn't get a copy of it. If you don't have and want the code, and it is now available with N license, you either accept N license, find some unofficial copy of the APLv2 version, or you don't get access to the code.

... That's not at all what this means.

1. I didn't say anything about changing the LICENSE, but it certainly would be possible to change the LICENSE. Typically, you need to get all the copyright holders to agree to it (e.g. all previous contributors), OR, move to a license that ensures all of the previous rights as well (e.g. a compatible license). Additionally, there's a moment when the LICENSE changes. Previous releases would be under the previous license and available indefinitely under those terms (assuming you have the source code at that version).

2. The LICENSE itself has provisions around use of the code. If you fail to adhere to the agreement, then, yes, you could be sued by the copyright holder. Effectively, for APLv2, here is a summary: https://tldrlegal.com/license/apache-license-2.0-%28apache-2...

3. What I did say is changing the _copyright holder_, e.g., the owner. This is the grantor of the LICENSE, who is providing the software / source code (typically gratis, but it doesn't have to be) provided you adhere by the rules stated in the LICENSE. APLv2.0 definitely doesn't give you the right to pretend you wrote the entire thing. In fact, if you redistribute the source code with modifications, the APLv2.0 requires that you include a statement of your significant changes.

I know that's not what it means, which is why I was confused when you initially responded with the comment that you did. I guess partly I was confused by the condescending phrasing you used when nothing I said was incorrect. Any who.

My point was only – why would fly.io pay an appreciable sum to transfer the copyright of code already written when a FOSS license has already been applied to that code? Clearly that was a connection I was making in my head that I failed to write down in my comment.

I guess, fundamentally, the question comes down to: “will we expect lightstream to be developed under the name Ben Johnson, or Fly.io.” This _might_ have implications for what the project becomes.

My intention was not to be condescending, fwiw, so I am sorry for my failure there.

"Litestream has a new home at Fly.io, but it is and always will be an open-source project"

Very bottom of the post. Technically, Litestream remains an open-source project, so it's more accurate to say that Fly.io acquired the brand IP and the owner of that IP.

The tweet[1] links the blog post and says Litestream is part of fly.io now.


This is interesting! I like using Fly.io today, but I’m currently using a single node for most stuff with SQLite. Having some kind of failover and replication would be pretty awesome. I have yet to try Litestream and it does sound like there’s some issues to work out that could be pretty nasty, but I’ll definitely be watching.

Fly.io is very nice. It’s what I hoped Hyper.sh would be, except it isn’t dead. That said, there are a couple things I worry about… like, there’s no obvious way to resize disks, you pretty much need to make a new disk that’s larger, launch a new instance with it mounted, and transfer data from an existing instance. If it was automated, I probably wouldn’t care, though a zero downtime way of resizing disks would be a massive improvement. Another huge concern is just how good the free tier is. I actually am bothered that I basically don’t get billed. Hyper.sh felt a bit overpriced, and by comparison Fly.io does scale up in price but for small uses it feels like theft.

> there’s no obvious way to resize disks

Yes, this sucks right now. Resizable disks is on our list, we just need somebody to spend a few days on it. Luckily we're hiring platform engineers [1] to work on fun problems like that.

> I actually am bothered that I basically don’t get billed.

We actually had a bug that skipped charging a bunch of accounts. :) Regardless, we're not overly concerned about making $1/mo from small accounts. Large customers more than make up for it. Turns out building something devs _choose_ to use on their free time often leads to using it at work too.

[1] https://fly.io/jobs/platform-product-engineer/

> Yes, this sucks right now.

If I may, really need to hire sudhirj back or get someone doing the tedious work of answering dumb/advanced questions in the forums and doing follow-ups! Even if it doesn't scale, this high-touch forum engagement may not only help inform the product roadmap but help eventually cultivate a stronger community.

Dqlite is also interesting, and in a similar space. It seems to have evolved from the LXC/LXD team wanting a replacement for Etcd. It's Sqlite with raft replication and also a networked client protocol.


There's also rqlite. There's definitely a place for this kind of stuff. But we already use a bunch of stuff that does distributed consensus in our stack, and the experience has left us wary of it, especially for global distribution. We almost used rqlite for a statekeeping feature internally, but today we'd certainly just use sqlite+litestream for the same kinds of features, just because it's easier to reason about and to deal with operationally when there's problems.


rqlite author here. Anything else you can tell me about why you decided against it? Just simpler, as you say, to avoid a distributed system when you can (something I understand).

We like rqlite a lot. There's some comments in your issue tracker from Jerome about it at the time. The decision wasn't against rqlite as a piece of software so much as it was us deliberately deciding not to introduce more Raft into our architecture; any place there is Raft, we're concerned we'll essentially need to train our whole on-call rotation on how to handle issues.

The annoying thing about global consensus is that the operational problems tend to be global as well; we had an outage last night (correlated disk failure on 3 different machines!) in Chicago, and it slowed down deploys all the way to Sydney, essentially because of invariants maintained by a global Raft consensus and fed in part from malfunctioning machines.

I think rqlite would make a lot of sense for us for applications where we run multiple regional clusters; it's just that our problems today tend to be global. We're not just looking for opportunities to rip Raft out of our stack; we're also trying to build APIs that regionalize nicely. In nicely-regionalized, contained settings, rqlite might work a treat for us.

While the title is about a business acquisition, the article is mostly about the technology itself -- replicating SQLite, suggested as a superior option to a more traditional separate-process rdbms, for real large-scale production workloads.

I'd be curious to hear reactions to/experiences with that suggestion/technology, inside or outside the context of fly.io.

In which scenario would you use litestream[1] vs rqlite[2]?

1 - https://github.com/benbjohnson/litestream

2 - https://github.com/rqlite/rqlite

rqlite author here. The way I think about it is that both systems add reliability to SQLite, but in addition rqlite also offers high-availability. Another important difference is that Litestream does not require you to change how your application interacts with the SQLite database, but rqlite does.

Another way I think about it (I'm sure Ben may have other ideas!) is that if you want to add a layer of reliability to a SQLite-based application, Litestream will work very well and is quite elegant. But if you have a set of data that you absolutely must have access to at all times, and you want to store that data in a SQLite database, rqlite could meet your needs.

Check out the rqlite FAQ for more.



Litestream author here. I agree with Philip. Litestream relaxes some guarantees about durability and availability in order to make it simpler from an operational perspective. I would say the the two projects generally don't have overlap in the applications they would be used for. If your application is ok with the relaxed guarantees of Litestream, it's probably what you want. If you need stronger guarantees, then use rqlite.

Agreed, they generally solve different problems. It's important to understand that rqlite's goal is not to replicate SQLite per-se. Its primary goal is to be the world's "easiest to operate, highly-available, distributed relational database". :-) It's trivial to deploy, and very simple to run. As part of meeting that goal of simplicity it uses SQLite as its database engine.

Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact