Who is doing this and where can I read more? What are the tradeoffs?
I imagine that you get a a dataset that is significantly smaller but it is much trickier to keep a dataset in memory the way you could with MySQL.
It's like having a free implicit index on the customer (because you had to lookup the sqlite db file before you could start querying).
I spend a lot of time thinking about tenancy and how to handle it. Tenancy is such a common problem.
Performance is the number one reason tickets are hard to estimate. The second in my experience is security.
Time and tenancy are the number one opportunities for SQL to just be better (I always need tenancy and my Order By or at least one constraint can typically be satisfied with time).
I'm doing it, though I haven't written anything up. Happy to share my opinion though, with a bit more experience than you have.
The databases I'm working with are pretty small - ballpark 4MB of data per "tenant". So, I guess, a single large database sever with half a terabyte of RAM could keep well over a hundred thousand tenants in memory at the same time (I don't have anywhere near that many tenants, so I haven't tested that... and honestly if I did have that many I'd probably split them up between different servers).
Without getting stuck into the into too much detail - "tenant" isn't really a good fit for how we split them up. Our business is largely based on events that happen at a specific date, with maybe a few months of activity before that date. We have an sqlite database for each event (so ~4MB per event). Once the event passes, it's essentially archived and will almost never be accessed. But it won't actually never be accessed so we can't delete it.
I haven't run into any performance issues so far, just with regular sqlite databases on the filesystem. I expect the kernel is doing it's thing and making sure "hot" databases are RAM as with any other frequently accessed file on the disk.
My understanding (it's a theoretical problem I haven't actually encountered...) is SQLite only really struggles when you have a bunch of simultaneous writes. Our business model doesn't have that. The most actively written table is the one where we record credit card payments... and unfortunately we don't make tens of thousands of sales per second.
If we did have that "problem" I'm sure we could allocate some of our billions of dollars per day in profits to finding a way to make it work... my gut instinct would be to continue to use SQLite with some kind of cache in front of it. All writes would go to something faster than SQLite, then be copied to SQLite later. Reads would check the write cache first, and SQLite if the cache misses.
My experience working with a single large database is you end up with a lot of stale data that you is almost never needed. When a table has a hundred million rows, with indexes on multiple columns, even the simplest operating like adding a new row can get slow. My approach with SQLite eliminates that - I'll often have just hundreds of rows in a table and access is blazingly fast. When I need to access another database that hasn't been touched in a long time (years possibly), having to wait, what, an entire millisecond, for the SSD to load that database off the filesystem into memory isn't a big deal. No user is going to notice or complain.
Obviously that's more challenging with some data sets and if you're constantly accessing old data, those milliseconds will add up to significant iowait and things will fall over. I definitely don't use SQLite for all of my databases... but in general if you're doing enough writes for SQLite's simultaneous write performance issue to be a problem... then chances are your data set is going to get very large, very quickly, and you're going to have performance headaches no matter what database you're using.
Finding some way to divide your database is an obvious performance win... and SQLite makes that really easy.
I imagine that you get a a dataset that is significantly smaller but it is much trickier to keep a dataset in memory the way you could with MySQL.
It's like having a free implicit index on the customer (because you had to lookup the sqlite db file before you could start querying).
I spend a lot of time thinking about tenancy and how to handle it. Tenancy is such a common problem.
Performance is the number one reason tickets are hard to estimate. The second in my experience is security.
Time and tenancy are the number one opportunities for SQL to just be better (I always need tenancy and my Order By or at least one constraint can typically be satisfied with time).