Hacker News new | past | comments | ask | show | jobs | submit login
Storage and transaction in mvSQLite, the distributed SQLite on FoundationDB (univalence.me)
79 points by losfair on Sept 24, 2022 | hide | past | favorite | 17 comments



What's the use case for something like this? I'd think that instead of building a distributed version of SQLite on top of another database engine, you'd just choose a distributed database in the first place. Are there benefits to the way SQLite works that outweigh the choice to just use some other flavor of SQL?


FoundationDB is an open-source transactional KV store where you can write to any replica, and continue writing even if a certain number of replicas are lost. The transactions maintain strict serializable semantics, meaning that transactions happening at the same time against different nodes will be consistent. (You can't read bank account A's balance, add to it, and then write it back if someone else committed a change to the bank account balance, even if that happened on another node.) This is honestly a pretty rare property in distributed databases, and even in databases that claim to implement it, it is ridiculously easy to get wrong. (Reading the analyses here are fascinating: https://jepsen.io/)

One of the many things that SQLite brings is the ability to treat arbitrary storage as a relational database that can be queried (and mutated) with SQL.

Both of these things are generally rare in open source databases. There are very few multi-master distributed databases that have ACID transactions, and even fewer of them have SQL support. So marrying these two technologies is extremely compelling; you get the power of SQLite's relational data layer with the power of FoundationDB's distributed transactions. This is really high tech stuff for free!

A different implementation of similar semantics is rqlite. rqlite uses the Raft consensus protocol to replicate the text of your database writes to each node, which commits it to a local on-disk sqlite database. I haven't actually looked at mvSQLite enough to know how different the approach is, but both should be on everyone's list to take a look at and poke at. (For most "work" you'll be doing, it's probably completely fine to not create a distributed system like these; run in one datacenter, run a monolithic RDBMS, and tell users you're down when a tornado flattens the server racks. Do keep some offsite backups either way, though. Your application code will always have bugs, and these databases will happily replicate an accidental delete to 5 data centers at the speed of light.)


so FoundationDB is user as a file system for storing SQlite files?

In this case, the write contention performance would not be much better than using SQLite over SMB.


Unlike a regular filesystem, mvsqlite offloads both storage and transaction processing to FoundationDB. Fine-grained optimistic locking is implemented, and massive write concurrency across a cluster is possible.


You should read up on FDB, I doubt you’ll have issues like that until extreme scale. I guarantee it isn’t just like sharing a file over SMB, lol.


I have zero issue with FDB but some worry about mvSQLite.

After reading more about the design it seem to use FDB to store Pages and using FDB transaction to make those update have have Strict serializable isolation.

This would work correctly but it seem that mvSQLite also try to do its own MVCC logic instead of simply using FDB transactions.

It is well known that implementing Serializable transaction on top of MVCC is not trivial. There was a bug in Postgresql from version 9.1 to version 12.3 making it not truly Serializable.

See: https://jepsen.io/analyses/postgresql-12.3


The idea that mvSQLite is based on (decoupled storage and compute) has been implemented for the other two most popular SQL databases. MySQL and PostgreSQL have Aurora and AlloyDB, and PostgreSQL additionally has Neon. This is something missing from the SQLite ecosystem (before mvSQLite).

And SQLite has a unique advantage when implemented on top of decoupled storage and compute. Decoupled MySQL and PostgreSQL can't really linearly scale from zero in a serverless use case: you need to spin up a compute instance when you want to do queries. This isn't necessary with mvSQLite, because the embedded SQLite library can just connect directly to the multi-tenant storage engine (mvstore).


> This is something missing from the SQLite ecosystem (before mvSQLite)

This is also something missing from self hosted, open source, SQL rdbms.

As far as I'm aware, this is the closest thing to Aurora and AlloyDB that I can run myself.

Also, by leveraging FoundationDBs ability to have multi-writer deployments, it gives us the potential to scale writes far beyond what Aurora and AlloyDB can handle. I'd love to see an extreme benchmark comparing a single 24xlarge Aurora writer vs a few dozen nodes in FoundationDB. The Aurora writer is likely more efficient per CPU but ultimately can't scale past 96 cpus.


Is it possible to do read replicas with this? Would that be a FoundationDB level concern?


You can manually configure the replication factor in FoundationDB and the client is intelligent enough to load balance across them.

Three data hall is R.F == 3

Three datacenter is R.F == 6

You can configure it higher manually as well, we have a few test clusters we configured to 9 and it seemed to work fine.


Could this be used to synch an in browser ephemeral SQLite to a server?


The article mentions it not being bug-free, what are some of the bugs?


Software that is not formally verified contains bugs.

All known correctness bugs are fixed (except experimental features) - this is just a reminder that you need more than one layers of defense for something that stores your critical data.


Normally when people say "it's not bug free" they mean that there are important known bugs that haven't been fixed.

Otherwise it is a pretty meaningless thing to say. Not even formally verified software is definitely bug free.


How does this compare to Mycelial?


what is mycelia’s?


https://mycelial.com/

It is another distributed SQLite database solution, with emphasis on making an offline-first focus.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: