This approach is more complex to implement but a lot more versatile and flexible. Most of times you wouldn't want to version or branch the whole database, but only parts of it.
So, I was wondering why they didn't use fossil itself
Fossil is a SCM for files. LiteTree is a branching RDBMS.
One of the reasons why SQLite is so widely used is that it is carefully tested and shown to be reliable even in potentially faulty conditions. As detailed on https://sqlite.org/testing.html, there are three test sets, one of which is public (the TCL set). I’d love to see test results to assure the safety of any data stored in LiteTree.
I went looking for them a while back, but didn't find them.
It's completely possible I missed them though. :)
In the generated makefile there should be the tests targets, I guess. In the Makefile.in I see the targets : tcltest, quicktest, test, valgrindtest, smoketest and many others
In my experience, claims like these usually end up showing that the author didn't understand the `PRAGMA synchronous` setting at all, or they chose to ignore it to juice their stats.
In this benchmarking test are the data durability guarantees the same for both LiteTree and vanilla SQLite?
If I want a fast database, I will just write my data to /dev/null. It has well-understood data durability guarantees and is very quick to write to.
Check your fonts if those two look the same.
Who cares if you get more bongoliomarks.
Instead of storing the transactions as a separate lmdb commit, I decided to store the database in a git repository and expose the diffs using sqlite's sqldiff utility. This allowed my workflow to be almost unchanged and limits the dependencies to git, sqlite, sqldiff, & bash.
I don't have stress test results, but it should be similar to git. I think I remember getting it up to several hundred megabytes at one point and it was fine. I mostly use it for smaller sets of highly relational data that I want to track like I would source code.
By leveraging git & sqlite it lets me avoid writing a network sync implementation, architecture specific code, or patching any C code to recompile.
And thank you for your work! I may use it someday.
For key-value systems, there are simple techniques for adding branched versioning to key-value (particularly ordered key-value) stores. We are using it for our research dataservice that holds 25+ TB of Connectomics data, which includes 3d image and segmentation data (http://dvid.io). Our paper is currently under review but should have been out several years ago :) We can use a variety of key-value storage backends and are experimenting with versioned relational DBs, so I'll definitely give LiteTree a look.
You can see how with just the effort repointing queries a reporting app could show the real world and the world you are modelling.
0) The most obvious use case is to replace git, because git proper doesn't handle big files very well, let alone bigger than RAM structured data. Mind the fact you might still need an ad-hoc solution for binary data. WT does have a limitation on value size.
1) There is various data science uses cases, like sharing data  or A/B testing stuff.
2) Anywhere you need an audit trail .ie you need to look back at previous states, whether it is for debugging purpose or domain logic
versioned datastore is a compromise between event sourcing (pure log) and non-versioned databases.
 Collaborative Open Data versioning: A Pragmatic Approach Using Linked Data
3) Take a copy of the data store offline, and later merge it back into the online database
Manually resolving conflicts is fine. The alternative is to rebuild a system on Couchbase/pouchdb, or rewrite on event sourcing, neither of which - when the system otherwise lends itself to RDBMS - is my choice.
AFAIK this can be a foundation for some form of Snapshot Isolation https://www.sqliteconcepts.org/SI_index.html (?)
Also, are there guarantees that no two branches can be created at the exact same point in time?
Thanks for the excellent work! I can actually see some use cases for this in one of my side project. :-)
Branch off, do some queries, inserts/updates and then merge back in
Basically, git-sqlite already supports merges (see link above)
However I need a production ready solution.
There is also:
But the project does not seem mature enough.
Do you know if there is any way to achieve this with an aim for production? What would be the best way/stack to get this result with current available tools?
This also explains why merging is not included (it is simply not needed for this application).
I did an SQL-on-blockchain a while ago  and ran into this problem as well. I solved it by not committing the last N transactions, and re-executing them for each read query in a transaction. This obviously is not a very efficient solution (and still runs the risk of having to re-execute the whole chain of queries in case a chain split occurs further than N blocks before the current head).
In my implementation, the default is to take the records
from the "other" branch because you want 'devel' in 'master' not the other way around. Not sure I am clear. LMK.
Yes, LMDB is proved to be safe. It is used in many apps and even other DB implementations, including Monero.
What I have not tested is SQLite with WAL (Write-Ahead Log). In this case it may be faster than the default journal mode.
Is it possible to see a history of a column, table, schema, etc? Is it possible to tag a certain point in time?
It would be liberating for many schema designs that we could just change stuff and be sure that the database knew what was changed and when with the ability to roll changes back.
In LiteTree the database pages are stored on LMDB instead of writing them to the WAL.
LiteTree has MVCC, that comes from LMDB and SQLite's WAL. It can have many readers and one writer at the same time.
It seems like you do not rely on range queries at all.
Memory-mapped file database with zero-copy on reads, just returning the pointer to the data on memory. LMDB is awesome!
Noms doesn't have the appeal of SQL, but it is versioned and forkable and strongly typed data.
Or this could be used as some elementary partitioning logic where each branch is effectively a partition.
Why are you doing it like that? Does it lead to some limitation of some sort? Like making merge very costly?
It leads to an easier implementation and robustness, as the main data processing is done by SQLite and LMDB. LiteTree is something like a glue, and the code is simple. What leads to better maintainability.