Hacker News new | past | comments | ask | show | jobs | submit login
LiteTree: SQLite with Branches (github.com/aergoio)
343 points by kroggen on Aug 29, 2018 | hide | past | favorite | 75 comments



Fossil[1] is a SCM system (like git) created by the very same author of SQLite (D. Richard Hipp). It uses SQLite as its database and implements versioning and branching[2] and even merging (which LiteTree doesn't do) on its own, by recording the changes on each item on a separate table.

This approach is more complex to implement but a lot more versatile and flexible. Most of times you wouldn't want to version or branch the whole database, but only parts of it.

[1] https://www.fossil-scm.org

[2] https://www.fossil-scm.org/index.html/doc/trunk/www/branchin...


I had the same thought.

So, I was wondering why they didn't use fossil itself


> So, I was wondering why they didn't use fossil itself

Fossil is a SCM for files. LiteTree is a branching RDBMS.


Thanks for posting this. My first thought was - has this been sent through the official SQLite battery of tests? If so, have the tests been adapted to validate branches, rapid branch switches, branching under failure conditions (malloc fails, power outages, etc) and concurrent access patterns?

One of the reasons why SQLite is so widely used is that it is carefully tested and shown to be reliable even in potentially faulty conditions. As detailed on https://sqlite.org/testing.html, there are three test sets, one of which is public (the TCL set). I’d love to see test results to assure the safety of any data stored in LiteTree.


Hi! It was not tested with the official SQLite public tests yet, but this will happen in the next month. Some tests will need to be adapted (maybe a lot) so it may take a while. But stability and reliability is our focus as it will be included in the Aergo blockchain (https://www.aergo.io/).


Are you sure the SQLite tests are public?

I went looking for them a while back, but didn't find them.

It's completely possible I missed them though. :)


https://www.sqlite.org/src/dir?ci=tip

In the generated makefile there should be the tests targets, I guess. In the Makefile.in I see the targets : tcltest, quicktest, test, valgrindtest, smoketest and many others


Excellent, thank you. :)


> LiteTree is more than TWICE AS FAST than normal SQLite on Linux and MacOSX!!!

In my experience, claims like these usually end up showing that the author didn't understand the `PRAGMA synchronous` setting at all, or they chose to ignore it to juice their stats.

In this benchmarking test are the data durability guarantees the same for both LiteTree and vanilla SQLite?


I also read this sentence and my immediate thought was that I would only use three exclamation marks in the documentation for a database engine to say something like "TWICE AS MUCH test coverage!!!"

If I want a fast database, I will just write my data to /dev/null. It has well-understood data durability guarantees and is very quick to write to.


I prefer /dev/urandom or /dev/zero. They have a much higher likelihood of returning the data I wrote.


True! For maximum I/O efficiency, obviously you write to /dev/null and read from /dev/urandom.


Just how I like my CQRS!


No, no, /dev/urandom is really slow...


> In this benchmarking test are the data durability guarantees the same for both LiteTree and vanilla SQLite?

Who cares if you get more bongoliomarks.


Looks like the speed comes from using lmdb rather than sqlites own storage layer.


I first read this as IMDd for Internet Movie Database and got confused, but probably you mean LMDB for Lightning Memory-Mapped Database, which makes much more sense. :)


lmdb != Imdb

Check your fonts if those two look the same.


I wonder if they got the inspiration from https://github.com/LMDB/sqlightning


But, AFAIK, lmdb is not write optimized (as opposed to something that uses LSM trees like levelDB).


Neat. I'll have to compare this to my own implementation.

https://github.com/cannadayr/git-sqlite

Instead of storing the transactions as a separate lmdb commit, I decided to store the database in a git repository and expose the diffs using sqlite's sqldiff utility. This allowed my workflow to be almost unchanged and limits the dependencies to git, sqlite, sqldiff, & bash.


Nice! I did not know your repo. I guess they have different use cases. Yours is good to store the db inside of a git repo. I had only read about storing the dump of SQL commands on the repo and then use the git diff. But it requires dumping and rebuilding the db each time. LiteTree focuses on very big databases, in which making a file copy is not feasible on a branch creation.


Yes, I saw similar ideas ~ a year ago, & figured it was possible. It was just a matter of using sqlite's (excellent) sqldiff utility and wrapping a custom git diff driver. Took about an evening to get the proof-of-concept working and a little more fiddling for ease of use.

I don't have stress test results, but it should be similar to git. I think I remember getting it up to several hundred megabytes at one point and it was fine. I mostly use it for smaller sets of highly relational data that I want to track like I would source code.

By leveraging git & sqlite it lets me avoid writing a network sync implementation, architecture specific code, or patching any C code to recompile.


Yeah, I will use the sqldiff code internally to implement diff between 2 branches. Excellent work from Richard Hipp and its team indeed.

And thank you for your work! I may use it someday.


yup, np. shoot me an email sometime, looks like we're on a similar page.


There has been earlier work on getting git-style branched versioning on top of databases. For relational databases, OrpheusDB (http://orpheus-db.github.io/) puts a layer over PostgreSQL. They also supply a gRPC layer for interacting with the server.

For key-value systems, there are simple techniques for adding branched versioning to key-value (particularly ordered key-value) stores. We are using it for our research dataservice that holds 25+ TB of Connectomics data, which includes 3d image and segmentation data (http://dvid.io). Our paper is currently under review but should have been out several years ago :) We can use a variety of key-value storage backends and are experimenting with versioned relational DBs, so I'll definitely give LiteTree a look.


Thanks for posting this, it's useful to understand where LiteTree sits in context.


Can anyone elaborate a use case for something like this? I’m guessing there’s some blockchain connection but it’s not immediately obvious


You can use it for "what if" analysis. - Take a fork of the database - Transform data in the fork. i.e. costs+5% - Point your current queries to run on the fork

You can see how with just the effort repointing queries a reporting app could show the real world and the world you are modelling.


That’s helpful, maybe I’m naive but isn’t that like an Excel workload? I haven’t seen scenarios where the entire database needs that kind of functionality. Can you elaborate on what kind of data and business problem one would be working on where they need that level of robust versioning?


Uses cases for a versioned in branches data store that are not blockchain related.

0) The most obvious use case is to replace git, because git proper doesn't handle big files very well, let alone bigger than RAM structured data. Mind the fact you might still need an ad-hoc solution for binary data. WT does have a limitation on value size.

1) There is various data science uses cases, like sharing data [0] or A/B testing stuff.

2) Anywhere you need an audit trail .ie you need to look back at previous states, whether it is for debugging purpose or domain logic

versioned datastore is a compromise between event sourcing (pure log) and non-versioned databases.

[0] Collaborative Open Data versioning: A Pragmatic Approach Using Linked Data

[1] https://en.wikipedia.org/wiki/Audit_trail

edit: cosmit.


A real-life situation I ran into:

3) Take a copy of the data store offline, and later merge it back into the online database

Manually resolving conflicts is fine. The alternative is to rebuild a system on Couchbase/pouchdb, or rewrite on event sourcing, neither of which - when the system otherwise lends itself to RDBMS - is my choice.


Does the OP support merging? I didn’t see that.


No it doesn't, which seems to eliminate a lot of use cases. Merging is listed in the nice-to-have-in-the-future section.


Data modeling under the many worlds interpretation.


Is the function similar to PostgreSQL's deprecated "Time Travel" https://www.postgresql.org/docs/6.3/static/c0503.htm ?

AFAIK this can be a foundation for some form of Snapshot Isolation https://www.sqliteconcepts.org/SI_index.html (?)


Well, we can read the database at any point-in-time by selecting the branch and commit number. So it is similar to a snapshot. And yes, it is a type of Snapshot Isolation or MVCC (Multi Version Concurrency Control) but more than that because we can have concurrent readers on different branches.


How do you find the branch associated with a specific point in time? Didn't see that in the docs.

Also, are there guarantees that no two branches can be created at the exact same point in time?

Thanks for the excellent work! I can actually see some use cases for this in one of my side project. :-)


I guess I wrote it wrong. The access is made by specifying the branch and desired commit number (integer value). It is not done by a date-time value. Although adding metadata (like date-time) can be supported if required by the users.


If merge gets supported than it could serve as an alternative for program development -- using tables to store function definitions, constants, etc. instead of using flat files.


I think it could be pretty cool for concurrency.

Branch off, do some queries, inserts/updates and then merge back in


Could you expand on the benefits to this?

Basically, git-sqlite already supports merges (see link above)


I am looking for exactly for this kind of implementation for my work project - having a DB using version control model.

However I need a production ready solution.

There is also: https://github.com/attic-labs/noms But the project does not seem mature enough.

Do you know if there is any way to achieve this with an aim for production? What would be the best way/stack to get this result with current available tools?


Good to know your interest! LiteTree is implemented with production use in mind. It will be used in the Aergo blockchain (https://www.aergo.io/). New tests will be released in the next weeks.


I'm looking at this will little knowledge of how this makes the blockchain application easier. What seems odd to me is that merging branches isn't supported? So you can't perform a bunch of "transactions" and then merge them back into your master state. Maybe someone could illuminate the purpose this solves a little more clearly, as I'm guessing it has nothing to do with my naive understanding.


This is apparently intended to be used for a blockchain that contains SQL transactions in the blocks [1]. As there can be multiple 'heads' of a blockchain (before the network settles on a longest chain), there is a chance the head you picked as being the currently 'longest chain' eventually will not be on the longest chain at all. Hence you'd need the ability to go back to a point in time and execute the queries for another head, should it become the longest one.

This also explains why merging is not included (it is simply not needed for this application).

I did an SQL-on-blockchain a while ago [2][3] and ran into this problem as well. I solved it by not committing the last N transactions, and re-executing them for each read query in a transaction. This obviously is not a very efficient solution (and still runs the risk of having to re-execute the whole chain of queries in case a chain split occurs further than N blocks before the current head).

[1] https://github.com/aergoio/aergo [2] https://catena.one [3] https://github.com/pixelspark/catena


Bitcoin-like block chains don’t merge.


We would need some rules for how to resolve collisions. If I change a record in two different ways, which one wins?


That is up to the upper layers to decide which record wins.

In my implementation, the default is to take the records from the "other" branch because you want 'devel' in 'master' not the other way around. Not sure I am clear. LMK.


Interesting project. How do you achieve theses performances ?


I guess that LMDB is the root of the outstanding peformance. Check it out, it is an innovative key/value database that uses a memory-mapped db file, the OS cache, zero-copy on reads and copy-on-write.


Are you really comparing the same thing, then? Do you have the same consistency guarantees if the system shuts down unexpectedly?


When I saw the results I did not believe, so I reviewed the benchmark code. It is basically the same SQL commands being run in normal SQLite and with the branching enabled.

Yes, LMDB is proved to be safe. It is used in many apps and even other DB implementations, including Monero.

What I have not tested is SQLite with WAL (Write-Ahead Log). In this case it may be faster than the default journal mode.


Yes, it would be nice to see results with WAL, as it's significantly faster in most cases.


Maybe libmdbx would be interesting too.

https://github.com/leo-yuriev/libmdbx


According to Wikipedia[1], SQLightning -- a port of Sqlite using LMDB -- was 20x faster than original sqllight. It could thus be interesting to compare LiteTree with SQLightning.

[1] https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Databa...


Indeed. But the last version of SQLightning is using SQLite 3.7. It would be useful to have an updated version. I was going to update it but then I decided to implement LiteTree in another way, using pages instead of rows. This is because our customers at Blocko/Aergo prefer higher performance over small storage.


Very interesting stuff!

Is it possible to see a history of a column, table, schema, etc? Is it possible to tag a certain point in time?

It would be liberating for many schema designs that we could just change stuff and be sure that the database knew what was changed and when with the ability to roll changes back.


Looking at the README it's not clear how indexes are managed. Like when we create a branch and add some data to an existing table and move back to a previous branch and try to add data with the same index keys ?


If you create an index on a branch it will only be visible to the child branches starting from the point where the index was created. I encourage you to clone and test it. It is easy to compile, and it's fun! I will make a video in the next weeks.


Interesting, I implemented something similar a long time ago, have to see if I can dig up the source code. The goal was to support forking data without duplicating unchanged data.


Yeah, this is also used in virtual machines disk images (I guess)


This looks great. Thank you for creating it and sharing it


Why did you choose LMDB among leveldb, wiredtiger and bsddb or even gdbm?

It seems like you do not rely on range queries at all.


Because of the performance. Have you seen the results?

Memory-mapped file database with zero-copy on reads, just returning the pointer to the data on memory. LMDB is awesome!


Interesting. The branches could solve the "date-effective" table designs. In the past I had used Git as a database to store multiple versions of a document efficiently.

Or this could be used as some elementary partitioning logic where each branch is effectively a partition.


The use case seems to overlap with noms dB - https://github.com/attic-labs/noms

Noms doesn't have the appeal of SQL, but it is versioned and forkable and strongly typed data.


A few other and I are attempting to use noms specifically as a versioned database. I haven't seen anything else that can do everything that noms does. Its only problem is that it's really a database engine, not a full fledged database... so some development work is needed to get it usable within a project.


This is interesting and I hope I can find a use case for it. However, the performance compared to vanilla SQLite makes me anxious that there is a trade-off elsewhere, such as crash integrity.


LMDB is crash-proof and its speed is unbelievable! I hope LiteTree will be too. The trade-off is storage space as all the history is recorded. But is uses way less space than copying the entire db on each branch creation.


> LiteTree is implemented storing the SQLite db pages on LMDB.

Why are you doing it like that? Does it lead to some limitation of some sort? Like making merge very costly?


It is way easier to store the pages in a key/value db than storing them in separate files. And I don't need to implement concurrency access control as it is dealt by LMDB.

It leads to an easier implementation and robustness, as the main data processing is done by SQLite and LMDB. LiteTree is something like a glue, and the code is simple. What leads to better maintainability.


great stuff :) this is innovation :)


Cool project thanks for sharing your work. There's an older project using lmdb (which doesn't support branching or anything, just for storage)...is litetree's usage of lmdb comparable to what sqlightning does? How does litetree work with the write-ahead log? How do multiple concurrent connections interact? Are multiple writers allowed? Can readers and writer(s) coexist?


In SQLightning the database rows are stored on LMDB instead of writing them to SQLite's B-tree.

In LiteTree the database pages are stored on LMDB instead of writing them to the WAL.

LiteTree has MVCC, that comes from LMDB and SQLite's WAL. It can have many readers and one writer at the same time.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: