
Show HN: VersionDB – A key, value store inspired by Git - josephsweeney
https://github.com/josephsweeney/versionDB
======
josephsweeney
Creator here, this project is still in an early state, but if anybody has any
questions I'd be happy to answer.

------
iovoid
Why choose SHA1 and not something that is collission-resistant like SHA256 or
SHA3?

~~~
josephsweeney
Mainly because SHA1 was convenient, but also Git uses SHA1. See this Linus
rant:

[https://marc.info/?l=git&m=148787047422954](https://marc.info/?l=git&m=148787047422954)

Most of that argument applies, but if it ever becomes a problem, we should be
able to move to something like SHA256 fairly easily.

~~~
iovoid
git creators refuse to migrate because they selected sha1 in the start and
because of backwards compatibility its harder to just change it. Also git is a
situation where its harder to get a maintainer to push your binary blob. In a
database, its more probable that a user includes malicious data. The hash used
is not so easy to change, unless you are willing to make the change not
backwards-compatible (break existing DBs)

~~~
josephsweeney
You're definitely correct. This project is still in its early stages so no one
is really using it yet, so its easy in the sense that I just have to change
the hashing algorithm. No need to worry about backwards compatibility.

------
aennyta
This sounds very interesting. Which database is it - mongoDB, mySQL,...? It is
not very clear.

~~~
josephsweeney
It actually doesn't use another database. Just uses plain old files for
storage.

It takes after Git where it stores each piece of data in a file with the name
of the file as the hash of the data.

~~~
aennyta
So does it export the database in file and then version controls that file?

~~~
josephsweeney
There actually isn't any database outside of a directory of files. The version
control is done the same way that Git works under the hood but written from
scratch in C.

Essentially, we have a database directory with two sub-directories, refs and
objects. In refs we have a file for the id of each piece of data stored. The
id file contains the hash of the latest commit for this id. A commit is just
another file that contains a time, a hash of the previous commit, and the hash
of the data.

The objects directory stores all the data and the commits, with each entry's
filename being the hash of its data and its contents as the data stored.

So all we're doing is making a linked list where each entry points to a
different version of that data. No external database or version control
needed.

