
Git as a NoSQL datastore - joshfraser
https://github.com/pauldowman/gitmodel
======
trjordan
I want ACID in my databases:

    
    
      - Atomic - 

Yep, but done with a global lock.

    
    
      - Consistency - 

Yep. Since it's NoSQL, there's no referential checks, so this is easy. Git
itself probably wants index / commit tree has to be readable, and all
attributes.json files be valid. I don't see any reason it wouldn't be.

    
    
      - Isolated - 

Sure, works fine, due to that global lock.

    
    
      - Durable - 

Check. Once it's written, it's written.

So, git doesn't get you any of this for free, and getting Atomic / Isolated is
done by introducing a big ugly global lock. I guess I don't see the appeal of
using git (or even a model like git) because it doesn't get you any
interesting necessary features for free. I see the backup and history thing as
a secondary feature -- neat, but not worth sacrificing ACID for.

That said, cool idea -- there's not a lot of code you need to get to a
reasonably small codebase.

~~~
po
Maybe you should just think of it as the NoSQL version of sqlite3 but with
added version control.

~~~
jemfinch
What does that even _mean_?

~~~
JoachimSchipper
I think "easy to set up" combined with "less crappy than expected". Although
SQLite isn't very crappy at all...

------
JeffJenkins
I wrote code to do this in python for a project. The biggest problem I ran
into was that if you wanted the full history of X/Y/Z.json to contain the full
history of Z.json, even if it had been moved, it ended up requiring two
parallel structures. One structure with directories for the tree data, one for
the raw data of that node.

My intent was to use it in a document system using Operational Transforms,
which side-stepped the issue of concurrent access; only the canonical
representation of client data would need to be written, so it was serial
writing

------
derefr
Huh, I was half-way to creating something in the same vein, while trying to
use Git as the "world-state synchronization protocol" for a distributed MOO.
Back to the (much-simplified) drawing-board :)

------
Vitaly
I don't think its intended to be used for frequently changed models like
'user'. It is much more suited for low change frequency document like models.
For example 'content' model in a blog or cms.

having history might not be a big deal, but having branches IS!.

It is great for stuff like testing some new set of pages on a staging server
and then 'adding' them to the currently running production server which kept
changing while you were working on the stage. Try that with SQL!

~~~
pauldowman
Yes, it would be a lot more appropriate for things that don't change that
frequently. I actually started writing a blogging app for coders, and then
succumbed to a bout of extraction distraction by generalizing the part that
read the pages/posts from a Git repo: <https://github.com/pauldowman/balisong>

------
po
Also of interest and in a similar sort of spirit:

Bup: Git based backups - <https://github.com/apenwarr/bup>

~~~
pygy_
In the same vein: Gibak.

<http://eigenclass.org/hiki/gibak-backup-system-introduction>

<http://eigenclass.org/hiki/gibak-0.3.0>

<https://github.com/pangloss/gibak> for an improved fork (last commit in 2008,
but still...)

------
yatsyk
I also thought about similar project. Git based active model has some
drawbacks and not for every project for sure. It's not so good if there are
many users making a lot of changes to data (like facebook). But it's very
interesting for applications when few users make changes on site and site
looks like static for rest of visitors (applications like any CMS). And we are
getting a lot of cool git features for free.

------
silentbicycle
Well, sure, git provides an append/log-based distributed hash store. It's
probably not what you're looking for, though: It doesn't have a library for
efficient in-process access, so you need to spawn a git shell command per
operation, with somewhat opinionated semantics. It's also GPL'd.

It's good for prototyping, though. (And I hear it's good at managing changes
for source code!)

~~~
WALoeIII
"It doesn't have a library for efficient in-process access"

<https://github.com/mojombo/grit>

~~~
silentbicycle
Ah, via Ruby. _RUBY?!??!_ Ah. Delightful.

 _YECCH_

So, uh, can't we just make an efficient, well-designed, C library for
accessing those data structures, write BSD-licensed wrappers as consenting
adults, and get on with our lives? I'm speechless.

Is writing an append-only log data-structure system really that hard? Ok,
then.

~~~
aditya
What's stopping you?

~~~
silentbicycle
Git itself works fine, for my purposes. If _I_ wrote a C library for managing
git's data structures, it would basically be out of spite. I may make a Lua
wrapper for libgit2, though - it's more what I had in mind (though it's too
bad it's GPL'd).

I have a library for managing distributed graphs on top of an append-only log
file, but it's got different design trade-offs than git does - It's for a
specific C/Lua project, and the semantics of the data don't match git's.

~~~
asb
libgit2 is GPLed, but importantly also has a libgcc-style exception.

------
Sirupsen
This could be interesting for projects where keeping revisions would be handy,
for instance a note-taking or document platform.

~~~
rlpb
Yes - how about Tomboy/gnote integration?

------
epynonymous
have you done any performance tests? how well does git scale in general for
when you have billions of files and versions?

this gives me some ideas, good stuff.

~~~
zoomzoom
i have seen that these huge deployments of git run into problems, that is why
git is less than perfect as a system-wide backup tool.

------
uriel
In a somewhat related note, see also Venti:
<http://doc.cat-v.org/plan_9/4th_edition/papers/venti/>

