

Is Git more than a version control system? Reimplementing CouchDB with Git+Bash... - ivanstojic
http://www.ordecon.com/2009/04/22/is-git-more-than-just-a-version-control-system/

======
davidmathers
I've been waiting for this. Ever since:

1\. I learned from Linus that git is a decentralized "content" database that
gives identity to different versions of the same content and allows them to be
compared and merged. (as opposed to a traditional VCS which is more of a
"delta" database)

2\. I learned from Damien that CouchDB is a decentralized "document" database
that gives identity to different versions of the same document and allows them
to be compared and merged.

I've been wondering just what the difference really is.

~~~
iamwil
Git retains its change history, and if I remember correctly, uses the sha hash
to verify that no one changed the history. Couchdb retains the change history
until someone compresses the db (garbage collect). Its revision history is
only used as a means of merging out of sync documents.

In addition, Git makes individual merges in a document, and when there's a
conflict, it resorts to human intervention. Couchdb does not make merges
inside a document, and will not do conflict resolution. Instead, using a brief
set of rules, it picks one version of the document over another.

~~~
peregrine
Thanks for that, but couldn't you theoretically very easily script git to
automatically choose based on prior requirements?

~~~
ivanstojic
Of course you could. I guess one of the more important points that are made
somewhat subtly by this hack is that you can easily fiddle with the
functionality of a simple yet flexible piece of software (git) - that's what
the Unix philosophy preaches. You start by taking lots of simple components,
that are flexible and do not make assumptions about their use, glue them up
together into bigger pieces.

Large homogeneous systems that only provide one functionality are hard if not
impossible to extend to make them do just what you want. Unless you want what
they do in the first place, which makes the whole customization thing
pointless ;-)

But that is self evident, or at least should be.

------
ivanstojic
I hope nobody minds this butchery, but I really wanted to finally learn some
more about Git and refresh my Bash skills.

It's true that the road to hell is paved with good intentions :-)

~~~
vdm
Au contraire; congratulations on a great hack.

I think a lot of us have had this one cooking in our minds for a while and its
great to see that somebody got it Done.

Bravo.

~~~
ivanstojic
Much obliged for your comment. It's comments like this one, and others I have
received that drive me to more hacking :-)

------
ams6110
Not really a new idea to use a VCS as a document database, other folks have
back-ended Wikis and document management systems with SVN, for example. But a
nice story regardless, and another nudge for me to really take a closer look
at git.

First distributed VCS I heard of was darcs. I have not used it either, but
from what I've read about the two, git has some real advantages in speed and
robust-ness.

~~~
igorgue
Darcs is just different, also when this Git buzz started the Darcs people were
wondering why they didn't just contribute to Darcs since their ideas are
almost the same (except for some things like the merge algorithm)... they just
realized that some people just don't program in Haskell :(

~~~
mbrubeck
Git and Darcs have very different underlying models. Most importantly, Git is
content-centric (diffs are generated only as byproducts) while Darcs is patch-
centric (any given "version" is just a sum of all the patches that produce
it). Linus, for one, feels strongly that this is an important distinction.

------
Readmore
I've built a Git datastore that works in much the same way. I constructed it
to interface with Rails in a manner as close to ActiveRecord as I could. I
actually ended up basing it on the CouchRest module for interfacing Rails and
CouchDB, so there are a lot of similarities.

As soon as I find some time I'll do a full write up for anyone that's
interested.

------
spooneybarger
Funny that this should appear this morning.

Last night I was thinking about how much I hate existing file managers. How
the metaphor was great when you had 2 meg hard drives and a few folders and
maybe a couple hundred files tops, but how now it falls apart. There was a
project called lifestreams at Yale ( <http://cs-
www.cs.yale.edu/homes/freeman/lifestreams.html> ) that had some interesting
ideas about allowing you to see your documents as a versioned timeline. Using
a dcvs type system as a backend would get you a lot of that plus more, you
could explore different ideas for files on different branches. I do however
want more...

I want usable metadata like the BeFS had. Where any arbitrary metadata
key/value paris could be attached to a 'file', For example, contacts in the
BeFS were basically empty files with metadata attached. That metadata included
name, address etc. Any application could uses the data, augment it etc. The
file manager ( tracker ) could query the data to create live 'searches' that
looked just like folders. You could add new types and what not. Very powerful
when the filesystem is a database. BeOS ( and now haiku ) kept the traditional
file manager around as well. Others have approached this idea, but haven't
really gone after it.

What I thought of that I really wanted to see was something like a document-
centric database like couchdb that is backed with dcvs like features that
operates just like a regular filesystem, i mount it, i can drag and drop
content into it, save into it, make it look like a 'normal' fs to existing
applications, but all that path info etc is just saved searches on metadata
that group information together and you can tag those searches with metadata
themselves.

~~~
spooneybarger
an additional thought... then filesystem backups just become a push/pull
scenario between different instances. man, i really like this general idea.

------
euroclydon
This has been on my mind for a while: Does anyone know if DropBox is developed
on top of an existing source control system?

~~~
troystribling
Running git on top file based storage would also be an interesting project.
There is a proprietary commercial implementation of a similar idea from
Caringo called CFS <http://www.caringo.com/products_cfs.html>. An opensource
version based on git would be nice.

------
vdm
Can Git be made to handle binary files as nicely as
[Dropbox](<http://getdropbox.com/>) does?

On [one](<https://www.getdropbox.com/tour#3>) of the pages from their tour, it
shows how if you save a 10 MB PSD to Dropbox twice, it shows you a table with
a row for the original file and another row for '400 extra bytes' or whatever.

> Dropbox is also smart with how it tracks changes to files. Every time you
> make a change, Dropbox only transfers the piece of the file that changed
> (also known as block-level or delta sync), making it easy to work with big
> files like Photoshop or Powerpoint documents.

~~~
silentbicycle
First off, the "block-level" sync is _exactly_ what rsync is for. You'll
probably have better results using that. (<http://www.samba.org/rsync/>) You
could track a list of local file paths in git, but sync the binaries
themselves with rysnc. (They complement each other well.)

Tracking changes in binary files (which cannot be merged in any reasonably
generic fashion) is a fundamentally different issue than tracking changes in
text, particularly source code. Git is designed to do the latter. While you
_can_ use it to track changes in binaries, merging doesn't make sense anymore,
and hashing / scanning big binary files for changes is significantly slower.
(A bunch of images generally won't matter, but I wouldn't use it to track,
say, video, or large database dumps.)

------
troels
Quite interesting. On a related theme, a lot of people have begun using vcs'
(especially git, but also some times svn) as the data-storage backend for
their applications, where they would have used a rdbms a few years ago. Eg.
[Jekyll](<http://github.com/mojombo/jekyll/tree/master>) etc.

~~~
ivanstojic
Git is really nifty. What I like about it is that, when you get down to it,
it's amazingly fast. There are many reported cases of people holding their
home directories as Git repos on a day-to-day basis.

It's just amazing what a good VCS can let you do.

~~~
durin42
People store their home dir with hg too (I do, many coworkers do), and before
the DVCS era people used svn and cvs. This is far from a git-unique thing.

~~~
silentbicycle
Indeed. I've used svn, monotone, hg, and git for this (moving from each to the
next over the last three years or so), and they all worked fine. In my
experience, the special features of each have relatively little significance
for typical operations on home directories.

It's worth picking any and just doing it, though.

~~~
icefox
I have often heard of people putting their /etc in cvs/svn, but I never
bothered due to having to setup a repo. That is until I had git at which point
it became just a git init.

~~~
silentbicycle
Right. Monotone (an earlier DVCS, inspiration for git), while a really nice
system technically, expected you to set up keys for authorizing changes to the
repository before you could use it* . It was a little thing, but reducing
setup to just "git init" (or "hg init") makes for less friction to putting
stuff in VC _just because_...and then you find out it's good for things you
would have never anticipated.

* One style difference between the two. See Graydon Hoare's comment here ([http://www.mail-archive.com/monotone-devel@nongnu.org/msg080...](http://www.mail-archive.com/monotone-devel@nongnu.org/msg08012.html)).

------
silentbicycle
So, um, has anybody tried using this?

Don't have time to set it up at the moment, but I'd been wondering exactly the
same thing.

------
middayc
Don't have time to read it now but looks very interesting thinking so I will
read it later.

I am using bazaar to do daily backups at some webapp that stores serialized
data in files. I was thinking _a little_ about syncing it between servers with
bazaar if needed but you seem to go a few steps further. :)

~~~
ivanstojic
There are some things that I find extremely enticing about Git. First of,
there's the speed which I already mentioned. The second is how it's designed
to be decentralized - and my amazement at this probably stems from the fact
that this is the first decentralized VCS that I've ever worked with.

I've been using rsync to keep certain directories of my various computers
synchronized, but I think I'll go one step further soon enough: for my Linux
computers, I'll keep some parts of my home directory in Git in order to
maintain an identical user interface / configuration on those machines.

------
AndrewO
Thanks for reminding me how paltry my Bash skills are. Yeah, thanks a lot...
:)

------
ivanstojic
You bastards killed my webhost's MySql instance :-)

