

Time and Space Tradeoffs in Version Control Storage - Rexxar
http://www.ericsink.com/entries/time_space_tradeoffs.html

======
ars
He missed some:

First try a zlib of the entire thing, not each version, since it will
effectively find all the common lines and not repeat them.

Next, instead of pure keyframes, let the keyframes themself be chained. That
should save a ton of space, without being really slow. (And then recurse if
you like - every ten keyframes, add a supra-keyframe, etc. recurse all the way
down.)

~~~
jrockway
Yeah, the zlib of the entire thing would be interesting. Also interesting
would be gzip(orignal + set of deltas). I would also like to see how his data
did with a git packfile, and when stored in a svn repository. (This would
answer an important question: Are the theoretical ideas better than what is
currently used in practice?)

------
jrockway
I am confused as to why the delta-without-keyframes takes too long to measure.
At most, you have to do 500 disk reads (which is more than 1, but still not
very slow), and then do 500 delta applications, which should also not be
particularly slow.

I would like to see his code and test data, but I couldn't find a link.

