Author here. I discovered this while working on a clean-room implementation of Git in pure Go. While there are a lot of references to packfiles online, surprisingly, the actual format of packfiles was rather underdocumented. Most resources just mention that they exist, and describe how to use `git verify-pack` to inspect a packfile, without explaining how to parse packfiles and apply deltas.
I decided to write this up to save others the trouble of having to reverse-engineer it from scratch!
Yes, I link to a different version of that same file in the article (my link points to the version hosted on kernel.org, rather than Github). It provides a bit of high-level context, but by itself it doesn't provide enough detail to actually reimplement the corresponding Git functions.
Aside from being more terse and (IMHO) more difficult to read than prose with examples and non-ASCII diagrams, that file doesn't explain the context and motivation for packfiles, and it doesn't cover the parsing and application of deltas at all.
If you found that piece of documentation deficient while implementing a packfile parser, then it would be nice to update it to include those details that were lacking to help the next person to reimplement git.
- A chance to learn about the really dark, thorny corners of Git
And for what it's worth, source control is Git's intended use case, but people do use it for other purposes as well (like managing personal media collections across multiple devices[0]). Git has become a protocol or a platform in addition to a VCS[1].
But there aren't very many FOSS clean-room implementations of Git, at least not this far down the chain (packfiles). One of the best ways to discover hidden implementation issues or oversights in a spec or existing documentation is to try and reimplement it, which has the effect of strengthening the platform itself in the long run.
[1] Bitcoin is a bit of a hot-button topic, but it's similar to bitcoin in this regard: the tool itself is intended for financial transactions, but people have already started to use it for all sorts of unrelated use cases.
> But there aren't very many FOSS clean-room implementations of Git, at least not this far down the chain (packfiles).
I know at least of gogit (https://github.com/speedata/gogit), if only a little, because I've contributed once to it. I don't want to belittle your project, but I'd like to know: what are the differences between gitgo and gogit ?
(I see at least one similarity: the name is extraordinarily unimaginative :)
Kudos. Just noting that one of libgit2's primary strengths is 100% cross-platform portability. Is this different from your cross-compilation goal? I'm not familiar with Go.
Yes because Go generates static binaries and requires no toolchain. Once you link a C library, you loose these benefits and cross-compiling returns to the "normal" difficulty. In addition to that, Go code may be buggy but it's safe, while linking a C library means dropping this guarantee.
So in the Go ecosystem it is actually preferred to use pure Go libraries not to loose these benefits.
I doubt it. A clean-room implementation is intended to demonstrate to a court that no knowledge of the copyrighted material was available to the implementers during the creation of the work. Any similarities to the "original" work must then be caused by functional constraints resulting from compatibility requirements, thus free from creative elements (not protected by copyright).
If the internal documentation include any creative elements which are reflected by the git implementation, then the clean-room would be contaminated.
This ("Unpacking Git packfiles") was a CTF challenge a few weeks ago (at the Haxpo CTF in Amsterdam), except we weren't given the original repository, we only got a pcap dump of the traffic. Using `git extract-objects` I was able to unpack them into object files (stored in .git/objects/xx/*) but even these were not readable. Eventually found some zpipe command that did the trick. What a pain to do this with common tooling if you don't have the time to dive into the format and write a real unpacker.
I decided to write this up to save others the trouble of having to reverse-engineer it from scratch!