
Git's initial commit - olalonde
http://github.com/git/git/commit/e83c5163316f89bfbde7d9ab23ca2e25604af290
======
jordigh
Well, while we're looking at FIRST POSTS, here's Mercurial's, self-hosting a
month after git, and like git, also created to replace bitkeeper:

[http://selenic.com/hg/rev/0#l10.1](http://selenic.com/hg/rev/0#l10.1)

The revlog data structure from then is still around, slightly tweaked, but
essentially unchanged in almost a decade.

~~~
coldpie
Mercurial is impressive for making Git's UI look intuitive.

~~~
EGreg
Other way around

~~~
coldpie
C'mon, man, you make branches by _cloning the repository_ [1]. That's
insanity.

[1] [http://hginit.com/05.html](http://hginit.com/05.html)

~~~
jordigh
git at revision 0 worked the same way. You can see that there are no
references in git at that time either. They're both copying bitkeeper, which
worked the same way.

Nowadays git has references (branches), and hg has bookmarks which are the
same, plus hg also has the option to label every commit with a permanent
branch name. They also still have branching-by-cloning, and if you listen to
Linus's original Google code talk about Git, you can see that he conflates
"branch" and "clone" because that's what he originally envisioned! Even in
2007 he was still thinking in bitkeeper terms too. I bet that branching with
references was Junio Hamano's idea, after Linus did the code hand-off.

I find branching-by-cloning a bit more natural in hg, because you can push to
any repo. It's useful for quick, throwaway, local, easy testing out of ideas.
In git, you can only push if your push doesn't modify HEAD, which typically
translates into only being able to push to bare repos.

~~~
coldpie
Interesting, thanks for the info. I've only been using Git since 2009 or so. I
love Git's model of commits being objects in their own right, allowing you to
cherry-pick them across branches, or rebase them to reorder or squash several
commits together, for example.

My usual development routine is to make a ton of small commits that add up to
a small set of good commits, to promote bisect-ability. I do dozens of
rebases, squashes and amends when working on a topic branch. I have to use
Mercurial for one of my clients, and it's a nightmare doing my development
model in an SCM where I can't toss commits around willy-nilly like I can in
Git.

~~~
jordigh
> I have to use Mercurial for one of my clients, and it's a nightmare doing my
> development model in an SCM where I can't toss commits around willy-nilly
> like I can in Git.

Yes you can. `hg histedit` is a lot like `git rebase -i`, and `hg rebase` is
like `git rebase` without -i and `hg commit --amend` is a lot like `git commit
--amend`.

There are also some really cool things that we're working on with hg:

[https://www.youtube.com/watch?v=4OlDm3akbqg](https://www.youtube.com/watch?v=4OlDm3akbqg)

------
brandonbloom
I love checking out very early versions of projects. You often get to see the
essence before the real world came in and ruined the beauty of it.

~~~
akkartik
I do this as well. It really should be more widely broadcast.

(I've also spent some time thinking about how it's kind of a hack, and what we
can do to make it better: [http://akkartik.name/post/wart-
layers](http://akkartik.name/post/wart-layers))

~~~
Monkeyget
There is The Architecture of Open Source Applications series of book
[http://aosabook.org/en/index.html](http://aosabook.org/en/index.html) were
one of the author of the software explain the essence of the program.

------
afandian
My god... the comments. Looks like the reddit culture (i.e. fun for in jokes
but not particularly professional)

~~~
scintill76
"A marathon of clicking 'next page,' but the view is worth it." So, this
commenter practically worships git, but apparently doesn't actually understand
it well enough to know a better way to find the hash of the first commit and
punch that into Github. Or, it was just a joke and they got there the quick
way, but still felt obliged to post a dumb joke to inflate their own ego by
"leaving their mark" on git. Maybe I'm being too mean, but yeah, I also think
a lot of the comments are pointless.

~~~
dlitz
> Maybe I'm being too mean, but yeah, I also think a lot of the comments are
> pointless.

Yeah, I think you're being a little mean. If you browse to that user's GitHub
page, it looks like it's just somebody new who's excited about software. Good
for them.

The comments are pointless, sure, but also harmless. Similar comments might
crowd out productive discussion if they were on (say) the head of the master
branch, but I doubt that any serious development is happening on git's initial
commit anyway. Let the new people have their fun.

As far as newbie disruptiveness goes, it could be far worse. When I was
getting started with Linux, I posted this cringeworthy gem to LKML, now
enshrined in the archives for all eternity:
[https://lkml.org/lkml/2000/10/22/69](https://lkml.org/lkml/2000/10/22/69) If
newbies today are merely posting "yay, git!" and "thank you!" to a secondary
forum where it doesn't disrupt development, I'd say they're doing pretty well
in comparison. :)

~~~
scintill76
Yeah, fair enough. Good on you for linking your own cringey post. I think a
lot of developers have those early cringe moments, especially if they were
young when they started.

As far as disruption, it did occur to me later that somebody may be getting
notification emails about these comments. But it's not too bad, as I assume
they could just send the emails to /dev/null, since Github is not the official
host of git. (As a tangential note, I sort of wish Github would handle this
better. So many Github-mirrored projects end up with something like "don't
submit pull requests or open issues here, they will be ignored" in their repo
description.)

------
jeffreyrogers
Interesting fact about Git is that it was self hosting in two weeks, IIRC.

~~~
TazeTSchnitzel
How can something that isn't a programming language be self-hosting?

~~~
jonesetc
Overloading the term. The OP presumably meant that the source for git was
under git source control.

~~~
vacri
'Hosting' means 'contain', 'serve'. A building can host a department or a
convention, and a married couple can host a dinner party, with neither being
required to be a webserver or programming language.

~~~
tripa
To add to that, IMHO self-hosting for VCSs is closer to the original meaning
of the phrase than for compilers.

------
stinos
Maybe I've been drilled too hard by a couple of programming gurus, but I
immediately noticed there are quite _a lot_ of repeated yet unnamed magic
constants in the (otherwise pretty clean) code. According to wikipedia [1] the
rule to not use them is even one of the oldest in programming. Curious what
kind of profanity Linus would come up with when confronted with this :]

[1]
[https://en.wikipedia.org/wiki/Magic_number_%28programming%29...](https://en.wikipedia.org/wiki/Magic_number_%28programming%29#Unnamed_numerical_constants)

------
d0m
I've read so many git tutorials, I wish I had seen that README file before.

~~~
danra
This. I find that learning from original documentation tends to be _much_ more
efficient than learning from third party blogs/tutorials which try to
"simplify" things, and usually do the opposite.

------
hyp0
It's so short.

The readme is the best explanation of git I've seen.

~~~
jastanton
Does anyone know if the structure of git has changed much? I would like to
read this thinking this is pretty close to the current implementation but I
would have no idea. anyone?

~~~
asdfaoeu
You can just see the structure with git cat-file

    
    
        -> % git cat-file -p 8c48d1a36c3d11db44c75a431d4f09cb0035222f
        tree 288c2d5379768f685f391bdbffd31b8965318c63
        parent 002ae35061beef02453b7fb1045a50fa2f7f30f8
        author Denis Bilenko <denis.bilenko@gmail.com> 1246939605 +0700
        committer Denis Bilenko <denis.bilenko@gmail.com> 1246939605 +0700
    
        MANIFEST.in: include libevent.h and libevent-internal.h
        -> % git cat-file -p 288c2d5379768f685f391bdbffd31b8965318c63
        100644 blob 6e543dc13df1b556fd95530061ac0c77a9178309.hgignore
        100644 blob 79c7beb2227ce149c7a71e58e2f7379071b7a189MANIFEST.in
        100644 blob 0d05178544942a035a82599900bec27fbac1c9c5README.eventlet
        040000 tree edb8f37fa622315dcf7bf4f7316d5e85c48cfdbdexamples
        040000 tree 64cf252d77a4162099442bb0153985fc20ed5ba3gevent
        040000 tree 261052e04b4aece469b2e767e394aafbc9d88a32greentest
        100644 blob 488e805c563dfeeb6af5e7a1a8953b706d9676e3setup.py
        -> % git cat-file -p 6e543dc13df1b556fd95530061ac0c77a9178309
        syntax: glob
        *~
        *.pyc
        *.orig
        dist
        gevent.egg-info
        build
        htmlreports
        results.*.db
        gevent/core.so
    

And yeah it's still very similar though it currently doesn't store the objects
individually but rather packs them together.

~~~
alblue
I wrote about the format of git trees (and other object types) here:

[http://alblue.bandlem.com/2011/08/git-tip-of-week-
trees.html](http://alblue.bandlem.com/2011/08/git-tip-of-week-trees.html)

------
fivedogit
Thread from 829 days ago.
[https://news.ycombinator.com/item?id=4395014](https://news.ycombinator.com/item?id=4395014)

~~~
Sevein
Good memory!

~~~
fivedogit
Nah. I just use Hackbook.

[https://chrome.google.com/webstore/detail/hackbook/logdfcelf...](https://chrome.google.com/webstore/detail/hackbook/logdfcelflpgcbfebibbeajmhpofckjh)

------
EGreg
Linus wrote:

* +Side note on trees: since a "tree" object is a sorted list of +"filename+content", you can create a diff between two trees without +actually having to unpack two trees. Just ignore all common parts, and +your diff will look right. In other words, you can effectively (and +efficiently) tell the difference between any two random trees by O(n) +where "n" is the size of the difference, rather than the size of the +tree. *

Um, What?

~~~
pja
Since a git hash points to a sorted list of filenames and content hashes, to
diff two git commits you lookup the commit objects by their hash, run down the
resultant list of filename/hash pairs & then only lookup & diff the content of
those files that have differing hashes (if they have the same hash, they must
have the same content according to the git data model, so they can be safely
ignored).

Hence diffing arbitrary commits with git is always O(N) in the number of
changed files, regardless of the number of interstitial commits.

~~~
throw_away
In particular he's saying that for a tree, you can quickly skip sub-trees if
they are the same, regardless of how deep they go. Kind of like a Merkle tree:
[http://en.m.wikipedia.org/wiki/Merkle_tree](http://en.m.wikipedia.org/wiki/Merkle_tree)

I'm no git internals expert, but I suspect for a flat list of files the
complexity is still O(n) where n is the number of files (not changes) because
at very least you must check that n checksums are the same.

~~~
pja
_I 'm no git internals expert, but I suspect for a flat list of files the
complexity is still O(n) where n is the number of files (not changes) because
at very least you must check that n checksums are the same._

Sure. The constant factors make a huge difference though - even if you've
cached all the data in memory walking all those structures and diffing the
actual file data is going to be enormously slower than simply walking a list
of hashes, so you're really saying that the total time is big * O(number of
files changed) + small * O(number of files). If small*N ~ big then it's
reasonable to just disregard that cost - it's going to be lost in the noise.

~~~
throw_away
I'm not arguing that, but rather that this ability to skip unchanged trees
because the hash of all contents is bubbled up is specifically what Linus is
referring to in the comment, not simply the comparison of hashes in the flat-
directory use-case.

------
zabcik
Why are there multiple main() functions? I've never seen this style before. Is
it multi-process?

~~~
GauntletWizard
There's a bunch of different utilities in there. Each has it's own main()
function, and they're compiled into a bunch of binaries.

------
royragsdale
[https://github.com/git/git/commits?page=1091](https://github.com/git/git/commits?page=1091)

If you want to see the commits going forward from here.

------
DodgyEggplant
This is a great lesson in writing focused & succinct specs, when one clearly
sees what his/her program is going to do.

------
Jackcor
Did all of the inital commit code is written by Linux Torvalds ?

~~~
gpvos
Yes. The interesting thing is actually that it isn't that much code.

------
hnmcs
Gotta love the fact that there are open pull requests.

[https://github.com/git/git/pulls](https://github.com/git/git/pulls)

------
hw
Does Github offer an easy way to get to the first commit of a project?
Traveling page by page back in time is time consuming (yeah, i did that)

~~~
ChristianBundy
No, but if you have the full history you can grab it with a shell command.

    
    
        echo https://github.com/git/git/commit/$(git log --pretty=format:%H | tail -1)

------
Fizzadar
Great to see the original command set, and the title of course: "GIT - the
stupid content tracker"

~~~
derekp7
If I recall, Linus was highly pissed at the time he wrote GIT. Lots of his
comments at the time were meant as a slam against the guy who was reverse-
engineering the Bitkeeper protocol, which resulted in the license for
Bitkeeper getting yanked for the kernel project. I wonder if Linus is still
angry with Tridgell?

~~~
sjwright
I've been in the situation where a combative party has spurred me on to do
some of my best work. I doubt Linus holds a grudge... and considering the
consequences I wouldn't be surprised if he wrote a tounge-in-cheek thank you
letter!

------
dirtyaura
I only realised reading the README that git is a great lesson in branding.

------
justintbassett
I wonder what the first commits for big sites/projects look like?

~~~
josephcooney
I tried to compile a list of a few of them a while ago:

[http://jcooney.net/post/2011/06/22/First-Check-in-
Comments-f...](http://jcooney.net/post/2011/06/22/First-Check-in-Comments-
from-Popular-Open-Source-Projects.aspx)

------
dbdr
Where are the tests?

------
byteCoder
Following the tradition of sports, I propose that commit id
e83c5163316f89bfbde7d9ab23ca2e25604af290 be officially retired.

~~~
CUViper
Given that the only way to reuse it is to duplicate the tree and commit
metadata exactly, or find an sha1 collision, I think it's pretty safe. :)

I wonder if there are any git sha1 collisions out there in aggregate, say
across all of github. Would they even notice if there were?

~~~
meowface
>I wonder if there are any git sha1 collisions out there in aggregate, say
across all of github.

Despite the incredibly high number of all commits there must be, I think the
chance of a collision is still very unlikely. 2^160 is a pretty big number.

~~~
MichaelGG
The number of inputs before a likely collision is more on the order of 2^80.
Which is still pretty large.

~~~
thret
This is comparable to the number of atoms in the universe. Pretty large! We
will never see an accidental collision.

~~~
zxcdw
Not quite, atoms in the universe is in the range of 10^80, which is a bit less
than 2^266.

On the other hand, 2^80 is "only" approx. 1.2 * 10^24. Still, good luck
colliding with that without big effort.

------
benihana
Is there a reason there aren't any braces around single-line _if_ statements?
Is that a C thing? It seems kind of inviting to bugs to me.

~~~
desdiv
It's pointless to argue over these kind of things. Every major project/company
has their own codified code style guide, and if you want to contribute/earn
your salary then you must follow that style guide to the T. Here's the
relevant quote from the Linux kernel coding style[0]:

    
    
        Do not unnecessarily use braces where a single statement will do.
    
        if (condition)
    	    action();
    

[0]
[https://www.kernel.org/doc/Documentation/CodingStyle](https://www.kernel.org/doc/Documentation/CodingStyle)

~~~
guelo
I consider that a bug in their style spec. Single line if statements are known
to cause bugs.

~~~
sytelus
That's what I'd thought for may be over a decade. About ~3 years ago I
revamped my personal coding style to eliminate as unnecessary baggage as
possible. As part of that I stopped using braces for single line if and I'd
yet to bump in a bug _because_ of that. Overall I find code looks more compact
and cleaner, may be even less friction to read. Nowadays when I see a braces
around single line if I get that "oh that's clunky code" feeling in my
stomach. Things are worse with C# and lot of Java code where people insist not
only having braces around single line if but also have { on its own separate
lines.

I think a good language shouldn't have braces to mark blocks in first place.
Given indentation,they are redundant most of the times and they just
contribute in clunk. This is exactly the case with Python and hence this is
essentially a default style and people hadn't be complaining about it's
causing bugs.

~~~
scott_s
There's an enormous difference with Python, because the indentation is syntax.
These two code snippets, one in Python, the other in C, do _not_ mean the same
thing:

C:

    
    
      if (condition)
        statement_1();
        statement_2();
    

Python:

    
    
      if condition:
        statement_1()
        statement_2()
    

Personally, I always use braces in C and C++, even though it is more clunky. I
want the assurance. I also frequently have to make changes to code that does
_not_ use braces, and then I have to add the braces in because I am adding
statements to a conditional. To me, that is _more_ clunky.

------
tempodox
Code comment about git:

    
    
      stupid. contemptible and despicable.
    

That sums it up quite well. Every day I pay thanks to The One Who Programmed
Me that my workflow doesn't put me in need of that shitload of crap that is
git. I pity those who do need git.

