
A look back: Bram Cohen vs. Linus Torvalds (2007) - geertj
http://www.wincent.com/a/about/wincent/weblog/archives/2007/07/a_look_back_bra.php
======
solutionyogi
This article brings back memories. Back in 2007, I had just watched Linus'
presentation on Git at Google
([http://www.youtube.com/watch?v=4XpnKHJAok8](http://www.youtube.com/watch?v=4XpnKHJAok8))
where he called all non distributed version control systems as useless. I
could not make any sense of the DVCS from his talk. I tried to play with Git
and it was extremely frustrating due to the poor CLI. I thought may be Git is
just a fad. But then more and more people kept talking about how awesome it
is.

This was one of the article where Git finally clicked for me. The key quote:

 _There is no need for fancy metadata, rename tracking and so forth. The only
thing you need to store is the state of the tree before and after each change.
What files were renamed? Which ones were copied? Which ones were deleted? What
lines were added? Which ones were removed? Which lines had changes made inside
them? Which slabs of text were copied from one file to another? You shouldn 't
have to care about any of these questions and you certainly shouldn't have to
keep special tracking data in order to help you answer them: all the changes
to the tree (additions, deletes, renames, edits etc) are implicitly encoded in
the delta between the two states of the tree; you just track what is the
content._

It's been 7 years since I have been using Git and I can't imagine how I ever
worked with version control which didn't work on the entire tree.

------
shubhamjain
The brilliant thing about Linus that never ceases to amaze me is his level of
knowledge and how he is never 'wrong'. He has always defended his decisions,
maybe in arrogant tone, against countless arguments and each one stands tall.

Lately, Linus announced use Git object database format for Subsurface[1]. One
of the respondents said "Why not use JSON?". Linus excellently defended by
saying, putting everything in one file was not great. So, even though not
being a web guy, he was still aware, why using Git object file format had more
merits than any other thing.

[1]:
[https://plus.google.com/+LinusTorvalds/posts/X2XVf9Q7MfV](https://plus.google.com/+LinusTorvalds/posts/X2XVf9Q7MfV)

------
ayrx
"Me _personally_, I want to have something that is very repeatable and non-
clever."

This is what all software engineers should aim for.

~~~
colanderman
Ironic from Torvalds, given the haphazard way some git commands interpret
their arguments. Maybe he didn't write those.

------
atmosx
> I knew Torvalds was smart, but seeing as I was never really more than an
> occasional Linux user I never realized just how smart;

Hm, when a guy writes his own _kernel_ he is _smart_. I mean, as far as
implementing goes, _as smart as it gets_. The amazing thing is that he was
pretty young when he did it (1991-2). And then, there this[1]. When people
talk about "hackers", Linus is the first person that comes to mind.

[1] [http://lwn.net/2000/0824/a/esr-
sharing.php3](http://lwn.net/2000/0824/a/esr-sharing.php3)

~~~
Erwin
Around 20 years ago when I was taking my computer education, low-level
programming was all there was.

OS programming from absolute scratch is nothing special by the standards of
what was done that day (some years later you had OS toolkits and a huge amount
of tools to make that far easier, like virtualization; 20 years we could maybe
beep to debug our programs). Many in the programme grew up on Commodore or
Spectrum which also meant a lot of low level tricks.

So Linux 0.1 didn't really have any amazing contributions to computer science
(on the contrary, you may recall the famous Tanenbaum-Torvalds thread on
microkernels vs monolithic kernels). It was pragmatic and, quite quickly,
useful.

I think where Linus did extremely well was a) successfully managing a huge
number of contributions while being highly technically involved and b)
relentlessly changing the internal design to improve it. If Linux had been a
commercial product, there'd be lot of senior people greatly invested in their
own designs that'd be unwilling to modify them.

For comparison, here's another famous kernel programmer who has the technical
skills, but not the collaboration skills:
[http://www.templeos.org/](http://www.templeos.org/)

~~~
njharman
> So Linux 0.1 didn't really have any amazing contributions to computer
> science

That's a key point. Linus is not a Computer Scientist, he is a programmer. CSs
advance the theory of computation. Progs make shit we can use.

------
ethomson
I'm surprised that the author of this post would point out a rename conflict
as something that "git gets right", in part because I'm relatively certain
that git-merge-recursive did not exist when this this mailing list exchange
occurred (I'm actually surprised that it was the default already in 2007) and
git-merge-resolve would have done something completely different, treating
`greeting` as deleted in both and `saludo` as added in left. There would be no
conflicts and `saludo` would merrily be created, which seems like the wrong
thing.

But I'm mostly surprised because rename conflicts are this transient thing.
git-merge-recursive will _detect_ a rename conflict, but you're hosed when it
comes time to resolve it, since the information that it's a rename conflict
isn't captured anywhere except, briefly, in the phosphors of your CRT.

In the author's example, when you run `git status`, it will simply tell you
that `saludo` was added by them. Which is exactly the behavior of the rename-
deficient git-merge-resolve. The expectation in resolving this, I suppose, is
that you saw the message that this was a rename/delete conflict, remembered
the original filename and could somehow make a decision based on that.

This is not terrible in a rename/delete conflict, but for some other types of
rename conflicts, it's much more difficult. For example, branch 'A' renames a
file from 'foo' to 'bar', branch 'B' renames it from 'foo' to 'baz'. Now you
have two files in your working directory and git-status can only tell you that
they were each added, which is not indicative of a conflict.

This is annoying for a user on the console. This is impossible for somebody
trying to build a UI to resolve a merge conflict: 'bar' was added in one of
the branches... why does this conflict? Well, if it's only on one side of the
merge, then it must have come from some rename conflict. But with which other
file? What's the common ancestor that git-merge-recursive decided was a
rename? Meh.

(Please do not mistake this rant as a suggestion that Codeville's merge is
superior to Git's. I'm not suggesting that, just that git-merge-recursive has
a few rough edges that could use polish.)

~~~
wincent
That's exactly the point: Git didn't handle the rename conflict so well at the
time of the mailing list exchange, but it did handle it better by the time the
blog post was written. And it may handle it better still in the future,
precisely because the repo format isn't laden with metadata[0], and the
handling of edge cases like this can be improved by evolving the heuristics
that Git uses to infer what happened.

It's a bet that "future self" (improving heuristics) will be more effective
than "past self" (attempting to design a future-proof the repo format). It
looked like the bet was paying off in 2007 when the blog post was written, and
7 years later that still seems to be the case.

[0] Metadata which would need to be carefully managed for compatibility across
versions, and which would be missing any time the user forgot to explicitly
record it (with a Git command) and instead made a change directly to the
worktree.

~~~
ethomson
Yeah, we're in agreement about that. The simplicity of the git repository is
very nice. The repository format is a thing so beautiful that it makes you
want to cry.

With a few horrible warts thrown in that make you actually cry.

------
wirrbel
I think this is a good example for complexity management. Linus has a bottom-
up approach to this. With a few building blocks you build up a system where
you can define and work with simple algorithms that are both understandable
and approachable by a single human mind.

The underlying assumption is that simple approaches can lead to an "easy"
solution. To contrast this with a complex algorithm, a complex algorithm is in
a lot of cases harder to implement and reason about.

I would like to object that generally, you cannot assume that simple means
easy and complex means hard, there are complex systems that actually turn out
to be easy to reason about and simple systems that turn out to be quite hard.

I actually would not be surprised if the next generation of VCS will feature
more complexity than GIT to make working with rewritten history easier and to
pave the way for certain workflows that git makes possible but not convenient.
Then I hope that these approaches will be complex but easy.

PS: Subversion is for example an example for complex and hard. While the
interface of subversion aims at being quite easy and usable, the
implementation is very complex with a lot of corner cases, exceptions and an
abundance of leaky abstractions. It is a primary example of top-down design
gone wrong.

------
jobigoud
The two approaches also exist for in-application undo/redo stacks.

You can either try to track the _operation_ you did that will need to be
undone, or you can track the _state_ of the document prior to the change,
whatever change that be. I have found the second approach to be more robust
and simpler to think about.

~~~
anon4
The first approach is mostly an optimisation when you need to operate in a
memory-tight environment and can't afford to keep several complete copies of
past states.

~~~
xorcist
Not necessarily. The point here is that you _reason_ about the complete state,
not that you _store_ it as-is.

See for example git itself which has grown quite an efficient storage system
despite the design ideas being as described in the article.

------
AnimalMuppet
Linus didn't just wake up one day with these ideas. He'd been using source
code control systems for a while on a huge project (the kernel) and had been
growing dis-satisfied with what they did. He knew, by direct experience, what
he wanted to be different, and why.

------
riffraff
And yet, at the time I was hoping we'd get darcs-like cherry picking, and 7
years later the incumbent VCS still doesn't :(

~~~
sanderjd
Care to elaborate on what is better about darcs' cherry picking?

~~~
defen
In git, a cherry pick essentially just looks at the diff specified by that
revision, then applies that diff as a new commit. The system doesn't record
the context (unlike with a merge where you at least know the parents).

darcs system is way too complicated to get into in a short comment reply, but
the basic idea is that if you cherry-pick a commit you get all the context
along with it. That's because darcs stores a series of patches rather than a
series of tree states.

One nice thing about git's way is that since it's just pulling a diff, you can
cherry-pick from anything, (e.g. add a remote that's a totally separate
unrelated project) as long as the diff applies cleanly.

~~~
sanderjd
Thanks for the explanation - very interesting!

------
pjungwir
This makes me think it'd be handy to embed git into a desktop application and
use it as the datastore. But I suppose the GPL prevents this unless the app is
open source.

~~~
rakoo
Embedding git _the application_ (or even the library) itself can be difficult,
as shown by github's experience [0] (I guess they know what they're talking
about).

What you can do, on the other hand, is use the git _format_. There's already
something pure-python [0] and something pure-go [1], and I'm pretty sure the
same exists for other languages.

Oh and by the way, the pure-python I linked to is used for bup, a backup tool
that stores its data in git format. Because it's extremely efficient.

[0]
[https://github.com/bup/bup/blob/master/lib/bup/git.py](https://github.com/bup/bup/blob/master/lib/bup/git.py)

[1] [https://github.com/speedata/gogit](https://github.com/speedata/gogit)

~~~
ash
> as shown by github's experience [0]

Missing link? bup project is not related to github.

~~~
rakoo
Woops ! I was talking about this one: [https://speakerdeck.com/tanoku/my-mom-
told-me-that-git-doesn...](https://speakerdeck.com/tanoku/my-mom-told-me-that-
git-doesnt-scale)

------
baldfat
Linus = Genius. People who take his personality first miss the man that really
is more like the public persona of Steve Jobs then the actual Steve Jobs.

~~~
jeremysmyth
Did you read the article? I don't believe the article is "personality first"
at all.

There's little in there that highlights his personality aside from a throwaway
comment that the author believes they're both somewhat arrogant. In fact, in
all the Linus quotes in the article there's not even a single shred of
arrogance.

~~~
wirrbel
the article does not claim that they were arrogant in this discussion. It was
very much not an article of the "Linus-Torvalds-is-rude" kind but more of a
comparison of visions.

In a way, Torvalds showed with git that he is a good software engineer by
putting together established techniques to form an excellent "product". He did
not get side-tracked by reinventing the wheel but focused on a useful feature
set and a performant implementation. A very good job indeed.

~~~
baldfat
I actually never said anything about arrogant BUT the author did in his
introduction. I only said "Personality First"

