This was one of the article where Git finally clicked for me. The key quote:
There is no need for fancy metadata, rename tracking and so forth. The only thing you need to store is the state of the tree before and after each change. What files were renamed? Which ones were copied? Which ones were deleted? What lines were added? Which ones were removed? Which lines had changes made inside them? Which slabs of text were copied from one file to another? You shouldn't have to care about any of these questions and you certainly shouldn't have to keep special tracking data in order to help you answer them: all the changes to the tree (additions, deletes, renames, edits etc) are implicitly encoded in the delta between the two states of the tree; you just track what is the content.
It's been 7 years since I have been using Git and I can't imagine how I ever worked with version control which didn't work on the entire tree.
Lately, Linus announced use Git object database format for Subsurface. One of the respondents said "Why not use JSON?". Linus excellently defended by saying, putting everything in one file was not great. So, even though not being a web guy, he was still aware, why using Git object file format had more merits than any other thing.
This is what all software engineers should aim for.
Hm, when a guy writes his own kernel he is smart. I mean, as far as implementing goes, as smart as it gets. The amazing thing is that he was pretty young when he did it (1991-2). And then, there this. When people talk about "hackers", Linus is the first person that comes to mind.
OS programming from absolute scratch is nothing special by the standards of what was done that day (some years later you had OS toolkits and a huge amount of tools to make that far easier, like virtualization; 20 years we could maybe beep to debug our programs). Many in the programme grew up on Commodore or Spectrum which also meant a lot of low level tricks.
So Linux 0.1 didn't really have any amazing contributions to computer science (on the contrary, you may recall the famous Tanenbaum-Torvalds thread on microkernels vs monolithic kernels). It was pragmatic and, quite quickly, useful.
I think where Linus did extremely well was a) successfully managing a huge number of contributions while being highly technically involved and b) relentlessly changing the internal design to improve it. If Linux had been a commercial product, there'd be lot of senior people greatly invested in their own designs that'd be unwilling to modify them.
For comparison, here's another famous kernel programmer who has the technical skills, but not the collaboration skills: http://www.templeos.org/
That's a key point. Linus is not a Computer Scientist, he is a programmer. CSs advance the theory of computation. Progs make shit we can use.
I'm a fairly anti-social person. I don't know many people. Yet at the time Linux came out, I personally knew a dozen people who could easily have written it when they were young.
So why didn't they?
Several of them had satisfied their urge to hack on operating systems by getting jobs hacking and porting Unix (and a couple of them "ported" Unix by essentially writing a new implementation).
The others who could have done it had no need for it. They all had easy access to Unix workstations and Unix VAXes, and were busy dealing with their urges to hack on other things like graphics or AI or networks or scientific computing.
The amazing thing about Linus is not his considerable technical ability--plenty of people have that--but rather his management ability. As I said earlier, I know at least a dozen people who could have written a kernel...but I don't think any of us could have taken it from a one man kernel to a worldwide project with hundreds of contributors.
In a hundred years, Linus Torvalds will have a footnote in technical textbooks, and a whole chapter in business textbooks.
Here's the thread context, for anyone else who is curious how Linus responded: http://lkml.iu.edu//hypermail/linux/kernel/0008.2/0240.html
For the record, he didn't seem to address esr's email.
But I'm mostly surprised because rename conflicts are this transient thing. git-merge-recursive will detect a rename conflict, but you're hosed when it comes time to resolve it, since the information that it's a rename conflict isn't captured anywhere except, briefly, in the phosphors of your CRT.
In the author's example, when you run `git status`, it will simply tell you that `saludo` was added by them. Which is exactly the behavior of the rename-deficient git-merge-resolve. The expectation in resolving this, I suppose, is that you saw the message that this was a rename/delete conflict, remembered the original filename and could somehow make a decision based on that.
This is not terrible in a rename/delete conflict, but for some other types of rename conflicts, it's much more difficult. For example, branch 'A' renames a file from 'foo' to 'bar', branch 'B' renames it from 'foo' to 'baz'. Now you have two files in your working directory and git-status can only tell you that they were each added, which is not indicative of a conflict.
This is annoying for a user on the console. This is impossible for somebody trying to build a UI to resolve a merge conflict: 'bar' was added in one of the branches... why does this conflict? Well, if it's only on one side of the merge, then it must have come from some rename conflict. But with which other file? What's the common ancestor that git-merge-recursive decided was a rename? Meh.
(Please do not mistake this rant as a suggestion that Codeville's merge is superior to Git's. I'm not suggesting that, just that git-merge-recursive has a few rough edges that could use polish.)
It's a bet that "future self" (improving heuristics) will be more effective than "past self" (attempting to design a future-proof the repo format). It looked like the bet was paying off in 2007 when the blog post was written, and 7 years later that still seems to be the case.
 Metadata which would need to be carefully managed for compatibility across versions, and which would be missing any time the user forgot to explicitly record it (with a Git command) and instead made a change directly to the worktree.
With a few horrible warts thrown in that make you actually cry.
The underlying assumption is that simple approaches can lead to an "easy" solution. To contrast this with a complex algorithm, a complex algorithm is in a lot of cases harder to implement and reason about.
I would like to object that generally, you cannot assume that simple means easy and complex means hard, there are complex systems that actually turn out to be easy to reason about and simple systems that turn out to be quite hard.
I actually would not be surprised if the next generation of VCS will feature more complexity than GIT to make working with rewritten history easier and to pave the way for certain workflows that git makes possible but not convenient. Then I hope that these approaches will be complex but easy.
PS: Subversion is for example an example for complex and hard. While the interface of subversion aims at being quite easy and usable, the implementation is very complex with a lot of corner cases, exceptions and an abundance of leaky abstractions. It is a primary example of top-down design gone wrong.
You can either try to track the operation you did that will need to be undone, or you can track the state of the document prior to the change, whatever change that be. I have found the second approach to be more robust and simpler to think about.
See for example git itself which has grown quite an efficient storage system despite the design ideas being as described in the article.
darcs system is way too complicated to get into in a short comment reply, but the basic idea is that if you cherry-pick a commit you get all the context along with it. That's because darcs stores a series of patches rather than a series of tree states.
One nice thing about git's way is that since it's just pulling a diff, you can cherry-pick from anything, (e.g. add a remote that's a totally separate unrelated project) as long as the diff applies cleanly.
What you can do, on the other hand, is use the git format. There's already something pure-python  and something pure-go , and I'm pretty sure the same exists for other languages.
Oh and by the way, the pure-python I linked to is used for bup, a backup tool that stores its data in git format. Because it's extremely efficient.
Missing link? bup project is not related to github.
There's little in there that highlights his personality aside from a throwaway comment that the author believes they're both somewhat arrogant. In fact, in all the Linus quotes in the article there's not even a single shred of arrogance.
In a way, Torvalds showed with git that he is a good software engineer by putting together established techniques to form an excellent "product". He did not get side-tracked by reinventing the wheel but focused on a useful feature set and a performant implementation. A very good job indeed.
Thanks for keeping up the down votes for what I point out something that is clearly in the article BUT someone decides I am stretching something ~~~
"Now, I've never had a particular liking for either of these personalities, although I've had to recognize that they're very clever individuals. Both of them have been known for occasional demonstrations of arrogance."