
What’s Wrong with Git? A Conceptual Design Analysis - adns
http://people.csail.mit.edu/sperezde/onward13.pdf
======
shadowmint
Humm, well, you can summarize this paper with:

1) The proposition that all systems (and thus git also) should be composed of
operations that are orthogonal (no side effects), general (can be used for
multiple purposes) and that the system displays a high level of propriety
(available operations are limited to those strictly required).

2) Git isn't very good and any of these because it has weird side effects, the
operations it does have are often very specific, and there are a whole bunch
of different ways of doing similar or related tasks. ...and the author
_really_ doesn't like the idea of staging. Which seems irrelevant to the
argument, but turns up 4 or 5 times in the paper.

3) Gitless is amazing because it doesn't have the concept of staged changes.
See (2) about how the author really likes this:

    
    
        The elimination of the staging area was enthusiastically received as a
        major reduction in complexity, though one student missed being able to
        stage files and then only diff those staged files prior to committing 
        (using git diff --staged). We believe this to be a limitation not so much
        in the conceptual model of Gitless but rather in the detailed functionality 
        of the gl diff command, which appears to be insufficiently versatile.
    

Well, I believe the author would be more plausible if they didn't have a clear
bias towards a result they favoured.

How about looking up, for example, a bunch of things that people use staging
for, and then objectively evaluating those, compared to the alternative in a
'no staging' world?

Perhaps you could also include potential limitations of using the
orthogonal/general/propriety comparison as the _single test_ for evaluating
features? Like, does following that path lead to slower implementations, where
the 'concepts' are inherently limiting to the implementation, like with
bazaar.

Food for thought~

~~~
jwr
I also found the paper to be very biased. I use the staging area all the time
and like it a lot. Also, I really enjoy git's practical approach.

It seems the authors would find hg to be much more enjoyable. It has a
stricter approach ("this is how you should do it, because conceptually this is
what things should be"), which some people like.

------
stephen_g
I really don't get the part, "Despite its widespread adoption, Git puzzles
even experienced developers and is not regarded as easy to use"

Am I the only person who finds Git exceedingly simple? I had significant
experience with Subversion, and then tried Git on some of my personal
projects. I didn't find the learning curve very steep (perhaps some of the
articles and documentation I read helped a lot, or maybe some of the graphical
tools I used at first), but compared to SVN, git just feels to me how a
version control system should work...

Sure there are some little quirks with the interface, but it didn't take all
that long to work out...

~~~
SwellJoe
I didn't find the switch any harder than learning about revision control in
the first place. When I moved from RCS to CVS, it was pretty easy (though I
was pretty weak in RCS, and so I had to learn a lot of new concepts). When I
switched from CVS to Subversion, it was really easy (Subversion is designed
from the ground up to just be a better CVS). When I switched from Subversion
to git, it was a challenge...but, not a really big one.

There are some things I still don't get about git. And I still find myself
with some weird commits that look like I'm committing what other people _just_
committed (because my tree was behind theirs/HEAD, and I had commits that
happened and then I had to pull in order to push...I _still_ don't know how to
avoid that, other than always pulling immediately before doing any commits and
then pushing immediately after committing, which feels clunky).

I tinkered with some other distributed revision control systems before it was
apparent git would win the majority of the mindshare (and actually before git
even existed). Someone I worked on the Squid project with also happened to be
an arch and then bzr developer (and I think now works at Canonical on DVCS),
so I spent quite a bit of time using those. I also found them kinda confusing.
But, not so much that I couldn't get work done.

~~~
Skinney
> There are some things I still don't get about git. And I still find myself
> with some weird commits that look like I'm committing what other people just
> committed (because my tree was behind theirs/HEAD, and I had commits that
> happened and then I had to pull in order to push...I still don't know how to
> avoid that, other than always pulling immediately before doing any commits
> and then pushing immediately after committing, which feels clunky).

Git pull --rebase?

------
stiff
This seems to boil down to a criticism of the staging area, so it is strange
its purpose is never clearly explained in the paper. The reason why the
staging area exists, is, I think, that for larger teams working long term on a
code-base, it is crucially important for the version control history to be
very, very neat, with logically separate changes in separate commits, with
very clean commit messages for each change, with the right changes going to
the right branch (eg. you don't want to commit a change that can go live
immediately to a branch that's going to be released in 3 months) and so on.
That's also why git makes rebase such a big deal, I guess Linus spends a lot
of time getting people to use the VCS right even after the changes themselves
are more or less right, and thanks to rebase and the distributed model he at
all is able to do corrections related to version control and branch/release
management before changes enter the main repository.

People aren't really good in remembering about those things up front, that's
why Git introduces the staging area, so that you can work as usual and only
after you are finished with whatever was occupying your mind you can consider
splitting your work into nice commits, which can be quite a task in itself to
do right. If you remove the staging area, and want to incrementally build up a
few commits from the changes you made, you end up having to pass in again and
again a lot of parameters to the commands executed, first to git diff, then to
git commit, and it's easy to do a mistake and diff something other than the
changes that will actually be comitted in the end.

A lot of people, especially in small teams, use version control very sloppily,
and then get confused about conflicts, changes are hard to track down in
history etc. Remember that Git was build for maintaining Linux, which has an
absolutely huge number of people working in parallel - in this case you
really, really have to care about using the VCS tidily, actually understanding
its concepts very well, and not just churn away commits, or you will just fail
to integrate the changes correctly. So, frankly, while I like the general
discussion in this paper and its approach, it seems to me a bit confused with
respect to Git, I wonder whether the authors have any experience in doing
long-term software development in a team and especially doing software
_integration_ and in using a VCS for that purpose. Once you have a few people,
or more than one team working on a project, a few testing servers, and a few
different release branches, the concepts in Git do make a lot of sense.

~~~
haberman
Staging seems like a perilous way of creating clean commits. If you make a
commit out of only _some_ of the changes in your working tree, the result of a
commit will be a filesystem state that never existed in your working
directory, and thus was very likely not tested as committed.

~~~
etherealG
if you don't trust yourself to make that judgement, try this workflow:

stage, commit, stash

now your working copy does match the state you committed, and you can test
away to your hearts content. if you find a problem, fix it, and commit
--amend. keep this cycle going till you're done.

stash pop, carry on working

almost every time i see a criticism of git it's about the way you use it not
the tool itself.

~~~
caster_cp
>almost every time i see a criticism of git it's about the way you use it not
the tool itself.

In my point of view, you cannot separate "the way you use the tool" with the
"tool itself". A hammmer is easy and straightforward to use because of its
design features. That is what makes the hammer useful. Granted that there are
complicated tools, that require a steep learning curve (a pipe organ, for
instance), but that should not be the case of git.

The whole point is that the tool should make it easy for you to do what you
intend to do. That's the whole point of criticizing an application (or tool,
as you put it). I am pretty sure that you can make beautiful and exact things
with Git, but the fact is that sometimes they are difficult to perform or
counter-intuitive, and that's the crux of the criticism.

A tool should not be designed only to "allow" people to do certain things. It
should also make these things easy and straight forward.

It's impressive how (most of the times) our usage of a tool is directly linked
to how it was designed. Therefore, design features (like the ones proposed in
the article) cannot be distinguished from the "core" of the tool, or the
functionalities it allows one to perform. The design, in some sort of way,
_is_ the tool. And that's what conditions our usage of it.

~~~
revscat
> The whole point is that the tool should make it easy for you to do what you
> intend to do. That's the whole point of criticizing an application (or tool,
> as you put it). I am pretty sure that you can make beautiful and exact
> things with Git, but the fact is that sometimes they are difficult to
> perform or counter-intuitive, and that's the crux of the criticism.

I'm not sure I agree. There are tools that are inherently difficult, because
the problem they attempt to help with are inherently complex problems:
architecture, MRIs, corporate taxation, managing pilot and crew schedules for
airlines, etc.

Managing source code for any system of sufficient complexity falls squarely
into this domain. Git tackles this -- nicely, I would argue. Among other
needs, VCS's need to separate code changes into manageable chunks, store them
in a compact manner, and be able to distribute those changes efficiently over
a network.

Git handles these quite nicely. Separately, you would like developers to have
the ability to commit changes in small, related chunks, all while
simultaneously preventing conflicts -- or at least making them difficult. Git
does this as well.

> A tool should not be designed only to "allow" people to do certain things.
> It should also make these things easy and straight forward.

Again, I'm not sure I agree with this premise for all cases. Tools should be
as complicated as they need to be, and no more. The basic workflow behind git
-- add, commit, pull/push -- is not overly complicated, and I must be honest
in admitting that it puzzles me when it is otherwise claimed. Is it _easy_?
Apparently not for some. My personal path was CVS, SVN, MKS, Perforce, then
git, and it did not take me long to understand the benefits of git over the
others I had used.

It was pretty straightforward. Different, but hardly intractable, especially
for a tool which is so singularly important to me as a developer. In that case
I do not mind complexity, given the flexibility that is gained and, frankly,
since it's what I do for a living.

------
verteu
New users find Git difficult because it has extensive hidden state. The effect
of "git commit/diff/reset" is completely dependent on the invisible state of
the stage/branch/history DAG.

A competent user always knows what "git status" will output. But novices don't
even understand which hidden state they must keep track of.

~~~
pedalpete
By 'hidden state' do you mean that Git doesn't easily show the user where they
are in the process? Is something like sourcetree a good solution?

~~~
verteu
Exactly -- I've found SourceTree and the Visual Git Reference
([http://marklodato.github.io/visual-git-guide/index-
en.html](http://marklodato.github.io/visual-git-guide/index-en.html)) very
useful to teach beginners.

------
Mithaldu
In short: This guy wrote a wrapper around git [1] that practically makes it
SVN, then wrote a paper about how students don't like the staging area and
actually got published. [2]

The real interesting story here is how peer review managed to let something
slip through that ignores the main target demographic for git, professionally
working and experienced developers; and also ignores such extremely simple
hypotheses such as "some people think Git is hard because most learning
materials about Git are bad at communicating".

[1]
[http://people.csail.mit.edu/sperezde/gitless](http://people.csail.mit.edu/sperezde/gitless)
[2]
[http://dl.acm.org/citation.cfm?doid=2509578.2509584](http://dl.acm.org/citation.cfm?doid=2509578.2509584)

------
shepik
The paper assumes that git is hard because there are lots of concepts, like
"Tracked file", "Ignored file", "File staged for removal", "Untracked file"
etc. They also think to remove index as a way to reduce complexity.

But those are not the reasons why git is hard.

~~~
pedalpete
can you elaborate on why you think git is hard?

~~~
shepik
imo git is hard, because it is hard to think of history as a tree. also,
problems we solve with git are harder than what we used to solve with
subversion or cvs. i mean, most of the git problems i or my colleagues ever
encountered happened because of a rebase, or rebase + merge, or merge +
rebase, or similar combinations, when we tried to rewrite history to make it
clean and readable, and we never even think about "clean history" back then.

You can grasp in about an hour concepts like "index", "staging area" and other
that the article mentions.

~~~
Someone
I don't think it is hard to think of history as a tree. SCM tools have been
doing that for decades (with fewer branches)

Also, git history is more like a DAG.

Also, about that 'grasping in about an hour': way too often, reading some more
confuses the hell out of you again. Witness:

 _You can grasp in about an hour concepts like "index", "staging area" and
other that the article mentions._

So, is there a difference between index and staging area? Google "git index vs
staging area" gives me top hit
[http://stackoverflow.com/questions/12138207/is-the-git-
stagi...](http://stackoverflow.com/questions/12138207/is-the-git-staging-area-
just-an-index), which does not help me.

Second hit is [http://stackoverflow.com/questions/4084921/what-does-the-
git...](http://stackoverflow.com/questions/4084921/what-does-the-git-index-
exactly-contain). Again, far from a clear answer.

Further googling/clicking gets me to
[http://stackoverflow.com/questions/6716355/why-staging-
direc...](http://stackoverflow.com/questions/6716355/why-staging-directory-is-
also-called-index-git-index).

And no, the git-scm book at [http://git-scm.com/documentation](http://git-
scm.com/documentation) did not help _me_ either. It seems to have banned the
use of 'index' as the (almost? More or less?) synonym for staging area.

I think a large part of complaints about git are caused by its confusing user
interface and confusing terminology. Yes, terminology may have been cleaned up
officially, but 'the Internet' is littered with remnants of its history.

Apart from that, one thing that I find confusing about the staging area is
that it is invisible. Consequently, there doesn't seem to be a way to build
what would get committed on as 'git commit' (do a 'git add X', then edit X.
'git commit' will commit the old content of X, but 'make' will use the edited
content of X. Or am I confused again?)

~~~
_ikke_
index and staging area refer to the same thing. The staging area is a high-
level concept, while index is more of an implementation detail (exists in
.git/index).

------
banachtarski
Really? Is git _really_ that hard? Please. The man pages are super readable
and explain pretty much everything. Nobody's complaining about awk, sed, find,
and the like. Git exhibits functionality on par with those tools and the user
should expect a similar degree of complexity.

~~~
dscrd
It's the little, seemingly unnecessary complications, like
[http://bitflop.com/document/111](http://bitflop.com/document/111)

If there was awk, sed, find which did the same thing but were significantly
easier to use, _I_ would complain about them. At least if I was forced to use
them.

~~~
banachtarski
My point is that you'd be hard pressed to make awk, sed, and find simpler
without removing functionality. In the same way, I think git is really compact
considering its feature set. The issues like the one you point to in the
article occur once in a blue moon.

------
YuriNiyazov
Man, I was obscenely happy when I discovered "git stash", and these jokers
want to get rid of it. Not interested.

------
wrs
The word "merge" occurs only once in this paper, in the overview. That is an
indicator of the superficiality of the analysis overall.

Yes, if your needs are so simple that you never perform a merge, Git is too
complicated. If you work on a large team with a lot of parallel efforts going
on, you know why all the Git functions are there, including the staging area.

I applaud the idea of a "single-user" or "training wheels" Git that has a
simplified model like this, but claiming you're "analyzing Git" when you limit
the domain of your analysis so severely is rather misleading.

EDIT: And also, I don't think the approach of working backwards from "here's
how Git fails to support what the users thought Git was supposed to be doing"
rather than forwards from "here's how Git fails to communicate to users what
it's actually doing" is the most productive way to do this.

------
__david__
The authors seem confused by the purpose of the "Assume Unchanged" feature.
According to the initial commit [1] it seems intended to be used as a speed
optimization for crappy filesystems, and not as some way to avoid committing
files with changes.

They also say:

> Of course, the user might make the set of files explicit on every single
> commit (leaving out the database configuration file), but this is laborious
> and error-prone.

I find this amusing given that I don't _ever_ use "git add -u" or "git add -A"
in my daily git life—"git add -p" is as close as I get (and commit-patch [2]
is nice, too).

[1]
[https://github.com/git/git/commit/5f73076c1a9b4b8dc94f77eac9...](https://github.com/git/git/commit/5f73076c1a9b4b8dc94f77eac98eb558d25e33c0)

[2] [http://porkrind.org/commit-patch/](http://porkrind.org/commit-patch/)

------
midas007
Git absolutely fails at the claim of being good for remote developers on slow
links. If the network drops during such an op, feel free to enjoy starting
from scratch.

~~~
babas
What "op" exactly? git push/pull? git push/pull are bandwidth efficient and
atomic. A failed pull/push wont destroy your local/remote copy.

~~~
Argorak
It tends to drop your connection on large packs.

Thats more a server configuration thing, many git hosts disconnect slow
clients.

I was in Botswana for a few weeks in November and I can relate to that
sentiment: git was unusable down there.

~~~
midas007
Yuck, BTDTBTTS. Satellite latency is eye-stabbing, almost as bad as the
utterly worthless Mountain View's Google Wi-Fi. Local internet elsewhere, in
random countries, GFL... bring your own or start a service (yeah, a friend
made some serious cash putting up a service on some Greek island).

Anyone on OSX can feel some pain just by enabling Network Link Conditioner
prefpane by creating a profile as follows:

    
    
      Download Bandwidth: 256 Kbps
      Downlink Packets Dropped: 90%
      Downlink Delay: 1000 ms
    
      Uplink Bandwidth: 256 Kbps
      Uplink Packets Dropped: 90%
      Uplink Delay: 1000 ms
    
      DNS Delay: 2000 ms
    
    

Setup instructions:

[http://mattgemmell.com/network-link-conditioner-in-
lion/](http://mattgemmell.com/network-link-conditioner-in-lion/)

------
dschiptsov
One could almost smell MIT - emphasis on a proper methodology conceptual
integrity and links to Gabriel's "Worse is better".) The guys made my day.)

------
NumberSix
Git has many problems.

(1) Many git commands have many variants that do different, sometimes very
different things.

(2) There are many different ways, including variants of many different base
commands, to do very similar, but not identical things. These are not aliases
for the same action, but rather commands with similar, overlapping, but still
different effects. For example, should I do a "git reset", "git revert", "git
checkout", or more complicated acrobatics with branches, merges, rebasing etc.
just to discard some work that I don't want?

(3) One of the many adverse consequences of this is that if you Google how to
do many things in Git, you will often find several conflicting answers. Unless
you already know a lot about Git, in which case you don't need to Google how
to do something in Git, this is very unhelpful.

(4) Git uses cryptic 40 character hexadecimal SHA-1 codes to identify commits
rather than a sequential numbering system. This means for example that one
cannot tell automatically from two SHA-1 codes which commit or file came
before or after the other.

(5) Git's branching scheme makes it difficult to set up a traditional
test/development/production system where a developer can easily checkout the
production code for the system except for their sub-system _AND_ the
development version of one or more other sub-systems.

(6) Git's branching system tends to result in work being scattered across
dozens of private branches belonging to different developers or teams making
integration difficult.

(7) Git user interfaces fail to hide the low level, complicated command line
interface from users. Something usually goes wrong that requires reverting to
the command line to sort out what happened and fix it.

(8) It is difficult to attach human readable names such as "release candidate
1" to git commits. There are "tags" but by default they are not pushed to
remote repositories. It is possible to set up repositories to block a push
with human readable tags; the remote git user lacks permission to push a
commit with the tags.

(9) Git's extreme complexity means it is used in very different ways at
different companies and organizations. Different companies and organizations
often add further systems on as wrappers around Git, e.g. the Gerrit code
review system. For example, some Git users essentially never use rebase while
others have a process that makes heavy use of rebase.

(10) Git enthusiasts usually respond to criticisms such as this by proceeding
to explain or attempt to explain some complicated set of acrobatics in Git,
often involving several cryptic variants of several commands, that may solve
the problem but is impractical to remember and reuse or document.

(11) Git uses many different names for the same concept or component of Git
such as index and staging area. This is confusing even after several months of
using Git.

~~~
Crito
1) True. This is a mild annoyance. Particularly I think that the -b flag of
checkout should be removed and instead checkout functionality should be added
to git-branch.

2) I don't see the problem. reset, revert, and commit each do different
things...

3) This hasn't been my experience. When in doubt, go with the top SO answer I
guess?

4) Absolutely not a problem; this is a feature. Sequential numbering is not a
powerful enough concept to fit what is possible in git. Just for starters it
begins to break down when you realize that relativity of simultaneity kicks in
with DVC.

5) What? How? I do that all the time...

6) If your team is refusing to work with each other, that is an organizational
failing. You've got some problems there that are not related to git.

7) Wait, what UI are you normally using?

8) I suppose I can see this would be annoying if you are frequently pushing
very large numbers of tags. Though the second complaint does not make sense to
me; why would you configure your repo to reject pushed tags if you want to
push tags? If you want to do that, then don't do that.

9) _" That's a feature."_™

10) Zero cryptic commands above. I aim to please.

11) Eh, such is language. I wouldn't be opposed to tightening up the docs, but
I don't believe anybody really has serious issues with this.

------
doubleshotmocha
There's a lot of things I felt like I've said before when reading this

------
molalala
love it

