
Distrusting git - vu3rdd
http://benno.id.au/blog/2011/10/01/git-recursive-merge-broken
======
gst
The real problem here is:

... "and I was getting ready to commit a series of important changes" ...
Before doing so, I want to merge in the recent changes from the remote master,
so I do the familiar git pull. ... "maybe I’m going slightly crazy after 3
days straight hacking" ...

Do I interpret this correctly as that the author has not commited any changes
for 3 days?

With SVN there may be an excuse for this, but with Git the right way is to
commit as often as possible, and then squash your commits before pushing them.
With such a workflow the problem would have been a non-problem - just use git
reflog and checkout your previous version.

Of course you wouldn't use a git pull then, but just rebase your local commits
on top of master.

Learn how to use your tools, instead of complaining about them!

~~~
regularfry
How is it relevant how long it was since he last committed? It sounds from the
description like this could happen if he'd committed five minutes before.

It's _still_ not cool to clobber work like that.

~~~
ajross
Of course it's not cool. It's a bug, and it was fixed. But the relevancy is to
the title and summary. When you say "git destroyed my data" it _sounds_ like
you're saying that git lost commited data in the repo.

What actually happened here is that the working tree got clobbered. That is a
_vastly_ less problematic situation. Working trees get clobbered all the time:
rogue "make clean" changes, system crashes, someone-stole-my-laptop, errant rm
-rf, forgetting which tree your changes are in... I've lost working data to
every one of these, and I never felt the need to blame my tools in a blog
post.

So yeah. It's a bug (and a pretty embarassing one). It was fixed. Is there
anything more to say? Move along.

~~~
joe_the_user
If "make clean" clobbered a source directory without me making an error in my
make file, I would be rather miffed.

 _"All software has bugs, but bugs that destroy data are pretty devastating."_
It's hard not to agree with this...

~~~
ajross
The point was more that if you've ever written a "make clean" rule, you've
probably blown away your source tree a few times trying to do it. The software
development working directory is the wild west. Bugs that destroy data here,
frankly, don't rise anywhere near "devastating" in my book, sorry.

------
jmount
The article isn't as anti-git as the title might lead you to believe. I
enjoyed the article for the research and up-voted it. I use and like git, but
the ideas of silent data loss is scary (as it spreads).

Long story below.

However, if you want real fun try out what one centralized repository did for
me once. I was (against my will) using Visual Source Safe in the 1990s (ick
ick ick). Visual Source Safe at the time represented its data on a server with
two RCS style history files (called a and b). When you committed both of these
were updated (no idea why there were two) and then as a matter of policy
Visual Source Safe re-wrote your local content from the repository. That is on
a check-in: it wrote back over your stuff. Fast forward to the day the disk
filled up on the server and a single check-in attempt corrupted a and b (so
even if redundancy was the reason for 2 file, it didn't work) and the the
server stayed enough up to force overwrite my local content. Everything lost
for the files in question (no history, no latest version, no version left on
my system, forget about even worrying about recent changes). Off to tape
backups and polling colleagues to see if we could even approximate the lost
source code.

------
silentbicycle
My ears perked up when I heard that it involved renames on OSX. I don't know
about the exact issue he had, but I recently found out the hard way that OSX's
HFS+ is a case-insensitive* filesystem. You can get subtle issues by importing
multiple files with the same case (such as "README.txt" and "ReadMe.txt") into
the same repository; this isn't specific to git.

I had a similar issue with Perforce on Windows - Perforce was case sensitive,
Windows wasn't, and thanks to CamelCase, there were two files that had the
same letters but different casing. (I don't remember the names.)

* Technically, "case-insensitive but case-preserving", which in practice seems to mean, "case-sensitive, _except when you need it to be_ ".

~~~
illumen
yeah, or it can be sensitive. It's got various options that you can tweak
depending on needs.

~~~
silentbicycle
I didn't know that. (Some info here:
[http://hints.macworld.com/article.php?story=2003102722460311...](http://hints.macworld.com/article.php?story=20031027224603111))

I still think it's a terrible default, though. I'm coming to OSX from BSD and
using it _as_ a Unix. Retrofitting case-insensitivity onto a Unix is bound to
lead to ugly corner cases.

~~~
masklinn
> Retrofitting case-insensitivity onto a Unix is bound to lead to ugly corner
> cases.

It generally works well, but the other way around does _not_ work: many OSX
software (most prominently — as usual — adobe's) play fast and lose with
casing, and _will_ break down on case-sensitive HFS+.

~~~
gthank
I'm pretty sure it hoses Norton's "security" software, too. Of course, if you
can't handle documented file system options, I don't trust you to detect
exploits….

------
garethsprice
"OK, so the bug never trigged in 16,000+ Linux kernel merges."

If your team is avoiding a tool with a 1 in 16,000 chance of failure then
they'd probably also want to avoid flying (1 in 20,000 chance of death by
failure), large bodies of water (1 in 8,942) and run terrified from cars (1 in
100) (source: <http://www.livescience.com/3780-odds-dying.html>.

The car stat seems rather high, and git won't kill you, but the general point
is that a 1 in 16,000+ chance of losing a few hours of work is "s--t happens,
find a workaround and get over it" odds.

~~~
jvm
Actually though, there was a 0% chance of this happening to anyone who uses
git properly and commits before merging, and a 100% chance of it happening to
this guy.

~~~
kenjackson
If this is how you properly use Git then why do they allow the improper way to
do so? This seems like broken UX if you allow people to do something that will
100% cause catastrophy -- unless there is virtually no way to design it
otherwise (but it's obvious that this is not the case with Git).

~~~
eropple
Because you should have read the man pages.

Right?

Right?

No, of course not, and your point is obviously clear. "Blame the user" is
still in vogue in some circles, though.

~~~
bennoleslie
In this case the git pull man page is relatively clear that:

"If any of the remote changes overlap with local uncommitted changes, the
merge will be automatically cancelled and the work tree untouched"

~~~
kelnos
... which git failed to do properly. So the git-pull man page is clear that it
won't clobber your uncommitted changes. Except in the (buggy) case when it
will.

------
cheald
> Before doing so, I want to merge in the recent changes from the remote
> master, so I do the familiar git pull. It complained about some files that
> would be overwritten by the merge, so I saved a backup of my changes, then
> reverted my changes in those specific files, and proceeded.

Yikes. git stash / git stash apply. Pulling into a dirty working tree is
asking for trouble.

git losing data is Very Very Bad (and _massive_ kudos to the author for
tracking down the bug rather than just bitching about it), but if you're
following a proper git workflow (pull to clean working trees, save often), you
shouldn't ever be in a position to trigger this bug. That's not an excuse for
git to break like that, but the reason that it was likely never seen in the
16k Linux commits is that it's not the "right" way to do things.

------
yason
To quote: _OK, so the bug never trigged in 16,000+ Linux kernel merges_
—kernel developers are probably sane people quite proficient in git so that's
quite unlikely to happen. That's probably the reason the bug was out there for
a year, nobody ever bumped into it. I would bet some money on none of them
kernel developers ever having git-pulled into a dirty working tree. (Most of
the newbies around the world who probably bumped into it didn't understand git
was in error there—excluding the author.)

I can't explain why the opposite happens. Most of the people I know
intuitively commit or stash their local changes before merging. They have this
intuition even if git is relatively young piece of software. But then there
are always a handful of people that I imagine who could do something like
that. And I'm not quite sure _why_.

One possibility is that it could come down to the level of trust in computers.
I don't think I _could issue_ git-merge without git-stash/git-commit
first—probably because I don't instictively trust programs to handle complex
operations too well in the first place. Operations such as handling unsaved
data or letting random commits from different place three-way merge themselves
into a single branch. _Or both._

This mechanism of distrust might be similar to how drivers who think they're
bad drivers are, in fact, the best drivers. They underestimate their
capabilities enough to assume everything won't always go right, and then
they're a few steps ahead when something goes wrong.

------
jder
For anyone that's interested, the version of git that fixes this issue (1.7.7)
has just been released:

Download: <http://code.google.com/p/git-core/downloads/list>

Announcement: [http://git.661346.n2.nabble.com/ANNOUNCE-
Git-1-7-7-tc6849424...](http://git.661346.n2.nabble.com/ANNOUNCE-
Git-1-7-7-tc6849424.html)

------
fr0sty
I'm really late to this party but I want to stress a. Point that doesn't get
mentioned often enough:

Do not use "cp". Please.

Copying changes to save them and reapply later is nearly guaranteed to quietly
lose changes, reintroduce removed code, or otherwise screw up your work.

If you want to move changes stash them or commit them and then apply them
elsewhere. Using cp throws out all of git's ability to help you do what you
mean and not what you say.

Also, the data corruption was caused by a bug, yes, but the cp based workflow
being used will result in a nasty suprise sometime in the future.

------
gwern
That's kind of an odd response to a bug - not adding a new test, but just
noting that it didn't hit one particular project. Is Git's entire test-suite
just 'the Linux kernel changelog'?

------
biot
Even if you choose to keep three days worth of changes uncommitted, you're
still doing local backups of your machine anyways, right? He'd be facing the
same amount of information loss if his hard drive died.

If you're on OS X, Time Machine will get you back to where you were recently
(except if your home directory is encrypted, then it backs up on logout). Or
use Dropbox/SpiderOak/other to keep the last _n_ versions of your changes.

------
jessedhillon
At first, it seemed that this was another rant about a misbehaving piece of
software.

But I was impressed that, unlike so many others (myself included), the author
went beyond just complaining. He actually made a real effort to identify the
conditions under which the issue occurs. But I was blown away when he actually
examined the source code and identified when the bug was introduced. Great
work!

------
codenerdz
Its good to know that my 'feature development' git flow would make sure that
this bug would be avoided or at the very least easily worked around.

My favorite git flow in 143 easy to remember steps

1) git checkout -b MyFeatureBranch # create a feature branch

2) Code/Hack/Fall Asleep on Keyboard

3) git commit -am "Wow finally done with this tiny feature"

4) Go back to to 2 if needed

5) git checkout master

6) git pull # get all the latest changes

7) git checkout MyFeatureBranch

8) git rebase -i # squash commit comments if neccessary

9) Fix merge conflicts, git add, then git rebase --continue

10) git checkout master

11) git merge MyFeatureBranch

12) git push

13) PROFIT!!!

This is borrowed from [http://reinh.com/blog/2009/03/02/a-git-workflow-for-
agile-te...](http://reinh.com/blog/2009/03/02/a-git-workflow-for-agile-
teams.html)

~~~
diab0lic
Not to mention this "microcommit" approach in your feature branch allows you a
fine level of control over your working branch without polluting the master
branch (because of the squash). We use this one at work, and I do on my own
personal projects as well. Never have I been happier with a VCS than I am with
git after moving to this workflow.

------
ezyang
I suspect that certain types of people (including myself, at times), actually
want continuous backups being taken on the state of their working copy prior
to actually performing a commit. Bugs or not, Git doesn't do very well with
unversioned changes: an accidental 'git reset --hard' can easily blow out lots
of work (happened to me), even if that was exactly what the command was
supposed to do. The correct thing to do is commit early and commit often (git
commit -am "Wibble"; git reset HEAD~ works well for me) but from a user
experience standpoint this ought to be automatic.

~~~
joeyh
Yes, it would be nice to have the assurance that the WC state was stored in
the reflog before every git command that could possibly change it. This would
probably be a relatively easy patch to write, although it could also slow git
down too much to be accepted.

------
ldng
I admit I skimmed through the article, but, "git destroyed my data" .. hum ...
how come ?

Git is a versioning system that doesn't free you of making backups of your
central master. And by master I mean the global central reference repository
or whatever you call it.

So you screw up your repository using an unconventional workflow and now you
and your co-worker don't trust git anymore ?

Well maybe you shouldn't have blindly trusted it in the first place. It's a
better tool than many but still is just a tool you should use with care. As
any tool. It has bugs.

That said, I feel your pain. Finding bugs in other tool can be a very
frustrating experience. Well, shit happens :-)

------
kwamenum86
Git can be a beast conceptually speaking if you don't learn it the right way.
This was undoubtedly a bug but the author's story surfaced some suboptimal git
habits.

------
bcl
The way I avoid problems like this is: 1 - Always do new work in a branch off
whatever branch you plan to commit to eventually. 2 - commit often 3 - Use
rebase -i to squash commits when everything is looking good 4 - Use rebase
parent-branch to replay your commits on top of whatever new stuff is in the
parent and resolve any conflicts 5 - Only then go back and merge the working
branch back into the parent-branch

------
leeoniya
have a habit of doing "git stash save" before any pulls if you have
uncommitted changes. problem solved.

------
Vitaly
cool article about tracking down a bug in git.

But this is really a small corner case and I can see how it went unnoticed for
a year.

I almost never use 'git pull' (I do git fetch and then "git merge" or "git
rebase" depending on the results), but more importantly I never ever use pull
when I have changes in my current working repository. I commit, and _then_ I
pull or pull --rebase etc. This way I'm _really_ sure that my data is safe as
git has a lot of safety features for committed stuff. all files are stored as
objects and there is a reflog to help if you loose track of rebased branch
etc.

Another thing that I sometimes do is 'git stash' before pull/merge/rebase etc.
git apply later is also a very safe op.

------
nahname
The biggest issue here is not committing before pulling. Always commit all of
your changes before updating your history (either through git pull, git
fetch/merge or git pull --rebase).

------
kayoone
have my working copies inside of dropbox. Of anything goes wrong with git, i
can still go back to older versions of any file using dropbox.

~~~
dorianj
Likewise, I keep all my git repos backed up to Time Machine. As long as I'm at
my desk, I can never lose more than an hour's worth of work. I don't think
I'll ever understand git well enough to solely rely on it.

------
phzbOx
Personally, I always stash or commit my files before merging with another
branch. I feel like not doing so is asking for trouble.

------
vog
The article has a serious character set issue. It contains stuff like
"doesnâ€™t" instead of "doesn't".

~~~
Xurinos
Your browser -- and my browser -- did not autodetect the character encoding
correctly. Switch to Unicode (UTF-8), and it should look fine. I used Firefox.

~~~
EdiX
The reason our browsers don't autodetect the character encoding as utf-8 is
because that page doesn't say it's utf-8. In that case browsers are allowed to
assume it's latin-1.

~~~
graywh
Actually, it did say. In line 1.

    
    
        <?xml version="1.0" encoding="utf-8" ?>
    

Perhaps your browser and mine disagree on when that declaration is acceptable.

For reference, [http://www.w3.org/International/questions/qa-html-
encoding-d...](http://www.w3.org/International/questions/qa-html-encoding-
declarations)

~~~
nbpoole
If you want to get pedantic about it, the content is being served as text/html
(as opposed to application/xhtml+xml) which means the XML declaration isn't
valid.

~~~
wnight
Well, I'd like to know why it doesn't work. If that's pedantic, so be it.

Edit: I think people took this the wrong way. I mean it's not pedantic to find
the real problem. We're all better off because the poster I'm responding to
showed us the real problem.

------
grammaton
It's not cool that the guy lost some data, but as he points out - all software
has bugs. This is just one more reason to make your commits as atomic as
possible - which it sounds like he wasn't doing at all.

------
jarin
Well, at least thanks for the reminder to update git :)

------
mkramlich
Two lessons here:

1\. don't do what that guy was doing. asking for trouble

2\. upgrade to git 1.7.7+. just to be sure.

------
davvid
tl;dr: User still hasn't learned that committing early and often is a good
idea. User blames tools for his ignorance.

~~~
cheald
While he could improve his workflow, git had a legit bug in this case. If he
was complaining that "OMFG, git reset --hard wiped out my changes!" then we
could laugh at him, but git did a Bad Thing here by any reasonable measure.

~~~
davvid
I appreciate the replies but down-voting is silly.

Look, I have to deal directly with co-workers that misuse git. When I see them
doing something bad, I tell them about it. The biggest resistance has been
from those that refuse to change.

Here's a real-world example: "git messed up my merge again". What? I go to
look into it -- well it turns out they don't understand git and work around it
by copying files out of their sandbox, run "git pull", and then copy stuff
back in. This is such a recipe for disaster (it easily can and will lose
others' changes) that _not_ telling him to "improve his workflow" is
disastrous for the project as a whole.

Yes, git messed up his merge. But as others have noted, so could have a bad
Makefile rule. All I'm suggesting is that by improving his workflow
(committing early and often) then there is no chance of work getting lost,
ever.

~~~
cheald
I don't think anyone takes issue with the "fix your workflow" part of your
statement; bad workflows should be fixed. They aren't an excuse for bugs in
the software, though. A version control system should never unrecoverably wipe
out data unless you explicitly ask it to.

~~~
davvid
Heya, yes, bugs are completely unacceptable. Trust me, the git project
completely understands this.

Sorry if my comments seemed rude or snarky. Actually, I tend to dislike
"tl;dr" comments altogether. I don't think my comment added much to this
discussion. If I could I'd delete it. cheers

