
A Git catastrophe cleaned up - asymmetric
http://blog.plover.com/prog/git-tastrophe.html
======
mcherm
The author concludes by saying:

>I think I've written before that this profusion of solutions is the sign of a
well-designed system. The tools and concepts are powerful, and can be combined
in many ways to solve many problems that the designers didn't foresee.

I disagree. I consider this to be a failure of Git. The set of different
options (normal merge, rebase, filter-branch, etc) is complex and not cleanly
orthogonal which makes for a very messy "mental model". Even experienced
experts would have difficulty finding the clear, simple way to solve this
problem and those less experienced would have little chance of proceeding
cleanly.

I really wish some tool other than Git had "won" the version-control race; I
honestly believe Git to be the worst of the contenders in the most recent
generation of version control systems (albeit better than the previous
generation in important ways).

~~~
deanCommie
Allow me to disagree with your disagreement.

Git fails often for the "basic" use cases. I won't lie. The number of options
is intimidating for a beginner and one can easily get themselves into trouble.
This is why there are Git guides and tutorials and UIs and they are all
ultimately unsatisfying when you're a beginner.

However the use case discussed in this article is NOT a beginner use case. It
is an advanced case and the fact that Git is an advanced tool that supports it
(both natively with git-filter-branch, as the author found out, and ad-hoc-ly
as was their original plan is a testament to the power of the system.

Tell me how you would better achieve the requirements that were the premise
for this article in a different VCS?

~~~
ap22213
DVCS is fundamentally a simple concept. And, every use case should be possible
with a series of elemental operations. Any 'advanced' use cases can and should
just be 'macros'.

~~~
jack9
> DVCS is fundamentally a simple concept

It is, if you're going to simplify it so far as to avoid the problemspace by
nomenclature. It's just transforming nonlinear events into a linear sequence!
I guess multithreading is also a fundamentally simple concept, in that vein.

~~~
ap22213
I don't mean to minimize the effort that went into Git. But, have you ever
looked at the source code? It's like someone dropped the kitchen sink into the
global namespace. The whole code base is full of premature optimizations. And,
there's a lot of low-level infrastructure code coupled in with the core logic.

I know abstraction is out of fashion these days, but it may have helped a
little in organizing things. Abstraction, when used properly, can almost
always make complicated things more simple.

Basically, if a _regular_ person had created Git, and not a demigod, would it
even have become what it is?

~~~
Hello71
we would have hg.

as a user, I'll take premature optimizations over slow as balls any day.

~~~
ap22213
The problem with premature optimization is that most of it has little effect
on overall performance. Software performance almost always follows a Pareto
distribution.

But, wtf do I know, I'm just some regular person.

~~~
SOLAR_FIELDS
It's hard to tell - did the premature optimizations allow Git to extend way
beyond the initial release with little worries of performance? It's not
something that's easily quantifiable.

------
peff
The simplest solution is:

    
    
        # try the merge; you'll get conflicts on those files
        git merge topic
    
        # discard the versions from the topic branch;
        # you know you already merged those changes in
        # the funny "git checkout commit", so any differences
        # are due to changes on master.
        git checkout --ours new-file-{1,18}
    
        # now you are free to fix up any real conflicts
        # and resolve the merge
        git commit
    

This has the advantage of representing the true history. You had two lines of
development (the original topic, and the "squashed" history created for
deployment), and the merge shows them coming together and choosing the
deployment-side content.

~~~
cyberpunk
Yeah; I know it sounds evil and this shouldn't be done but why not just make a
new branch from the borked master, force push master back to sanity (pre-
massive merge), have everyone else cherry pick and push their commits and then
just rebase new branch and fix the conflicts in there as you should and merge
normally?

~~~
btym
Because it's selfish. Your time isn't more valuable than that of the
developers already working against that branch (and as stated in the article,
there were hundreds). It's your job to merge your changes cleanly.

~~~
cyberpunk
Agree -- I was talking recovery though..

And I missed the hundreds of devs bit apparently. My bad.

....Hundreds of devs though? On the same repo? I started sweating thinking
about that one ...

~~~
mjrbrennan
Pretty sure Facebook has one massive repo for all their projects for thousands
of devs.

------
guomanmin
Participate in Atlassian Research

My name is Angela and I do research for Atlassian. I’m kicking off a round of
discussions with people who use Git tools. Ideally, I’d like to talk to people
that sit on a team of 3 or more. If this is you, I would love to talk to you
about your experience with <using> Git tools, or just some of the pain points
that are keeping you up at night when doing your jobs.

We’ll just need 30 mins of your time, and as a token of my thanks to those
that participate, I’d like to offer a US$50 Amazon gift voucher.

If you’re interested, just shoot me an email with your availability over the
next few weeks and we can set up a time to chat for 30 minutes. Please also
include your timezone so we can schedule a suitable time (as I’m located in
San Francisco). Hope to talk to you soon!

Cheers, Angela Guo aguo@atlassian.com

------
msvalkon
While an unorthodox merge strategy was used, this is what happens when you
hole up in a topic branch for a long time. I bet this would've been easier had
they merged smaller commits or PR's to master constantly. If one is afraid of
deploying unfinished features, don't make them functional until they are
ready. Tie them together once finished. Or did I miss something here?

~~~
sangnoir
I'm also very surprised - I thought it was standard to merge master (target
upstream branch) into the topic branch frequently (daily seems reasonable). I
know some people do not appreciate "merged <branch> into <other branch>"
commits in their history, but that is a small price to pay IMO.

~~~
kosinus
Or rebase. Don't hide merge fixes in merge commits, keep your changes relevant
when read in context of current master.

Unless you're working on a topic branch together with another dev, but I find
that rarely happens in practice.

~~~
nogridbag
Yes I just completed a large refactoring in a feature branch which affected
lots of files and the only way to stay sane throughout the process was to
constantly rebase my work on top of master (within my feature branch).

Once my refactoring was complete I squashed it into a single commit to prepare
to merge (or rebase) into master. I don't really think it's useful to keep the
history of how I implemented that refactoring. Squashing it into a single
commit is far easier to revert then if I merged multiple commits into master.

------
lmm
> The next day he wanted to go ahead and merge the front-end changes, but he
> found himself in “a bit of a pickle”. The merge didn't go forward cleanly,
> perhaps because of other changes that had been made to master in the
> meantime. And trying to rebase the branch onto the new master was a complete
> failure. Many of those 406 commits included various edits to the 18 back-end
> files that no longer made sense now that the finished versions of those
> files were in the master branch he was trying to rebase onto.

Can one not instead merge master into the feature branch?

~~~
antocv
I also dont see why he didnt just git revert to the parent commit, the commit
before the other guy changed 18 files seemingly randomly.

Then rebased featurebranch on that, and thats it. So sure the history would
contain the mistake, so what.

~~~
cyberpunk
I'm with ya; I don't see the evil in just force pushing head back to where it
was before though -- do you really need 800 useless commits in the history
when you're going to just be adding the code back in sanely later anyway?

Reverting them seems a bit more risky to be than just rebuilding it properly
-- you'll need to write some scripts to make sure you got all the commits and
then the whole dev team is going to end up pushing commits which unrevert your
reverts just so you can then rebase the feature branch..

Sometimes push -f is the best option ;)

~~~
antocv
But not in this case, a force push is not needed.

~~~
cyberpunk
Depends on the situ really; if it got people working again faster (yay, saved
money!) or if production was unavailable and it helped get things back up
faster (yay! more business!) then it'd be the right move in my book --
otherwise, yeah, it's a horribly hacky thing to do and shouldn't be done
unless you really have to...

I'm not saying I would have done one or the other if I was the OP -- just that
I'd have no issues doing it should the situation call for it and it saved
time/money at the cost of pure feelings :}

------
soft_dev_person
Why not just revert the offending commit? It would be a valid blip in history
as mistake made and corrected.

~~~
OJFord
> _It occurred to me while I was writing this that it would probably have
> worked to make one commit on master to remove the back-end files again, and
> then rebase the entire topic branch onto that commit. But I didn 't think of
> it at the time. And it's not as good as what I did do, which left the
> history as clean as was possible at that point._

~~~
soft_dev_person
I read that, but it doesn't read like he is aware of the option to revert a
commit directly. More that he would remove them manually and commit that.

Little difference, but if "as clean as possible" was the goal, I think that
would be the cleanest.

[edit] added the little words that my keyboard seems to filter out.

~~~
OJFord
Maybe you're right; I agree that's probably what I would have done.

Apart from anything else it allows a clean-slate "this is how it _should_ have
been handled in the first place", rather than a separate git-fu move that was
necessary or even relevant only because of the original indiscretion.

------
fpig
I don't understand the problem here, why didn't he just do a merge and resolve
the 18 conflicts by using the version of the file from master?

And the problem wasn't in checkout-add-commit, that is a trivial issue, the
WTF here is producing 406 new commits in a branch without ever thinking of
merging master back into it or rebasing on master, to avoid having a giant
merge later.

------
rurban
400 commits not cleanly applying? Not a big deal. I routinely merge 1000-2000
commits and rebase 30 active branches to that also. The solution is git
rerere. It stores all the resolved merge resolutions forever, and cp or rb
apply cleanly then, without any trouble. Eg
[https://medium.com/@porteneuve/fix-conflicts-only-once-
with-...](https://medium.com/@porteneuve/fix-conflicts-only-once-with-git-
rerere-7d116b2cec67#.8pj73vnex)

------
kazinator
> _X decided to merge and deploy just the back-end changes, and then, once
> that was done and appeared successful, to merge the remaining front-end
> changes._

> " _What should X have done in the first place to avoid the pickle?_ "

0\. (Of course, not develop a 406 patch changeset and then have to pick it
apart. Make smaller pushes, frequently.)

1\. Create a topic branch right there at the tip where the 406 changes are
locally committed.

2\. Then use git's interactive rebase to rewrite this branch such that just
the back-end commits are picked first, followed by the front end.

3\. Make a back-end topic branch from the last back-end commit and test that.
If it's cool, master can be rebased to that and pushed to origin/master
upstream.

4\. Test remaining front-end changes, rebase master to them, push.

Also:

3\. a) If back-end changes need fixing, fix them on the back-end-topic branch.
Then rebase the original topic to the back end topic to pick up these changes
"under" it. (I.e. replay the front end over the new back end, and install as
new front end).

~~~
forgottenpass
_2\. Then use git 's interactive rebase to rewrite this branch such that just
the back-end commits are picked first, followed by the front end._

Probably how I'd do it, but I know it's not a popular workflow. The only
project that I've seen publicly talk about expecting a changeset to rewritten
from "history of fiddling around" into "series of incremental improvements to
a codebase" is the linux kernel.

It's more work for every developer, and digs into more advance git operations,
but it really helps keep the tree clean. Most projects don't have a velocity
that requires such well-groomed changes, but this is an example of how failing
that can make the history ugly and changes more difficult to work with.

For those interested why/how you'd rewrite a patch series:
[https://www.kernel.org/doc/html/latest/development-
process/5...](https://www.kernel.org/doc/html/latest/development-
process/5.Posting.html#patch-preparation) The 406 changes probably compress to
less than 20.

~~~
azernik
My ex-employer (Meraki) was VERY insistent on people cleaning up their
histories. It was very good not just for keeping a clean tree, but also for
code quality - once you have code changes in neatly-separated commits with
discrete chunks of functionality, any leftover test code or unrelated changes
pop out immediately to both the original coder and the reviewer.

By the way, this also helps with the issue this developer faced because part
of the problem seems to have been that the backend and frontend changes were
mixed in the same commits (hence the ugly hack for committing). Doing this
kind of history-cleaning as you go makes it much easier to manipulate the
order of committing changes to master.

~~~
kazinator
My experience exactly: Gerrit reviews, changes separated out: no "patch
bombs", etc.

------
jdonaldson
Reading this was like watching a traffic accident in slow motion. I could hear
myself yelling at the author as if he were a student driver:

"Use filter-branch!!! Use filter-branch!!! NOOOOOOOO NOT merge union with
manual deletes!!!"

But... he went and did it anyways. Honestly, reading back through commit logs,
you always find the part where the driver runs off the road, plows through a
clearly marked gate, runs on a train track for a mile or two, then merges back
onto the main street, carrying part of a mailbox and a deer carcass.

You can fault git if you want, but it seems like some of these cases just
arrive naturally no matter what cvs is used. It would be great to have a "git
education" repo that contains situations just like these to work through...
sort of a "drivers ed" for managing a repo.

------
mjd
The Reddit discussion of this, though brief, was interesting and to the point.

[https://www.reddit.com/r/git/comments/5i3mpz/another_git_cat...](https://www.reddit.com/r/git/comments/5i3mpz/another_git_catastrophe_cleaned_up_story_of_a/)

~~~
coolgeek
This is the author of the blog post we're all discussing

------
guard-of-terra
Two fun things about git: It is deterministic, and it doesn't delete anything
(readily).

This means you can't really have a catastrophe.

Just git reflog your way out.

~~~
tomp
That's a bit of an overstatement. You can't have a catastrophe _with stuff
already committed_ (but it's very easy to delete in-progress work).

~~~
duiker101
If you are committing so rarely that you can have a CATASTROPHE with your in-
progress work, maybe you should try breaking the work in smaller parts and
committing more often

~~~
lmm
Unfortunately git's staging area concept encourages you to do exactly the
opposite of this.

~~~
reitanqild
I almost only use it to cherry-pick and then immediately commit?

~~~
lmm
Right, all the good use cases involve not actually using it - if you're
immediately committing then it would be better if it just became a commit.

~~~
reitanqild
I meant: I cherrypick from my changes and commit bugfixes and drive-by update
of comments separately.

My bad. I shouldn't have used the word cherrypick there.

------
krupan
People in the discussion here saying, somewhat blindly it appears to me, that
using mercurial would have avoided this mess. I'm a huge mercurial fan and
have dealt with some tricky situations similar to this (but never this
situation exactly) and I'm not so sure how mercurial would have handled it.
The best I can say is that I've never known even the most adventurous
mercurial user to use checkout (actually revert in mercurial) on individual
files in that way. Is that something git people do more often?

Aside from that, it'd be fun to see how mecurial handles this, but I'm not
sure from reading the original post if I could exactly reproduce it.

Mercurial would let you do the checkout (revert) trick that started it all. I
can imagine it causing merge conflicts as described. Mercurial does let you
specify how to resolve merge conflicts for the whole merge, or you can tell it
not to resolve conflicts at all and then you can run hg resolve on a file-per-
file (or glob of files) basis and tell it to pick default (equivalent of
master) for the files you want. I didn't quite follow the git way of doing
this he described with .gitattributes, but using hg resolve sounds easier (but
neither are things a non-expert user of either tool would know).

In the end some other solutions were proposed. I would not recommend using
checkout (revert in mercurial) either. I don't know of a filter-branch
equivalent in mercurial, but that sounds like a cool way to deal with this. In
mercurial I probably would have reached for graft (equivalent of git's cherry
pick), which isn't very different from git.

------
SadWebDeveloper
The problem with Git is that everyone is trying to use Git as central
repository rather than distributed as if it were SVN, personally i blame
GitHub for promoting among the new developers the wrong tool for the job
causing all this unnecessary drama. Git is the best version control system if
and only if the project has a good leader checking everyone merges before
commiting and letting everyone knows who is working on what and what parts
will affect.

------
cyberpunk
> But I couldn't think of anything, so I asked Rik Signes. Rik immediately
> said that X should have used git-filter-branch to separate the 406 commits
> into two branches, branch A with just the changes to the 18 back-end files
> and branch B with just the changes to the other files. (The two branches
> together would have had more than 406 commits, since a commit that changed
> both back-end and front-end files would be represented in both branches.)
> Then he would have had no trouble landing branch A on master and, after it
> was deployed, landing branch B.

Well. Okay. That's a technical solution and it'd work, it's probably no less
time consuming than going and fixing the code in a new branch and merging
cleanly (every time I end up needing to filter branch stuff I have to RTFM on
it, and it takes ages) -- this problem is _NOT_ a technical one though; it's a
process one.

Why are you landing 400 commits in one go? Half of those were on files which
then start causing merge conflicts for your team and wasted a huge amount of
your time?

Use feature flags, fix your conflicts in branch, don't merge anything into
master unless it's using the 'merge' button on github/gitlab/gogs/whatever.
And really think/discuss/roundtable about how you're introducing features
because it sounds like this is running away from you a bit here..

It doesn't need to be this complex, and these kinds of messes can't really be
put on the tools -- although git certainly makes it easy to set a lot of
things on fire..

~~~
edibleEnergy
> this problem is NOT a technical one though; it's a process one.

No, the cause of the problem was a process one, but now it's become a
technical problem.

That isn't really helpful when someone has already created a git nightmare
like what's described here.

~~~
cyberpunk
Hmm, I wasn't trying to come off that way, although I wasn't trying to be
helpful with the specifics of their git nightmare since happily they sorted it
out anyway: I was only trying to suggest that the only way you "clean up" from
a nightmare such as that is to start looking at what seems to be an obviously
fragile way of dealing with their code since that's the real problem..

Unfucking the repo, while mildly technically interesting should really be the
smaller of the two lessons learned from such a mess... No?

~~~
greghatch
It's a little tone deaf to offer "should've done it this way!" as advice to a
person already in a bad way.

Unless OP is asking for better processes to avoid this in the future, I think
it's not a good idea to offer preventative advice as a first response. YMMV,
but this sort of comment is often unwelcome (especially if you phrase the
questions in an interrogative way... example - "Why are you landing 400
commits in one go? Half of those were on files which then start causing merge
conflicts for your team and wasted a huge amount of your time?" comes across
like you kind of enjoyed typing it and doesn't really occur as a kind stranger
leaning in to help)

If your post was directed more to the readers of the thread instead of at OP,
I think it would've come across less admonishing. Of course, I'm not judging
you for your comment to be clear, just providing feedback on why your comment
might rub people the wrong way (apologies for length).

~~~
cyberpunk
Hm, well -- OP; I wasn't trying to have a go if you thought that, and the
question wasn't rhetorical -- if you're up for sharing how this sort of thing
comes to pass I'd love to have your story/discussion..

As for coming across like I enjoyed typing it, I'm pretty much at a loss on
how to respond to that one besides: NEIN!

------
forrestthewoods
I like Perforce. It may not be perfect. But it's idiot proof.

"Days since gitastrophe" is a common phrase. There is no Perforce equivalent.
You can't blow your leg off. There aren't thousands of "Perforce made easy"
blog posts because it's actually easy. There are no "fixing my p4 repo" tales
because it never breaks.

Thanks Perforce.

~~~
reitanqild
Sorry for ruining the party (seriously).

 _I like Perforce. It may not be perfect. But it 's idiot proof._

That's because it is almost impossible to do anything, good or bad with it.
;-)

 _" Days since gitastrophe" is a common phrase._

Never heard it before.

 _There is no Perforce equivalent. You can 't blow your leg off._

The equivalent of a hand saw: you will have a harder time cutting off your leg
with a hand saw than with a chain saw.

 _There aren 't thousands of "Perforce made easy" blog posts because it's
actually easy. There are no "fixing my p4 repo" tales because it never
breaks._

Or because nobody uses it and those who do keeps the knowledge for
themselves;-)

Seriously: I used perforce for 18 months and I tried to understand why people
loved it but no luck. Git isn't exactly perfect but compared to "checking out"
a file before even editing it or not being able to commit without access to
the server (IIRC) git is perfect. Oh, and not being able to check out a file
that someone else has checked out, so if a dev goes on holiday without
"checking in" the files first then you have to hack a bit to get ready for
work again.

The best argument however for git is that even the perforce guys have
integrated with it.

Thanks Perforce.

~~~
AstralStorm
Bonus points when you get conflicts or try to use their terrible Swarm review
system which relies on shelves.

Also branching which hides previous history before branch out for no reason.
No real branch merge functionality. Automatic merge being inferior to what Git
has.

And more...

------
cousin_it
In a corporate environment, I think I prefer a simpler workflow with a plain
old centralized VCS and without using any branches at all. As code gets
written, each commit goes on the trunk behind a feature flag (which you need
anyway). That way each commit can benefit from continuous builds and testing,
and other people can notice problems early. Branches would only be used for
releases.

I've worked like that for years on some pretty big projects, and it never
caused complicated problems like in the OP. The only caveat is that you need a
strong safety net against breaking the trunk (lots of tests, mandatory code
review, etc.)

------
isaac_is_goat
Why wouldn't they reintegrate the mainline development branch with the new
branch if it was so long lived? And/Or have the new code behind a feature flag
so you could potentially have it deployed but disabled? So many ways this
could have been avoided with some basic forward thinking...

------
luos
So if I get this then he just added the files to master, then tried to work on
the topic branch?

That really seems weird. I don't think it's git's fault. Also he could have
done git merge master --accept-theirs if he really wanted some kind of history
but I guess it would be worthless.

------
chetanahuja
I humbly (re)submit this for your consideration [https://git-man-page-
generator.lokaltog.net/](https://git-man-page-generator.lokaltog.net/)

------
marcinkuzminski
IMHO it's odd approach to the problem. I'd rather ask the author (who knows
best in this case) to split this into two separate parts that can be nicely
merged.

------
bowmessage
Worth noting that cherry-pick takes commit ID ranges in the form
xxxxxx..yyyyyy, it may have simplified the driver.

Thanks for the tip re: --keep-redundant-commits!

------
hellofunk
The more I read stuff like this, the more I wonder how many problems would
just go away for so many people if they used Mercurial instead.

~~~
hackerboos
Why is Mercurial better than git? It's my understanding that you can still
rewrite history with Mercurial.

~~~
krupan
Mercurial keeps track of whether you have pushed a commit or not, and if you
have pushed it, it won't let you rewrite it without doing a manual override.
This feature is called, phases.

------
gragas
>and published the changes to `master`

-_-

------
draw_down
I hate this shit. Rebasing always causes conflicts and dealing with them is
such a giant pain. I get that in this case the designer really brought the
pain on themselves but I wish using git didn't require this sort of surgery
periodically, which in my experience it does.

~~~
aninhumer
As opposed to what? Conflicts are an inevitable consequence of multiple people
working on the same code at the same time, and git is as good or better than
the alternatives for handling that.

~~~
draw_down
In my experience- as opposed to merging, which just works, but nerds get all
persnickety about because they hate merge commits. I understand the concern I
suppose, but working with `git rebase` just seems so unnecessarily painful.
It's really awful.

~~~
rplst8
rebase uses the same merge logic as merge. What some people don't get is that
if there were three commits pulled down form a remote, and you had three
commits, and you had conflicts (different ones) on each of those commits,
rebase makes you resolve them one by one so it can literally "replay" those
commits on top of the tip of the branch you are rebasing onto.

That's why rebase is so cool. It lets you solve each conflict during the merge
of each commit separately. Then you can do it interactively and squash it all
into one commit.

Learn the tools, don't complain.

~~~
aninhumer
> It lets you solve each conflict during the merge of each commit separately.

Which is what they're complaining about, compared to just merging and doing it
once. Sometimes resolving smaller chunks at a time can be easier, but almost
as often it involves a lot of extra effort creating intermediate states for
little benefit, and often they don't even make sense.

I much prefer to keep an accurate history, rather than invent one that I think
looks neater (the same goes for squashing commits). And honestly I think a lot
of what people object to can be mitigated by learning how to use git log
properly.

~~~
rplst8
Look, if I make 15 commits, to my local repo, some with major functionality
additions, some with bug fixes, and some with typo fixes - I shouldn't have to
share that whole history with the remote.

Nor would anyone want me to. The re-writing history part only really applies
to your local repo. If you re-write history on the remote, you are going to
piss some people off.

~~~
aninhumer
>I shouldn't have to share that whole history with the remote.

You don't have to share it, but why not?

It might be useful in some cases, and people don't have to look at it the rest
of the time. They can just look at merge commits, which are almost always
coherent changes. Instead of relying on people to invent continuity, which
might not even make sense.

------
scarface74
I think everyone is overlooking the main point.

He said that one guy was only making changes to either the front end or back
end code.

They should have been two separate repos. one for the front end code and one
for the back end code.

~~~
icebraining
That's a huge discussion in itself:
[https://news.ycombinator.com/item?id=10007654](https://news.ycombinator.com/item?id=10007654)

~~~
scarface74
That was a great discussion. As someone said below, I agree if it is
independently deployable, it should be in its own repo.

If you need to share code between repos you need to create versioned packages
- in the c# world - Nuget.

