Hacker News new | past | comments | ask | show | jobs | submit login
How to undo almost anything with Git (github.com/blog)
411 points by agonzalezro on June 8, 2015 | hide | past | favorite | 91 comments



I've used the following flow chart to help determine strategies to get myself out of messes:

http://justinhileman.info/article/git-pretty/

Just another way of presenting similar information. No affiliation, just a satisfied consumer of the info :)


Ha ha! I like the part of the flow where it goes: "Is anyone downstream?" -> yes -> "Enough to form a lynch mob?" -> no -> "Do you hate them?" -> yes -> "We're going to do an interactive rebase".


I use this one. Also the same information, but its in paragraph form like the OP, but with hyperlinked questions to jump around like you would in a flow chart.

http://sethrobertson.github.io/GitFixUm/fixup.html


My only real git crisis was when I accidentally `git push -f`ed what I thought was my own branch, but actually the `develop` branch. On a monday. After a weekend where about a hundred commits had been cherry picked and merged in to develop by a dev not in the office. None of my coworkers had recent versions of develop.

The thing that ended up saving me was our CI -- we autodeployed passing builds to our staging env; so we were able to ssh in and `git push -f` back from staging to our repo.


Git generally doesn't delete commits until a garbage collection is done (and assuming no other references). If you find yourself in this situation again, then try:

git reflog show remotes/origin/develop

This will show the reflog for the branch, then you can find the hash just before your push, check it out, then force push it back.

I've recovered from other developer's accidental force pushes in less than 5 minutes, with no commits lost.


But if the remote has advanced since you pushed, you'll still need to reflog on the remote server, right?


Someone must have pushed the version the remote was at, so the correct commit would be on someone's devbox.


It was a series of PRs merged on github, no one in the office had a current branch of develop.


It looks like you can use the GitHub API to get the events for the specific repo, which would include your force push, and the hash of the previous commit. Some instructions at https://objectpartners.com/2014/02/11/recovering-a-commit-fr...

edit: This should be in reply to jordigh's comment above


If it's just a matter of "merging" couldn't you just merge those branches again? Or do you resolve conflicts on github or some such (something that's never fit my workflow but I can imagine others using it)?


That's really the point. The guy who made the last change should have it locally as well. Even if he git pulls first.


During any push, the remote advertises latest SHA-1. I don't know if we actually use it in 'push -f'. But if we haven't, it's a good idea to save this in local reflog.


My most recent major git crisis wasn't reversable by git.

A little while ago, I accidentally did a `git clean -xdf` on my home directory (wrong tmux tab). I index my home directory: just the most important config files, among other notes and text files. That `git clean` call wiped half my home directory before I realized what was happening and frantically tried to ^C and ^\ it. I had to find other ways to recover my files.

The deleted files weren't essentially important, which is why I didn't back them up frequently or index them, but they were moderately important. That was a bad day.

I realize it's probably better to dump everything in ~/.config and index that instead, while maintaining symlinks in ~/. It was just the way I had it set up.


I suggest to use GNU stow for managing config files and keep your repo in a separate directory.


Places like bitbucket let you mark branches (like master and develop) as non-deletable and non-rebaseable. Basically it prevents errors like yours.


Seriously. I don't understand why github doesn't provide at least the ability to lock the master branch against any action other than a merge from another branch. Merging a branch that causes major damage is unlikely if there's any kind of review process and generating a revert to "undo" that damage is a one button thing in the github-webUI if you do it immediately, or at least before master gets other commits on top of it. But I'd think everyone's workflow would stop because they'd have a ton of conflicts from a screwed up master.

In fact, let me ask HN. Does anyone work in a place where devs commit directly to master? I can't even imagine that workflow. Everything I've seen or heard of is PullRequests, code-review, +1, then merge to master.


Yes I do work directly on master because I use the http://martinfowler.com/bliki/FeatureToggle.html pattern as opposed to http://martinfowler.com/bliki/FeatureBranch.html because in my experience branches come with an amount of technical debt that has been unacceptable to me (you just delay merging your code, making it a bigger more dangerous merge).

Branches also encourage a workflow [in my experience] where the devs implement multiple unfinished features on different branches. That's another form of technical debt. If everyone works right on master, it is more natural to finish a feature before moving onto the next.

I've also had to rewrite history on a git repository to remove things like binary files.

In the top level comment for this thread, I'd say the situation could have been avoided if that developer were cherry-picking to the git repo on his computer, then pushing that branch to the server. The history would be present on both that dev's computer as well as the server, so losing the history on the server would not affect the branch on that developer's machine. Not sure if the comment implies the developer had SSH'd to a live server & performed his cherry-pick there, but that's how it sounds.

The only use I've found for [feature] branches in my day to day work is a place for a junior developer or un-trusted OSS contributors to push changes for me to review before I merge them in. If I'm on an experienced team, I find things go smoothest if we all stay on master.


in my experience branches come with an amount of technical debt that has been unacceptable to me (you just delay merging your code, making it a bigger more dangerous merge)

You can solve that by regularly merging or rebasing master onto your feature branch (as described in Fowler's article).


"Does anyone work in a place where devs commit directly to master?"

I think this is probably common in places that have used SVN/Perforce/Whatever since forever and migrated to Git because it's now a "best practice"


It's also quite common on new projects. Many developers see no point in developing on branching when there isn't even a completed prototype yet, so the commits tend to be very wide-ranging instead of focused.


> Everything I've seen or heard of is PullRequests, code-review, +1, then merge to master.

That won't help you when you think you're pushing to and from your own branch, but are accidentally on develop and force-pushing to the remote develop branch.


That's where locking down the branches and preventing rebases comes into play. It prevents someone from accidentally doing what you describe.

Most people only know github and I don't think they expose this locking feature in their UI which is a shame.


But if your process is only merging PRs, then locking the master against pushes as he proposes would create no further issues, preventing that from happening. If you need to modify the master directly, it would complicate things.


GitLab CEO here, in GitLab we do this too and call it protected branches, your master branch is automatically marked. See https://about.gitlab.com/2014/11/26/keeping-your-code-protec...


I always recommend people to use `git push --force-with-lease`. It's more to type, but it verifies if the remote branch is the same as the remote tracking branch, so you don't accidentally rewrite branches someone else has pushed to.


You can just alias it in your bash profile, if it's too much to type.

alias gitpushf='git push --force-with-lease'


Mercurial Evolve is a safe and distributed alternative to `git push --force`:

https://www.youtube.com/watch?v=4OlDm3akbqg


Maybe that was why I always `git fetch && git push -f`. But luckily since a few months it's not a problem anymore if you configured your git correctly, because now it only force pushes your current branch.


Just make an alias pushf = push --force-with-lease, and it becomes even less to type! :-)


For what it's worth, you could have looked at the reflog of the git repo on the server and it would still have a reference (the hexadecimal hash) to the develop branch before your git push -f.


We had the hash, but no one had it locally and we could not pull it down since I had overwritten the history.


This isn't a criticism. I'm either missing info, and am making a fool of myself, or telling you about something you didn't know was possible:

You either had git configured on the server to git-gc after every push, or were unable to ssh into the server?

If neither one of those is true, then -IIRC- you could have either:

1) Logged in to the server, rewritten the affected repo's branch to point to the pre-disaster commit hash.

2) Pull down the repo's .git directory from the server, rewrite the branch, and force push that.

Would either one of those have been more work than working with your CI system?


He mentions in a different bit that it was actually on github, not their own git server. There is a way to get the reflog and write a new branch to point to the commit using github's API mentioned above.


Ah. Thanks for the info!


What kind of workflow do you use where a git push -f is necessary? Just curious, I'm no git expert.


How is this not undoable with whatever the reverse merge command is?

Second why ever use force merge?


This does not cover the only scenario which I was hoping it would. I accidentally pushed my api key/password to github and I want to "undo" that push and completely remove the history locally and on an origin? This is so obscure across many different outlets. And go ahead, flame me for pushing my password/api key to github, but all of you know you have done this at least once in your life!


If you pushed something (publicly) to GitHub, there's no undo: you HAVE TO change your keys now. Your keys were immediately ingested and compromised a few seconds after you pushed, and likely even before you realized what you've done.

To paraphrase from my own comment last month [1]:

"Some time ago I published my blog to GitHub, with my MailGun API key in the config file (stupid mistake, I know). In less than 12 hours, spammers had harvested the key AND sent a few thousand emails with my account, using my entire monthly limit.

Thankfully I was using the free MailGun account, which is limited to only 10,000 emails/month, so there was no material damage. And MailGun's tech support was awesome and immediately blocked the account and notified me, reseting the account after I had changed the keys."

If you're wondering how they were able to harvest GitHub commits so quickly, just read the article linked in that thread [2]. Basically you have bots drinking from GitHub's events firehose and the GHTorrent project. Every commit is monitored and harvested for passwords on the fly.

[1] https://news.ycombinator.com/item?id=8818035

[2] http://jordan-wright.com/blog/2014/12/30/why-deleting-sensit...


Nice, so now we need to make honeypots which post to GitHub tons of things that look like real secrets but with broken credentials, monitor logins for those things, and start a blacklist of compromised / malicious systems (which would be the things trying with those logins).


https://help.github.com/articles/remove-sensitive-data/

That said, as the article points out, you need to consider them compromised once they've been pushed and rotate the creds.


To add to this, this is not just good paranoid practice. Don't just think you're safe because you fixed it 5 minutes later and probably no one noticed. There are sites that monitor the global github commit feed looking for things like AWS credentials and SSH keys. If it's been pushed to a public github repo for even a moment, it's been grabbed.


Even slightly more obscure things, like the config file for Sublime SFTP (`sftp-config.json`) have been personally observed as a target of crawling.


It's still useful to know how to use e.g. BFG for e.g. Situations where you push a password to private git / GH :)


If it is a public repo it's compromised, period. Change keys/passwords ASAP regardless of what you do with the repo.


This last week, I accidentally pushed a github apikey (my dot files repo). Github send me an mail saying that they disabled these api key, on less that 1 minute. So, I created a new api key, and I changed my fish config file to read the api key from an private file.


Was this taken from https://news.ycombinator.com/item?id=9661349 or is it just a coincidence?


I'm so glad I use Perforce. Yeah branches are a bit cumbersome. But it's idiot proof. There is literally no way for any artist to cause irreversible harm. They can't even do harm that isn't easy to fix with a few clicks in an easy to use, easy to understand, easy to discover GUI.

I suppose one of the key features of Git is the ability to rewrite history. It makes a lot of sense in the context of an open source project pulling in changes from the wild. For most of us such utilities aren't just useless they're actively harmful.

Never leave me P4. Please God never leave me.


With Git though, I can and do have many micro repositories scattered around my disks. My bin and Documents directories almost always have their own repositories each. Git works great for this as well as being a tool to help teams with code management. I do have to Google a lot of stuff for it but it works really well and this article is a good summary.


I'm a huge git fan but I have to agree. At least some of the VCSs out there should be non self-destructive. Git is really bad if not all users of a repo take the time to learn to use it efficiently.


> Git is really bad if not all users of a repo take the time to learn to use it efficiently.

The "if" clause here happens rarely enough that I can quote just the first four words of the sentence: "Git is really bad."

There are 2000 words here on how to undo.


Well, git should not be the default, but that doesn't make it a bad tool. Think about different heads for screw drivers. Just because you have bought a set and there is one you never or seldom use, doesn't mean it's a bad head. The moment someone has a screw which needs that head suddenly all other heads are useless and that's the best head you can have. Right?

(I'm not so sure what the right English word for "head" is, I hope it still makes sense even if I use the wrong one)

Or another example. Cars are a highly specialised tools. Nobody is allowed to use one if they haven't shown that they have learned to use it properly via standardized exams. Maybe if a company uses a lot of git the mistake is that they don't require people to learn it properly first, either by offering courses or by filtering in the hiring process.

And since I've spent month learning it and now can use it even in potentially destructive situations like "git rebase -i" I can tell you that for me it's way more useful than SVN or CVS ever were.


I always used to swear by Git, until I started working at a company that uses Perforce. It's literally fantastic and I want to use it for all projects now.


Something I'm still working out....

I have a directory tree full of test data. As the project goes along, the test data will evolve, and thus should go under revision control.

Testing needs to start with known files, so, hey!, git checkout test_data - except that means my latest code revisions need to go into the test_data branch even before they're tested :-(.

Then the tests make their changes to the data, which the tests check, and which I then want to throw away. So: "git checkout test_data -f; git clean -f" -- except that cleans out the source code area as well as the test data area.

I'm thinking the test data should be separate repository. Is that a mistake?

[Edit] I've tried looking at stackoverflow.com, but searching for "git testing" returned ~7000 articles, the first few hundred of which didn't look relevant.


Keep the "pristine" version of the test data in src/ (checked into git), have the build process copy it into target/ (or build/ or whatever you want to call it; either way, ignored in git) before starting testing.


You can do:

git checkout test_data 'testdata/'

to only grab the 'testdata/' directory from that branch. You can do this with files with a pattern, like '*.c', as well.


try putting test data templates in test_templates (under rev control), and doing rm -rf test_data, cp -r test_templates test_data at the start of your testing routine. Put test_data in .gitignore.

does that help?


Thanks! On the one hand, git seems to restore the test data to its initial state faster than rm -r; cp -r, but I'm getting a strong hint from these answers that keeping the test data explicitly in sync with the code is worth the minor delay.


Sounds like something that would be a lot easier to handle with "svn revert"..


Only if you're looking for excuses to prefer svn. You can do the exact same thing you would with 'svn revert' with 'git checkout --'.



as great as git is, I wish there was something more intuitive. Git is definitely one of the more confusing tools out there.


Honestly? There kind of is -- https://mercurial.selenic.com/. I learnt mercurial first but have now spent about 3x as much time with git and I still find mercurial to be more intuitive.


Cool. The problem is git is already pretty much ubiquitous. Would it be possible to put a mercurial like UI on top of git? Does one exist yet?


Git and Mercurial have different workflows. I don't think it would be reasonable to translate one to the other in the general case. The every day operations for both are pretty straight forward. It's where you want to do something strange where it gets hairy. At that point you want to really understand what's going on under the hood anyway.

If you are interested in using Mercurial on your own project and are simply worried about hosting, here's a list of providers: https://mercurial.selenic.com/wiki/MercurialHosting

If you are thinking of using Mercurial when the rest of your team uses Git, then I would just suck it up and learn Git (or convince your team to switch). It is not, by any means, the most complicated thing you have to learn for your job ;-).

Back in the bad old days when I was forced to use Visual Source Safe for version control, we used to maintain our code in multiple repositories -- using CVS for development and simply pushing to VSS when features were complete. That kind of tactic is still open to you, but the overhead is rather large for the minimal difference between Git and Mercurial.


> Back in the bad old days when I was forced to use Visual Source Safe for version control

Sounds interesting. So was this basically a team of 'bandits' secretly using a non-endorsed VCS to collaborate and get shit done, and then pushing to VSS to keep the pointy-haired boss happy?


Hmm... I suppose the answer is yes. We got permission from our director to use CVS (at the time we were banned from using anything with a GPL license, so we had to cover our asses to a certain degree). There was a corporate mandate to store everything in VSS at some point, but I don't think anyone actually cared what you did before it was stored in VSS.

As you might imagine (the use of GPL software banned, the use of MS software enforced), the rules were politically rather than technically oriented. There were rumours that we had an agreement with MS to follow these rules and I wouldn't say it was out of the question. By and large as long as you followed the letter of rules nobody cared much after that. Using CVS was rather a major coup, though, given the GPL licence.

Possibly younger people will be amazed at the ridiculous conditions some people worked under back then. After I left that position I swore I would never agree to work with restrictions like that. In fact, the ability to work on free software is one of the first things I bring up as important to me in a job interview. Even a slight hesitation is enough to make me walk.


That's what a lot of people did. The DVCSs heavily advertised this workflow where you could subvert the ... limitations ... of VSS and its ilk.


This is the big issue with using hg anywhere I've worked. Not even that people are experienced with it -- just the name recognition usually wins out.

Honestly, I also think github's defacto status is a major obstacle to adoption.

There are actually projects that let you work with an hg/git repo and push to the other type. http://hg-git.github.io/ for instance. No idea if these things are production ready or not though.


Git is the easiest to work with SCM I've used, however I've only used SourceSafe, cvs and svn before.

Once you grok that it's more or less an immutable DAG with diffs for edges and hashes for node names, that tags are read-only labels for hashes and branches are read-write labels for hashes, all possible operations are obvious; you just need to find the right incantation.

The concept of git being so simple is what makes working with it much easier than something like svn or cvs, where doing the equivalent of a rebase, cherry-pick or merge of a diff into multiple different branches is sufficiently difficult that I developed my own tools and workflow to get around them. When I had to work with svn writing bug fixes or doing development, instead of committing my work, I saved my work to patch files which I saved / reapplied when I switched branches. I developed scripts to do 3-way merges. I haven't had to do any of that crap with git. Git is far more logical. It's just missing a consistent command-line UI.


"Once you grok that it's more or less an immutable DAG"

That's problem #1. More or less immutable = mutable. Other SCMs limit mutations to additions and use a separate command (svnadmin) to do things that may permanently lose information. The svn repository may be ugly, but it can be relied on to store history.

"you just need to find the right incantation."

Incantation is the right term. As you admit, it's "just" missing a consistent command-line UI.

The combination of these two makes me very weary whenever I do anything remotely difficult in git.

A third thing that scares me is the ease with which people talk about things still being there "as long as git hasn't garbage collected them". To me, that sounds like having a memory allocator with an 'unfree' call that you can use to try and recover accidentally freed memory.


The dag is immutable; it's a bit more than just a dag though. That's what I meant.

Your GC fears sound like superstition, sorry. Nothing to do with manual memory allocation, and the problems of manual memory management are irrelevant. GC just collects nodes no longer reachable from branches or tags. Very unscary once you understand the dag nature.


> "The dag is immutable; it's a bit more than just a dag though. That's what I meant."

https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History:

and you can rewrite commits that already happened so they look like they happened in a different way. This can involve changing the order of the commits, changing messages or modifying files in a commit, squashing together or splitting apart commits, or removing commits entirely – all before you share your work with others.

That surely looks like changing the graph, not just its attributes.

> "Your GC fears sound like superstition, sorry. Nothing to do with manual memory allocation, and the problems of manual memory management are irrelevant. GC just collects nodes no longer reachable from branches or tags. Very unscary once you understand the dag nature."

Thanks for triggering me to reread the documentation. I thought that gc would (potentially) collect all unreachable roots, but rereading https://www.kernel.org/pub/software/scm/git/docs/git-gc.html, I find:

The optional configuration variable gc.reflogExpire can be set to indicate how long historical entries within each branch’s reflog should remain available in this repository. [..] It defaults to 90 days.

The optional configuration variable gc.reflogExpireUnreachable can be set to indicate how long historical reflog entries which are not part of the current branch should remain available in this repository. [...] This option defaults to 30 days.

So, it seems that they work hard to prevent collection of nodes that you may want to refer to.

That makes this lack of documentation:

--auto With this option, git gc checks whether any housekeeping is required; if not, it exits without performing any work. Some git commands run git gc --auto after performing operations that could create many loose objects.

waaaaaaay less of a problem. I have looked hard, but cannot figure out what those 'some commands' are that may do a gc. The best I could find is http://stackoverflow.com/questions/5137447/list-of-all-comma.... That's 5 years old, greps the git source code, and not the official documentation.


Changing things produces different hash codes so they are different commits.


I'm curious if you don't like the concepts of how it works or the UI or maybe the workflow.

A workflow example would be doing a rebase after a push is almost universally seen as naughty so why permit it without some kind of UI like --I-really-know-what-i-am-doing=yes or something?

WRT the UI itself, I've read a couple people claiming the emacs magit package is easier to use than the CLI itself, which would isolate the problem to the GUI. I have not personally invested the time into magit and would find comments on that theory by people who have experimented to be interesting.

http://magit.vc/


Git's CLI is just awful in a bunch of ways. There's a bunch of commands that do two or more different things depending on how you use them, and whole piles of cryptic and incomprehensible error messages.

Heck, the just the contrast between `git add <file>` and `git reset HEAD <file>` is terrible.


I wouldn't describe the CLI as "awful", though I hear your complaint.

Not sure where you're going with `add` versus `reset` though. `git rm --cached ${file}` would presumably do what you want with parallel syntax.


The sane way to do it would be to have something like `git stage` and `git unstage`, as distinct single-purpose commands.


The first example at https://git-scm.com/book/en/v2/Git-Basics-Git-Aliases is literally exactly what you want.


"It's easy to make the CLI less bad, but the developers have never bothered to do so themselves" is reinforcement of my complaint, not dismissal of it.


Not trying to dismiss, just assist ;)


There is NO particular workflow built into git. Teams have to reach an informal agreement on how they're going to use it, otherwise chaos ensues.


Only as long as you don't learn it. If you really need to use git often, please take the time and go through the git book. Then most things make sense and using git becomes way more stable.

It's a powertool for power users. It sks that it became the major tool of our time, but that doesn't change it's design decisions. (And I'm a huge git fan. But I'm also a power user who is happy to spend weekends to learn all the details about such a tool)


Though I walk through the valley of the shadow of rebase, I will fear no evil: for reflog are with me (sorry, couldn't resist)


See also http://sethrobertson.github.io/GitFixUm/fixup.html which has helped me immensely in the past.


This is pretty good, I had never seen autosquash before. Another rebase flag that I find useful is --onto. For example, to rebase only the most recent N commits of a branch onto master:

git rebase HEAD~N --onto master


Instead of "git reset --hard stuff", I recommend "git reset --keep stuff" as it will not delete uncommited files in the working directory.


That is reasonable, depending on what you are trying to achieve,, but I think the explanation you gave is a bit misleading. Neither --hard nor --keep affect files that have never been comitted. The difference is that --keep aborts if it would affect a file that is tracked, but has uncommitted changes.


This is pretty useful for Git noobs like me.


git checkout is what I want to do most of the time, but I always have been confused with svn checkout, so I never could remember it.


One particularly tricky thing that this page doesn't mention is removing binary blobs (or any file) from the history. Someone committed tens of megabytes of binary stuff to one of our repos. A later commit 'fixed' this by removing them. But those binary blobs are still there, because it's a historical commit, meaning every time you clone the repo (or similar) you get huge amounts of crap.

Maybe --squash will fix it? Something to look at, I guess.


As far as I know, you need to use rebase --interactive and delete the commit(s) that introduced the blobs. This will, however, create a new tree which can be painful for everyone involved.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: