Hacker News new | past | comments | ask | show | jobs | submit login
Git Undo (megakemp.com)
288 points by dominicrodger on Aug 25, 2016 | hide | past | favorite | 170 comments



I freely admit to being an Hg fan, but that this stuff is accepted as common practice kinda blows my mind. What's so wrong with keeping an accurate picture of history that people do all kinds of manipulation to their history to keep from the VCS system from accurately reflecting history of development?


Opinions differ on this matter because of different concepts of what 'history' is appropriate to maintain.

At one extreme, you could keep track of all your keystrokes in the editor so that you could have a full history of your work including backspaces to correct typos.

On the other extreme is the mythical programmer who crafts perfect commits in exactly the correct order on the first attempt.

Most mortal programmers need the ability to iterate over their code to get it to a reasonable place before they want it enshrined in the blessed commit history that is shared with others or otherwise retained over time. The intermediary states, with false starts, poor implementations, hand crafted functions instead of standard functions, poorly named classes, and so on are all part of development but aren't particularly interesting to keep in the permanent history of a project.

Git's ability to manipulate the existing commit tree (amend, reset, rebase, etc.) is extremely useful for this normal 'exploratory' development. Once a stable point has been reached though (often because the tree has been published or shared with others), these commands do become inappropriate and a different set of tools becomes relevant (revert, merge, etc.).


My two main arguments for "cleaned up history" are

1) Reviews are much more enjoyable when the commits reflect the final understanding of the problem rather than false starts etc.

2) Looking back through history is much more enjoyable when the commits reflect the final understanding of the problem rather than false starts etc.


I agree 100%. In my mind the 'stable point' I mention above isn't reached until after the review process is complete and any recommend changes are applied, which often involves fixups, merging commits, splitting commits, re-arranging commits, and so on.


couldn't there be an immutable approach to this, a folded view of history. Mark commits of value, hide the iteration ones. The log reflects the folded history first; if needed you can unwrap for full details.


One clarification: amend, reset, rebase and their ilk don't 'manipulate the commit tree' other than adding commits. The manipulation is with the branch names associated with the commits.

I've always hated the common description of 'rebase' as 'rewriting history'. None of the existing commits are modified by rebase, new commits are added and the branch names are shuffled around.


I think this is pretty pedantic. I count that shuffling as rewriting history - that's not what happens in the background but that's what appears to happen, and that is what matters. What would you term it instead?


What actually happens does matter. You can't understand how lots of git commands work if you don't understand that the git commit tree is an append-only data structure and that branches are just labels of leaves in the tree.


It doe rewrite history in the sense of which events followed which events.

Imagine the following sequence of events:

I make a commit on my local master

Someone else makes a commit on their master

They push

I 'pull --rebase'

That history now shows their commits before mine in the history, even though I made my commits first, directly on top of master.


Let me see if I can clarify. Here is a summary of the situation you described before any push or pull.

    origin repo: M (master)
    your repo:   M---C1 (master)
    other repo:  M---C2 (master)
    
 If other pushes `master` to origin we have:
 
    origin repo: M---C2 (master)
    your repo:   M---C1 (master)
    other repo:  M---C2 (master)
If you then run, from master, `pull --rebase` we have:

    origin repo: M---C2 (master)
    your repo:   M---C1
                  \
                   C2---C1' (master)
    other repo:  M---C2 (master)
 
Your master branch will be positioned at C1'. As you can see from the diagram, the `pull --rebase` didn't change any existing commits, it just added C2 (same SHA as in the origin and other repos) and added C1', which are the changes in C1 applied to C2 instead of M. If those changes can't be made automatically, you'll get a conflict that has to be resolved before C1' can be created.

I don't think it is helpful to describe this as adjusting the order of commits or re-writing history or any similar language that suggests some sort of mutation to the commit tree. The only thing that has happened here is that additional commits have been added to the tree and the label `master` has been moved to a new leaf commit.

I realize some other commenters have said I'm being pedantic but I would instead say that I'm being accurate. You can't really understand how rebase, rebase -i, rebase --onto, fixups, reflog manipulations, and so on work if you don't have the correct mental model of the git commit tree.


Given that commits are immutable objects, the only sane interpretation of "rewriting history" is that it rewrites your view of the history rather than somehow rewriting immutable content-addressed objects.


Sure, but lots of people using git don't really understand that commits are immutable or that the commit tree is an append-only data structure. The pervasive use of the phrase 'rewrite history' hinders this understanding.


I disagree. I've used Git for a long time, and talked to a lot of Git users, and I've never seen anyone say something that implied they thought they were literally modifying the commit objects, as opposed to rewriting the history of a branch.


Rebasing also rewrites all your commits to have a different parent commit.


Nope, parent is correct. If you use hg's changeset evolution and run rebase, and then do a hg log --graph --hidden you can see that your original commits have not been touched, other than to mark them as hidden and obsolete.


This is a discussion about git.


They both work the same here, I just used hg because it illustrates the inner workings nicely. With git you don't get the nice hidden commits view, just the refleg (which is trying to show you the same thing).


You can see hidden commits with git log --reflog.


You might want to have `git bisect` working, for instance. For bisect to be useful, each commit should leave the repo in a consistent, "green" state. If you commit something stupid by mistake, and add a commit later to fix it, then during the bisect you might get wrongly stopped at the stupid commit.

Arguably it's more important to have clean commit history in open source projects. I used to be hard on this at work but I relaxed a bit lately.

Generally the development in companies moves much faster than in open source libraries (in terms of # of commits per weeks), and generally in open source world it's expected to have clean, well tested, working solutions rather than hacks that can be fixed tomorrow if needed - and because of that, OS maintainers have higher standards for commits.


There seems to be a split among git users between those who think history should show what actually happened (i.e. it's left alone), and those who think history should tell a story about the changes (i.e. once you've finished something, turn it into a coherent set of commits like "stub out X", "add tests for X", "make first X test pass", etc.).

I agree with you, that history should be left alone; mostly I think of the YAGNI argument that its futile to think that you have a better idea of what future developers want to see, compared to those future developers themselves.

My repo histories are riddled with stuff like "finished X", "stubbed out Y", "fix typo in X", but at least nothing has been hidden from future devs who might be digging around for their purposes, regardless of whatever elegant story I might come up with.


As I understand "undo" it's a convenient feature for addressing quickly-discovered errors.

If you rapidly realize that you committed to the wrong branch, or left a line of code half-finished, or misspelled a word, there's very little value in logging that. If you had seen it two seconds before the commit you would have fixed it without a second thought, so why insist on preserving it seconds after the commit?

Presumably (if only for safety) no one is using 'undo' on anything pushed to a shared repo. I can appreciate the argument that we shouldn't rewrite history into a nice, streamlined narrative, but I don't see much reason to avoid tools like 'amend' for fixing commit messages, or 'revert' when some silly line of test code gets committed (and not pushed).

When I'm dealing with other people's Git histories, I appreciate the middle ground approach most. There's no point in spinning some imaginary, elegant story - if it's not real history then write up an essay instead of storing it in your 'history'. But I also don't need to see every line of "oops, un-stubbed X" - my experience is that at least for immediate fixes it only makes things harder to read.


I think a solid middle ground can be found. By focusing on making clean commits, instead of being tempted to make "whoops fixed" commits, you will become a better developer and your code will be cleaner. And your history, too.

At the very least, review your commits and clean then up before pushing - merge 'fix' commits if you didn't --amend them and review commit messages. What the commit does should be obvious from the subject.

I've also seen people write git commits as if they were a work log - things like "fixed a bug", or "implement feature X". That's the wrong way (IMO) to do git history, what the comment should say is what the /commit/ does, not what you did.

There is no I in programmng.


One way I enforce the review of my own commits in my own work flow is not using the CLI for committing. My git client shows me the difference in the status window and also when writing the commit message. Then also making it easy to stage specific junks without having to go down the add -p route. I personally use magit in emacs but I'm sure gitk and some of the more graphical ones have this ability. Not to say any of these don't exist in the CLI, the ergonomics are just not the same as getting the output of multiple commands in a single well designed interface.


But what good does it have for future devs if the history is

-- Added this thing -- Fixed typo -- Capitalized the letter

Etc.


One important reason is to avoid wasting time on gilding lilies.

Another reason is that the git information (e.g. from git blame) tells us when the code was written and in what order, rather than some post-hoc rearrangement.

For example, we might notice that code X is doing some tricky work which elsewhere is done by a helper function Y. We look at the git info and see that X was added after Y, so we try to figure out what special edge-cases X is trying to deal with that Y wasn't suitable for. Little do we realise that X was actually written before Y existed, but the commits got rearranged.

That kind of archeology is difficult to predict in advance (mostly because, if we realised all of the issues with our code beforehand, we'd fix them immediately!).

Future devs are just as capable at traversing repos and collapsing diffs as you or I, so there's no need to lie to them. In fact, they might have access to much smarter tools and IDEs than we do.


Personally, I only care about when the code hit master. Because that's when it could potentially have broken shit for everyone.

That I committed it locally is pretty irrelevant: I could just as well NOT have committed it, made a backup of the files on the side, copied them back in...from the perspective of the rest of my team, my local history is an implementation detail.

If the only thing I do is manipulate my local history, then open a PR and merge, master's history will actually show something much closer to the truth: That on X date I added something to master.

That I spent 6 weeks and 300 commits locally to do it (kids, don't do this at home!), literally doesn't matter to anyone.


> Personally, I only care about when the code hit master.

So just look for the merge commit on the master branch that brought it in.

By having 300 separate commits (which you were doing anyway) it helps us know what your thought process was on the day that a given line changed. Maybe you were refactoring function X to do Y. If you don't mention that you were accounting for changes happening in someone else's branch, then we know we have to look closer at that code. Without the individual commit, all we know is that giant-project-x was accomplished with this commit, and the change to that line may or may not have the necessary update.


By having a history of every single key you typed to create this comment, it would help me know what your thought process was when you typed it up. Maybe you got pissed off and wrote a swear word or two and then backspaced. Maybe you worded something awkwardly and then refactored your sentence. Without all of your keystroke history, all we know is that a comment was made by you, and your opinion may or may not have taken into account certain arguments made by others in the same thread around the time that the comment was published.


> it helps us know what your thought process was on the day that a given line changed

This is not something someone who actually spends his days reading code would say.

Code is hard to read as it is. Presenting it in well packages, readable commits is the very least one can do.


>Future devs are just as capable at traversing repos and collapsing diffs as you or I, so there's no need to lie to them.

The ability of devs to collapse a bunch of commits into a useful summary is near zero right now. You can only achieve it by rewriting history. Unless you think that feature is going to be commonplace very very soon, there is a compelling reason to lie.


> The ability of devs to collapse a bunch of commits into a useful summary is near zero right now

You can get quite far with 'git diff START END'. Something more task-specific can probably be done with Emacs, Magit, Ediff mode, bash, elisp, etc.

Even if you think collapsing commits by rewriting history is useful for making summaries, etc. what makes you think you can produce a more useful summary right now than that future dev can, considering the fact that you don't know what they might want?

The nice thing about git is that anyone can make a new branch from any point in the repo's history, merge, cherry pick, rebase, etc. to their heart's content, then garbage collect it once they've learned what they needed.


What makes me think I can write better code than the thousand people downloading my repo later? I don't, but somebody should do the summarizing, and it might as well be me.

Smashing the diffs together gets you the least useful parts of a purposeful squash commit.


> somebody should do the summarizing, and it might as well be me.

Should they? See my earlier point about gilding lilies and YAGNI ;)

Of course, there are always exceptions! The most obvious ones are processes which work per-commit, e.g. bisecting, conflict resolution, per-commit code review, etc. where having a bunch of interleaved "stories" can be tedious.


The order in which code was written is pretty meaningless. What matters is how the function of the code changed over time. If I run `git blame` I should not be presented with a whole series of "fixed typo", "changed whitespace", etc commits. That makes it really hard to work with. When I use `git blame` I want to be presented with the commit that actually made a meaningful change to the code.


That code X should be clearly commented to explain the state you describe. That's the proper, most ergonomic, solution to the problem. Sure, it might not be documented and archaeology might be needed, but it shouldn't be considered as an excuse to not write comments and/or documentation.


I agree that if X avoids Y for some subtle edge-case or whatever, then there should be a comment explaining why.

However, in my example X is written first, but just so happens to have become redundant once Y gets written. We've just spotted this redundancy, and it's up to us to figure out whether X should be refactored to use Y or not.

If we look at an unaltered history, we would see that X was written first, so we can hypothesise that it's just a special case of Y which can be refactored away.

If we look at an altered history, the commits containing X may have been squashed/rebased/etc. into a coherent "story", which just-so-happens to appear on top of the story containing Y.

If there were a comment telling us that X was added due to some edge-case, etc. that makes Y unsuitable, we could leave it alone and get on with something else. Yet in this situation there is no such comment, but that doesn't imply that it's not there to handle some subtle edge-case; we'd need to do more investigation to convince ourselves that it is indeed redundant before we could refactor it in confidence, to counteract the contrary evidence which git is telling us.


Actually as a reviewer I don't mind that "fixed typo" style commits. It keeps them separate from the real work. I can just cleanly glide past them while scanning the history.

"Added this thing" sounds like a substantial commit -- at least in terms of meaning, even if (for some reason), the actual diff is a one-liner. In that case I would want a more explanatory commit message, but the change itself is fine.


> My repo histories are riddled with stuff like "finished X", "stubbed out Y", "fix typo in X"

Where I work, our commits from years ago are like that, and since practically all the people from that era have moved on, the history is practically useless when trying to determine what they were working on and why they were working on it.

In fact, I found what appeared to be a logical error in one of the many tools we have deployed. I tried to track down when it was added and the commit just said something like "fixing integration tests".

So, not only did they change some tests, but they also added some code as well.

In other words, the reason that line of code was added is very well hidden from this "future" (now present) dev.


The three messages you give above are perfectly good members of a cleaned-up history. Each has a clear meaning, and small is good when it comes time do bisect.

Messages like "stuff", "it works", "xxx" and "everything I did last month" are not so good, but very common. Moreover if you are in the habit of avoiding them -- that is having each commit do one thing with a clear intension -- then you will keep finding times when you wish you had done something differently an hour ago. And then `rebase -i` is you friend.


Many people think that for later reference a history where each commit contains exactly one feature or bugfix is more useful than the usual development workflow of "start feature a, fix bug in related feature b, fix documentation of unrelated startup flag, implement feature c that turns out to be prerequisite to feature a, do more work on feature a, fix bug in c, [...], finish work on a".

And in case you want to commit to an open source project you are basically forced to rewrite history for any non-trivial change because changes that make sense to develop at the same time often form independent PRs.


You have to think about the purpose of the history.

Git history serves a few purposes. First, it provides an overview of development so that someone can use `git log` to quickly figure out what's been done. Second, it provides context to code changes so that someone using `git blame` can figure out why some code looks the way it does. Finally, it provides a set of distinct points for clean manipulation of history via `git revert`, `git bisect` etc.

From the point of those purposes, there's no real value in a history which accurately reflects the development process. The ideal commit has a few properties: it should address only one concern, it should contain all the immediate code changes addressing that concern, and it should not be overly long. Commits like that make navigating and manipulating the git history easy.

There's nothing wrong with keeping an accurate picture of history. It's just not actually useful.


> There's nothing wrong with keeping an accurate picture of history. It's just not actually useful.

Except for the relatively rare cases when it is incredibly useful. But that's why we have the reflog.


It's just a way of looking at it. Let's say I'm in charge of a project. You send me a patch. I commit the patch. What I want in my history is that I committed your patch. I really don't care what you did in your history to create that patch.

But it's equally valid to consider all your commits important in my history as well. It just depends on what you want. Personally, I never rebase, but I can understand why some people like that feature.


That's what I find hard with contributing to OSS often.

Many devs are happy with what you send them... as long if the history is right!

And right seems totally random to me.

Most are happy if you simply send them "one commit", but tell you to merge multiple commits before they accept it.

Others say "lets split this or that" before they accept it.

Then I have to go back and fiddle around with Git just to get my change landed...


I never had the urge to rewrite the history of my code until I started using CI & CD. Since then, it happened to me few times that I wanted to fix something small and ended up trying multiple times, pushing a new "maybe this time?!" commit over and over again. Obviously it's not best practice, but it something you do when there's a rush.

Having 10 tiny commits like that are just failed attempts to fix a bug isn't practical. It makes reading and understanding your repository code _harder_. Git rebase helps me keep my log clean and understandable, thus making it something I can work with in the future.


I'm using CI and CD. If I want to find out if I've fixed a bug, I just run the build locally before pushing.


Ideally, sure. But I've had to do this before due to the fact that a configuration value in the PaaS I was using was unmodifiable on remote, but modifiable locally, so I had to do a sort of binary search to narrow it down by observing the behavior of the server via changing my code's value a number of times.

He makes a good point, but ultimately I have to agree with someone's point about "pointlessly gilding lillies".


On some of the projects I've worked on, although I can easily run a subset of the tests as a sanity check, the full test suite takes hours to run and has to be run on several different platforms because there is a lot of platform-specific code. There's no way I or anyone else can have confidence in any substantial change until it's run through CI. This is a cycle that may repeat many, many times in some cases before some obscure, platform-specific bug can be tracked down. All of those intermediate attempts would quickly totally drown out the signal in the commit history. This is just one of many reasons why for larger projects, rewriting history is really the only practical option.


There can definitely be value in e.g. summarizing the final result and understanding after a number of 'failed' iterations. And that the many small commits can be considered "noise" if you only care about what happened at a high level. However, rewriting history with operations like squash or rebase IMO seems like a bad solution to a real problem - it really shows that we don't yet have the right abstractions or tools for doing this. If all the small commits are considered "noise" then it shows that we don't have good enough tooling for filtering and grouping when perusing our VC history. The information that people are currently putting into a squash commit is an aggregate or derivative of the original, and as such shouldn't replace it, it should supplement it. There are legitimate use cases where the details of those small commits are indeed valuable, even if the more high-level "just tell me the final result" use case is more common - you shouldn't have to be forced into making that tradeoff.

I know some people are using merge commits or pull requests as a place to put this information - but maybe we need a an explicit mechanism for grouping together commits and summarizing them? I'm imagining something along the lines of code folding. Such a grouping might have other uses too (e.g. signal that there's a grouping of commits where the tests will fail, so skip to the last commit if bisecting)


So, here's a question. I make a commit to implement a feature, then I realize there's a bug in my implementation, so I fix the bug, and then squash it into a single commit. What is the scenario where anyone is going to be perusing the history and he'll actually want to know about that bugfix?

What's the actual value in that intermediate commit? Other than seriously contrived scenarios I can't think of any of the "legitimate use cases" you mention. If it's someone else's code I never want to see that intermediate commit.

Where's the tradeoff?


Let's imagine for a second that the bug-fix you make isn't perfect. Maybe you ought to have refactored something a bit more instead. Maybe you made the bugfix a couple days after making the main feature commit, and you'd forgotten some detail. At any rate, half a year later someone has to sit down and figure out why the code is behaving weirdly sometimes. If you've got an accurate history of how the code was written, they'll have an indication that the code from the bugfix was added post-hoc, and might be inclined to investigate here. They'll understand that all these lines of code were not written at once, so the ones in the bugfix are more likely not to be fully cohesive with the rest of them.

Sure, in the happy case where your code is perfect, all those extra commits are just 'noise'. But when debugging, there can be value in the forensic information about the evolution of the code. Which also documents the evolution of the understanding of the person that wrote it. It can help answer questions like "why is _this_ here?" or "what were they thinking?!?" I've fixed bugs that would have taken much longer to narrow down if I hadn't had clues like that.


As I understand it (and what I now do when on a team) is use commits as atomic, fully flushed out parts. Commit 324rte may specifically "add support for PUT operations on widgets" or asde21 may "Refactor Business Rule Unit Tests into individual files".. and looking at the log, you can cherry-pick that one commit (and deal with potential merge issues) onto your branch.

But if you're working at home on your own project and just want to sync between a few machines, what do you do? Commit, "Hashing on the Liststore kinda works, some bugs." Push. Go to your home machine, work some more, and finally squash all those kinda working commits into one commit ... potentially even need a "git push --force" (which you can safely do since you're the only developer?)

I agree with you totally though. It shouldn't just be a branch. There should be a way to group x edits into one big commit. That's the atomic unit that has a specific feature, and all the mini-commits inside of it should be totally abstracted except for specific deep searching commands.


I too admit that I'm a huge Hg fan (will not switch the company code to git) but I also admit history rewriting is a damn worthy feature. At the very minimum I think we can agree that safe rebasing is better than a plethora of merges that really serve no purpose other than to clutter history. The more distributed and diverse/disparate your team is the more I think history editing + cheap branching is worthwhile (but on the opposite end I think Hg consistency is better for corp).

One of my greatest challenges in using/understanding Git was(/is still) the reflog. I know the reflog isn't that complicated but there isn't anything really analogous in other SCM (of my limited knowledge of perforce, svn, hg, git). Also for some reason the presentation of the reflog UI wise is intimidating.

Reflog is nice gem for git particularly since the builtin Mercurial rollback (I wish they would just remove that command) is fairly awful (use histedit or rebase instead). That being said Mercurials new changeset evolution experimental stuff looks really promising [1].

That being said if you are looking to undo in hg like this article talks about you have to look at the

    hg unbundle backupfile
Unbundle is a pretty nasty command compared to the reflog commands but on the other it is just restoring from some backup file. I'm not too sure how you can transfer reflogs around.

[1]: https://www.mercurial-scm.org/wiki/ChangesetEvolution


I think bundles are easier to understand than the reflog. When you run a command that modifies commits, the original commits are stored off in a bundle file. When you unbundle that file, the commits show back up in your repository history. What could be simpler? The git reflog gives you a weird truncated log view of old commits that is very hard to parse (for a human at least).

That being said, I've been using changeset evolution for over a year and it is awesome. Instead of creating a bundle your commits are just hidden. You can run any of your hg log commands with --hidden and it shows you those hidden commits. You can see exactly how your rebase removed (hid) some old commits and created new ones. It's very easy.


Rollback is deprecated, it doesn't show up in the docs. Because people use it in scripts it will never be removed due to mercurial's backward compatibility guarantees.

Mercurial 3.9 ships with the journal extension, which is a bit like the git reflog: https://www.mercurial-scm.org/wiki/JournalExtension


Vaguely remember seeing that extension. I did not know it ships with Hg now! Awesome! More reasons to stick with hg longer for my company :)


What does accurate history mean?

Do you commit after every 20 seconds of typing?

Why not?

The people cleaning up history have the same motivation.


The last time we had this discussion the consensus bottom line was that you try to avoid rewriting history once you have pushed it to a place where others have access to it.

Once you've published it, let it go. Do not mess with history. I think we can all agree on that.

The disagreement is whether I should squish my commits before I push it and I put that in the same pile of questions as am I obligated to my significant other and society at large to shave my legs before going out in public.


You aren't obligated, you just might make people not like you very much.


> What's so wrong with keeping an accurate picture of history

I think part of the problem is the distinction between history and audit. What you are thinking of is a full audit: every change made by everyone to get to and from each state.

History can sometimes be this, but sometimes you just want the solid states and the extra detail of each step between including failed steps that were back-tracked is more information than people want and can result in cognitive overload.

Different people want different detail.

Sometimes the same people want different detail for different tasks. One option might be for a feature to allow you to mark a commit as intermediate. Keep those with the flag around but don't display them or allow things like bisect to operate on it by default. Display them if an extra option is provided, allow action upon them similarly (not by default, you don't want a typo in a commit ID to result in a commit that exists but is not the one you are looking for to be accessed).


I think that's what feature branches are for. You merge to master/default after you're done with your feature, and commit an accurate history to the branch.


Probably, though that isn't something you can do after the fact where a "treat this as a partial result" flag on a commit could be.


In my opinion, it is only valuable to present a pristine history to make sure that every "official" commit provides an application that can be ran. This allows git bisect to do amazing things. In other words, I will git merge --squash. When someone rewrites history after pushing, all they do is make everyone else have a bad day.


If I'm pushing non-building commits on a feature branch in Hg, why should it matter? It documents the history of the development of that feature without interfering with default/master.


No version control system accurately reflects every bit of history that goes into creating and editing your code. You'd have to record every keystroke. The whole idea behind a vcs is to record helpful and relevant history. Is every little merge relevant and helpful history? Hard to say, maybe, but probably not.


It's the same reason that history books aren't a bunch of primary sources stapled together.

What more better than painstakingly accurate history is useful history. I don't care that Joe was distracted one day and had to make a fix-up commit. I care that he authored a certain change.


I'm also coming from Hg, and this picture [1] sums up the problem. The history tree in our company is way more messy - basically over 1/3 of all commits are just merges.

[1] https://twitter.com/michaelhenke/status/585142133167751169


> What's so wrong with keeping an accurate picture of history that people do all kinds of manipulation to their history to keep from the VCS system from accurately reflecting history of development?

What's wrong with organizing your source code in a single directory? Why do people insist on organizing them into subdirectories aud subsubdirectories instead of letting the main directory accurately reflect the size of the code base?


How do you accurately reflect the history of development when it's nonlinear?

I was an hg person too, but I came over to the git side when I needed to collaborate with people. git's killer feature is merges. And merges benefit from fine-grained commits that rewrite history.


I wonder how much time and money has been wasted trying to operate Git's confusing UI. The repository format itself seems fine, but I'm surprised we're not all using a better frontend by now.


The basics (stage, commit, merge, branch, push, pull, log, diff, bisect) aren’t confusing. (The rest isn’t that confusing either, but…) If you prefer GUIs, feel free to use them, but that’s certainly not everyone.


I disagree, the basics are confusing, because there is too much state. I think gitless (http://gitless.com/#vs) shows how a less confusing version looks.


Reminds me of hg patch queues, which I've always found to be drastically more error prone than git's staging feature when trying to commit some but not all of the changes you've made, but I haven't tried gitless and it may do it better.


State in the form of staged changes is a huge convenience to me (e.g. with git add -p), but to each their own.


I was at least a little confused by stage for a while. I still think the distinction between stage and commit is unnecessary at best.


I had a recent epiphany about staging, which is that it really starts making sense if you stage as you go.

Let's say you want to write small feature X, and it truly makes sense for X to be a single commit. But doing X involves changing around both Y and Z. Git makes the following workflow easy:

* Fiddle around until you get Y working how you want it

* Stage it

* Fiddle around trying to get Z to work. Try something experimental. Nope, that's not you want it. `git checkout .` Try again.

You don't have to worry about only blasting away the failed Z attempt while preserving Y—since Y is staged, it's easy to keep around.


Technically the same metaphor can be used for "commit" if we assume you stage the one thing and commit that one thing. Now look through the same workflow again but replace "stage" with "stage it and commit it."


You can also use shelve in mercurial. I have never needed to use it on git though because of staging/cheap branches.


But why stage when you can just commit? You can use rebase -i to modify that commit later if you need.


You might not want to commit all of your changes at once.


commit can be followed by a file name. Or there is commit --interactive or commit --patch to choose what to commit (or git add versions of those).


If you don't have `stage` how are you going to generate a commit which has only some of the changes you've done (potentially including partial commits from files)?

`stage` is unnecessary if you have a really simple workflow, but it provides a lot of flexibility at very little cognitive cost.


By specifying the parts you want to include when you commit. I think it's unnecessary for any reasonable work flow, and doesn't manage to break even on its benefit vs. cognitive cost.


So you want a long command line argument were you tag specific line numbers and have to get it all right at once? That sounds terrible to me.

Being able to add bits and pieces to your staging area and then once you've got it all ready commit is pretty useful.

I guess I just don't see the cognitive cost as being particularly high. It's a pretty simple model.


I really would prefer not to even use a CLI for this operation, since I think it's not well suited to the kind of interaction commits entail.

A "staging area" is a useful concept, but it's not one that needs to exist outside the UI of whatever tool you're using to generate the commit.


You can add bits and pieces to commits directly with commit --amend. The index is really not needed.


"git add -p" generally negates the need for incremental staging, unless you have made two unrelated changes to the same diff hunks that need to be untangled.


I teach Git. I find many who have learned just enough to get by, and their shallow understanding hobbles them.

It's not enough to memorize command line "incantations", you have to understand what's happening.

Git is a sophisticated tool. There are over 100 subcommands!

Once you fully understand the basic terms (e.g., "detached", "HEAD", "branch", "commit", etc.) Git becomes less confusing.

http://www.verticalsysadmin.com/git/flyer.html describes a free webinar we offer on Git basics -- people who have used Git for years come away surprised how much they've learned.


I think you should read back to yourself what you wrote -- you are essentially saying git is complex. Anything complex with enough time and understanding can become easy, but do you need the complexity in the first place? Most developers usually want a simple workflow where they don't want to deal with too many idiosyncrasies of the tool they use. You want to checkin and checkout mostly, but instead one needs to understand a lot of details or you keep tripping up.


I understand.

Git allows you to do a lot more than check in and check out. It's a powerful tool for collaborating on source code.

To the extent that one doesn't confront what Git actually is, it can seem mysterious or needlessly complex. There's a method to the madness. :)


It's the "boil the oceans" problem (http://www.urbandictionary.com/define.php?term=%22boiling%20...). Same thing that happens with vi, for example.

You have a widely adopted tool with some real or perceived flaws. Everybody knows them and wants to fix them.

But unless you somehow get mass adoption from the start, the project flounders because everyone will be pointing out that you can't install & use the new tool in restricted environments or on very old environments.

So we're left with the lowest common denominator.

At this point, to break the cycle, either the original developers come with the 2.0 interface and push it hard (which might cause backlash: https://xkcd.com/1172/) or someone with a ton of pull and resources does it from outside (which could trigger a fork or other unpleasantness).


Except we all knew how to write better UIs back in 2008 when Git was first written. It's not some brand-new research that only came to light a few years ago. We knew how to do usability tests in 2008. We knew what patterns worked and what didn't. We knew how to build discoverable software.

Why the heck wasn't the UI improved back then, before the thing was even released?

I mean, while your explanation is correct, it doesn't explain or excuse the pure incompetence of the original developers when it comes to usability issues. Their laziness or ignorance back then has confused and irritated thousands or millions of developers now, and continues to, and will continue to for the foreseeable future.

Make sure the shit you're going to set in stone is good before you grab the chisel, guys. You're professional software developers, not clowns.

Sorry for the rant.


One can argue semantics whether you call it a UI or a CLI, but hey. There's a number of GUI clients out there that depending on your criteria could be considered better. They're usually not as powerful as the CLI client though, and if they are, features like accessing the reflog are hard to find and use. There's also a few alternative CLIs out there, I've just done a quick googling and came across http://www.saintsjd.com/2012/01/a-better-ui-for-git/ and http://www.kennethreitz.org/essays/legit-the-sexy-git-cli.

Personally I prefer the CLI, it's the only tool that I can rely on to do what I tell it to do and to know what's happening. But it takes time and effort to get used to it.


> One can argue semantics whether you call it a UI or a CLI, but hey.

In the same way one can argue whether you call a braeburn a fruit or an apple.

CLI is a subset of UI


I prefer git CLI mostly, except for merge conflicts. Exclusively for git merge conficts I use IDEA IDE resources. Otherwise CLI is my friend because I feel safe (git push and git commit have the best color-coded messages in most OSes)


A CLI is a user interface.

The problem with Git (well, one of many many problems with Git) is that it conflates its user interface with machine interfaces-- which means tools that have to work with Git (like those GUI clients) have to use the CLI to do so. They don't have a more powerful option, like an officially-supported API or a shared library they could call into. This is terrible software design.


libgit2 [1]

In fact, there exist many alternative 'frontends' to Git. There are even protocol translators like Hg-Git [2], and many importers that typically use the fast-import format to ingest Git-impl primitives [3]

[1] https://libgit2.github.com/

[2] http://hg-git.github.io/

[3] https://github.com/frej/fast-export


libgit2 is not official, not in-sync with the main Git tool development, and doesn't support many of the features Git supports. So no, it isn't a solution to the problem.

Separating the UI and machine interface is something that should have been done from day one. In fact, I've been told Git's codebase actually already does that (it just doesn't expose the machine interface to the outside world.) Human beings are not machines. They have entirely different needs.


> I'm surprised we're not all using a better frontend by now.

Chalk that up to the power of fashion and a misguided notion of technical proficiency.


Once you understand Git's data model, the UI is perfectly intuitive and very efficient. When you want to do something in git, it generally requires just a single command - you just have to know what you actually want to do.

Attempts at different UIs fail because they're all trying to put an abstraction over top of git that doesn't actually reflect the underlying data. As a result, they're limited to the set of git functionality that overlaps their abstraction, and the tools are less powerful.


This is a good article:

https://stevebennett.me/2012/02/24/10-things-i-hate-about-gi...

> Once you understand Git's data model, the UI is perfectly intuitive

So it's not intuitive at all.

Not to mention that every damn command is inconsistent with every other command! To remove a file, git rm. To remove a branch, git branch -D. To remove a commit, git reset --hard HEAD^. How is this intuitive, consistent, or even sane?

"I understand git" and "git is easily understandable" are completely different. Git is not easily understandable, at all.


`git reset` doesn't simply remove commits. Depending upon what you reset to it could result in `git log` showing additional commits, or an entirely different set of commits. Conversely, there's no invocation of `git rm` which creates files. In this case, the commands look different because they do totally different things.

You'd have a better argument with `git rm`, `git branch -D` and `git remote remove`. :)


Yeah, that was the first thing that popped into my head, but examples certainly abound.


I mean, I'll agree, if you insist on trying to create an abstraction to understand git, you can beat your head against the wall for days trying to figure out how it works. On the other hand, if you just take a couple hours to really try to understand what it's doing and why it's not that hard, and it's way more effective and powerful than any other tool out there. Personally I like tools that are powerful and efficient once learned over tools that I can use without any learning.

Note that I never said git is easy to learn, just that once you take the time to understand it, it's actually quite natural and intuitive.

That article you linked is clearly from someone who liked how simple subversion was and is annoyed that git requires more than 15 minutes to learn. But there's a reason almost no one is using subversion anymore.


> Personally I like tools that are powerful and efficient once learned over tools that I can use without any learning.

I prefer powerful and efficient tools that I can use without any learning, since the two aren't mutually exclusive.

> if you just take a couple hours to really try to understand what it's doing and why it's not that hard,

What is it in git's architecture and design that mandates that a file should be removed with "rm", a branch with "branch -D" and a remote with "remote remove"? What about its fundamental architecture makes it so that the commands can't be "git rm", "git branch rm", "git remote rm"?

Nothing, and this is the crux of the argument. Git's porcelain is inconsistent, unintuitive and poorly designed.

> Note that I never said git is easy to learn, just that once you take the time to understand it, it's actually quite natural and intuitive.

What does "intuitive" mean if not "easy to learn"?

> That article you linked is clearly from someone who liked how simple subversion was

Maybe so, but that doesn't invalidate its arguments.


> prefer powerful and efficient tools that I can use without any learning, since the two aren't mutually exclusive.

They're not necessarily mutually exclusive, but in my experience there's often a trade off. I would argue that git is about as simple to as it can be without taking power away from the user by forcing an abstraction on him.

> What is it in git's architecture and design that mandates that a file should be removed with "rm", a branch with "branch -D" and a remote with "remote remove"?

Yeah... I have to agree that the choice of command line arguments is the weakest element of git.

> What does "intuitive" mean if not "easy to learn"?

The two often go together, but aren't necessarily the same thing. Something is "intuitive" if the correct thing to do is natural and obvious without a whole lot of thought. Git isn't intuitive before you understand the data model, but once you do, you don't have to spend a lot of time figuring out how to accomplish things, so it becomes intuitive. I would argue that by contrast, subversion is intuitive out of the box, but as soon as you want to do more sophisticated things it becomes rapidly counter-intuitive and very difficult to work with.


> there's often a trade off

I agree with you there. It doesn't mean that things are as bad as "weak tools or hard-to-learn tools", but there's a tradeoff.

> I have to agree that the choice of command line arguments is the weakest element of git.

I think there's a fundamental misunderstanding here. Git's conceptual model is hard to learn, but there's no way around that. If you want to be proficient in Git, you have to understand the conceptual model, and people find it hard and mostly give up, and say that git's core is badly designed (which it's not).

This muddies the waters for people who claim that git's porcelain is badly designed (which it is), because then other people mistake that for the former argument, and we end up talking at cross-purposes.

I think we can both agree that git's core/architecture is great, and the porcelain is quite bad.

> Git isn't intuitive before you understand the data model

I think this meshes with my previous paragraph, but I think git could be much more intuitive (and require much less mandatory learning of the internals for someone to be productive with it) if the porcelain were better designed.


> Personally I like tools that are powerful and efficient once learned over tools that I can use without any learning.

You say that as if the two are mutually-exclusive. They aren't.


Most git command-lines are lengthy enough to give me time to consider them, so I don't often feel the need to "take back" a git invocation. What I do often screw up, though, is the (almost hypnotic) tapping of y/n when doing a `git {add, reset, checkout} -p` to prepare and clean up a commit.

Ideally, with all of the -p commands, git wouldn't actually apply any of the changes I specified until it was about to quit (i.e. either when I advance past the end of the set of potentially-affected hunks, or I manually type 'q'), and then would prompt me for whether {set of operations I specified} is what I wanted to do. This would leave the -p operations the the flexibility to expose an 'u'ndo.


You'd probably really like Magit mode for Emacs then. What you're describing with interactive add becomes trivial since you can stage parts of the diff by selecting them in a region. I can hardly live without Magit these days. Bonus is that it even works on remote hosts via tramp. </emacs plug>


<vim plug>With the vim-gitgutter plugin installed, you can dynamically add/remove/undo hunks without even leaving the editor[0]. I use this instead of `git add -p` nowadays</vim plug>

[0]: https://github.com/airblade/vim-gitgutter#getting-started


<spacemacs plug />


I can enthusiastically second this recommendation. If you use Emacs and Git, but have not yet tried Magit, you are missing out.


As long as its not the last one, you can go back with `K` to the previous hunk. And forward with `J`. (The lowercase versions only go to hunks you decided to skip, but uppercase go to all, even ones you already chose an option for)


I worry about naming a function undo that doesn't necessarily undo what the user expects. Undo has a strong user expectation, and I'm not convinced this matches that.


Git already has `git branch` which doesn't in fact do any kind of branching but creates a label which follows commits when it's checked out.


Git also has 'git revert' which creates a new commit, 'git reset' which actually reverts (among other things), 'git checkout' which switches branches... the fact that all the other commands are confusing doesn't mean this new one has to be.


That's all a Git branch is though; so it makes perfect sense for the command `git branch` to do that.


You are absolutely right and I'm sorry to see you getting downvoted. See:

http://bryan-murdock.blogspot.com/2013/06/git-branches-are-n...


It doesn't immediately make any kind of branch, but if you then create commits using both of the labels it will inevitably create a branch in the DAG so it's not terribly misnamed.


You can look at it as a copy-on-write optimisation. It may not be so poorly named if you look at it that way.


I think git branch only doesn't "branch" if you already have a differing idea of what a "branch" should be from another revision control system.

For example, I consider that CSV and Subversion don't have branches, but just "copies". To my mind, what git branch does is exactly what branching is.


What do you think `git branch` should do?


It creates a branch of zero length.


I have this alias in my ~/.gitconfig:

    cancel = reset --soft HEAD^
I don't want an alias to hard reset, it seems to dangerous and a good way to lose some work. However a soft reset like this allow me to cancel the last commit and add an omitted file, or remove one from the commit, or simply to correct the commit message easily.


Actually, if that's all you want, you can do:

  git add <file>
  # or "git rm --cached <file>" to remove
  git commit --amend
and it will replace with a new commit that has what you want. It's like a mini rebase -i


True, but I prefer to be able to see my staged changes as a whole to be sure that I did everything as I wanted.


Yep, git gives you more than enough rope to hang yourself with.


Nice idea!

The only non intuitive thing might be, that calling e.g. 'git undo' twice doesn't undo the last two changes, but the first undos the last change and the second one undos the undo.


Could it be improved to have undo filter out its own reflog entries and instead have a 'redo' to undo those?


I don't think that 'git undo' can tell with certainty which entries in the reflog are from it.

It would be a crude heuristic that's going to break.


Is there maybe a way to annotate the commit message of HEAD without influencing the reflog?


You can't change the commit message of a commit without changing the commit, which in most cases is a bad idea, if you don't know pretty much exactly what your doing.

Replacing one heuristic with another won't make this a stable operation.

A lot of people had already the idea to encode relevant information inside of documentation and it was always a bad idea in the long run.


I get that, I just wondered whether it's technically possible (with supported git operations, not hacking down at FS / byte level).


If you want to undo without reflog entries, just snip the most recent line off the reflog.

If you want to put annotations on commits, use tags with message bodies.

So yes, there are ways to accomplish what you want.


well, there is git-notes(1), but I'm not sure how it can be used as a journal for reflog operations.


Be careful with hard resets - they throw away working tree changes. Ironically, it's one of the few git operations you actually can't undo. (Usually I prefix such scripts with "git stash save" due to this.)


It should automatically skip reflogs created by undo itself, so git undo; git undo be equivalent to git undo 2, and for undoing undo there should be seperate git redo.


I agree with that. So you could undo step by step without thinking about the number to put next to your undo command. It seems a bit trickier to implement, though


Only a bit of awk magic:

~/git-undo.awk:

  BEGIN { jmp = 0 }
  {
    match($2, "{([0-9]+)}", c);
    if (c[1] == jmp)
    {
      jmp++;
      if ($3 == "reset:")
      {
        match($6, "{([0-9]+)}", x);
        jmp += x[1];
      }
      else
        i--;
      if (i == 0)
      {
        print jmp;
        exit;
      }
    }
  }

  git config --global alias.undo '!f() { git reset --hard $(git rev-parse --abbrev-ref HEAD)@{$(git reflog | awk -v i=${1-1} -f ~/git-undo.awk)}; }; f'
Redo is also possible, but i don't have time now to do it.


Warrants a big warning that using `reset --hard` will irreversibly wipe out any uncommitted changes.


Also doing `git checkout -- filename` will lose the changes to that file permanently. The only feature I miss from Bazaar was that it would create a backup file automatically for its equivalent command. I have actually created a bash function that overrides git to add this functionality (since there is no pre-checkout Git hook and Git doesn't allow you to create an alias named "checkout").


Came here for this. git undo won't restore the changes wiped out by reset --hard? They aren't stored on the reflog?


If it hasn't been committed, it won't be in the reflog. Working copies and stashed changes are especially vulnerable to being overwritten by accident. It would be great if git had a way to undo these operations too.


You could alias reset to instead commit all and then reset.


But what if I'm just trying to unstage something? I don't think shell aliases can solve this in the general case.


I've also wrote about undoing things in Git and other productivity tips here: http://eliasdorneles.github.io/2016/06/19/on-getting-product...


I always make sure to keep a `git log` in my terminal scrollback buffer, so I can easily `git reset --hard` back to some revision. And if that fails there's indeed always the reflog to fall back to.


if you're on a mac, the excellent & open-source GitUp graphical git client has undo built right in. it also makes it easy to slice-&-dice your commit graph.


I wish there was a Dropbox for developers where you never had to worry once about commits or history or tags. Maybe just a big green "Release" button and that's all there is to it. Surely it won't be ideal for writing the Linux operating system but for most of the cases that would be more than enough to get the job done, keep eveyone in sync and yes a lot less confusing allowing you to focus on things that really matter.


So what happens when two developers edit the same file, or make changes to different files that in combination break the system?

You need commits, you need the ability to merge. If you don't want to force all commits to happen online and everyone to resolve conflicts immediately then you need branches. You want tags if you're going to have releases (otherwise how do you refer to them?). At that point you basically have git.

All the complicated features were added because someone thought they needed them (there are certainly a few git features where I think that someone was wrong, but not many).


If that's what you want, you can just use Dropbox itself to store code and use one of the several deployment tools which can pull code from it (e.g. cloudcannon).

It quickly falls apart though as you don't have a proper history, reverts, or branching - it's OK for static sites but a disaster for anything more than that.


Have you ever tried subversion?


The alias is only useful in the simplest use case. Where you just want to undo the latest one change. Reusing undo to revert the undo is counterintuitive.

If you want to know how many steps back you need to undo, you still need to check reflog. This means you're better of just resetting manually to the change you want.


I find reflog one of the neatest git features, and regularly use it to get out of jail / help cherry-pick across branches. This seems a bit magical, but I think there are some pretty neat lessons within for lots of users which still makes this a great read.


I find it wise to give git a command for reflogging. Flogging it only once would definitely not be enough by a long shot. It might afford the tortured git user some release, although it won't change git any.


This seems dangerous to get used to. I always thought the reflog was supposed to be last-resort. Isn't it?


That doesn't sound right. There's nothing secret or internal about the reflog. "reset --hard" is more of a last resort, but only because it wipes uncommitted changes.


I have this. It's called cp(1).


Granted, you can shoot yourself in the foot with cp(1), but not nearly as much as git(1) lets you.


Every Git thread makes me so happy I use Perforce.


What's wrong with git revert?


Why not just use a filesystem which supports snapshots? It would allow you to go back without even invoking git.


Why not just boil the ocean?


Using a better filesystem isn't that hard. Sure, on windows you have to set it up as a network drive, but that's a few minutes of effort.


Because a simple, but clever, alias that integrates with everyone's existing workflow is a lot easier?


1. calculating differences

2. merging two or more different branches

3. transporting code (publishing it)

3a. accepting somebody else's patches

4. descriptions of history points

4a. pointers to parent code trees (especially with tree merges)

5. history traversals (bisecting, among the others)

Not to mention that you need either administrative privileges for creating a snapshot or a special kind of filesystem that supports this for non-administrator. And sysadmin still needs to prepare such a filesystem for your $HOME.


What are you talking about? amelius is mentioning an alternative to the hacky "git undo", not an alternative to git. You put your git repo onto the filesystem.

As far as administration, most devs will have elevated rights or can use FUSE or something. We could talk about needing administrative privileges to install git, too, but it's pretty far removed from the central topic.


> amelius is mentioning an alternative to the hacky "git undo", not an alternative to git.

Ah, so I misunderstood. Still, changing the filesystem under $HOME (or whatever is the working directory) is more difficult than using some user-level tool.


That was called : ClearCase.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: