Hacker News new | comments | show | ask | jobs | submit login
A Git catastrophe cleaned up (plover.com)
274 points by asymmetric 301 days ago | hide | past | web | 260 comments | favorite



The author concludes by saying:

>I think I've written before that this profusion of solutions is the sign of a well-designed system. The tools and concepts are powerful, and can be combined in many ways to solve many problems that the designers didn't foresee.

I disagree. I consider this to be a failure of Git. The set of different options (normal merge, rebase, filter-branch, etc) is complex and not cleanly orthogonal which makes for a very messy "mental model". Even experienced experts would have difficulty finding the clear, simple way to solve this problem and those less experienced would have little chance of proceeding cleanly.

I really wish some tool other than Git had "won" the version-control race; I honestly believe Git to be the worst of the contenders in the most recent generation of version control systems (albeit better than the previous generation in important ways).


Allow me to disagree with your disagreement.

Git fails often for the "basic" use cases. I won't lie. The number of options is intimidating for a beginner and one can easily get themselves into trouble. This is why there are Git guides and tutorials and UIs and they are all ultimately unsatisfying when you're a beginner.

However the use case discussed in this article is NOT a beginner use case. It is an advanced case and the fact that Git is an advanced tool that supports it (both natively with git-filter-branch, as the author found out, and ad-hoc-ly as was their original plan is a testament to the power of the system.

Tell me how you would better achieve the requirements that were the premise for this article in a different VCS?


I agree with your disagreement. Furthermore, I'm curious how this edge case could have been mitigated using an alternative tool? most of my VCS experience is with git, and a very little SVN.


As an experienced Hg user I will say what the author did would be about the same pain level in Hg (maybe even less if you use some extensions). Regardless it is probably less painful with whatever tool you are most familiar with.

I would argue the author probably could have even used SVN (not that I would recommend that at all) given he did some pretty manual scripting stuff (looping through commit ids etc).

Of course it his hard for me to tell with out seeing his exact repository (files).


In SVN, this case would completely explode into manually applying a big patch.


"The easiest strategy in a case like this is usually to (go) back in time [...] That strategy wasn't available because X had already published the master with his back-end files, and a hundred other programmers had copies of them."

"a hundred other programmers had copies of them." isn't a problem/cannot happen with svn because svn uses a single master repository. Fix it, and everybody can fetch it.

That leaves the "going back in time" step. That isn't easy with svn. If you have a backup of your repository, you can create patches for the commits made after it, restore from backup, and replay those you want to keep.

If you don't have a backup, the svn administrator can use svnadmin to dump and restore revisions and remove them (http://superuser.com/a/315138). I think that will be a lengthy operation if your repository has lots of history. It also requires free space to create a copy of your reporpsitory. Disclaimer: I have never used this tool.


> If you have a backup of your repository, you can create patches for the commits made after it, restore from backup, and replay those you want to keep.

Maybe I'm just too young to have a good perspective on this (due to only becoming a developer after git was fairly well established), but I thought that version control itself is supposed to be your backup


A backup by necessity is a copy; with centralized version control as svn is, you don't have a copy of the repository, so you have to make one some way.

(Decentralized version control is only slightly different; if you lose the 'main' repository and don't have an exact clone (aka a backup), you may be able to glue together something from other repositories that is close to or equal to what you had, but there is no way to know for sure)


it is for as long as your repository is undamaged.

its rare, but sometimes a repository can get damaged beyond repair. keeping it inside of a cloud storage directory like google drive, dropbox or similar makes it prone for fatal errors.

in such a scenario a backup of the full repository is required. in gits case, remote sources can fill the 'backup' role pretty easily, as everything gets pushed to a remote server. this is obviously not enough safety if you're a bigger enterprise, they'll need additional safety guards/backups, but its sufficient for most people.


The edge case could be mitigated by having a pre-defined development workflow and not seat-of-the-pants cowboy commits of "backend" and "frontend" changes that were in reality interwoven. Or even a basic code review where someone hopefully would say "WTF are you doing!?"


Agree - this should have been a single push and dev should have squashed commits together where it makes sense to. I get in the dev stage you might be all over the place but a well-designed codebase with separation of concerns should have allowed the dev to group like commits into single areas and brought it down to something more manageable like 20 commits instead of 400.


DVCS is fundamentally a simple concept. And, every use case should be possible with a series of elemental operations. Any 'advanced' use cases can and should just be 'macros'.


> DVCS is fundamentally a simple concept

It is, if you're going to simplify it so far as to avoid the problemspace by nomenclature. It's just transforming nonlinear events into a linear sequence! I guess multithreading is also a fundamentally simple concept, in that vein.


I don't mean to minimize the effort that went into Git. But, have you ever looked at the source code? It's like someone dropped the kitchen sink into the global namespace. The whole code base is full of premature optimizations. And, there's a lot of low-level infrastructure code coupled in with the core logic.

I know abstraction is out of fashion these days, but it may have helped a little in organizing things. Abstraction, when used properly, can almost always make complicated things more simple.

Basically, if a regular person had created Git, and not a demigod, would it even have become what it is?


we would have hg.

as a user, I'll take premature optimizations over slow as balls any day.


The problem with premature optimization is that most of it has little effect on overall performance. Software performance almost always follows a Pareto distribution.

But, wtf do I know, I'm just some regular person.


It's hard to tell - did the premature optimizations allow Git to extend way beyond the initial release with little worries of performance? It's not something that's easily quantifiable.


Not sure why you are being down-voted. What you describe is the philosophy of e.g. monotone, which works quite well.


simple systems can appear complexed because there are many buttons. You often only have to use just a few buttons, where the rest are for special ocations.


What is advanced about a partial checkin?


Nothing. The author stated though that the merge had to be done:

> without losing the commit history and messages

Otherwise a "partial checkin" (which was exactly what "co-worker X" did) would have worked fine.


Ah okay, my bad.. I was mis-reading the article. I get it now. The topic branch commit history where the co-worker built up the branch would be lost if you just tacked on the leaf node to the master branch..


I see what you mean about the messy mental model, and I do 100% agree that in a well designed system most problems should have one obvious best solution. That said, I'm not sure that's a fair criticism of git.

For instance, in this case, as mentioned below, the author of the article should have just reverted the commit where he checked out the files, then done a normal merge. If you're thinking of everything at the commit level, it's pretty much the obvious thing to do. The first obvious thought is to just backtrack (as mentioned in the article), then once you realize that's off the table (since the code is out in the wild) the very next obvious idea is to revert. Revert is what you do when you can't just undo. It's not even as if his commit involved any merges, so the revert is super easy.

I guess I just don't agree that experts would have a hard time finding the simple solution - it was the first thought that popped into my head, and I read the article mostly looking for the explanation for why he didn't just do it.

While git does provide near complete control, you almost never need to use more than the basics, and it doesn't have to be confusing.

EDIT: Just to make very clear why I think the solution is obvious, the first thought that I think will occur to almost anyone is, "if I can just bring the master branch back to its prior state, then I can merge like usual". The beauty of git is if you can form a coherent idea like "bring the master branch back to its prior state" then you can probably do it, and it probably isn't hard. The mistake the author made was to have that thought, and then instead go off down a rabbit hole, and throw out the original simple idea.


If you revert, how do you guarantee that the final result is identical to the previously deployed and tested version?


He'd need to revert like 400 commits!

Woah that's a real messy one. I hope everyone walks away with a feeling that merging 400 commits in what sounds like a CD env is asking for trouble...

I'm probably not going to earn extra ninja points here, but in that situation I'd have turned off the deploy jenkins job, force pushed master back to sanity, squashed these commits into a sensible amount in a new branch, rebased it from master and gotten someone (several someones) to go over the pull rq which should be able to merge without conflict. (Fix conflicts in branch!)

Maybe you'll have to get your devs to make a new branch locally and cherry pick their stuff back in, but urgh; I think the solution in the article would have taken me ages to RTFM enough to work it out...

The lesson here is merge less, more often, or pay more attention when doing evil things like this should you get into that kind of situ...


If you don't care about the history, you can just squash the merge. Otherwise, it's not like those commits on the topic branch cost you anything, and they might have important context.

I do agree though that my first step would have been merge from master to the topic, or rebase the topic onto master.


After you merge everything you could just diff the originally committed files against the deployed and tested versions. If there are any differences, then those need testing!

Presumably if you're merging a bunch of other changes in anyway, you're going to need to do a bunch of testing in any case, that's just an inevitable result of splitting the deployment into two phases.


But how do you revert a commit if there were subsequent commits that changed the same files? (If I'm reading the original post correctly.)


I think grandparent meant a "git revert", which is actually a distinct operation (introduce a commit which is the inverse of a given commit) from resetting a branch pointer.


Well, in the worst case, you can literally revert all subsequent commits, merge the topic-branch, then reapply the commits.

In most real situations you'll probably find that only a small number of commits actually need to be reverted and reapplied, especially if your team is good about having reasonably self-contained commits.

If you're concerned about filling the git logs up with noise, you can always do use rebase to combine any revert/re-revert commits into single log entries.


> The set of different options (normal merge, rebase, filter-branch, etc) is complex and not cleanly orthogonal

All of those commands you just listed are clearly defined and serve different purposes. In other cases, most commands or command flags simply help the user do something in an automated fashion that you could do by typing each specific command out (i.e., git pull --rebase). Git's core tooling is mainly centered around graph manipulation, everything else has been added due to convenience. And quite frankly I've never really understood the criticism people levy at git for being "complicated."

It's stupidly simple at the end of the day, especially compared to the hodgepodge of features that other VC's provide, all of which don't really work together and are designed to cover some gap in what should be handled in the core design of the VC (I'm looking at you, Mercurial branches and bookmarks). All tools require you to be familiar with them. There's no getting around that.

> Even experienced experts would have difficulty finding the clear, simple way to solve this problem

I'm not sure how you can possibly say that, the answer is pretty obvious to a lot of us reading this article. This just feels like baseless whining.

There is a litany of literature out there that could have easily pointed the author to a well defined, accepted, and documented solution.


> It's stupidly simple at the end of the day

That reminds me a joke. On a particularly convoluted math lecture the professor writes on the whiteboard and explains at the same time: "from here we can state that, obviously..." He suddenly stops, slowly turns the head down, keeps mumbling "so we can obviously...". He slowly walks in the room, trying to catch the thought, then leaves the room... Some minutes pass. Then he comes back, with head up, widely open eyes and confidently continues - "Yes, this is completely obvious - so from here we can easily get the following conclusion."

Git has a lot of nice explanations. However it's either a particularly edgy case, or all those great explanations somehow fail to easily conduct those core "stupidly simple" ideas forward to actual Git beginners. At least to some of those beginners - not an insignificant number.


I mean, i just find any defense of Git's "simplicity" hilariously unobservant. I'm sure to many Git is perfectly simplistic in design and UX. It is clear, and well thought out.

But, for many others, Git is a confusing mess. And, i should think, nearly everyone has heard someone(s) call Git confusing. I don't think it is up for debate that Git is known for being confusing.

Who cares if it "is" confusing or not - that's relative. Most things aren't confusing if you've learned the subject matter.. right? It is known for being confusing though, and that is clearly to people who don't know Git. Ffs, there wouldn't be debates about Git's UX every month on HN if it was so utterly non-confusing, haha.

Some things are very simple and easy to learn, use, and adopt. Git is not one of them, clearly and objectively.


One thing often missing from the debate is this: version control is a fundamentally complex problem. In my view, it's a unfair to blame this complexity on the tools. The cockpit of an airplane is famously complex not because Boeing and Airbus wanted it that way but because flying a plane is complex, and because they couldn't make it any simpler.

Git has a fundamentally sound architecture, and with a correct mental model of what's going on you'll be able to deal with pretty much any situation. Nevertheless, you won't get beyond the simple use cases if your conceptual picture is incorrect, but that'll be the case for any VCS. I don't believe that git is necessarily superior to mercurial or take-your-pick, but let's not disregard that we're dealing with something difficult.


Version control is complicated, but git adds a whole nother layer of bullshit on top of it. Why do you use checkout to change a branch, create a branch and revert a file? Not because version control is hard, but because as a CLI, git does many things wrong (example from http://stevelosh.com/blog/2013/04/git-koans/).


> Why do you use checkout to change a branch, create a branch and revert a file

git checkout lets you check out files from a particular point in time. If you don't specify a path, it will check out every file. If you don't specify a point in time, it defaults to HEAD (like every git command). I fail to see what is confusing about that. You're not "reverting" changes to a file, you're just checking out a version of the file from the history.

`git checkout -b` is shorthand, and is shorthand that I personally don't use.


> One thing often missing from the debate is this: version control is a fundamentally complex problem.

No, it's not!

I've been teaching SVN to Electronics Engineers, Designers and other folks who have never used version control before. It's always been very straightforward. They'd be able to use it effectively after a quick tutorial and demo, with a few notes of advice.

I've been teaching Git recently... until I gave up. It's a giant WTF clusterfuck. Git is the ultimate failure in simplicity and user experience.

Just use SVN (it works on git repo).


> No, it's not!

Yes it is. SVN is so "simple" because it doesn't have many of the features needed by large teams working on code. It's simply not fit for purpose.

> Just use SVN (it works on git repo).

Unless you need branches (and merge them), code that doesn't get silently corrupted, a proper history of all changes, the ability to work offline without connecting to a server. Y'know, the problems that git was built to solve.


> Unless you need branches (and merge them),

I use svn to merge branches every day. I do agree that git is probably more useful for more complicated setups, but svn is very capable of branching and merging.


> I use svn to merge branches every day.

Has it gotten any better with conflicts? From what I heard a few years ago, it's still absolutely shocking (if two people touch the same file you can forget about merging your branch).


I typically don't have any issues, but I guess that would heavily depend on your workflow.


My problem with Subversion is that it only really works well for trivial cases where you have linear and non-overlapping changes. It's great for configuration management, but not much else.

If you have more than 1 person doing simultaneous changes to a repository which might overlap, SVN becomes a pain to use because its merge handling is basically "Oh, there's a conflict. Here is the whole mess, good luck!" and you lose access to all the tools SVN provides for normal use.

Git (and hg) treat merges as an everyday thing, and a merge conflict does not leave you unable to do basic things such as commit or stage changes with the index, which is a huge help when resolving large conflicts.


I've met people who have given up entirely on merging SVN branches. They just copy changes around manually. Coming from git, I thought their overly-cautious usage of svn was ridiculous. My attitude changed when I ran into revprop issues merging my changes and struggled to discover why some commits seem to randomly be missing svn:mergeinfo. I only had so much time to deal with the issue, so eventually I gave up on fixing the problem and just generated patches to apply manually to the other branch.

One thing I always worried about when experimenting with commands in SVN was that I'd permanently screw up the server. With git, I was comfortable in the knowledge that as long as I didn't push, at worst I would screw up my local repository. Even better, I could just make a copy of that local repository before trying something I wasn't sure about. I haven't had to do that in a very long time, but it was nice while I was learning.

git is not the first version control I tried, but it's definitely the first that really made sense to me. Sure, there's a bunch of badly named commands to memorize, but the underlying concepts they represent are simple.


Personally, I find darcs lovely to use. It avoids complexities like branches (you already have a filesystem - if you want to work on multiple things, you can have multiple checkouts) and provides a simple UI that shows you what you're doing every step of the way by default.


Try Mercurial. It's mental model maps more directly to SVN.

I have successfully taught it to people like CEO's and CTO's with very little difficulty.

The problem with Git is that it has 3 areas of state (working-index-repo) instead of 2 (working-repo).

That makes things HORRIBLY complicated for the 99% of useful use cases.


+ there are nice tools like RhodeCode to work with Mercurial repositories.


Git is not "simple" from the point of view of new users. But it is "simple" from the point of view of the programmers of Git. There's the graph structure and some tools to manipulate it. It doesn't do much to hide that.

The underlying code of the application I'm writing would be much "simpler" if I didn't have to add all kinds of features to make it easier for the user.

Also, Git is not so much a VCS application as a framework for version control.


> "from here we can state that, obviously..." He suddenly stops, slowly turns the head down, keeps mumbling "so we can obviously...". He slowly walks in the room, trying to catch the thought, then leaves the room... Some minutes pass. Then he comes back, with head up, widely open eyes and confidently continues - "Yes, this is completely obvious - so from here we can easily get the following conclusion."

I had a lecturer who did this! With the only difference being he didn't leave the room, he would just stand there speechless with a poker face and then - 0.5-1 minute later - he'd say "well, this is obvious".

SUDDENLY, during the exams he expected us to explain the obvious.


There is a good complementary story to this, of the math professor who is asked how to solve a math problem by a student. The professor pauses, then says the answer. The student asks again, "But how do you solve it?" The professor pauses again, says the answer again. When the student protests, the professor says, "I already solved it two different ways! What more do you want?"


My problem with git is that it has a simple model, with a horrible GUI on top.

For example, what does the following command do?

    git checkout a/b
I've had confused users manage to produce 3 different outcomes from this command:

* Get a clean copy of the file a/b from the current branch

* Checkout the branch b from remote a

* Checkout the local branch 'a/b'

By confusing where you can use particular types of arguments, I've found students will often end up in these situations (in particular lots seem to end up with local branches called 'remote/branchname', or files called 'remote/branchname'), which tends to lead to massive confusion.

Life would be much better if git was a more direct mapping to it's underlying model, and didn't try to be so clever about guessing what users wanted.


You asked what the following command did, and it will do plainly what you tell it to, but you're treating the fact that a/b can be anything you make it to be somehow a fault of checkout, and all because it decided to be nice and try to guess what you really want. I don't understand that line of thought.

The git documentation clearly states what happens in your scenario:

> ARGUMENT DISAMBIGUATION

> When there is only one argument given and it is not -- (e.g. "git checkout abc"), and when the argument is both a valid <tree-ish> (e.g. a branch "abc" exists) and a valid <pathspec> (e.g. a file or a directory whose name is "abc" exists), Git would usually ask you to disambiguate. Because checking out a branch is so common an operation, however, "git checkout abc" takes "abc" as a <tree-ish> in such a situation. Use git checkout -- <pathspec> if you want to checkout these paths out of the index.

See here: https://git-scm.com/docs/git-checkout#_argument_disambiguati...

If you somehow manage to purposefully name things in a confusing manner, you can tell git exactly what you want it to grab at any point in time. I don't possibly see how this is a fault of the tooling.


We might just have different points of view here.

Obviously one can read the documentation carefully and read what's going on, but in practice most users don't. Also, to be fair to users, they often don't know what they don't know -- they aren't aware they should be reading how checkout does argument disambiguation when they start getting weird error messages.

On the other hand, if git didn't allow '/' in branch names, except are a remote/branchname seperator, then we would know what '/' meant -- in practice this is what almost everyone does in practice anyway (there may be a deeper issue here I'm not aware of).

Similarly, if git didn't have commands which could take either a <tree-ish> or a <pathspec>, say have "checkout-file" (that's an awful name, but I'm sure we can find a better one), then there would be no need to have this special bit of documentation, and describe to people how to get around the issue.

My experience of teaching is that git guessing what people want makes learning git harder, as they have to learn what are the fundamentals, and what is git being "helpful".


Yep there's a lot of best practices you learn after using git for a while that is not obvious initially, like not using '/' in branch name (though, I used to have a coworker who always named branches like 'fix/something' or 'feat/something' -- well at least there was a pattern there).

I also don't use most of the git commands directly, I have a bazillion of aliases that are shorter and/or more intuitive.

To be honest, there were some changes for the better in the CLI. For instance 'git push' no longer defaults to the crazy 'matching' behavior, but 'simple' instead.

I guess the problem is perhaps that most of the time, people get used to the quirks and don't report them (maybe I'm mistaken).

In fact, if git bug tracker was hosted on GitHub, I'd probably be more active in reporting and discussing things. I'm not an old school mailing list kind of guy I guess, and maybe not only me (but I believe many git maintainers would say, it's a feature that barrier of entry is not low).


I always use '/' in the branch name and have never had any issues. Can you explain why this would be an issue?


The only issue I am aware of is that beginners often assume, based on basically every tutorial, that / has a special meaning, it is used to denote remote/branchname.


To be fair, issues like this are really only UI issues with some git commands, not necessarily git as a whole. The GP post was probably lamenting a much deeper issue with the design of git.


Mercurial bookmarks are equivalent to git branches, but Mercurial branches have no equivalent in git. In Mercurial, branches give you a way to track the branches a commit is a part of which is good for long-running branches (I think of things like master, maybe a long running refactor or major feature branch) while bookmarks are great for the more normal workflow of small, focused, feature branching. Neither of those are designed to cover a gap in the core design, they are a _feature_ of the core design that git doesn't even have.


>branches give you a way to track the branches a commit is a part of which is good for long-running branches

I think you have a slight misunderstanding on the differences between git and mercurial, as git absolutely allows you to do this. Mercurial tends to track this information in a way where it actually limits telling you what branches a commit is contained in. Git will tell you straight up which branches lead back to a commit (i.e., being contained in them). Mercurial tends to only give you immediate, direct ancestors, any kind of complicated history and Mercurial can't be bothered to actually to dive in and tell you what branches a commit is really apart of.

That seems to be strangely counter-intuitive for "long running branches."

Git doesn't need an equivalent of Mercurial branches and bookmarks, because git branches do everything both of those mercurial features do and more, in a far more lightweight and efficient manner contained in a clean, singular feature.


> Git doesn't need an equivalent of Mercurial branches and bookmarks, because git branches do everything both of those mercurial features do and more, in a far more lightweight and efficient manner contained in a clean, singular feature.

Not quite everything. Since Mercurial branch names are baked into the commit, if you wanted to know which branch a commit was originally part of, you can find out.

Git does not support this. If two branches have parent commits in common, it technically belongs to both branches.

There is a valid conversation to be had on if this feature is actually necessary or makes sense in the context of a DVCS, but git branches are not a full substitute for Mercurial branches.


> If two branches have parent commits in common, it technically belongs to both branches.

It belongs to both branches as well in Mercurial, it just hides that information from you. An astoundingly bad idea, no? I'm not sure how mercurial hiding information from you is a "feature" in any sense of the word.


Hiding or encapsulating information is arguably part of the very nature of an abstraction; the CLIs provided by git and Mercurial could be considered an abstraction of the implementation details of the respective technologies.

As such, I fully expect each tool to hide some amount of information so that I don't have to think about the details.


There's no easy quick way to represent the actual state of the commit. It's far easier in git because it's the only way to do it. Furthermore, then information being "encapsulated" can actually become stale or misleading in Mercurial.


Mercurial records in every commit, that branch it was made. It is part of it's identity. That is what the parent is talking about. Git does not do this. Because git does not have this concept of named branches.

In other words, Mercurial has feature branches, while git has developer branches. This gives much more possible workflows in case of Mercurial.

>Mercurial tends to only give you immediate, direct ancestors, any kind of complicated history and Mercurial can't be bothered to actually to dive in and tell you what branches a commit is really apart of...

http://stackoverflow.com/questions/7166011/mercurial-find-al...

Is that what you are talking about?

Mercurial contains vastly more querying capabilities in the form of revsets. No such thing exists for Git.

If you go head to head with git vs mercurial, git will loose pathetically.


>Git does not do this. Because git does not have this concept of named branches.

It doesn't pre-bake the information in the commit, because it's entirely useless information to bake into a commit; and even in the case of Mercurial can paint the wrong picture about the nature of a commit given branches that it's on.

>This gives much more possible workflows in case of Mercurial.

That is the exact opposite of what it gives you. I'm not even sure how you possibly arrived at that conclusion.

>Mercurial contains vastly more querying capabilities in the form of revsets.

Git doesn't need a query language, because you can literally do everything revsets will give you with git log and then maybe some very elementary bash script knowledge. You are literally arguing that Mercurial having its own arbitrary DSL is somehow better than git being able to seamlessly use popular tooling with conventional command line syntax; which seems to be at odds with the whole "mercurial is easier to use." I take extreme issue with your claim that Mercurial has "vastly" more querying capabilities and then you not even give one measly example.


>because it's entirely useless information to bake into a commit;

I have found it very useful in untangling a complex history. You might not see the value in it becuase git does not care to maintain a truthful history, but promotes building a "clean" history. It makes sense to do that if you are maintaing somethign like linux kernal. But for smaller projects, maintaining a truthful history is much more valuable.

So I would say that it is extremely valuable information in the context of Mercurial.

And there are other version control systems that follow similar model. Take a look at the '3.3 Branches' heading in another fantastic version control system called 'Fossil' here [1].

>That is the exact opposite of what it gives you. I'm not even sure how you possibly arrived at that conclusion.

Well, Mercurial supports git's branching model and a lot of others. I am not sure how that is not clear.

>Git doesn't need a query language...

Of course. It is not meant to be used by mere mortals..

>because you can literally do everything revsets will give you with git log ...

Are you seriously saying that using command line flags is more powerful and easier than using a query language that lets you use arbitrarily expressions to query a data set?

> I take extreme issue with your claim that Mercurial has "vastly" more querying capabilities and then you not even give one measly example.

Checkout the link I mentioned in my earlier comment.

This is the manual page for revsets [2]. Please take a look.

[1] http://www.fossil-scm.org/fossil/doc/trunk:2012-01-01/www/fo...

[2] https://www.mercurial-scm.org/repo/hg/help/revsets


>But for smaller projects, maintaining a truthful history is much more valuable.

Elaborate.

>Mercurial supports git's branching model and a lot of others.

Except, it doesn't completely.

>Are you seriously saying that using command line flags is more powerful and easier than using a query language that lets you use arbitrarily expressions to query a data set?

Yes, because command line flags are ubiquitous in nearly every single tool, versus an arbitrary DSL that has no convention behind it, or can be easily parsed by other tools in the same ecosystem.

> Please take a look.

You seem to be under the impression that I have not or do not use Mercurial. You would be mistaken. Both of those pages do not lend support to your claim that the ability to "query" (with or without a DSL) the history graph is somehow superior in Mercurial. They merely document the feature, but do not draw any meaningful comparisons highlighting what it can do that git log or even git rev-list (which git log uses) can't.


> Elaborate

For example, to answer questions like "What feature was this commit part of, originally". "How and where did the merges happened?". To answer these questions months after the merges happened, even after you did't think that you would need to answer these questions, something like the workflows mercurial is essential.

>Except, it doesn't completely.

Elaborate?

>Yes, because command line flags are ubiquitous in nearly every single tool

Ok. But that does not mean that it is powerful enough and can be a replacement for or better than a proper query laguage..

> arbitrary DSL that has no convention behind it

What convention? I am not getting you.

>can be easily parsed by other tools in the same ecosystem.

Parse what exactly?

>Both of those pages do not lend support to your claim that the ability to "query" (with or without a DSL) the history graph is somehow superior in Mercurial. They merely document the feature, but do not draw any meaningful comparisons highlighting what it can do that git log or even git rev-list (which git log uses) can't.

Oh it does support it plenty. Take a look at the vocabulary supported by revsets in form of intuitive and easy to remember predicates. I don't think you are really impartial if you are not convinced by it.


>to answer questions like "What feature was this commit part of, originally"

Neither system will answer that. If you're relying on branches to tell you what feature sets changes belong to, you should be adding that to your commits.

>"How and where did the merges happened?"

Git will do this faster and in a more correct manner. In a complicated history graph, Mercurial will actually hide information from you if you try to query it and sometimes even mislead you. I'm starting to think you haven't used either system extensively.

>Parse what exactly?

Arguments and output.

>Oh it does support it plenty.

It does not. If you can't give an example, you can just admit that you don't know what you're talking about.


>Neither system will answer that....

I am not sure what you are talking about. Every change set in Mercurial contain the branch in which it was made. Even after a merge, there is a clear distinction about the commits made in a certain branch.

I think you might be thinking of rebased change sets. In that case, you would be right.

>Git will do this faster and in a more correct manner..

Please show an example. I would also like to see an instance where Mercurial hiding information from you and giving misleading answers to your queries.

>Arguments and output.

But it is supposed to be used by people. So it should be first human friendly. And there is nothing stopping you from parsing output of mercurial commands. And what is the use case of parsing arguments? Why did you think so? Mercurial has even a template feature letting you format the output in any way you want. I am not sure, it seems to me that you are not making a whole lot of sense..

>If you can't give an example, you can just admit that you don't know what you're talking about.

There are some examples at the bottom of the revset page I linked. Knock yourself out. And do share some examples of your claims, because your last commits contained enough stuff to doubt your experience..(stuff about conventions and DSLs, what was that again?)


> I would also like to see an instance where Mercurial hiding information from you and giving misleading answers to your queries.

A branch with two heads. Two branches that have the same name. Which commit came from where? What happens if that branch was merged into the other? Or what happens if only one got merged into a "mainline" branch, while the other one was reverted else where. It's useless information at best, and misleading at worse.

>But it is supposed to be used by people. So it should be first human friendly.

You're going to claim a DSL that you have to research is somehow easier to use than simple command line arguments that have decades of convention behind them? Odd.

Git allows you to format output as well. Not sure why you felt the need to bring that up. Again, unless you've extensively used both systems I'm not sure why you feel the need to argue because it's making you look foolish.

>There are some examples at the bottom of the revset page I linked.

And absolutely none of them help your argument.


> It's useless information at best, and misleading at worse.

It seems useless to you because you see it from a git viewpoint. It is clear when you say that "Two branches that have the same name". It is not "two branches", but one branch with two heads. In Mercurial, a Branch is a set of commits. That view is incompatible with the thing that git call "branches".

If you think about it, Mercurial's implementation of a Branch is closer to the idea of a "Branch". As I told earlier, Mercurial branches are "Feature" branches". I am not sure if you have gone through the fossil doc page I linked earlier. When you see it like that, it makes complete sense that a Branch can have any number of heads. Say, the second head containing a different implementation of the same feature.

(God I miss Mercurial)

>Which commit came from where? What happens if that branch was merged into the other? Or what happens if only one got merged into a "mainline" branch, while the other one was reverted else where. It's useless information at best, and misleading at worse.

These questions are the result of trying to make sense of Mercurial branches in a GIt way..

>what happens if only one got merged into a "mainline" branch, while the other one was reverted else where.

Again, this question is a result of you seeing mercurial branches in a git way. Because git define a branch as the set of commits starting from a "head", the idea of merging only one of the heads does not make sense to you.

But you can do the same with git, right. You can move back couple of commits from a head and create a new head by starting committing from there. Only that in git, the previous head would be hidden. And you will have to go to the reflog to dig them up. But in Mercurial, things are much clearer and exposed (contrary to your accusation of it hiding and misleading).

So again, it feels misleading to you because it is different from git. It is very consistent in conforming to its on model.

>You're going to claim a DSL that you have to research is somehow easier to use than simple command line arguments that have decades of convention behind them? Odd.

I am not at all getting you here. "command line arguments that have decades of convention"? What exactly does that mean? And what is the thing about "a DSL that you have to research"?. You don't have to look up command line arguments of git? And you call git's command line arguments simple?

>Again, unless you've extensively used both systems I'm not sure why you feel the need to argue because it's making you look foolish.

I didn't claim that git does not allow you to format the output, but just that Mercurial allows you to do that so that your argument about easily parseable output is moot.


Could you maybe specify better how this is implemented in hg so that I can understand why it would be better than:

    git branch --contains <commit>


When you make the commit, hg records in the commit which branch you were on when you made it. It becomes a permanent part of the commit. Regardless of whether you later merge or even delete that branch, the association between that branch and that commit is still there.

I personally don't understand the use case, and prefer git's way of handling things, but I don't rule out that other people find it useful.


One use case is given branch X, tell me what the branch looked like N commits in the past. Git cannot tell you this because all parent commits are equal. DVCS workflows tend towards lots of short-lived branches where it is less useful (which is why hg has bookmarks).

It can also just simplify certain things. I can tell by checking if a commit is reachable from e.g. the v1.2 bugfix branch but not from master that the commit was post the 1.2.0 release to fix a bug.

With hg branches you can just see that the commit was made the the v1.2 bugfix branch. This is very minor, but my experience with hg is that there are dozens of these minor niceties that are why a lot of hg users are still around despite git having clearly won the popularity contest.


>One use case is given branch X, tell me what the branch looked like N commits in the past.

What am I not understanding here? You definitely can go N commits in the past on each branch, and even any commit. And if you rewrite history you can even use the reflog to really tell what the branch was like (as long as it hasn't been cleaned out).


    ┌─────┐◀──────── master
    │Merge│◀──────── bugfix
    └▲───▲┘
     │   │
     │   │
     │  ┌─┐
     │  │B│
     │  └▲┘
    ┌─┐  │
    │A│  │
    └▲┘  │
     └───┤
         │
    ┌────────┐
    │Ancestor│
    └────────┘
Which branch was commit `B` made on? What did the `bugfix` branch look like one commit ago?


Nice use of Unicode! Do you have a tool to help you construct a diagram like this?


If you were in master when you merged bugfix in ("git merge bugfix"), then you can

  git cat-file -p master
and see which commit, A or B, is listed as the first parent. This belongs to the master.


> Which branch was commit `B` made on

You can do some basic heuristics to figure it out, but I'm not sure what the point of knowing this is.

>What did the `bugfix` branch look like one commit ago?

`git checkout bugfix; git log bugfix^1`


In the case of merges, how would you tell which parent to look at? Branches might have been deleted, moved, or merged.

> And if you rewrite history you can even use the reflog to really tell what the branch was like (as long as it hasn't been cleaned out).

The reflog is not moved across remotes, is it? If I make a fresh clone of a repository, that doesn't tell me what the branch was like a year ago.


You're right, it's not moved across remotes. I'm just confused to why the information would ever be needed. It seems to not have any real use case. You can recreate a branch any where at any point in the history, why would you care what commit a branch originally came from? It seems like a solution in search of a problem. Or maybe even a solution to a problem that shouldn't need solving in the first place!

>If I make a fresh clone of a repository, that doesn't tell me what the branch was like a year ago.

This isn't true with Mercurial either. A lot can happen in a year, and it might even make the information misleading (having similar named branches, history deletion, etc). It's useless at best and misleading at worst.


"This commit has a bad commit message. Why did I commit it? Oh, it's on the branch add-search-functionality."

I find that having a branch name that was the sole branch name when the code was made gives context to the commit; why it was made and what its purpose was. Being reachable from a branch pointer doesn't give that same context. Yes, commit messages should handle this. But I find they don't.

> It's useless at best and misleading at worst.

This has not been my experience. I haven't found a great need for "what's the whole state a year ago"; more so "what was the context around this commit", which I think a mercurial-style branch name answers.


I'm sorry, in what practice are you in where your commit messages are somehow so lackluster that a branch name (not even a PR!) can give you more information about why something is being done?

And to make sure no one responds with some silly "just because you don't have a use for it, doesn't mean don't have their uses for it": I get what you're trying to say. I'm really just not buying it as an actual use case. If your commit message isn't sufficient enough to explain the purpose of the commit (ticket numbers, explanation, etc), it doesn't get approved. If you're relying on a <20 character branch name to give you sufficient context about a commit a year later, I simply have no words.

>which I think a mercurial-style branch name answers.

I find that extremely suspect.


> If your commit message isn't sufficient enough to explain the purpose of the commit (ticket numbers, explanation, etc), it doesn't get approved.

I'm glad this is the case where you work and develop. I have never worked at a company where this is the case. I'm also not perfect about commit messages in my personal code. The codebase I'm currently working on has recent commits with description "review", "add addressDto" and "post rebase fix".

> >which I think a mercurial-style branch name answers.

> I find that extremely suspect.

Ok. If I see "add addressDto" as a part of the branch "google-geocoding", that tells me the context. If I see it as part of "refactor-to-use-hibernate", that also tells me something useful.


>The codebase I'm currently working on has recent commits with description "review", "add addressDto" and "post rebase fix".

I guess what I don't get, is that if you have people doing commit messages like that, how can you possibly expect them to appropriately name a branch which most likely stores far less characters.


Far too often the commit message says "what" while the branch name says "why."

Just recently I had to triage a recently discovered bug whose symptoms disappeared about 4 years ago to determine if the bug was fixed or just masked. This was code that I was completely unfamiliar with because the other engineers that might have looked at it were otherwise indisposed working on things that were known for sure to be issues.

The "what" told me the specific change that masked the bug and the "why" informed me of the feature that this supported. That allowed me to generate a new test-case that showed up in the current version. Without both pieces of information I would likely have had to spend a couple of hours inspecting code changes to do the same.


If you can't trust your developers to write acceptable commit messages, why are you trusting them to write code?


Because you're less in the nitty-gritty when naming a branch, so you're more likely to name it with something referencing the large feature you're working on.


I work with two codebases with over 200k commits and over 10k branches each. Sometimes the branch names are useless. Sometimes the commit messages are poor. Often knowing one helps provide more context for the other.

Believe it or not, there were times in the last 20 years when there was insufficient review of commit messages!


It's sometimes very useful, but not that often. It's not great for short lived feature branches. It's for long-term branches. If you don't need those (and most often you do not): great, your life is (relatively) simple; don't over-complicate it. You don't buy the use-case, because the situation is not that common or desirable. But once you get in a situation where you really do need multiple long-term branches developed in parallel, you might be happy to have a tool that helps untangle the web of merges somewhat.


>It's for long-term branches.

Please elaborate how this helps you in long-term branches, instead of just assuming long-term branches aren't something most people deal with on a daily basis.

>you might be happy to have a tool that helps untangle the web of merges somewhat.

This has no relation to long-term branches, this happens with any kind of normal history, and even shared history between repositories (which is a far messier thing to untangle). Yet this feature is unneeded, because the information can be correctly found within git in an accurate and not misleading fashion.


> branches give you a way to track the branches a commit is a part of which is good for long-running branches (I think of things like master, maybe a long running refactor or major feature branch)

If you always merge "from the right direction", you can follow the first parent line in git and reconstruct which parent is in the master branch (or in the "heavier" branch in general) and which isn't.

In more complicated situations, Mercurial named branches (commit forever carries the branch name) can be helpful. But you can get pretty far with git, too.


> most commands or command flags simply help the user do something in an automated fashion that you could do by typing each specific command out (i.e., git pull --rebase)

A tangent perhaps, but would you mind expanding on this? I use `git pull --rebase` frequently, and I'm not sure how this fits into what you described. Is it the specific command, or the automated shortcut? And what's the counterpart?

Just trying to check my mental model.


"git pull" is just a shortcut for running "git fetch" and then merging (or rebasing) the current branch to its upstream.

IMO this is a stumbling block for a lot of people new to Git -- for example it covers up the fact that "remote" branches are a separate reference ("master" and "origin/master" are not the same branch!).


"remote" branches are called remote-trackimg branches. And yes, it's very confusing. Especially when git status says "up to date with origin/master", but pulls down loads of stuff when you pull...


From the "git-pull" man page:

> Incorporates changes from a remote repository into the current branch. In its default mode, git pull is shorthand for git fetch followed by git merge FETCH_HEAD.

> More precisely, git pull runs git fetch with the given parameters and calls git merge to merge the retrieved branch heads into the current branch. With --rebase, it runs git rebase instead of git merge.


I had a comment written up, but I decided to test my claim in my original post about a "litany of literature", and sure enough this has been documented elsewhere:

http://gitolite.com/git-pull--rebase

I searched for "how does git pull --rebase work"


Git is certainly the best of the version control systems to date, and I'm happy that it won. Experts have no problems with issues like this.

The described failure mode also can occur in any existing version control system --- the user is essentially did

cp -f ../mybranch/somefile-new.c somefile.c; hg commit somefile.c

The history in mybranch touching somefile.c is now partly useless, since all the patches are already applied. Of course, the branch can still be merged, if the conflicts are resolved --- the conflict resolution is then pretty much doing the above command again.


Honestly, I'm in the exact opposite camp - I'm glad mercurial is still alive and kicking, since I've found my experience with it (and tortoisehg) to be superior in a number of ways. Fewer cross platform issues, a much cleaner commandline interface, can hop over to a much more powerful gui if needed -- I generally use it even when I'm interacting with a git repo.

That said, I'm far from an expert in git. What do you see as it's advantages? (outside of the larger mindshare, obviously :)


The UI story of Hg is much better than Git's. The conceptual model of Git looks superior, though: it's more general and more orthogonal.

I wish some well thought-out front-end to Git became hugely popular and eventually standard. I mean, including the CLI. Git even has a "plumbing mode" to interact with other programs more easily.

But doing that is hard work.


Although I'm quite familiar with git on the command line, I'm using git mostly through magit[1] for Emacs. It's a very functional and discoverable user interface.

[1]: https://magit.vc/


Same here! Magit is brilliant.

Emacs is a somehow heavyweight frontend for a VCS, though. OTOH it can be considered a GUI client (and comes with a nice built-in editor).


The fact that you use a Windows-only GUI for Hg is extremely telling. Git is fundamentally a Unix tool that follows Unix design expectations and integrates with the Unix ecosystem and expects the user to be a component command line user.


I'd disagree with a number of assumptions in that statement.

For one, TortoiseHg isn't windows only. It's a python+qt based application that works uniformly pretty much everywhere.

That fact that you made that assumption makes me thing you may not have explored many alternatives. I'd recommend trying out some other VCSes, just to get a broader picture of how they function. I think doing that helps to get a grip on what your specific needs actually are, as well as a better understanding of the tradeoffs for whatever tool you pick. None of them I've seen are 100% superior.

Secondly, I love tortoisehg specifically because of the command line. Doing a simple commit with mercurial just takes `hg ci`. Doing a commit using the gui just takes `thg ci`. The latter one pops up a gui for me to stage the commit, and then I'm back in the console again.

IMO it's the best of both worlds -- I don't have to get knocked out of the commandline until I need to, and I can trigger whatever gui action explicitly, as part of my existing workflow.

There are some things which I think are just fundamentally better with a gui -- whether it's a curses-style "TUI", or proper graphics. Under tortoisehg, if I'm cherry picking for a commit, I can double click to open up a file in meld to make last-minute edits, shelve bits away for later, all kinds of one-off actions which are much more complex from the command line. Not to mention performing complex searches on VCS history.

It's a lack of an equivalently powerful cli-controlled gui for git which has kept me using hg as my primary vcs.


You're making a number of assumptions yourself.

>For one, TortoiseHg isn't windows only. It's a python+qt based application that works uniformly pretty much everywhere.

Yeah, admitted this mistake elsewhere.

>That fact that you made that assumption makes me thing you may not have explored many alternatives. I'd recommend trying out some other VCSes, just to get a broader picture of how they function. I think doing that helps to get a grip on what your specific needs actually are, as well as a better understanding of the tradeoffs for whatever tool you pick. None of them I've seen are 100% superior.

I have used, in this order, visual source safe, TFS, SVN, Hg, git.

>Secondly, I love tortoisehg specifically because of the command line. Doing a simple commit with mercurial just takes `hg ci`. Doing a commit using the gui just takes `thg ci`. The latter one pops up a gui for me to stage the commit, and then I'm back in the console again.

Doing a simple commit with git is just git commit. What am I missing here? You could alias it to git ci if you want.

>There are some things which I think are just fundamentally better with a gui -- whether it's a curses-style "TUI", or proper graphics. Under tortoisehg, if I'm cherry picking for a commit, I can double click to open up a file in meld to make last-minute edits, shelve bits away for later, all kinds of one-off actions which are much more complex from the command line. Not to mention performing complex searches on VCS history.

These aren't more complex on the command line, they're just done differently. They aren't as intuitive as a GUI, where the button is presented to you rather than having to read about a flag in the man pages. But VCS is a tool you use all day every day, it's worth it to learn about it in depth.


> Doing a simple commit with git is just git commit. What am I missing here? ...

> ... These aren't more complex on the command line, they're just done differently.

The process I outlined was a series of steps which were all part of what I'd consider an average commit (not necessarily "simple"). Invoking `thg ci` I could cherry pick lines, edit files as I'm reviewing the diffs, shelve away others, all within seconds, without shifting context. There isn't a single "flag" to read about which makes typing all those commands out necessarily faster.

Simple commits, I will frequently just do `hg ci` and be done. But if I'm doing a bunch of manipulation, I have click the hunks I want, commit, merge and push, all in much fewer seconds than it would be possible to type the equivalent set of commands.

I consider the overall job "tell the computer what I want, as efficiently as possible". For some specific tasks, a mouse just is a better method than a keyboard. I think it's good for complex multi-purpose tools to offer multiple UIs, allowing the user to get work done using the most efficient method tailored to their specific task at hand.

That said, I think having them based on the CLI, and using it to trigger a gui, is a far superior meta flow than having a gui trigger a cli just never seems to work out.


I just recently discovered `hg ci -i` which is a TUI for selecting which changes/hunks to include in the commit. I think it used to be a separate extension but now is part of the main hg distribution. I've tried to use it more now for scenarios where I happened to make multiple modifications to a file while working on a fix/feature and want them in separate commits.


The real reason ther eare few git GUIs is not that git expects advanced users that can use the command line. It is that it is very very hard to create a GUI for git due to the way it is built as a complex web of web of scripts written in C, Shell and Perl.

This architecture is the reason why projects such as libgit had to reimplement a lot of core git algorithms. https://libgit2.github.com/

Mercurial, on the other hand is more amenable for extension due to being built our of a bunch of Python modules.


You're half right, but coming from the wrong place. The fact that git is hard to make a GUI for is an intentional design choice in git. It was built to be CLI first and shoehorning a GUI onto it is the wrong way to use git. The correct way is to learn how to be comfortable in a CLI. Git's design with that web of shell and perl and C is totally fine and suited to a CLI-first application.


I think different tasks are (more) suitable to different interfaces.

CLIs are great when the space of actions you could perform is large (to nigh uncountable), but the range of subjects is small (the file paths, etc are known). Since you know the subjects, a programming language like `sh` is the most concise way to express a complex action.

GUIs are better in the exact opposite case: when you have a constrained set of actions to perform, but a large number of subjects they may be performed on. In that case, the problem isn't expressing the action, it's picking which subjects to act on (pick these lines to commit, oh except let me edit that one, put that one aside, ok back to the commit i'm staging...).

Many complex tools like VCSes contain both kinds of situations. I think sometimes a CLI is more appropriate, and sometimes a GUI is, depending on the context. IMO a fully mature VCS should make it as easy to invoke a gui for complex tasks as it is to do them from the command line.


>GUIs are better in the exact opposite case: when you have a constrained set of actions to perform, but a large number of subjects they may be performed on. In that case, the problem isn't expressing the action, it's picking which subjects to act on (pick these lines to commit, oh except let me edit that one, put that one aside, ok back to the commit i'm staging...).

Those examples are not only possible, but easy to do in the CLI for an advanced user. I do that sort of thing every day.

I agree that some tasks are better suited to a GUI and some are better suited to a CLI, but your VCS is firmly in the latter camp imo.


As a semi-concrete example, say you have a VCS checkout with 40 edited chunks across 15 files. You want to commit 20 of 40 chunks (spread across 6 of 15 of the files). When you're halfway done, you notice a typo needs correcting in chunk 16, and fix it. You then proceed to commit your selection.

From the command prompt, invoking thg and doing that took me about 18 seconds.

I'm reasonably good on the command line, but I certainly don't see how to beat that time using just the cli. I think saying it's "easy for an advanced user" is a little loaded, but I'm still interested to know how to do that task faster. And if such a cli process could only be done with git, I'd consider switching.


git add -p


That's a good example of where I think a gui has an advantage.

For the example I gave, unless the hunks were asymetrically distributed, `git add -p` would require pressing "n" 20 times to skip through the excluded hunks (among other things), even when the user could visually see all the hunks, and know which ones they wanted. Where as gui would only require the clicks to select the "y" hunks.

I realize this is arguing a difference in the efficiency, not capability, of the UIs; but that's basically my point. The use-cases where one or the other is optimal are too closely situated together in the problem space to say one is inherently the better choice, even for VCS tasks.

While command line interaction may allow the user to receive a whole screen's worth of information at once, it forces them to interact with it in a serial fashion, regardless of whether out-of-order interaction with on-screen elements would allow them to complete the task faster.


I tend to use Git primarily at the command line as well. I recently saw a colleague use an advanced IDE (I think it was IntelliJ IDEA) to perform a complex `git add -i` with staging individual hunks in multiple files. He did it quite a lot faster in his GUI than I could at the command line. His IDE displayed the files he was staging, with full syntax highlighting for the programming language, and background highlighting showing the lines being modified.

He quickly paginated through the file and just by clicking with his mouse was able to stage and unstage hunks. He had a neat side-by-side view with two panes of text showing the current text and staged commit, if I recall correctly. The IDE instantly showed if this would result in syntax errors, and could even automatically run the project build and tests.

`git add -i` and `git add -p` can do this, but it's much more clunky and serial and modal. For example, my colleague could easily scroll up and down the files, staging and unstaging hunks as needed. He could easily see which files in his file explorer needed this processing and switch between them. This information was all displayed on the screen at the same time, contextually, and the GUI enables navigation to any point in this workflow instantaneously with a click or scroll.

I don't tend to have to do this kind of operation very much, and I'm comfortable doing them from the CLI, so I haven't bothered to set up this GUI tool. But I've seen that GUI tools for tasks like this are faster. Anything involving editing the actual text of files or diffs of them will likely be faster in a GUI.


It seems unintuitive to me that manipulating a graph via a GUI should be the wrong way to operate. I find it difficult to imagine this argument would be made if git were not the DVCS of choice.

I would say, though, that the ideal GUI for a DVCS has probably not been invented yet.


Right and it's pretty sad, I think. Most contemporary software doesn't even try making use of graphic displays and mouse input to create a more efficient interface for trained professionals, often lacking even the most basic forms of ad-hoc programmability.


tortoiseHG is not windows only


My mistake. The point remains - users who are more comfortable in a GUI are in a similar boat.


> Git is certainly the best of the version control systems to date, and I'm happy that it won.

Me too.

What I found was that people do not recognize what tool git really is. People do not read the manuals and count on their world knowledge acquired from years of svn or other centralized systems. It's not the best sign, that you have to really dig git to be able to use it properly. But once you understood the core ideas, you can basically derive a lot of the workflows yourself, in a very principled and clear manner. Also: once you get a bit more familiar with the data model, one almost cannot not appreciate its beauty, I believe.


This is close to theological discussion territory, and I can see both sides of the argument, so I'm just going to mention that there is an interesting paper[1] from a couple years ago that uses a rigorous concept design process to attempt to simplify Git, and results in a VCS that does indeed require a simpler mental model (Gitless).

Personally I'm not going to abandon Git and all the tooling and community it comes with just for a simpler mental model, given that I have a moderately good handle on the (mainline) features of git -- and therein lies the problem. (Although note, Gitless does work on top of Git, so you should be able to use it with Github by the looks of things [2]).

I've linked Adrian Colyer's summary of the paper to make it easier to digest:

[1]: https://blog.acolyer.org/2016/10/24/whats-wrong-with-git-a-c... [2]: http://gitless.com/#vs


I have to say I am fan of both Mercurial and Git particularly because I'm a crusty old developer who came from some god awful VCS/SCM like PVCS, CVS, RCS, Source safe. I remember first using Perforce for the first time and thinking dear god it is expensive but totally worth it (this was before git).

Really what I think needs improvement is better diff/merging interfaces. There really isn't platform independent one that is opensource and free.

Right now I use p4merge because it works fairly well on macOS but I still think it sucks (it doesn't have keybindings and yeah I have tried vim and emacs diff ... they are ok but not as easy to visualize diffing).


I agree. I know there is a lot to Git and you can pretty much do everything. But whenever such discussions come up about how many options there are and how amazing it is, I just glaze over and completely fail to concentrate. I don't think I have the brain power to actually master Git. Luckily I work in a 2 man team, so rarely do I need to do anything particularly complex. I have my workflow and I do what I can not to stray from it.


Remember that git was written by and written for kernel hackers. Most of the people I've found having trouble with git are as far from kernel hackers as you can get and most who don't are quite a bit closer.

As a kernel hacker myself, I think Git's interface is superb and intuitive.


I don't think Junio Hamano (aka gitster) has done much kernel work at all.

But point taken.


I think there's an opportunity for a simpler-to-use alternative CLI Git client. Something more user-friendly than the /usr/bin/git program we're talking about and more developer-friendly than the existing Git GUI apps.


There is gitless [0][1], haven't used it though.

[0] http://gitless.com/

[1] https://news.ycombinator.com/item?id=12621837


Another parallel idea to consider is procedure and policy.

The cleanup after the foot-shot was pretty gory, but I'm not sure with disciplined git-flow that its possible to replicate that foot-shot, and if you can't shoot yourself in the foot then the goriness of the cleanup is a non-issue.

A third level up is you've got two devs carefully not coordinating and working together on code that closely related to each other, aside from the git-splosion that is not generally a recipe for success. Like, refactor the obviously independent multiple concepts (obviously independent because non-cooperating devs are working separately in parallel on them) in those files into multiple files and then there aren't merges to fight.


Git is just getting in the way of the classic old philosophical discussion of there should only be one correct way to think vs a more multicultural the more ways there are to think and adapt and overcome, the better.

The technical discussion of git innards and workflow just get in the way of the core philosophical discussion, and the innards would fall into place if the philosophical question were answered (as if philosophical questions are ever truly answered, LOL)

A classic example of the philosophical argument is Perl vs Python. There's infinite ways to express yourself vs there is only one way that works.

Now its very politically incorrect to say you like Perl in 2016 and I suspect Git is going to be in big trouble soon enough for the same philosophical reason.

I would be interested in hearing what is the philosophical equivalent of Python for source code revision control. I mean... people can't be seriously suggesting going back to RCS/CVS/SVN/Hg, can they? Or I guess they can? Maybe some closed source silo I'm unfamiliar with? Maybe no revision control at all and go back to tgz files with dates embedded in the filename?

A classic problem with limiting opportunity, limiting the range of thoughts and ideas, is finding a way to do it without impacting productivity or quality. In that way "use something simpler and enormously less capable" would be considered cheating the philosophical problem.


You're trying to find too much reasoning.

Nowadays many people hate Python as well and I'm speaking as an ex-Python developer that now hates Python.

And do you know what's ironic about Python? Because the language excluded certain features, it led to multiple non-orthogonal features that solve very specific use-cases and ways of doing things in its ecosystem, with hacks upon hacks meant to implement ideas from other ecosystems, most of them slightly broken.

The problem with language simplicity is that it is misunderstood. Simplicity in the language implementation is often correlated with complexity for the developer, being what happens when you limit what can be expressed.


WRT the reasoning I have a gut level feeling its not technically solvable so I'm going for an abstract proof that its equivalent to a known existing unsolvable philosophical debate, therefore any mention of technical problems or feature list of git is bike shedding because its provably unsolvable at a higher level.

So if the answer to the philosophical question is two opposing camps, and we have the source code revision control solution for one pole, then all we need to do is find the opposite pole and then ask the two never to merge (oh the extended pun..) groups, to realize the other side exists and please be civil with each other, etc. That could save a lot of wasted effort.

Clearly the equivalent of methodically twisting Python into Perl one little bit at a time isn't going to make anyone on any side happy, or vice versa.


I do think the GP has a point though.

Very roughly, by cliche:

Perl, Git: "The street finds its own uses."

Python: "How can you have any pudding if you don't eat your meat?"

(With apologies to those who don't read Gibson or haven't heard Pink Floyd.)


> I mean... people can't be seriously suggesting going back to RCS/CVS/SVN/Hg, can they?

...What's wrong with Mercurial? Or is there another Hg source control?

Personally, I started using Mercurial when I can and Git when I have to; I can't imagine someone considering Mercurial as backward even if they prefer Git.


Used both, I consider Hg a little kids Git.. which is fine for many workflows - but I do prefer Git.


You would...

I've yet find a workflow I prefer git to hg for.


A) I've used both hg & git extensively and muuuuch prefer hg.

B) I find it crazy that people are so reductionist when it comes to source control. There are tons of systems out there beyond the rcs/cvs/svn tree that have explored the space immensely. If all you are comparing to is those then clearly you'd think git was the only choice.


> I consider this to be a failure of Git. The set of different options (normal merge, rebase, filter-branch, etc) is complex and not cleanly orthogonal which makes for a very messy "mental model".

Allow me to disagree... Does the CVS or SVN way of doing rebase and filter-branch work with your "mental model"? Well, there is your answer.

> I really wish some tool other than Git had "won" the version-control race; I honestly believe Git to be the worst of the contenders in the most recent generation of version control systems

Not to sound harsh, but why would I want to use an inferior tool just because you don't get git?


I disagree with the person you're replying too, but CVS and SVN are not the current generation of version control systems (and CVS is perhaps not even the previous generation), so it's unclear why you bring them up.

If you want to compare, compare to darcs, mercurial, bazaar, monotone, ... plenty of choice without going back to svn (let alone cvs).


I agree with you on all accounts ;)

The point I was half-jokingly making was that people don't understand what an extremely powerful tool they have been given and instead complain about shallow stuff that for them is simple to grasp

edit: the way mg, bzr and the rest do the advanced stuff [when available] is basically a carbon copy of git.


> the way mg, bzr and the rest do the advanced stuff [when available] is basically a carbon copy of git.

I'm a pretty big fan of git, but this is an unproductive stance. I don't know much about bzr, but my impression is that darcs pioneered the ideas that became git-rebase, and elsewhere in this thread people are having substantive discussions about hg (Mercurial; I assume that's what you meant by mg) bookmarks and branches vs git branches: https://news.ycombinator.com/item?id=13229582


Luckily there are a bunch of tools that wrap the git primitives to present more friendly UIs. I spend most of my source control time in SourceTree at this point.


I'll just leave this here http://xkcd.com/1597/


This could be written about any tool. Especially bad with Perforce and Subversion - which do not really support branching. They say they do, but then the merging is a huge pain.


I would be interested to know that in your view, which VCS would have solved this problem better assuming they would have done the same thing. It seems to me that they chose the worst possible option each time they hit a problem.


> I consider this to be a failure of Git.

Surely it's a failure of source code as streams of text.

If each function was versioned with dependencies then this problem would be eliminated.

Replaced by other problems.


The simplest solution is:

    # try the merge; you'll get conflicts on those files
    git merge topic

    # discard the versions from the topic branch;
    # you know you already merged those changes in
    # the funny "git checkout commit", so any differences
    # are due to changes on master.
    git checkout --ours new-file-{1,18}

    # now you are free to fix up any real conflicts
    # and resolve the merge
    git commit
This has the advantage of representing the true history. You had two lines of development (the original topic, and the "squashed" history created for deployment), and the merge shows them coming together and choosing the deployment-side content.


Yeah; I know it sounds evil and this shouldn't be done but why not just make a new branch from the borked master, force push master back to sanity (pre-massive merge), have everyone else cherry pick and push their commits and then just rebase new branch and fix the conflicts in there as you should and merge normally?


Because it's selfish. Your time isn't more valuable than that of the developers already working against that branch (and as stated in the article, there were hundreds). It's your job to merge your changes cleanly.


Agree -- I was talking recovery though..

And I missed the hundreds of devs bit apparently. My bad.

....Hundreds of devs though? On the same repo? I started sweating thinking about that one ...


Pretty sure Facebook has one massive repo for all their projects for thousands of devs.


hmm this is certainly not simple to me! what's new-file-{1,18} for ?


Its the names of files, I believe.

so git checkout --ours {name_of_file1,name_of_file2,and_so_on}

The {} is standard bash expansion, you can try it like mkdir hello{one,two,three}, so parent poster went with articles use of new-file as the name of the files that were affected.


Participate in Atlassian Research

My name is Angela and I do research for Atlassian. I’m kicking off a round of discussions with people who use Git tools. Ideally, I’d like to talk to people that sit on a team of 3 or more. If this is you, I would love to talk to you about your experience with <using> Git tools, or just some of the pain points that are keeping you up at night when doing your jobs.

We’ll just need 30 mins of your time, and as a token of my thanks to those that participate, I’d like to offer a US$50 Amazon gift voucher. 



If you’re interested, just shoot me an email with your availability over the next few weeks and we can set up a time to chat for 30 minutes. Please also include your timezone so we can schedule a suitable time (as I’m located in San Francisco). Hope to talk to you soon!

Cheers, 
 Angela Guo aguo@atlassian.com


While an unorthodox merge strategy was used, this is what happens when you hole up in a topic branch for a long time. I bet this would've been easier had they merged smaller commits or PR's to master constantly. If one is afraid of deploying unfinished features, don't make them functional until they are ready. Tie them together once finished. Or did I miss something here?


If master is currently in production, a good way to do frequent deploys for code that's "not quite ready yet" is to use a feature toggle. This allows you to do partial feature deploys, but block any code paths from using it. Plus when it's time to start directing traffic towards that code path you can turn it off, if you find a bug you missed during testing.


This only works for isolated features. As soon as it's some kind of refactoring or a feature that affects a lot of existing code, it's not viable anymore.


If you really want to, you can often find ways to do continuous integration even in such cases.

For the most brute force example, you could copy the whole program and make a runtime switch for deciding which to run.


> you could copy the whole program and make a runtime switch for deciding which to run

Wouldn't that make merges even worse, not better?


Then you should be rebasing the branch often from master so that this is less painful when the flip happens. If there is an army of devs working on the same repo then it should be communicated out really obviously. If a dev is planning on submitting code that'll be merged after the massive branch gets into master then they can just rebase from the feature branch and any conflicts should be fixed in those branches...


I'm also very surprised - I thought it was standard to merge master (target upstream branch) into the topic branch frequently (daily seems reasonable). I know some people do not appreciate "merged <branch> into <other branch>" commits in their history, but that is a small price to pay IMO.


Or rebase. Don't hide merge fixes in merge commits, keep your changes relevant when read in context of current master.

Unless you're working on a topic branch together with another dev, but I find that rarely happens in practice.


Yes I just completed a large refactoring in a feature branch which affected lots of files and the only way to stay sane throughout the process was to constantly rebase my work on top of master (within my feature branch).

Once my refactoring was complete I squashed it into a single commit to prepare to merge (or rebase) into master. I don't really think it's useful to keep the history of how I implemented that refactoring. Squashing it into a single commit is far easier to revert then if I merged multiple commits into master.


Even so, if you can both adhere to "pull before you push, and use force-with-lease instead of force" the rebase workflow is possible too. However it's more of an abuse of Git rather than a use of it.


I find this to be the optimal wow.


Well, I routinely `git rebase upstream/master` to keep my git history clean. It has the same effect as a merge from master, but keeps everything tidy.


Whenever I do this I inevitably end up with merge conflicts in stuff I absolutely didn't touch. I don't know if it's because it's a monorepo with lots of people committing, or what, but it basically never works cleanly. It's very frustrating, particularly because merging master works without a hitch. So I just do that instead.


And also will make a huge mess if someone else has checked out your code already.


Is it really such a huge mess? They'll get a conflict if they try to pull, and then they simply have to rebase their changes onto your new rebased branch, right?


I don't put that much faith in everybody else to figure it out themselves.


It's part of the git workflows.

Everybody merge OR every rebase. The organization gotta decide on the workflow.


It's a very small price to pay! I can't believe how many people hate this practice because of the merge commits. It's extremely useful and not doing it makes git usage that much worse.


while I don't use git a lot (I use p4 for work) I think that this is often a lot of pain - the trick is not to get to far from main, continually touching up your working branch, from main as you work so that when the time comes the final merge is essentially already done (again I'm not a heavy duty git user, maybe this is hard)

I'm also not a big fan of the everyone has their own branch world view, I worked on a big project (a chip design project) where we essentially lost track of what we were building because of this. I'm much more keen on requiring people to stay close to main and eating their own dogfood


> The next day he wanted to go ahead and merge the front-end changes, but he found himself in “a bit of a pickle”. The merge didn't go forward cleanly, perhaps because of other changes that had been made to master in the meantime. And trying to rebase the branch onto the new master was a complete failure. Many of those 406 commits included various edits to the 18 back-end files that no longer made sense now that the finished versions of those files were in the master branch he was trying to rebase onto.

Can one not instead merge master into the feature branch?


ding ding ding. The original dev should have made a new feature branch off master. Then they should have dropped the 18 files in, or if the 18 files had a commit history, cherry picked those commits into the new feature branch.

The next day, when he was ready for the rest of the changes, merge those into the feature branch.

These devs were trying to be clever, and got bit by it :)

KISS


I also dont see why he didnt just git revert to the parent commit, the commit before the other guy changed 18 files seemingly randomly.

Then rebased featurebranch on that, and thats it. So sure the history would contain the mistake, so what.


Agreed. He has the following to say:

>The easiest strategy in a case like this is usually to back in time: If the problem was caused by the unorthodox checkout-add-commit, then reset master to the point before that happened and try doing it a different way. That strategy wasn't available because X had already published the master with his back-end files, and a hundred other programmers had copies of them.

So resetting the commit isn't an option, but what about reverting it? It seems like the obvious alternative to me, I'm not sure why he didn't mention it.


I'm with ya; I don't see the evil in just force pushing head back to where it was before though -- do you really need 800 useless commits in the history when you're going to just be adding the code back in sanely later anyway?

Reverting them seems a bit more risky to be than just rebuilding it properly -- you'll need to write some scripts to make sure you got all the commits and then the whole dev team is going to end up pushing commits which unrevert your reverts just so you can then rebase the feature branch..

Sometimes push -f is the best option ;)


But not in this case, a force push is not needed.


Depends on the situ really; if it got people working again faster (yay, saved money!) or if production was unavailable and it helped get things back up faster (yay! more business!) then it'd be the right move in my book -- otherwise, yeah, it's a horribly hacky thing to do and shouldn't be done unless you really have to...

I'm not saying I would have done one or the other if I was the OP -- just that I'd have no issues doing it should the situation call for it and it saved time/money at the cost of pure feelings :}


Why not just revert the offending commit? It would be a valid blip in history as mistake made and corrected.


Admittedly I did not read the whole post, but I agree that this sounds like the right answer. He made a mistake on master by merging a partial topic branch. The answer is to revert the mistake (git revert abc123) and then either rebase+merge or just merge the whole topic branch. I also don't agree with his conclusion that his way left the history in a better state - reverting the commit actually represents the mistake that was made.


> It occurred to me while I was writing this that it would probably have worked to make one commit on master to remove the back-end files again, and then rebase the entire topic branch onto that commit. But I didn't think of it at the time. And it's not as good as what I did do, which left the history as clean as was possible at that point.


He could have made the reversion commit on a local branch (branched from master), then rebased interactively to reorder / squash commits.

It presupposes that the commits are clean and well-separated though.


I read that, but it doesn't read like he is aware of the option to revert a commit directly. More that he would remove them manually and commit that.

Little difference, but if "as clean as possible" was the goal, I think that would be the cleanest.

[edit] added the little words that my keyboard seems to filter out.


Maybe you're right; I agree that's probably what I would have done.

Apart from anything else it allows a clean-slate "this is how it _should_ have been handled in the first place", rather than a separate git-fu move that was necessary or even relevant only because of the original indiscretion.


Yeah, that was my instinct as well. The problem was that the first user did that kooky way of bringing the 18 files over with `git checkout`. That's a terrible idea, both because it lead to the catastrophe here, but also because you lose all history of those changes, they appear brand new in that commit (which makes `git bisect` and `git blame` much less useful).

No, just `git revert` that terrible commit, then do a regular sensible merge/rebase. Solves all problems in, like, two commands.


Another thing is not to fear the non-fast-forward commit. Sure you'll need to tell the other developers (apparently "a hundred" in this case, but more often just a handful of people in the same room) that they have to reset back a few commits before pulling, but that's not the end of the world.


Not only that but the failure case when a developer pulls before receiving the memo isn't that bad. Git tells you that the branch was force updated and working out what to do from there is usually pretty easy.


I don't understand the problem here, why didn't he just do a merge and resolve the 18 conflicts by using the version of the file from master?

And the problem wasn't in checkout-add-commit, that is a trivial issue, the WTF here is producing 406 new commits in a branch without ever thinking of merging master back into it or rebasing on master, to avoid having a giant merge later.


400 commits not cleanly applying? Not a big deal. I routinely merge 1000-2000 commits and rebase 30 active branches to that also. The solution is git rerere. It stores all the resolved merge resolutions forever, and cp or rb apply cleanly then, without any trouble. Eg https://medium.com/@porteneuve/fix-conflicts-only-once-with-...


> X decided to merge and deploy just the back-end changes, and then, once that was done and appeared successful, to merge the remaining front-end changes.

> "What should X have done in the first place to avoid the pickle?"

0. (Of course, not develop a 406 patch changeset and then have to pick it apart. Make smaller pushes, frequently.)

1. Create a topic branch right there at the tip where the 406 changes are locally committed.

2. Then use git's interactive rebase to rewrite this branch such that just the back-end commits are picked first, followed by the front end.

3. Make a back-end topic branch from the last back-end commit and test that. If it's cool, master can be rebased to that and pushed to origin/master upstream.

4. Test remaining front-end changes, rebase master to them, push.

Also:

3. a) If back-end changes need fixing, fix them on the back-end-topic branch. Then rebase the original topic to the back end topic to pick up these changes "under" it. (I.e. replay the front end over the new back end, and install as new front end).


2. Then use git's interactive rebase to rewrite this branch such that just the back-end commits are picked first, followed by the front end.

Probably how I'd do it, but I know it's not a popular workflow. The only project that I've seen publicly talk about expecting a changeset to rewritten from "history of fiddling around" into "series of incremental improvements to a codebase" is the linux kernel.

It's more work for every developer, and digs into more advance git operations, but it really helps keep the tree clean. Most projects don't have a velocity that requires such well-groomed changes, but this is an example of how failing that can make the history ugly and changes more difficult to work with.

For those interested why/how you'd rewrite a patch series: https://www.kernel.org/doc/html/latest/development-process/5... The 406 changes probably compress to less than 20.


My ex-employer (Meraki) was VERY insistent on people cleaning up their histories. It was very good not just for keeping a clean tree, but also for code quality - once you have code changes in neatly-separated commits with discrete chunks of functionality, any leftover test code or unrelated changes pop out immediately to both the original coder and the reviewer.

By the way, this also helps with the issue this developer faced because part of the problem seems to have been that the backend and frontend changes were mixed in the same commits (hence the ugly hack for committing). Doing this kind of history-cleaning as you go makes it much easier to manipulate the order of committing changes to master.


My experience exactly: Gerrit reviews, changes separated out: no "patch bombs", etc.


What to do with commits that touch both back-end and front-end code in step 2.? This amounts to manual run of git-filter-branch, which is a feature the author was unaware of. Without that tool this task is daunting. If the author is willing to manually rewrite potentially hundreds of commits that touch back-end code, then there are a lot of straightforward solutions. We're looking for something more elegant.


I have dealt with that sort of thing, and roughly as follows. Having somehow identified these commits, I would mark them "edit" in the interactive rebase workflow. Then when such a commit is applied and the interactive rebase stops on it, I would use "git reset --patch HEAD^" to undo some of its changes in the index (while keeping the working tree the same). Then do a "git commit --amend" to overwrite the commit with pruned one. At that point, all the rejected changes appear as local modifications. Another "git commit -a" creates a new commit out of them; so the original became two. If the commit needs to become three you can just "git commit --patch" twice. Or just reset the whole original commit to HEAD^ and do several "git commit --patch" ops. (Use -c <original sha> to re-use the commit message as a basis for the new commit messages).

Typically I will do a rebase pass through the commit stack to do nothing but these splits in that pass. Then when things are separated out, do another interactive rebase to do the re-ordering and squashing.


Reading this was like watching a traffic accident in slow motion. I could hear myself yelling at the author as if he were a student driver:

"Use filter-branch!!! Use filter-branch!!! NOOOOOOOO NOT merge union with manual deletes!!!"

But... he went and did it anyways. Honestly, reading back through commit logs, you always find the part where the driver runs off the road, plows through a clearly marked gate, runs on a train track for a mile or two, then merges back onto the main street, carrying part of a mailbox and a deer carcass.

You can fault git if you want, but it seems like some of these cases just arrive naturally no matter what cvs is used. It would be great to have a "git education" repo that contains situations just like these to work through... sort of a "drivers ed" for managing a repo.


The Reddit discussion of this, though brief, was interesting and to the point.

https://www.reddit.com/r/git/comments/5i3mpz/another_git_cat...


This is the author of the blog post we're all discussing


Two fun things about git: It is deterministic, and it doesn't delete anything (readily).

This means you can't really have a catastrophe.

Just git reflog your way out.


This is the right answer. I think for whatever reason, software programmers are afraid to learn how git works. They'll spend countless hours learning how their other tools work - but not the one that keeps and protects changes to their work. It's maddening. The worst part is - it's really not that difficult to go into the .git directory and look at how the files are structured and learn it. Once you do, a lot of using git becomes second nature.


This is the video that I keep recommending to new people who want to know how it works on a technical level. I've always had amazing/euphoric feedback to that recommendation.

https://www.youtube.com/watch?v=ZDR433b0HJY


Cool. For me it is this video. https://www.youtube.com/watch?v=MYP56QJpDr4

I notice they are from the same publisher.


That's a bit of an overstatement. You can't have a catastrophe with stuff already committed (but it's very easy to delete in-progress work).


If you are committing so rarely that you can have a CATASTROPHE with your in-progress work, maybe you should try breaking the work in smaller parts and committing more often


Unfortunately git's staging area concept encourages you to do exactly the opposite of this.


I almost only use it to cherry-pick and then immediately commit?


Right, all the good use cases involve not actually using it - if you're immediately committing then it would be better if it just became a commit.


I meant: I cherrypick from my changes and commit bugfixes and drive-by update of comments separately.

My bad. I shouldn't have used the word cherrypick there.


Only if you never commit and then do checkout -f / reset --hard. Why would you be doing that?

Otherwise it's still in reflog. Anything you ever commited is just there.


Unreachable references will be pruned after 30 days (by default) if/when git-gc gets run (which will occasionally happen automatically by default). Not saying it's a problem, just note that they're not actually there forever.


For reflog it's 90 days. (So anything referenced by the reflog will be retained for that long.)


AFAICT, for unreachable entries (e.g. as generated by `git commit --amend`) the default is 30 days as defined by `gc.reflogExpireUnreachable`.


No, because those commits are still reachable from the reflog.


well unless you haven't pushed in a while and you rm -rf your home directory. Of course that has never happened to me.

On a completely unrelated topic, extundelete is a great piece of software.


I've had occasional data-loss from git.

The problem is when you realise someone badly rebased, or forced a push, a while ago and now you have to try to find where the stuff was lost.


the reflog is GCed, IIRC. So if you wait long enough, and commit long enough, that history will be gone.


The reflog GC is time-based -- it defaults to retaining entries 90 days.


People in the discussion here saying, somewhat blindly it appears to me, that using mercurial would have avoided this mess. I'm a huge mercurial fan and have dealt with some tricky situations similar to this (but never this situation exactly) and I'm not so sure how mercurial would have handled it. The best I can say is that I've never known even the most adventurous mercurial user to use checkout (actually revert in mercurial) on individual files in that way. Is that something git people do more often?

Aside from that, it'd be fun to see how mecurial handles this, but I'm not sure from reading the original post if I could exactly reproduce it.

Mercurial would let you do the checkout (revert) trick that started it all. I can imagine it causing merge conflicts as described. Mercurial does let you specify how to resolve merge conflicts for the whole merge, or you can tell it not to resolve conflicts at all and then you can run hg resolve on a file-per-file (or glob of files) basis and tell it to pick default (equivalent of master) for the files you want. I didn't quite follow the git way of doing this he described with .gitattributes, but using hg resolve sounds easier (but neither are things a non-expert user of either tool would know).

In the end some other solutions were proposed. I would not recommend using checkout (revert in mercurial) either. I don't know of a filter-branch equivalent in mercurial, but that sounds like a cool way to deal with this. In mercurial I probably would have reached for graft (equivalent of git's cherry pick), which isn't very different from git.


The problem with Git is that everyone is trying to use Git as central repository rather than distributed as if it were SVN, personally i blame GitHub for promoting among the new developers the wrong tool for the job causing all this unnecessary drama. Git is the best version control system if and only if the project has a good leader checking everyone merges before commiting and letting everyone knows who is working on what and what parts will affect.


> But I couldn't think of anything, so I asked Rik Signes. Rik immediately said that X should have used git-filter-branch to separate the 406 commits into two branches, branch A with just the changes to the 18 back-end files and branch B with just the changes to the other files. (The two branches together would have had more than 406 commits, since a commit that changed both back-end and front-end files would be represented in both branches.) Then he would have had no trouble landing branch A on master and, after it was deployed, landing branch B.

Well. Okay. That's a technical solution and it'd work, it's probably no less time consuming than going and fixing the code in a new branch and merging cleanly (every time I end up needing to filter branch stuff I have to RTFM on it, and it takes ages) -- this problem is NOT a technical one though; it's a process one.

Why are you landing 400 commits in one go? Half of those were on files which then start causing merge conflicts for your team and wasted a huge amount of your time?

Use feature flags, fix your conflicts in branch, don't merge anything into master unless it's using the 'merge' button on github/gitlab/gogs/whatever. And really think/discuss/roundtable about how you're introducing features because it sounds like this is running away from you a bit here..

It doesn't need to be this complex, and these kinds of messes can't really be put on the tools -- although git certainly makes it easy to set a lot of things on fire..


> this problem is NOT a technical one though; it's a process one.

No, the cause of the problem was a process one, but now it's become a technical problem.

That isn't really helpful when someone has already created a git nightmare like what's described here.


Hmm, I wasn't trying to come off that way, although I wasn't trying to be helpful with the specifics of their git nightmare since happily they sorted it out anyway: I was only trying to suggest that the only way you "clean up" from a nightmare such as that is to start looking at what seems to be an obviously fragile way of dealing with their code since that's the real problem..

Unfucking the repo, while mildly technically interesting should really be the smaller of the two lessons learned from such a mess... No?


It's a little tone deaf to offer "should've done it this way!" as advice to a person already in a bad way.

Unless OP is asking for better processes to avoid this in the future, I think it's not a good idea to offer preventative advice as a first response. YMMV, but this sort of comment is often unwelcome (especially if you phrase the questions in an interrogative way... example - "Why are you landing 400 commits in one go? Half of those were on files which then start causing merge conflicts for your team and wasted a huge amount of your time?" comes across like you kind of enjoyed typing it and doesn't really occur as a kind stranger leaning in to help)

If your post was directed more to the readers of the thread instead of at OP, I think it would've come across less admonishing. Of course, I'm not judging you for your comment to be clear, just providing feedback on why your comment might rub people the wrong way (apologies for length).


Hm, well -- OP; I wasn't trying to have a go if you thought that, and the question wasn't rhetorical -- if you're up for sharing how this sort of thing comes to pass I'd love to have your story/discussion..

As for coming across like I enjoyed typing it, I'm pretty much at a loss on how to respond to that one besides: NEIN!


I like Perforce. It may not be perfect. But it's idiot proof.

"Days since gitastrophe" is a common phrase. There is no Perforce equivalent. You can't blow your leg off. There aren't thousands of "Perforce made easy" blog posts because it's actually easy. There are no "fixing my p4 repo" tales because it never breaks.

Thanks Perforce.


Sorry for ruining the party (seriously).

I like Perforce. It may not be perfect. But it's idiot proof.

That's because it is almost impossible to do anything, good or bad with it. ;-)

"Days since gitastrophe" is a common phrase.

Never heard it before.

There is no Perforce equivalent. You can't blow your leg off.

The equivalent of a hand saw: you will have a harder time cutting off your leg with a hand saw than with a chain saw.

There aren't thousands of "Perforce made easy" blog posts because it's actually easy. There are no "fixing my p4 repo" tales because it never breaks.

Or because nobody uses it and those who do keeps the knowledge for themselves;-)

Seriously: I used perforce for 18 months and I tried to understand why people loved it but no luck. Git isn't exactly perfect but compared to "checking out" a file before even editing it or not being able to commit without access to the server (IIRC) git is perfect. Oh, and not being able to check out a file that someone else has checked out, so if a dev goes on holiday without "checking in" the files first then you have to hack a bit to get ready for work again.

The best argument however for git is that even the perforce guys have integrated with it.

Thanks Perforce.


Bonus points when you get conflicts or try to use their terrible Swarm review system which relies on shelves.

Also branching which hides previous history before branch out for no reason. No real branch merge functionality. Automatic merge being inferior to what Git has.

And more...


Game development is probably >95% Perforce. Has been for over a decade. Will be for the next decade.


In my personal experience, life (or, er, people) tend(s) to find a way to mess up any sufficiently complex, shared resource. In other words, I'm happy that it never breaks for you :3

Re. Perforce made easy, I'd suggest hitting up stack overflow or the p4 forums -- the community is much smaller than git's (i.e. practically the entire open source community vs. game developers), so there are fewer people available to write such blog posts (but enough to complain, evidently). I have a number of company internal-blog posts that explain Perforce to fellow newbs, but they can't be made public due to contracts and etc.


I worked at a company that used Perforce for a few years. We never used the p4 command-line tool because over the years people had written wrappers around all of the sub-commands to make things easier.

The wrapper command-line tool was about 10,000 lines of Perl, and called 'p5'. Good times.


In a corporate environment, I think I prefer a simpler workflow with a plain old centralized VCS and without using any branches at all. As code gets written, each commit goes on the trunk behind a feature flag (which you need anyway). That way each commit can benefit from continuous builds and testing, and other people can notice problems early. Branches would only be used for releases.

I've worked like that for years on some pretty big projects, and it never caused complicated problems like in the OP. The only caveat is that you need a strong safety net against breaking the trunk (lots of tests, mandatory code review, etc.)


Why wouldn't they reintegrate the mainline development branch with the new branch if it was so long lived? And/Or have the new code behind a feature flag so you could potentially have it deployed but disabled? So many ways this could have been avoided with some basic forward thinking...


So if I get this then he just added the files to master, then tried to work on the topic branch?

That really seems weird. I don't think it's git's fault. Also he could have done git merge master --accept-theirs if he really wanted some kind of history but I guess it would be worthless.


I humbly (re)submit this for your consideration https://git-man-page-generator.lokaltog.net/


IMHO it's odd approach to the problem. I'd rather ask the author (who knows best in this case) to split this into two separate parts that can be nicely merged.


Worth noting that cherry-pick takes commit ID ranges in the form xxxxxx..yyyyyy, it may have simplified the driver.

Thanks for the tip re: --keep-redundant-commits!


The more I read stuff like this, the more I wonder how many problems would just go away for so many people if they used Mercurial instead.


How would this problem go away with Mercurial?


It's much harder to shoot yourself in the foot with Mercurial.

Which is why so many developers and companies prefer it:

https://news.ycombinator.com/item?id=10228399

https://news.ycombinator.com/item?id=9465084


Could you explain how this specific situation is better with Mercurial?

In nearly every thread about Git I hear vague claims that other systems are much easier, but almost never any specific reason to think so.


Seconded.

And sorry for the brief comment but that would be very interesting.


You get new problems with Mercurial, like almost every tool out there barfing when it encounters multiple heads in a repo. Don't have multiple heads in a repo, you say? Not a viable option if you want to use bookmarks as lightweight branches.


Why is Mercurial better than git? It's my understanding that you can still rewrite history with Mercurial.


Mercurial keeps track of whether you have pushed a commit or not, and if you have pushed it, it won't let you rewrite it without doing a manual override. This feature is called, phases.


Most of the Mercurial use cases fit better with the metal model that people coming from SVN and similar, are used to.

It is only when one wants to do advance tricks that he/she needs to enter "Git-like" land.


>and published the changes to `master`

-_-


I hate this shit. Rebasing always causes conflicts and dealing with them is such a giant pain. I get that in this case the designer really brought the pain on themselves but I wish using git didn't require this sort of surgery periodically, which in my experience it does.


As opposed to what? Conflicts are an inevitable consequence of multiple people working on the same code at the same time, and git is as good or better than the alternatives for handling that.


In my experience- as opposed to merging, which just works, but nerds get all persnickety about because they hate merge commits. I understand the concern I suppose, but working with `git rebase` just seems so unnecessarily painful. It's really awful.


rebase uses the same merge logic as merge. What some people don't get is that if there were three commits pulled down form a remote, and you had three commits, and you had conflicts (different ones) on each of those commits, rebase makes you resolve them one by one so it can literally "replay" those commits on top of the tip of the branch you are rebasing onto.

That's why rebase is so cool. It lets you solve each conflict during the merge of each commit separately. Then you can do it interactively and squash it all into one commit.

Learn the tools, don't complain.


> It lets you solve each conflict during the merge of each commit separately.

Which is what they're complaining about, compared to just merging and doing it once. Sometimes resolving smaller chunks at a time can be easier, but almost as often it involves a lot of extra effort creating intermediate states for little benefit, and often they don't even make sense.

I much prefer to keep an accurate history, rather than invent one that I think looks neater (the same goes for squashing commits). And honestly I think a lot of what people object to can be mitigated by learning how to use git log properly.


Look, if I make 15 commits, to my local repo, some with major functionality additions, some with bug fixes, and some with typo fixes - I shouldn't have to share that whole history with the remote.

Nor would anyone want me to. The re-writing history part only really applies to your local repo. If you re-write history on the remote, you are going to piss some people off.


>I shouldn't have to share that whole history with the remote.

You don't have to share it, but why not?

It might be useful in some cases, and people don't have to look at it the rest of the time. They can just look at merge commits, which are almost always coherent changes. Instead of relying on people to invent continuity, which might not even make sense.

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: