
On Git's lack of respect for immutability and the Best Practices for a DVCS - sarvesh
http://www.ericsink.com/entries/git_immutability.html
======
Locke
Here's, I think, the big philosphic difference: Some people only want _good_
code in the repo. Some people want _every_ change in the repo.

I'm in the second camp. I'll gladly commit broken code and then fix it later.
I don't _push_ broken code, of course, but sometimes it's useful to have a
snapshot of the broken code before you start fixing it.

For example, let's say I'm accepting a patch from someone or I found a useful
snipit of code on the internet. I integrate the code and it breaks a unit test
in some non-obvious way. _I_ like to commit the code as it was from the
original source before I start trying to fix it. That way I can quickly revert
if my fix attempt is way off base.

Further, I have in my history who / where the code came from _and_ how it was
fixed (in a later commit, complete with an easy to get diff).

I'm not saying Eric's best practice is _wrong_. Some people prefer to only
commit good code, some prefer to commit early and often. I don't think either
practice is inherently better than the other.

Edit: I realize I didn't really respond to the immutability issue... I don't
really have an opinion on that.

~~~
gecko
I kind of think you did respond, actually. Once you're committed to allowing
broken code into the repository, as long as you don't push it, then you're _de
facto_ in the immutability camp, since rebasing not only doesn't offer you
anything, but destroys what you're trying to deliver.

------
jerf
Git is actually as immutable as the next DVCS. Check out "git reflog"; it
records the state of the repository for most, if not all, cases where it
changes something around. Pragmatically, this allows you to recover from a
botched rebase -i or something.

Where git differs is not in its mutablity, but in how it allows you to move
around the head of a branch in a more free fashion than most other DVCSs. (I
don't mean "more free" as a value judgment here, just an observation that it
is a superset of typical capabilities.)

It should be pointed out this only address about a half of the objections you
can raise for this re-writing; it is not true that you lose things because of
the re-writing, because you can get them back from the reflog. But it is of
course still true that history rewriting in the sense that most people mean it
is true.

Basically, from a _technical_ point of view this is a non-issue, but from a
_human_ point of view it still can be.

Personally, I'm willing to concede this is a powertool and perhaps isn't for
everyone, and by the definition given it's not a "best practice", but I don't
think I'm willing to accept that definition in general. I'd prefer something
more relative to the local user set, not go for a global definition. Perhaps
some teams would do nothing but shoot themselves, but in my experience with my
team and the local related teams, history re-writing is a net gain in terms of
how it impacts how we work. And it's hard to convince me of a point my
personal experience doesn't correspond to.

~~~
gecko
I don't mean this personally, but I'm getting a bit tired of git users
pointing to the reflog as protection against data loss. The reflog _does_
serve as a way to access no-longer-extent commits--until those reflog entries
expire, at which point they are garbage collected and no longer exist
anywhere, period. By default, this expiration time is two weeks.

Yes, the reflog can help recover from certain classes of errors, but its band-
aid over git's emphasis on destructive history rewriting is incomplete, and
can, and does, crack in real-world cases.

The correct solution, I think--which none of the distributed version systems
implement--is to provide a way to group related changesets into an übercommit.
This übercommit may still be sliced and diced back down to the buggy micro-
commits that detail its dirty development, but, by default, will be presented
as a single monolithic changeset to the user. No history is lost, and none is
rewritten; you're simply altering its presentation.

~~~
jerf
"I don't mean this personally, but I'm getting a bit tired of git users
pointing to the reflog as protection against data loss.... By default, this
expiration time is two weeks."

So, set it to ten years, and lose nothing of interest. (It's not a Y2K-like
problem because the value of a commit decays over time. Nobody will need to go
back in time ten years to recover a commit that wasn't in the main line,
because nobody will even know enough to ask a question that could be an answer
to.) Does that answer your objection?

"This übercommit may still be sliced and diced back down to the buggy micro-
commits that detail its dirty development, but, by default, will be presented
as a single monolithic changeset to the user. No history is lost, and none is
rewritten; you're simply altering its presentation."

You could do that with git now. Branch for every commit, and squash it back
down onto the main branch. Tag the final HEAD of the branch and include the
tag in your squash commit record, and you can use it to recover all the
component commits. If it bothers you that git still won't understand what that
means, you're just a couple of shell scripts away from having the
functionality fairly well supported.

(git, thanks to its heritage, shell scripts fairly well, and of course
anything else that can do "shell-script-like" things (Python, perl, etc.) can
handle it too. We have several simple scripts that sit on top of git and help
us impose policy on our branch management. So, I'm not terribly sympathetic to
criticisms of git that are one config change or a quick shell script away from
being fixed. Although, probably not for the reason you think; it's not because
I think those who don't customize their VCS is automatically a lazy developer,
it's because everybody has their own unique needs.)

~~~
gecko
> So, set it to ten years, and lose nothing of interest... Does that answer
> your objection?

No. It's still lossy. To turn the question around: if Subversion expired
changesets after a given length of time, I think you would complain. Likewise,
if git expired mainline commits after a certain length of time, you would also
complain (I sure hope). git is lossy, and I happen to think that's entirely
the wrong thing for a VCS to be. Making it "less" lossy is kind of like trying
to keep your teenage daughter "less" pregnant.

> You could do that with git now. [Lengthy explanation follows]

I can also do that right now in Mercurial using the group extension
([http://www.selenic.com/mercurial/wiki/index.cgi/GroupExtensi...](http://www.selenic.com/mercurial/wiki/index.cgi/GroupExtension)),
which is possible because Mercurial, being written in Python, is trivial to
extend. But that's not the same as being part of the Mercurial workflow, any
more than the bookmark extension in Mercurial prior to hg 1.1 counted as a
real answer to git topic branches, or Loom counts as bzr's answer to
Mercurial's mq now: it's not part of the VCS. Yes, I can make these all work
however I wish--but as much as people have now forgotten, it's also relatively
easy to make CVS and Subversion work in similar ways through tools such as
Quilt (<http://savannah.nongnu.org/projects/quilt>), which allow for rebasing,
offline commits, and many other features that you think of as git/hg/bzr-
specific. We've abandoned those tools for good reasons: they required
extensions, shell scripts, and odd workflows to work in distributed
environments. We will do the same to our existing VCSes unless they can
approach a more ideal workflow.

------
jrockway
What VCS doesn't support mutability? With svn, you can just "svnadmin dump",
edit the file in your text editor, and "svnadmin load". Same idea as "git
reset", just much more difficult to do. People do it every day, though.

Anyway, the great news is that if you have a central repository, you can deny
non-fast-forward commits. That then gives you the same "immutability" or
"security" as Subversion or anything else.

The moral of the story is, if you don't want to do something, don't do it.

(As an aside, I think this article is a classic case of, "SEE MY PRODUCT IS
STILL RELEVENT!!!11". Would he be saying this if he didn't have an inferior
product to sell?)

~~~
gecko
Ease-of-use does count for something. You're of course completely correct; I
can remove history by using "svnadmin dump", or editing the RCS files backing
CVS, or probably hacking Perforce's BDB file manually. But that's not part of
the default workflow, whereas "git rebase -i" seems to be an important part of
the daily git workflow. It's not a throwaway distinction, and has real
implications.

------
lnguyen
As has been mentioned in other comments, almost all VCS allow you to change
the history in some way, shape or form that probably wouldn't satisfy strict
"audit" requirements. There's usually a good reason for it, including one in
post's author's own product. [The usual reason is that it allows you to
correct something obscenely stupid.]

And usually it's only available as an "admin" feature (at least in the
commercial tools I've used). So it's not as if it's an everyday thing
developers would use or have access to.

The difference with Git is that you're the admin of your local repository and
can pretty much do what you will with it.

But there's a difference between your work repository located on your laptop,
workstation, etc. and the "reference" shared repository that's used by your
team+. So go and set all kind of hooks to prevent history rewrites (in the
form on non-fast-forward pushes and the like) on it. What happens in that case
is you can mess around all you like and clean up locally but once you've
decided to push up and share, your commit history isn't going to be changing.

+If there isn't a difference between the two repositories then either you're
prematurely optimizing your development and shouldn't worry about this issue
OR you've got larger separation of role issues to deal with beyond being able
to change your repository's history.

------
tptacek
Totally off-topic gripe: "And then I encrypted it with Schneier's latest
cipher" --- kill this meme. Schneier is many things, but the top
cryptanalyst/cryptographer in the world he is not.

~~~
yan
Schneier is already known to be a cryptographer and many people know this. If
you want to start a new meme with Wang Xiaoyun or someone else, go for it, but
I really doubt it'll catch on.

Plus, the reason people know him is because he's a pop security author first
and consciously strives to make a name for himself. Actual cryptographers are
too busy working with ciphers to build a reputation.

~~~
tptacek
That you are right about both of these points is not a refutation of my point.
If we can't eradicate the meme, let's go for ring vaccination.

------
biohacker42
You can't be everything to all people.

Git is aimed at people who know what they are dong.

Others will benefit from a more restricted feature set.

The bottom line for me is, that criticizing Git for not being immutable is
like criticizing Ferrari for being too damn fast.

~~~
gecko
It's more like criticizing a file system for lacking journaling, and then
having its users respond that it does so because it gives you extra power to
do things that a journaling file system would stop you from doing--or to
implement your own if you see fit.

~~~
jrockway
It's not really like that.

It's like the guy sells a commercial non-journaling filesystem, and he is
trying to argue that journaling is a horrible idea... while everyone switches
to his competitor's journaling filesystems.

It's sad when your product is no longer relevant, but that's no reason to make
up reasons to attack its competitors. That just makes you (and your product)
look stupid.

~~~
biohacker42
I wouldn't be so harsh as to say his product is not relevant, it's just
relevant to a different market.

What I've seen in corporate America, makes me think his product is far too
permissive for a lot of places, places which won't dare touch Git.

~~~
jrockway
_What I've seen in corporate America, makes me think his product is far too
permissive for a lot of places, places which won't dare touch Git._

These people are silly, though. I don't know of a single version control
system that doesn't let you change history in a straightforward manner.

At least in Git, the cryptographic commit IDs will change or be invalid when a
commit is changed out from under you. Most other VCSes are blissfully unaware
of history rewriting, and thus someone nefarious with access to the central
repository can easily forge history.

I bet that never makes it into the marketing literature, though.

~~~
biohacker42
_I don't know of a single version control system that doesn't let you change
history in a straightforward manner._

<http://www.accurev.com/>

It has an _append only_ db.

~~~
jrockway
Are the revisions cryptographically validated, and are there other copies of
the repository to compare against? If not, it doesn't matter, you can just
edit the disk blocks.

(Yes, maybe you don't care about history _that much_. But if history is a tool
for developers rather than the legal team, you shouldn't care if the
developers mutate it.)

------
moe
Sounds like a case of "if you don't like it, don't use it" to me. It's not
like "--patch" was a mandatory flag.

------
LogicHoleFlaw
I think a large part of git's history rewriting stems from the desire to be
able to mail features around as atomic commits. You don't want to be
cluttering up your patch set with spurious interim changes.

~~~
lanaer
As a git user, I appreciate being able to squash commits together to keep
things tidy for those who pull down my changes later — rather than seeing
scattered changes that is merged with their own work, possibly interleaved
with their own local un-pushed commits, they get one atomic commit for a
feature.

That being said, I’m pretty sure Hg gives the same ability through the use of
patch queues, which probably makes more sense.

~~~
gecko
Mercurial's patch queues ultimately provide very similar benefits and
drawbacks to git rebasing, except that:

1\. Patch queues may be version indefinitely, since they are their own
Mercurial repository, rather than eventually expiring, like rebased commits in
the git reflog; but

2\. The above requires you maintain discipline in committing the state of your
patch repository

Although the second can be trivially automated, the default behavior--
rewriting your patch whenever you issue `hg qrefresh`--is far from optimal.

~~~
jrockway
Your post is making the site difficult to read (horizontal scrolling). Can you
reformat the bullet points to not be in a code block?

Edit: thank you :)

~~~
gecko
Done.

