

Empirical Evidence for the Value of Version Control? - gruseom
http://www.neverworkintheory.org/?p=451

======
jasonkester
Here's a real world example of Version Control saving thousands of dollars of
developer time. From three days ago:

Text is rendering blurry in the thing I'm building. It used to look nice and
crisp, but now its as though it's all nudged half a pixel off the grid. It
looks terrible. And I don't remember having touched the text rendering stuff
or the vector code recently. wtf?

So I pull up the change log for the project and it gives me a giant list of
everything that has ever been done on it back to version one where it was
nothing but "test.html" moving a yellow box around the screen in time to an
audio track. I grab a version from a few weeks back, 50 checkins ago, and
click "update to this version". Reload the project in a new tab and sure
enough: beautiful crisp text. Tab over to the current version: blurry
blurriness.

So it's on. Jump forward to yesterday: still bad. 2 weeks ago: good. 1 week
ago: bad, and so on. Until I'm looking at one checkin that definitely contains
change that broke this. As luck would have it, it was a big one, so I get to
try individually updating each file in turn. And since the culprit is nowhere
near touching either text rendering or vector stuff, I get to go line by line
on it until in disbelief I find the single "if" statement that's been flipped
to a configuration that allows opacity animations to occasionally get left at
0.9997 instead of locking them at 1.0.

And we're fixed. In ten minutes.

Now I can honestly say that I don't think I would have found the cause of this
bug without version control. All the places I would have thought to look were
innocent, and the most innocuous thing in the world turned out to be the
culprit. With nothing but tarballs and hand-renamed folders I would have been
completely screwed. Not just more time to fix this, but several orders of
magnitude more time. And most likely we would have simply resigned ourselves
to ship with blurry text and kept an issue in the tracker about it for years.

Now, in case you missed it, this happened to me _three days ago_. It's not
some isolated instance I heard about once. Version Control saves you like this
on a daily basis.

If you see a software company that's currently in business and shipping
software, I'd say you can use that as empirical evidence of the value of
version control.

~~~
pilif
Also, if you are using git, `git bisect` will do a great job at automating
this kind of history search. It'll also do a nice binary search quickly
narrowing down the culprit.

Additionally, you said "As luck would have it, it was a big one," - this is
what I personally use `git add -p` and `git rebase -i` for before pushing to
the public repo: I try to keep the commits as small and self-contained as
possible such that finding the faulty commit gets much simpler, because there
will be no "big one".

~~~
scott_w
Keeping commits small is crucial. I've been bitten by the "big commit" enough
times to force myself to keep commit diffs to a few lines across a few files.

I even go as far as to put "formatting" changes into one commit, and actual
code into a separate commit. Formatting changes tend to be things like
"unindent this large block of code", or "strip extraneous whitespace".

They can be larger than usual commits, but they allow me to separate
functional changes from non-functional changes. And yes, I've seen a non-
functional change break working code before.

~~~
steferson
>And yes, I've seen a non-functional change break working code before.

Really? how? I'm really curious about this.

~~~
scott_w
Honestly, I don't know how it happened.

The changeset was unindenting an HTML file with in-line JavaScript by 4
spaces.

The JavaScript stopped working, so I reverted the change and it worked again.

------
Spidler
The Linux kernel is a public example of a LARGE software project being
maintained for a long time (Several years) without any source control, then
migrating to BitKeeper, and seeing a large increase in development rate as a
result of this.

BitKeeper pressrelease on the subject:
<http://www.bitkeeper.com/press/2004-03-17.html>

Also the discussion and article at LWN as the kernel moves _away_ from
BitKeeper may be interesting: <https://lwn.net/Articles/130746/>

If you look at fex. ChangeLogs for the Linux kernel before and after, and
also, after the git change, you can see the rate of development increasing
quite a lot with the assistance of proper tooling.

------
brudgers
_"but I have no experimental evidence to base that decision."_

"Empirical" and "experimental" are not synonyms. Experimental evidence
constitutes a small range within the set of all things which might count as
empirical evidence - e.g. anecdotes are empirical evidence but not
experimental evidence.

It is rare for something as vague as version control to undergo formal
investigation via experiment. As this thread shows, there is a broad range of
often incompatible activities which might be called "version control."

Typically, investigations would be via trial (and error), not experiment. The
_experience_ gained from such trials is empirical evidence - evidence from
experience is all that "empirical" means.

Although the strength of experimental evidence is typically based upon
statistical correlation, the strength of empirical evidence is based on
various other types of judgements.

It would not be unreasonable to give the reference to Moore substantial
empirical weight, if one attributed significant authority to Moore (perhaps
based upon previously finding that Moore's judgement about similar matters
corresponded with one's own experience). On the other hand, such an appeal to
authority would never be acceptable within an experiment.

In a sense, the idea of experimenting with version control, in a formal sense
is a bit absurd - at the point where the benefits or disadvantages appear
obvious, the experiment would be abandoned, e.g. at the point where
productivity rises sharply or the shipping date is missed.

In a sense, the question seems to miscategorize version control as something
other than a tool. Tools are particular. So is version control. There are some
programming tasks where it is hammer to nails. Other tasks however, use
screws.

Version control, as others in this thread have noted, is not a monolithic
thing. It's a cluster concept. Implementations may be hammers, screwdrivers,
rivet guns, or glue.

------
cheald
This seems like an awfully - and very academic - strange question. "Have there
been controlled studies to see if this thing that millions of developers have
personally experienced value from actually has value?"

No, I rather suspect not. What's more curious is the implied suggestion that
VC has no value if it has not been empirically proven through formal
scientific study.

I also suspect that you will find no papers on the effectiveness of building a
house using a hammer and nails and making it up as you go versus using a
nailgun and blueprints, but as man has known since he first discovered fire,
better tools pretty much always mean faster and higher-quality results.

~~~
antidoh
There might be something in here: [http://dblp.kbs.uni-
hannover.de/dblp/Search.action;jsessioni...](http://dblp.kbs.uni-
hannover.de/dblp/Search.action;jsessionid=069D77101E1A03C979BBB0DCE583D319?q=version+control&search=Search&_sourcePage=rbbrKZBl_XtRZ6WxuokaxpsZ4ZClrXslH_hO9BPqaoA%3D&__fp=6GwwjDZLvM0%3D)

------
rgbrenner
Let's say you're working by yourself on your pet project. It's entirely for
your personal benefit, and no one else matters. Why would you want version
control?

* when you make a change that introduced some bug, you can see exactly what was changed, and roll it back

* when you delete a feature because it's no longer useful, then some time passes, and you realize that feature is useful (or you can use that code for something else), you can just pull up your old code and re-add it.

* you can delete old stuff, knowing that it's not in fact gone.

* you can do more radical experiments by branching your code. While you could do this without version control by just copying your code to a new location.. with VC you could keep all of the file/code history.

I work alone writing software projects. No one has seen my code in 10 years
(since I started). I use SVN. Every bit of code gets put in SVN.

If you aren't sure about version control, take my word for it: you want to use
version control. It will make your life easier. Just do it, and in a short
time, you'll see exactly what I mean.

~~~
bartwe
Doing a review of the diff of your own work for the day before commit is also
a good way to spot issues.

------
exDM69
The OP considers versioning files but does not address collaborative sharing,
branching, merging, etc at all. Dropbox and the other alternatives suggested
in the post do not simply offer anything for merging or dealing with conflicts
(or do they? what about versioning filesystems?).

When working alone on a fairly straightforward, linear project, you might
actually get away using Dropbox. Try to work in a team of any substantial size
and there will be trouble.

~~~
icebraining
Dropbox's answer to conflicts is dumping a lot of copies of the files,
appending "([user]'s Conflicted Copy)" to the filename.

<https://www.dropbox.com/help/36/en>

------
gutnor
I'm sure they are trying to be useful, but I don't think it is the right
choice of question to ask the community of a blog that pretend to bridge CS
theory with practice.

For a practicing developer, that is like finding a professional carpentry blog
with the question "Can you think of any practical situation where a hammer is
useful", so I fear that will taint the opinion of those whose first contact
with that blog was through this link.

At least they should have answered themselves - or since that's what they are
suppose to do - talk about some theory that could render source control
obsolete ?

------
Tichy
I don't usual hit my thumb with a hammer, but I have no empirical evidence
that I shouldn't. Maybe I should smash my thumb with a hammer occasionally?

Seriously, does this need research? Just try it, and you'll see some pain go
away, not the least the mess of multiple copies of project directories and zip
files.

Not that there isn't anything research worthy about version control, but it is
entirely possible to recognize it as a good thing without it.

~~~
praptak
Similar point has been raised in the following paper:

<http://www.bmj.com/content/327/7429/1459> _"Parachute use to prevent death
and major trauma related to gravitational challenge: systematic review of
randomised controlled trials"_

Results: _"We were unable to identify any randomised controlled trials of
parachute intervention."_

~~~
codeulike
_We think that everyone might benefit if the most radical protagonists of
evidence based medicine organised and participated in a double blind,
randomised, placebo controlled, crossover trial of the parachute._

Thats quite a paper, thanks for the ref

------
tomlu
The lack of study for even the simplest, universally acknowledged principles
of software engineering has always troubled me. Proving any non-trivial
conjecture (for instance the proposition that version control reduces software
cost) _should_ be possible, but the cost of doing so is often prohibitively
expensive.

Until the evidence somehow materialises I think the best approach is to accept
that software engineering is mostly a craft, not a science. That way we can
heed the advice of respected software artisans with a clear conscience - all
of which would recommend the use of source control.

------
bjourne
I can belive that keeping files under vc is generally beneficial. But what I'm
not so sure about and what I wonder if it is worth the trouble making
elaborate commits? It's the difference between anal committing and loose
committing.

In the first style you ensure that your commit messages properly describe what
bugs your checkin fixes, that it follows the formatting convention you have
for writing checkin messages, that there are no extraneous whitespace changes
producing unnecessary hunks and that you don't accidentally insert a big block
of uncommented out code.

Loose committing is when you periodically commit the stuff you are working on.
Several times per day, when you feel you've reached something good, you
checkin what is in your repository and, at best, you write a comment like
"fixed the bug with bla".

Of course, there are several possible shades of gray between totally loose and
totally anal committing. What is the right strategy may very well vary
depending on how many developers are involved in the project.

For me, I've found that loose works best for my own projects. Prettifying
commits and thinking up good descriptions is annoying and can disrupt my flow.
I can't be bothered to force myself to do it just "because it's the right
thing" and I usually don't revisit old commits anyway. I'd definitely like to
hear other developers thoughts on the issue though.

~~~
kabdib
It depends on the project you're on, and what phase of development you're in.

If you work on your own, do anything you like.

If you work on something being used daily by tens of millions of people, upon
which your company's lifeblood depends, you're more careful. Actually checking
in code is somewhat anticlimactic; leading up to that, you've run a bunch of
tests, gotten a code review, and satisfied yourself that the change is
appropriate and necessary. Post-checkin, you monitor the build, make sure that
BVTs work, and basically make sure you haven't broken anything.

A good checkin isn't a big deal. It's the stuff around it that matters.

------
anarchitect
The company I work for (an online retailer) had no meaningful VCS until after
I joined and introduced Git.

Aside from the obvious software collaboration benefits, it's been particularly
valuable this week when we have had to manage multiple different releases for
the gifting season (promos, merchandising etc) across all six of our sites.
All of our post-Christmas sales are all waiting in branches to be deployed.

------
Nursie
Without a vcs you don't have a development process.

It stores the code. It stores all versions of the code. On a rudimentary
level, when you break something, you still have the good version.

On a more sophisticated level, with a proper version control system and
branching strategy you can -

Have multiple developers working on the same codebase and most changes
automatically merged in (though obviously you need some human oversight)

Support multiple different releases of a product from the same tree.

Roll forward patches and fixes from one version to the next, again largely
automatically.

...

Maybe some of these are less important for things you run on your own servers
rather than stuff that gets deployed at customer sites, but to me it's
difficult to imagine working without it.

------
codeulike
I think this would be a useful study, people are saying that its self-
evidently obvious that Version Control helps, but its always good to challenge
things that are supposedly obvious.

I expect if it was properly studied we'd be able to identify a rough level of
complexity below which using version control takes extra time and delivers not
much benefit.

e.g. Linux Kernel definitely couldn't be done without version control, whereas
some of the 24 hour things I've done solo at hackathons would probably have
moved faster without git.

edit: but don't get me wrong, for any serious, multi-developer project I'm
definitely using version control

------
mathattack
I think the question is bigger than just what's being asked. Of course we all
know that version control is vital. But has there been any formal study that
documents how much?

There are a lot of common accepted practices that actually have weak
scientific foundations. The waterfall method comes to mind. Asking for the
basis of a practice is still useful, even if the answer is just "look around."

------
fleitz
It's an issue of having more data / insight into your codebase.

Ideally your software has a fitness function that it must pass or which it is
compared with other iterations of your software. With each change made to the
software it ideally becomes more fit, however we know that some changes cause
bugs, reduce fitness, etc.

Source control allows regular human beings to revert to an earlier more fit
stage and progress from there.

If it were possible to write a correct version of a program on the first try
and never would the fitness function change then there would be no use in
source control.

Frequently small programs will be created with out source control. Like a
script to print the numbers 1 to 100, Fizzbuzz, etc. These kinds of software
generally don't benefit from source control and thus it is not used. Software
of simple complexity usually can be written correctly in a few iterations.

When working with multiple programmers the primary added benefit is file
syncing and visibility into who made what change so it can be inquired as to
the purpose/impact of the change.

In short if someone isn't using source control tell them that the prototype
they showed you last week was perfect for a new client, you need what was
built last week in the next 15 minutes to demo for a client, but the menu
color should be blue instead of red.

OT: I've been trying to figure out a way to auto-create commits everytime a
file is saved, then quickly quash those commits into a single new commit when
substantial changes have been made. Anyone know of a readymade solution?

~~~
steeleduncan
OT: seconded. If anyone knows of a way to commit to a "staging" repository,
then bundle my broken wip commits to the main repository, I'd love to know.

I'd like to keep the main repo clean with one commit per feature/bugfix, but
still be able to commit regularly and walk back/bisect changes whilst working
on the feature.

~~~
cheald
You can do this with git and squashed commmits, but while it seems like a good
idea, it's really not - when you're bug-hunting, smaller single-change commits
make it far, far easier to track down and kill the bug. Having to crawl
through a 3000-line squashed commit to find where a bug was introduced is
awful.

A better workflow is to use feature branches to work on a feature, when when
you're done with it, rebase and merge it back to master and create a tag for
the feature. This gives you an easy timeline of what features or fixes were
introduced when, lets you commit often without breaking master, and doesn't
destroy your valuable commit history.

~~~
steeleduncan
Thank you. This looks like it will work well. The only problem is remembering
not to push.

It would be nice to be able to work with two repositories in parallel. One
repository for my development and the other one the "official" repository with
larger commits, better commit messages and the guarantee that all commits
work.

~~~
cheald
You might be interested in git-flow[1] which is a formalization of this
workflow.

[1] [http://jeffkreeftmeijer.com/2010/why-arent-you-using-git-
flo...](http://jeffkreeftmeijer.com/2010/why-arent-you-using-git-flow/)

------
raverbashing
I think anyone who questions the need for an "empirical evidence" has never
written a single line of code in their life

Because it's in plain sight

No, this is not a matter of "trying to find it"

If someone questions this they're already wrong.

~~~
CJefferson
While I agree there is a clear need for version control, telling someone
looking for evidence that they are already wrong is extremely unhelpful.

Almost everyone had a time when they didn't use version control, at least not
properly. I wrote thousands of lines of code on an Amstrad CPC, and the
closest I came to version control was cycling between two different tapes when
it came to saving my data. When I moved to PC I did the same, with two floppy
discs.

~~~
raverbashing
So you see that even when you had no idea of what was version control you
invented a rudimentary form of it?

That's the thing, that is so basic people invent it if they don't have it.

I did similar things, though not on an Amstrad ;)

~~~
sliverstorm
I agree that the lack of empirical evidence is probably because it's a pretty
fundamental idea that is "common knowledge", but that doesn't mean evidence
isn't useful and it also doesn't mean the original poster deserves a bashing.

