
Cannot Measure Productivity - alexfarran
http://martinfowler.com/bliki/CannotMeasureProductivity.html
======
lifeisstillgood
I am going to get my drum out and bang on it again.

Software is a form of literacy - and we measure literacy completely
differently. In fact we measure it like we measure science - you are not a
scientist unless other scientists agree you are, and you are not a coder
unless other coders say you are.

What Fowler wants to measure is not the top echelons of productivity but the
lower bounds - presumably to winnow out the unproductive ones.

But that is not how we conduct ourselves in literacy or science. We _educate
and train_ people for a very long time, so that the lower bound of
productivity is still going to add value to human society - and the upper
bounds are limitless.

What Fowler is asking for is a _profession_.

~~~
foobarbazqux
> you are not a scientist unless other scientists agree you are

Science requires one thing: making and testing falsifiable hypotheses. A
priest is able to determine whether or not you are doing that. If anything,
it's philosophers who decide what science is, e.g. Karl Popper.

~~~
swombat
Doing science does not make you a scientist.

~~~
epenn
Can you elaborate on why you believe this to be the case? Saying that a
scientist is one who does science seems like a truism bordering on being
tautological. I'm curious why you disagree.

~~~
swombat
Does knowing a bit of physics make you a physicist? Does praying make you a
monk? Does mixing a few chemicals make you a chemist? Does having some
theories about people's motivations make you a behavioural psychologist? Does
balancing a budget make you an accountant?

"Scientist" implies a certain amount of knowledge, training, discipline, etc.
I'm not implying that every scientist needs to have undergone academic
training - there are other ways - but merely doing a scientific experiment is
not enough to call yourself a scientist.

A scientist is one who "does science" with some knowledge, consistency and
perseverance.

~~~
foobarbazqux
I guess you'd better edit Wikipedia:

> In a more restricted sense, a scientist is an individual who uses the
> scientific method. [...] This article focuses on the more restricted use of
> the word.

[https://en.wikipedia.org/wiki/Scientist](https://en.wikipedia.org/wiki/Scientist)

~~~
swombat
Wikipedia is not the ultimate repository of human knowledge, particularly when
it comes to more tricky questions like "what is a scientist?"...

If we're going to throw definitions around, how about dictionary.com:
[http://dictionary.reference.com/browse/scientist?s=t](http://dictionary.reference.com/browse/scientist?s=t)

> an expert in science, especially one of the physical or natural sciences.

~~~
foobarbazqux
Actually I think Wikipedia is pretty good for tricky questions, in that they
attract a lot of attention and receive a lot of edits.

If being a scientist is determined by the consensus of one's peers, it seems
like it makes sense to accept an article defining what scientists are that is
written as a consensus opinion.

But anyway, if you think it's wrong, why don't you edit it?

------
ChuckMcM
It has been more than 10 years, it has been at least 50 since there were moans
about productivity in the early 60's.

Feynman had some interesting thoughts on minimal computation that sort of
paralleled Shannon's information complexity. As you know Shannon was
interested in absolute limits to the amount of information in a channel and
Feynman was more about the amount of computation per joule of energy. But the
essence is the same, programs are a process that use energy to either
transform information or to comprehend & act on information so 'efficiency' at
one level is the amount of transformation/action you get per kW and
"productivity" is the first derivative of figuring out how long it takes to go
from need to production.

It has been clear for years that you can produce inefficient code quickly, and
conversely efficient code more slowly, so from a business value perspective it
there is another factor which is the _cost_ of running your process versus the
_value_ of running your process. Sort of the 'business efficiency' of the
result.

Consider a goods economy comparison of the assembly line versus the craftsman.
An assembly line uses more people but produced goods faster, that was
orthogonal to the quality of the good produced. So the variables are quantity
of goods over time (this gives a cost of goods), the quality of the good
(which has some influence on the retail price), and the ability to change what
sort of goods you make (which deals with the 'fashion' aspect of goods).

So what is productivity? Is it goods produced per capita? Or goods produced
per $-GDP? Or $-GDP per goods produced? Its a bit of all three. Programmer
productivity is just as intermixed.

~~~
hcarvalhoalves
> It has been clear for years that you can produce inefficient code quickly,
> and conversely efficient code more slowly (...)

That's not only false, but is often the opposite.

The symptom number one of an inexperienced programmer is to waste development
hours reinventing the (square) wheel, while a good programmer is lazy (already
knows which solution works best, and will probably just import it from a
tested library).

So an experienced programmer not only doesn't waste computation power, also
doesn't waste hours on the development cycle.

I agree with everything else you pointed.

~~~
ChuckMcM
Since I'm trying out my CODE keyboard [1] I thought I'd go into a bit more
detail.

My statement about inefficient code quickly is in terms of joules per
computation. So while it is absolutely true that a junior perl programmer
might slowly generate inefficient code and an experienced (lazy) perl
programmer might quickly generate optimal perl code, neither of them would
produce the same product written in assembly code (or better yet pure machine
code).

To put that in a different perspective, I once wrote a BASIC interpreter in
Java (one of my columns for JavaWorld) and it was pretty quick to do, and yet
looking at the "source" to Microsoft BASIC written in 8080 assembler it was
not very efficient. But it took Bill a lot longer to write Microsoft BASIC in
assembler, and you couldn't even _begin_ to port a full up Java VM to the 8080
(let's not argue about J2ME).

But step back then from that precipice, you have two versions of BASIC, one
runs in a Browser and one runs on a 16 line by 64 character TVText S-100 card.
(or 24 x 80 CRT terminal). Now you can run the same program in both contexts,
unchanged, but the amount of energy you expend to do so varies a lot. So which
is more "efficient?" I'd argue the one written in 8080 assembly is more
efficient from a joules per kilo-core-second standpoint. Which was written
more quickly? Mine, it only took about a week.

That is why talking about efficiency and productivity without getting anally
crisp in your definitions can lead to two opposite interpretations of exactly
the same statement.

[1] I find the lack of a wrist pad to rest on a challenge.

------
integraton
The sad part is that even after decades of technologists debating this, the
reality is that most non-technologists working in the industry don't know,
don't care, and really just want their pet features. The real measure of
productivity in organizations with non-technical stakeholders therefore
becomes whether or not a stakeholder feels like they are getting what they
want. Attempts to measure productivity, whether via lines of code or
"velocity," are often little more than a way for everyone to pretend their
opinion is backed by something quantitative. In especially bad cases with non-
technical management, they'll just keep swapping out processes until they
either get what they want or have something with numbers and graphs that makes
it look like they should.

While I could be accused of excessive cynicism, I do believe this is common
enough that it should be addressed. There's a pervasive delusion that
decisions are made by rational, informed actors, when that is rarely the case.

~~~
lifeisstillgood
> becomes whether or not a stakeholder feels like they are getting what they
> want.

If the stakeholder you choose is a customer, then that is a valid measure of
business productivity.

Which I guess is kind of the point - we are trying to measure on a granularity
beyond what we can validly do.

Which indicates to me that a world of smaller organisations, made up of
software literate people will be one where rewards will follow talent. That
may not be a world we want to live in - and my cynicism sees your cynicism and
raises :-)

------
mathattack
There are two types of productivity:

1) Are you doing the right things?

2) Are you doing things right?

They can be imprecisely measured, but every metric has problems and can be
gamed. Combining the measurements is extremely difficult.

Let's start with 1 - doing the right things. Someone who chooses to have their
team work on 3 high value tasks, and stops their early on 6 low value tasks is
by one definition more productive than someone who forces their team to do all
9 things. Or at the very least they are more effective. This is what Fowler is
getting at.

On point 2... Let's assume that the appropriateness of what you are doing is
immaterial. How fast are you doing it? This can be somewhat approximated. You
can say "Speed versus function points" or "Speed versus budget" or "Speed
versus other teams achieving the same output" and then bake in rework into the
speed. All of these metrics are doable. Lines of code isn't a good base
though.

The real question is, "What are you going to do with all of this productivity
data?" If the answer is systemic improvement, you're on the right track. If
you try to turn it into personal performance (or salary) then people wind up
gaming the metrics.

------
seiji
Is measuring productivity isomorphic to the hiring problem?

Everybody says there's a "shortage of developers," but I know good developers
who keep getting shitcanned after a few interviews where nothing seemingly
went wrong.

We can't tell who's going to be productive. Since we can't tell, we come up
with ten foot high marble walls to scale. Our sterile interview problems make
us feel "well, at least the candidate can do our Arbitrary Task, and since we
decided what Arbitrary Task would be, they must be good, because they did what
we wanted them to do."

Productivity is pretty much the same. There's "just get it done" versus
"solving the entire class of problems." Is it being productive if you do 50
copies of "just get it done" when it's really one case of a general problem?
I'm sure doing 50 copies of nearly the same thing make you look very busy and
generates great results, but solving the general problem could take 1/20th the
time, but leave you sitting less fully utilized after (see: automating
yourself out of a job).

~~~
kasey_junk
They are absolutely the same problem. Because we can't measure productivity,
we can't determine relative quality in an objective way. If we could, it would
make the hiring process much more simple.

The question I have is, how is this much different than any other profession?
How do we measure doctor productivity? What keeps me up at night is that it is
very likely that the 90/10 crap to good ratio in software developers is
probably the same ratio as surgeons.

~~~
vadman
Must be the same in every profession. How many of e.g. your school teachers
were good? About 10%.

I am wondering if the ratio holds for crap to good parents. The scarier aspect
of this is that people are actually being trained for their professions, as
opposed to parenting, so the ratio may be even worse.

------
RogerL
A very wise man said "there is no silver bullet". Yet we keep trying all these
schemes to automagically solve what are hard optimization problems only
amenable to heuristics and deliberate, intelligent introspection. Very simply,
you cannot run some tool to measure the information density of a large
project. Graphical programming isn't going to turn a bunch of marketers into
programmers. Doing user stories and forcing people to stand up as they talk
isn't going to remove all the need for planning and tracking. And so on.

You know how I figure out if something can be improved? I dig in, understand
it, and then look for ways to improve it. If I don't find anything, of course
it doesn't mean there is no room, but I'm a pretty bright guy and my results
are about as good as any other bright guy/woman.

I was subjected to endless amounts of this because I did military work for 17
years. You'd have some really tiny project (6 months, 2-3 developers), and
they'd impose just a _huge_ infrastructure of 'oversight'. By which I mean
bean counters, rule followers, and the like - unthinking automatons trying to
use rules, automatic tools, and the like. Anything to produce a simple, single
number. It was all so senseless. I know that can sound like sour grapes, but
every time I was in control of schedule and budget I came in on time and on to
under budget. But that is because I took it day by day, looked at and
understood where we were and where we needed to go, and adjusted accordingly.
Others would push buttons on CASE tools and spend most of their time
explaining why they were behind and over budget.

I like Fowler's conclusion - we have to admit our ignorance. It is okay to say
"I don't know". Yet some people insist that you have to give an answer, even
if it is trivially provable that the answer must be wrong.

~~~
chromatic
Please excuse this small rant.

If you're referring to Fred Brooks, he wrote "[T]here is no _single_
development, in either technology or management technique, which by itself
promises even one order of magnitude improvement _within a decade_ in
productivity, in reliability, in simplicity." (emphasis mine)

The surrounding context makes his comment a very specific prediction which
means something different from what most people claim he meant. Much of the
rest of his essay suggests techniques which address the issue of essential
complexity and which, when applied together, he hoped would produce that order
of magnitude productivity.

Perhaps there was no _single_ such improvement in the years 1986 to 1996, but
when people use the phrase "no silver bullet" to dismiss potential
improvements in productivity, I believe they're doing Brooks and the rest of
us a great disservice.

~~~
jacques_chester
You missed a key point of the essay, which is that no matter _how_ much
progress we make in accidental complexity, essential complexity does not go
away.

~~~
chromatic
Of course that's the key point of the essay, but I've never observed that
anyone who says "There's no silver bullet in productivity" has made it past
the desire to misuse the title of a Fred Brooks essay to support a middlebrow
dismissal to the nuance of distinguishing between accidental and essential
complexity.

After all, much of programming culture is stuck on the idea that the clarity
of syntax of a programming languages to novices is more important to
maintainability of programs written in that language than domain knowledge,
for example.

------
nadam
You can quite well measure productivity if you set a task, write tests for it,
and tell two independent groups to implement it. You give them the same amount
of time.

Now the _more productive_ / better group is which can do the task with
_smaller complexity_.

Complexity measures measure size of code and number of dependencies between
blocks in different ways. But even the most simple comlexity measure is quite
good: just measure number of tokens in source code. (It is a bitmore
sophisticated than LOC). You can then make competitons between groups, and
measure their productivity. (I am writing a book now titled 'Structure of
Software' which discusses what is good software structure on a very
generic/abstract level. It relates to 'Design Patterns' as abstract algebra
relates to algebra.)

~~~
alok-g
Genuinely asking: Why not just stop at "tell two independent groups to
implement it"? That is, why constrain to the same amount of time?

~~~
nadam
Because we measure the quality of their output. A weaker group can solve the
problem with the same quality as a stronger group given much more time. (For
example by doing refactoring in the plus time.)

~~~
alok-g
I see. The time constraint you set is on the tighter side. I was considering
it to be on the relaxed side which would allow the weaker group to improve as
you said.

On the other hand, setting the time constraint (as opposed to measuring both
time taken and solution complexity for the two groups) is important because
deadlines help.

------
artumi-richard
The book "Making Software: What Really Works, and Why We Believe It"
([http://www.amazon.co.uk/Making-Software-Really-Works-
Believe...](http://www.amazon.co.uk/Making-Software-Really-Works-
Believe/dp/0596808321/ref=sr_1_1?ie=UTF8&qid=1377809167&sr=8-1&keywords=making+software))
has a section on this.

Chapter 8 "Beyond lines of Code: Do we need more complexity metrics?" by
Israel Herraiz and Ahmed E Hassan.

Their short answer is that, in the case they looked at, all the suggested
metrics correlated with LOC, so you may as well use LOC as it's so easy to
measure.

IIRC they believe it's only good to compare LOC between different employees if
they are doing pretty much the exact same task however, but since LOC is
correlated with code complexity, there is some measure there.

I recommend the book, as really focusing on the science of computer science.

------
gz5
Heisenberg principle variant for software:

Measure it. Or optimize it. Can't do both without impacting the other.

Software is a work of art and creativity, not the work of a rules-based
factory.

------
stonemetal
So two teams build identical databases in identical time frames. One becomes
popular and has sells in millions of dollars. The other flops, with sells in
the hundreds of dollars. Sure there is a difference in business results but I
fail to see how the two teams were not equally productive at creating
software. Sure I don't have a good definition of software development
productivity but this is open to so many non software development productivity
elements as to be nonsensical.

Basically I see this as marketing. We may not be the fastest but who cares
about that we have the special insight to build the hits that keep you in
business.

------
wciu
Most performance indicators are imprecise. P/E ratio is one of the stupidest
measure of value, but it is widely use in finance. No one(at least no value
investor) would invest based on P/E ratio alone though, there is a lot more
due diligence that's done before investors put their money into a stock. (At
least that's what you hope happens.)

The problem with productivity measures, is not how they are measured but what
they are used for. Most managers want to use productivity measures to evaluate
individual or team performance, however, performance is tied to incentives, so
you always end up with a lot of push back from the team or someone gaming the
system. (IMO, this is because of lazy managers wanting to "manage by numbers",
without really understanding how to manage by numbers.)

Rather than using it as a performance management tool, productivity measures,
however imprecise, can be used alongside other yardsticks as signals of
potential issues. For example, if productivity measure is dropping with a
particular module/subsystem, and defect rate is increasing, then one might
want to find out if the code needs to be rearchitected or refactored. In these
cases, it is okay to be imprecise, because the data are pointers not the end
goal. When used correctly, even imprecise data can be very useful.

------
dirtyaura
The quest for a single measure of hard-to-define concept like productivity is
doomed. Even Fowler's article highlights the fact that we don't have a shared
understanding what the word productivity means: writong quality code, shipping
useful products or making money? all of them? It's no surprise that there is
no numerical measurement that captures a badly-defined concept.

In my opinion, we should approach measurement from a different angle: can we
learn something useful about our profession by combining different types of
measurements. Can we, for example, easily spot a person who is doing what
Fowler is calling important supportive work. Can we detect problem categories
that easily lead to buggy code and allocate more time for code quality work
for tasks and less for those that are known to be more straight-forward.

------
Jormundir
It drives me nuts when programmers brag about their productivity, measured by
how many lines of code they've written.

You end up with something like feature 1: +12,544 / -237 lines. Done in 2
weeks.

Then comes feature 2, 2 and a half months later, the stats: +5,428 / -9,845.

Look at that, you had to tear down everything they wrote because they cared
about amount of code over code quality. The more they brag, the more you think
"oh s$%t, every line they add is a line I'm going to have to completely
untangle and refactor."

I think software engineering productivity can be measured, though not well by
today's standards. There will probably be a decent algorithm to do it in the
future that takes in to account the power of the code, how easy it is to build
on top of, how robust it is, etc.

~~~
matwood
Nothing makes me happier than removing code. If I can find ways to deliver the
same functionality in less code I get excited.

Now, I do like to look at my personal lines of code because it gives me a
gauge to compare features I implement on a relative basis. It also gives me a
relative, rough measure how much effort a particular feature took to produce.

~~~
RogerL
You will like this story from Apple, when they for a time required engineers
to report LOC produced that week.

[http://folklore.org/StoryView.py?project=Macintosh&story=Neg...](http://folklore.org/StoryView.py?project=Macintosh&story=Negative_2000_Lines_Of_Code.txt&sortOrder=Sort%20by%20Date)

------
kailuowang
The purpose of measuring productivity is to manage it. There are two
categories of factors that decide the overall productivity: the factors within
the developers (capability, motivation, etc) and the factors outside the
developers (tools, process, support, etc).

True, it's hard to objectively measure the overall productivity using a
universal standard, but it is relatively easier to measure the productivity
fluctuation caused by the external factors. Velocity measurement in Agile
practice is mostly for that end.

For the internal factors, the best way, and arguably the only effective way,
to manage it is probably to hire good motivated developers. I think most top
level software companies have learned that.

~~~
lifeisstillgood
This is true - to an extent. Scrum screams out to measure relative story
points, and never provide the data for "management" purposes. But even the
same team estimating in succession will face external pressures - and if those
pressures will be alleviated by gaming story points, they will. This catch-22
had me - I truly think the only way is to report only an estimated finish
date. Any public posting of velocity eventually filters into a management by
velocity - because that's the only metric management has. And we are back on
the same old loop - we can have a measure of productivity as long as we do not
use it in any manner as a measure of productivity.

Add to this I don't think scrum has become setup to take this to its logical
conclusions - agile/scrum has been sold as a fairly fixed methodology, not as
a means to get some relative metric out of teams and use that in a series of
experiments to achieve productivity improvements. And even if it were, the
major wins we _know_ and can prove work (quiet conditions, minimal
interruptions, trust, respect, time for reflection and education, are a long
way from being accepted by today's enterprises.

In short there is no silver bullet, and while agile looked a magic bullet it
just turned out to be plain old lead.

------
mtdewcmu
The article makes the point that the LOC metric is confounded by duplication:

> Copy and paste programming leads to high LOC counts and poor design because
> it breeds duplication.

This problem is not insurmountable. Compression tools work by finding
duplication and representing copies as (more concise) references to the
original.* The size of the compressed version is an estimate of the real
information content of the original, with copies counted at a significantly
discounted rate. The compressed size of code could be a more robust measure of
the work that went into it.

* Sometimes this is done explicitly, other times it's implicit

------
alightergreen
And what about Iteration? The learning value that can come from doing things
poorly?! Imagine if Microsoft had LEARNED something from what they did wrong
in Windows 95? Or Windows ME! Imagine how amazing their software would be now.
They couldn't have done it without having totally screwed up first. Of course
they didn't do that in the end...so...

------
chipsy
Productivity by any volume measure seems meaningless in the software context.
That's like measuring writing productivity by word count. Nobody really likes
high-volume communication, unless the goal is to write a lot of trash.

Even if you deliver a system with a lot of features and no known bugs, if they
aren't the right features, it's not valuable software.

------
AlisdairSH
If you don't work >43 hours/week, you aren't productive. At least according to
one boss I've had. :|

------
scotty79
I think that the the one thing that enables science is that even though you
cannot measure all you want, you can still measure some things and that
measurements are useful, just not directly.

------
est
Because this

[http://en.wikipedia.org/wiki/Coastline_paradox](http://en.wikipedia.org/wiki/Coastline_paradox)

productivity of working on a software is like measuring fractals.

------
platz
Perhaps trying to measure true productivity reduces to the halting problem

------
dredmorbius
Software productivity management (as an end in itself) fails to account for
another fundamental axiom: that software itself isn't the end-product, but
itself is a tool or defines a process by which some task is accomplished.

Count lines of code, function points, bugfixes, commits, or any other metric,
and you're capturing a _part_ of the process, but you're also creating a
strong incentive to game the metric (a well-known characteristic of assessment
systems), and you're still missing the key point.

Jacob Nielsen slashed through the Gordon's knot of usability testing a couple
of decades back by focusing on a single, simple metric: does a change in
design help users accomplish a task faster, and/or more accurately? You now
have a metric which can be used _independently_ of the usability domain (it
can apply to mall signage or kitchen appliances as readily as desktop
software, Web pages, or a tablet app).

Ultimately, software does _something_. It might sell stuff (measure sales), it
might provide entertainment, though in most cases that boils down to selling
stuff. It might help design something, or model a problem, or create art. In
many cases you can still reduce this to "sell something", in which case, if
you're a business, or part of one, you've probably got a metric you can use.

For systems which don't result in a sales transaction directly or indirectly,
"usability" probably approaches the metric you want: does a change accomplish
a task faster and/or with more accuracy? Does it achieve an objectively better
or preferable (double-blind tested) result?

The problem is that there are relatively few changes which can be tested
conclusively or independently. And there are what Dennis Meadows calls "easy"
and "hard" problems.

Easy problems offer choices in which a change is monotonic across time. Given
alternatives A and B, if choice A is better than B at time t, it will be
better at time t+n, for any n. You can rapidly determine which of the two
alternatives you should choose.

Hard problems provide options which _aren 't_ monotonic. A may give us the
best long-term results, but if it compares unfavorably initially, this isn't
apparent. In a hard problem, A compares unfavorably at some time t, but is
_better_ than B at some time t+n, and continues to be better for all larger
values of t.

Most new business ventures are hard problems: you're going to be worse off for
some period of time before the venture takes off ... assuming it does.
Similarly, the choice over whether or not to go to college (and incur both
debt and foregone income), to to learn a skill, to exercise and eat healthy.

It's a bit of a marshmallow experiment.

And of course, there's a risk element which should also be factored in: in
hard problems, A _might_ be the better choice only _some_ of the time.

All of which does a real number in trying to assess productivity and employee
ranking.

Time to re-read _Zen and the Art of Motorcycle Maintenance_.

------
swombat
[2003]

------
a3voices
Also if you procrastinate a lot, you might end up learning something useful
and get new insights that will make you more productive in the long run.

~~~
sgarman
I'm going to keep telling myself this as I sit on HN...

