Hacker News new | comments | show | ask | jobs | submit login
Cannot Measure Productivity (martinfowler.com)
127 points by alexfarran 1543 days ago | hide | past | web | 74 comments | favorite

I am going to get my drum out and bang on it again.

Software is a form of literacy - and we measure literacy completely differently. In fact we measure it like we measure science - you are not a scientist unless other scientists agree you are, and you are not a coder unless other coders say you are.

What Fowler wants to measure is not the top echelons of productivity but the lower bounds - presumably to winnow out the unproductive ones.

But that is not how we conduct ourselves in literacy or science. We educate and train people for a very long time, so that the lower bound of productivity is still going to add value to human society - and the upper bounds are limitless.

What Fowler is asking for is a profession.

> you are not a scientist unless other scientists agree you are

Science requires one thing: making and testing falsifiable hypotheses. A priest is able to determine whether or not you are doing that. If anything, it's philosophers who decide what science is, e.g. Karl Popper.

Ludwig Boltzmann was a very important scientist (or perhaps I should say that his scientific contributions were significant). However, if I recall correctly, his peers didn't agree with his theories and, I would assume, they wouldn't have called his theories scientific -seeing how they basically assumed atoms in a time when that was controversial.

So, I too consider that agreement from others isn't a prerequisite for being a scientist. I also agree that "making and testing falsifiable hypotheses" definitely qualifies as doing science.

However, perhaps that's not the only way to do science. In general, there is that whole set of criticisms on the limits of falsifiability (with Kuhn et al). In particular, I'm thinking of cases where arguably the technology isn't sufficiently advanced to perform the measurements necessary to directly test the hypotheses (e.g., how quantum physics progressed). Arguably those doing all the thought experiments, modeling, thinking through consequences of those hypotheses and comparing with what they could measure were doing science -though those weren't falsifiable hypotheses at the time.

So what I'm saying is that your requisite is sufficient but perhaps isn't necessary either.

Oh, I think it's okay if the hypotheses are falsifiable in the future, as long as the models are not presented as experimentally verified.

Doing science does not make you a scientist.

Can you elaborate on why you believe this to be the case? Saying that a scientist is one who does science seems like a truism bordering on being tautological. I'm curious why you disagree.

Does knowing a bit of physics make you a physicist? Does praying make you a monk? Does mixing a few chemicals make you a chemist? Does having some theories about people's motivations make you a behavioural psychologist? Does balancing a budget make you an accountant?

"Scientist" implies a certain amount of knowledge, training, discipline, etc. I'm not implying that every scientist needs to have undergone academic training - there are other ways - but merely doing a scientific experiment is not enough to call yourself a scientist.

A scientist is one who "does science" with some knowledge, consistency and perseverance.

"Scientist" is a fluffy title like "Doctor" or "Professor", conveyed by other people or authorities for classification in a hierarchical/segmented society. As a researcher, I wear many hats doing design, coding, science, writing, etc..., but my title is just "researcher" because of the role I fill professionally.

I guess you'd better edit Wikipedia:

> In a more restricted sense, a scientist is an individual who uses the scientific method. [...] This article focuses on the more restricted use of the word.


Wikipedia is not the ultimate repository of human knowledge, particularly when it comes to more tricky questions like "what is a scientist?"...

If we're going to throw definitions around, how about dictionary.com: http://dictionary.reference.com/browse/scientist?s=t

> an expert in science, especially one of the physical or natural sciences.

Actually I think Wikipedia is pretty good for tricky questions, in that they attract a lot of attention and receive a lot of edits.

If being a scientist is determined by the consensus of one's peers, it seems like it makes sense to accept an article defining what scientists are that is written as a consensus opinion.

But anyway, if you think it's wrong, why don't you edit it?

I'd say doing science requires a specific mindset. Using amount of knowledge as a criterion seems wrong - how did the first scientists come about, presumably they didn't know anything at all yet?

Your last sentence is exactly right. There are significant benefits to building a profession that is predictable and well-defined. Ensuring a reasonable lower bound reduces the risk of failure significantly.

The open questions are 1) whether it's possible to do that, and 2) whether it's possible to do that without inadvertently imposing an upper bound. Common wisdom among software developers is that it is not possible and even if it was it would impose a very tight upper bound. But as far as I know, that common wisdom is just guess work. No one has found a good solution yet, but there's also no proof that there is not one. That's why so many people continue to try.

I agree with what you're saying, but I feel that it's exactly what Fowler is saying too. The article is called "cannot measure productivity", after all.

What parts of what he writes do you disagree with?

Fairy Nuff

I suppose he ends with throwing his hands in the air and saying we cannot do it. Well actually we can achieve a measure of literacy, just not perhaps fluency.

We know if a child can read and read well. It's possible to take their written work and assess or mark it. We can (and do in code reviews) do the same for code written by adults.

The problem he is stating is IMO in two parts - measuring basic competence (is a person of net benefit, can they write competent code, are they literate?) which is possible, and how good are they compared to their peers, which is much more subjective and based on taste as much as anything.

So I agree we should not try to measure productivity, I disagree that that's the end of the conversation, mostly because I disagree that productivity is one thing - I see it as more Hygiene/Motivator

you are not a coder unless other coders say you are

Not everybody needs the approval of an external entity. I couldn't care less what other coders think of me. What I ship speaks for itself. What I write speaks for itself. What studies scientists publish speak for themselves. The rest is politics.

Scientists must pass peer review to publish, and peer review can be very harsh. Double blind helps winnow out political bias.

Coders must get jobs to continue coding if they are not independently wealthy. Bias abounds in this case, as other coders with influence must vouch for you, and...in this case your projects can influence their opinion of you.

We are social creatures: we don't live as hermits (most of us anyways).

It was still an inflammatory thing to say, given that there's nothing stopping a person who does live as a hermit from learning to write working code.

In the days before the Web, I created two good-sized adventure games without ever having interacted with another programmer (save maybe 2 book authors). Nobody vetted me. I just made and released them. The lone wolf programmer has always been a thing.

I think there is a confusion between trying to define the line between scientist / non-scientist or coder / Non-coder (Which is difficult to the point of impossible) and the ability for someone who is already a scientist to look at the work of another and decide if it is the work of a scientist

This is why I use the term software literacy. My son is learning to read. He can write is own name and letters, read some words phonetically. All of those things are necessary but not sufficient. But he is not (yet :-) literate. Will it be at ten words ? A hundred? A thousand? Those are silly arbitrary cut offs.

Anyone of us here can tell the difference between literate and illiterate because we have passed through that gateway.

The same goes for scientist or programmer.

But actually trying to write down the exact definition, the point one passes from being able to write a line of executable code and becomes a real programmer? Becomes software literate? Can't do it. Which is also why you can't measure productivity (plus all the reasons Robert Austin has)

I think you are talking about something different. I can practice basic medicine, but no one would call me a doctor. And, no person would accept me calling myself a doctor without some form of external validation that they believed was valid. I can build software, but being a programmer for someone else requires being validated, even minimally, by something external to myself. Even if the validating entity and the person looking to validate me are the same.

100 years ago, in many places on this earth, you could very well have passed yourself off as a doctor. The reason you can't now is because other doctors and governments decided to enforce a minimum standard.

100 years from now, perhaps there will be "Licensed Professional Programmer" certifications. Until then, you're a professional programmer if someone pays you. Even if that someone is yourself.

But if you are working alone as a hermit, why do titles even matter? Titles are just something someone uses to classify you, and in isolation mean nothing (unless you have split personalities; joking, schizophrenia is a serious subject).

The psychiatric definition of schizophrenia doesn't have anything to do with split or multiple personalities. That's actually more like dissociative identity disorder. I know this is a widespread misconception, I used to think the same thing.

> The rest is politics.

Really. No reality check? Spinning a wheel doesn't cause anything in particular, unless it meets the road.

Now I am not saying what you are saying is totally false. The truth is more nuanced, or say, more qualified.

The qualification comes from achieving something in the real world. If you can achieve something by talking to someone instead of writing code to work around it, both are equally valid.

I think of it like this. A pure function does not do any real work. The real work comes from the side effect it causes, the global variable it sets, the file it writes to, the program it talks to on the other end.

So yeah. You can cocoon yourself saying lalalalala, but you don't want to be the thread that spawned, did something to its local variables and exited. Whats the point of such a life anyways?

You need to stop putting words in my mouth.

The context of my statement is in regards to self identification vs community identification. I write code, I release apps and I get paid. Why do I need a community to call me a coder before I can be considered one? What other people think of me, what image I put forth, etc is all marketing and politics. The only real evidence of me as the coder is what I ship. Does it scale? Does it have bugs? Does it work? That's the reality check, not whether Steve from SuperFrog Backup Solutions saw my code on github and thinks I write an elegant monad.

If you can achieve something by talking to someone instead of writing code to work around it, both are equally valid.

Nothing I wrote disputes that.

So yeah. You can cocoon yourself saying lalalalala, but you don't want to be the thread that spawned, did something to its local variables and exited. Whats the point of such a life anyways?

Consider this: a man spends a lifetime writing novels. He thinks of himself as a writer, he identifies himself as a writer and he introduces himself as a writer. He never publishes, but he's always producing. One day, his house burns down, killing him and destroying everything he's ever written. Do you think his life was wasted? Do you think he thought his life was wasted?

I agree with the sentiment of your last paragraph, though in fairness Fowler seemed to end his piece saying that measurement was seductive and likely to make things worse.

Still, I can't help cringe at some of the ideas posited -- for one, the idea that more features == a better product.

Or that profit is a measure of engineering productivity. As if it's that uncommon for great products to be badly marketed.

I think Fowler was just reaching for a stick to beat the metric he was deconstructing - "look Joe writes only 10K lines but makes ten times the profit - that means LoC is not a good metric."

However I do think that (free) market success is a reasonable measure of value / worth - backed up by my favourite quote for which I can find no author:

"Many books are unfairly forgotten, but none are unfairly remembered."

Measuring value (profit) isn't going to be a reliable indicator of how well something was engineered. There's too many badly engineered products that make tons of money, and too many brilliantly engineered products that are commercial flops. So many other elements are involved -- finding the right audience, marketing correctly to them, luck, etc.

This. A million times this.

It has been more than 10 years, it has been at least 50 since there were moans about productivity in the early 60's.

Feynman had some interesting thoughts on minimal computation that sort of paralleled Shannon's information complexity. As you know Shannon was interested in absolute limits to the amount of information in a channel and Feynman was more about the amount of computation per joule of energy. But the essence is the same, programs are a process that use energy to either transform information or to comprehend & act on information so 'efficiency' at one level is the amount of transformation/action you get per kW and "productivity" is the first derivative of figuring out how long it takes to go from need to production.

It has been clear for years that you can produce inefficient code quickly, and conversely efficient code more slowly, so from a business value perspective it there is another factor which is the cost of running your process versus the value of running your process. Sort of the 'business efficiency' of the result.

Consider a goods economy comparison of the assembly line versus the craftsman. An assembly line uses more people but produced goods faster, that was orthogonal to the quality of the good produced. So the variables are quantity of goods over time (this gives a cost of goods), the quality of the good (which has some influence on the retail price), and the ability to change what sort of goods you make (which deals with the 'fashion' aspect of goods).

So what is productivity? Is it goods produced per capita? Or goods produced per $-GDP? Or $-GDP per goods produced? Its a bit of all three. Programmer productivity is just as intermixed.

> It has been clear for years that you can produce inefficient code quickly, and conversely efficient code more slowly (...)

That's not only false, but is often the opposite.

The symptom number one of an inexperienced programmer is to waste development hours reinventing the (square) wheel, while a good programmer is lazy (already knows which solution works best, and will probably just import it from a tested library).

So an experienced programmer not only doesn't waste computation power, also doesn't waste hours on the development cycle.

I agree with everything else you pointed.

Since I'm trying out my CODE keyboard [1] I thought I'd go into a bit more detail.

My statement about inefficient code quickly is in terms of joules per computation. So while it is absolutely true that a junior perl programmer might slowly generate inefficient code and an experienced (lazy) perl programmer might quickly generate optimal perl code, neither of them would produce the same product written in assembly code (or better yet pure machine code).

To put that in a different perspective, I once wrote a BASIC interpreter in Java (one of my columns for JavaWorld) and it was pretty quick to do, and yet looking at the "source" to Microsoft BASIC written in 8080 assembler it was not very efficient. But it took Bill a lot longer to write Microsoft BASIC in assembler, and you couldn't even begin to port a full up Java VM to the 8080 (let's not argue about J2ME).

But step back then from that precipice, you have two versions of BASIC, one runs in a Browser and one runs on a 16 line by 64 character TVText S-100 card. (or 24 x 80 CRT terminal). Now you can run the same program in both contexts, unchanged, but the amount of energy you expend to do so varies a lot. So which is more "efficient?" I'd argue the one written in 8080 assembly is more efficient from a joules per kilo-core-second standpoint. Which was written more quickly? Mine, it only took about a week.

That is why talking about efficiency and productivity without getting anally crisp in your definitions can lead to two opposite interpretations of exactly the same statement.

[1] I find the lack of a wrist pad to rest on a challenge.

I love Feynman's idea of minimal computational energy - it suggests that we could actually measure elegance.

And to be fair we do - the great works of art, craft a d even science are almost always elegant. And it is something we atrive for. Some buggers have it and the rest of us wade around with feet of clay - but it is conceivably measurable.

Going to be easier to go with the editor's taste I think though.

>> I love Feynman's idea of minimal computational energy - it suggests that we could actually measure elegance.

It's a lovely idea, but in practice would be hard. "I just updated my interpreter and my code got 20% more efficient!" "How does it compare to that other code?" "I don't know, it's incompatible with the new interpreter."

The sad part is that even after decades of technologists debating this, the reality is that most non-technologists working in the industry don't know, don't care, and really just want their pet features. The real measure of productivity in organizations with non-technical stakeholders therefore becomes whether or not a stakeholder feels like they are getting what they want. Attempts to measure productivity, whether via lines of code or "velocity," are often little more than a way for everyone to pretend their opinion is backed by something quantitative. In especially bad cases with non-technical management, they'll just keep swapping out processes until they either get what they want or have something with numbers and graphs that makes it look like they should.

While I could be accused of excessive cynicism, I do believe this is common enough that it should be addressed. There's a pervasive delusion that decisions are made by rational, informed actors, when that is rarely the case.

> becomes whether or not a stakeholder feels like they are getting what they want.

If the stakeholder you choose is a customer, then that is a valid measure of business productivity.

Which I guess is kind of the point - we are trying to measure on a granularity beyond what we can validly do.

Which indicates to me that a world of smaller organisations, made up of software literate people will be one where rewards will follow talent. That may not be a world we want to live in - and my cynicism sees your cynicism and raises :-)

Speaking for the non-technologists here: I used to measure productivity crosschecking what people accomplished in a certain time and what they said they would accomplish in that time. Well, it is not really productivity, but at least estimating the amount of stuff you can get done. So, it is more like goals of development.

There are two types of productivity:

1) Are you doing the right things?

2) Are you doing things right?

They can be imprecisely measured, but every metric has problems and can be gamed. Combining the measurements is extremely difficult.

Let's start with 1 - doing the right things. Someone who chooses to have their team work on 3 high value tasks, and stops their early on 6 low value tasks is by one definition more productive than someone who forces their team to do all 9 things. Or at the very least they are more effective. This is what Fowler is getting at.

On point 2... Let's assume that the appropriateness of what you are doing is immaterial. How fast are you doing it? This can be somewhat approximated. You can say "Speed versus function points" or "Speed versus budget" or "Speed versus other teams achieving the same output" and then bake in rework into the speed. All of these metrics are doable. Lines of code isn't a good base though.

The real question is, "What are you going to do with all of this productivity data?" If the answer is systemic improvement, you're on the right track. If you try to turn it into personal performance (or salary) then people wind up gaming the metrics.

Is measuring productivity isomorphic to the hiring problem?

Everybody says there's a "shortage of developers," but I know good developers who keep getting shitcanned after a few interviews where nothing seemingly went wrong.

We can't tell who's going to be productive. Since we can't tell, we come up with ten foot high marble walls to scale. Our sterile interview problems make us feel "well, at least the candidate can do our Arbitrary Task, and since we decided what Arbitrary Task would be, they must be good, because they did what we wanted them to do."

Productivity is pretty much the same. There's "just get it done" versus "solving the entire class of problems." Is it being productive if you do 50 copies of "just get it done" when it's really one case of a general problem? I'm sure doing 50 copies of nearly the same thing make you look very busy and generates great results, but solving the general problem could take 1/20th the time, but leave you sitting less fully utilized after (see: automating yourself out of a job).

They are absolutely the same problem. Because we can't measure productivity, we can't determine relative quality in an objective way. If we could, it would make the hiring process much more simple.

The question I have is, how is this much different than any other profession? How do we measure doctor productivity? What keeps me up at night is that it is very likely that the 90/10 crap to good ratio in software developers is probably the same ratio as surgeons.

Must be the same in every profession. How many of e.g. your school teachers were good? About 10%.

I am wondering if the ratio holds for crap to good parents. The scarier aspect of this is that people are actually being trained for their professions, as opposed to parenting, so the ratio may be even worse.

This is a good insight. Both problems relate to the ability to socially interact with other people, determine what they want, and then to technically produce product to satisfy the other people.

A very wise man said "there is no silver bullet". Yet we keep trying all these schemes to automagically solve what are hard optimization problems only amenable to heuristics and deliberate, intelligent introspection. Very simply, you cannot run some tool to measure the information density of a large project. Graphical programming isn't going to turn a bunch of marketers into programmers. Doing user stories and forcing people to stand up as they talk isn't going to remove all the need for planning and tracking. And so on.

You know how I figure out if something can be improved? I dig in, understand it, and then look for ways to improve it. If I don't find anything, of course it doesn't mean there is no room, but I'm a pretty bright guy and my results are about as good as any other bright guy/woman.

I was subjected to endless amounts of this because I did military work for 17 years. You'd have some really tiny project (6 months, 2-3 developers), and they'd impose just a huge infrastructure of 'oversight'. By which I mean bean counters, rule followers, and the like - unthinking automatons trying to use rules, automatic tools, and the like. Anything to produce a simple, single number. It was all so senseless. I know that can sound like sour grapes, but every time I was in control of schedule and budget I came in on time and on to under budget. But that is because I took it day by day, looked at and understood where we were and where we needed to go, and adjusted accordingly. Others would push buttons on CASE tools and spend most of their time explaining why they were behind and over budget.

I like Fowler's conclusion - we have to admit our ignorance. It is okay to say "I don't know". Yet some people insist that you have to give an answer, even if it is trivially provable that the answer must be wrong.

Please excuse this small rant.

If you're referring to Fred Brooks, he wrote "[T]here is no single development, in either technology or management technique, which by itself promises even one order of magnitude improvement within a decade in productivity, in reliability, in simplicity." (emphasis mine)

The surrounding context makes his comment a very specific prediction which means something different from what most people claim he meant. Much of the rest of his essay suggests techniques which address the issue of essential complexity and which, when applied together, he hoped would produce that order of magnitude productivity.

Perhaps there was no single such improvement in the years 1986 to 1996, but when people use the phrase "no silver bullet" to dismiss potential improvements in productivity, I believe they're doing Brooks and the rest of us a great disservice.

I'm confused. I was pointing out that you cannot do something simple like count LOCs, run a CASE tool that spits out cyclomatic complexity, or other things, and instantly measure productivity. How is that not what Brooks was saying? You don't bean count your way to better software, you manage the inherent complexity. Daily, hard work, understanding all of the parts, and so on.

You missed a key point of the essay, which is that no matter how much progress we make in accidental complexity, essential complexity does not go away.

Of course that's the key point of the essay, but I've never observed that anyone who says "There's no silver bullet in productivity" has made it past the desire to misuse the title of a Fred Brooks essay to support a middlebrow dismissal to the nuance of distinguishing between accidental and essential complexity.

After all, much of programming culture is stuck on the idea that the clarity of syntax of a programming languages to novices is more important to maintainability of programs written in that language than domain knowledge, for example.

You can quite well measure productivity if you set a task, write tests for it, and tell two independent groups to implement it. You give them the same amount of time.

Now the more productive / better group is which can do the task with smaller complexity.

Complexity measures measure size of code and number of dependencies between blocks in different ways. But even the most simple comlexity measure is quite good: just measure number of tokens in source code. (It is a bitmore sophisticated than LOC). You can then make competitons between groups, and measure their productivity. (I am writing a book now titled 'Structure of Software' which discusses what is good software structure on a very generic/abstract level. It relates to 'Design Patterns' as abstract algebra relates to algebra.)

Genuinely asking: Why not just stop at "tell two independent groups to implement it"? That is, why constrain to the same amount of time?

Because we measure the quality of their output. A weaker group can solve the problem with the same quality as a stronger group given much more time. (For example by doing refactoring in the plus time.)

I see. The time constraint you set is on the tighter side. I was considering it to be on the relaxed side which would allow the weaker group to improve as you said.

On the other hand, setting the time constraint (as opposed to measuring both time taken and solution complexity for the two groups) is important because deadlines help.

The book "Making Software: What Really Works, and Why We Believe It" (http://www.amazon.co.uk/Making-Software-Really-Works-Believe...) has a section on this.

Chapter 8 "Beyond lines of Code: Do we need more complexity metrics?" by Israel Herraiz and Ahmed E Hassan.

Their short answer is that, in the case they looked at, all the suggested metrics correlated with LOC, so you may as well use LOC as it's so easy to measure.

IIRC they believe it's only good to compare LOC between different employees if they are doing pretty much the exact same task however, but since LOC is correlated with code complexity, there is some measure there.

I recommend the book, as really focusing on the science of computer science.

Heisenberg principle variant for software:

Measure it. Or optimize it. Can't do both without impacting the other.

Software is a work of art and creativity, not the work of a rules-based factory.

So two teams build identical databases in identical time frames. One becomes popular and has sells in millions of dollars. The other flops, with sells in the hundreds of dollars. Sure there is a difference in business results but I fail to see how the two teams were not equally productive at creating software. Sure I don't have a good definition of software development productivity but this is open to so many non software development productivity elements as to be nonsensical.

Basically I see this as marketing. We may not be the fastest but who cares about that we have the special insight to build the hits that keep you in business.

Most performance indicators are imprecise. P/E ratio is one of the stupidest measure of value, but it is widely use in finance. No one(at least no value investor) would invest based on P/E ratio alone though, there is a lot more due diligence that's done before investors put their money into a stock. (At least that's what you hope happens.)

The problem with productivity measures, is not how they are measured but what they are used for. Most managers want to use productivity measures to evaluate individual or team performance, however, performance is tied to incentives, so you always end up with a lot of push back from the team or someone gaming the system. (IMO, this is because of lazy managers wanting to "manage by numbers", without really understanding how to manage by numbers.)

Rather than using it as a performance management tool, productivity measures, however imprecise, can be used alongside other yardsticks as signals of potential issues. For example, if productivity measure is dropping with a particular module/subsystem, and defect rate is increasing, then one might want to find out if the code needs to be rearchitected or refactored. In these cases, it is okay to be imprecise, because the data are pointers not the end goal. When used correctly, even imprecise data can be very useful.

The quest for a single measure of hard-to-define concept like productivity is doomed. Even Fowler's article highlights the fact that we don't have a shared understanding what the word productivity means: writong quality code, shipping useful products or making money? all of them? It's no surprise that there is no numerical measurement that captures a badly-defined concept.

In my opinion, we should approach measurement from a different angle: can we learn something useful about our profession by combining different types of measurements. Can we, for example, easily spot a person who is doing what Fowler is calling important supportive work. Can we detect problem categories that easily lead to buggy code and allocate more time for code quality work for tasks and less for those that are known to be more straight-forward.

It drives me nuts when programmers brag about their productivity, measured by how many lines of code they've written.

You end up with something like feature 1: +12,544 / -237 lines. Done in 2 weeks.

Then comes feature 2, 2 and a half months later, the stats: +5,428 / -9,845.

Look at that, you had to tear down everything they wrote because they cared about amount of code over code quality. The more they brag, the more you think "oh s$%t, every line they add is a line I'm going to have to completely untangle and refactor."

I think software engineering productivity can be measured, though not well by today's standards. There will probably be a decent algorithm to do it in the future that takes in to account the power of the code, how easy it is to build on top of, how robust it is, etc.

There will probably be a decent algorithm to do it in the future

How could that be possible? Consider the following, very typical scenerio.

You write 2000 lines of code, implementing a feature. I write 2,200 lines of code, implementing a feature in a way that supports our vertical for the next 5 years (you just take my module and plug it in, instead of coding from scratch). Add in whatever time interval you want to make it more complicated - I took the same amount of time as you, less, or more.

Consider that this is a judgement call - did I or you do the right thing? We understand the risks and costs of premature design, but also understand the risk of coding exactly to today's requirements, with no insight into the future. No algorithm is going to tell you the right answer, and by the time we know (5 years from now) the measurement will be useless.

Or we each write a heuristic to the TSP. What algorithm could possible decide whose work is more "productive"? I put it in scare quotes because I don't even know how to define productivity in that regard. Yours runs faster, mine took 1/2 the time to code. Yours is 70% larger than mine, which has cache coherence implications as we continue to add to our programs. Yours is well documented (give me an algorithm to tell if code is 'well' documented), mine is sparse. I used gotos, you used exceptions to deal with errors. You wrote it in Haskell, I did it in C++. Yours will make features X,Y,and Z easily possible, mine makes A,B, and C easy. Your heuristic performs better for some graphs, mine performs better on others.

Who is more 'productive' here? It's not even a meaningful question. Bottom line is, we both tackled a hard problem, both did fine in very different ways, and both have implications on the future of the company (assume a,b,c,x,y,z are really important in the future).

It's an N-dimensional optimization problem with endless unknowns, and no knowledge or agreement on how to measure many of the axis', let alone their relative importance to each other.

Nothing makes me happier than removing code. If I can find ways to deliver the same functionality in less code I get excited.

Now, I do like to look at my personal lines of code because it gives me a gauge to compare features I implement on a relative basis. It also gives me a relative, rough measure how much effort a particular feature took to produce.

You will like this story from Apple, when they for a time required engineers to report LOC produced that week.


I have a friend who styles himself as a "professional code deleter" and who loves to post his diffstats when they're very large negatives.

The purpose of measuring productivity is to manage it. There are two categories of factors that decide the overall productivity: the factors within the developers (capability, motivation, etc) and the factors outside the developers (tools, process, support, etc).

True, it's hard to objectively measure the overall productivity using a universal standard, but it is relatively easier to measure the productivity fluctuation caused by the external factors. Velocity measurement in Agile practice is mostly for that end.

For the internal factors, the best way, and arguably the only effective way, to manage it is probably to hire good motivated developers. I think most top level software companies have learned that.

This is true - to an extent. Scrum screams out to measure relative story points, and never provide the data for "management" purposes. But even the same team estimating in succession will face external pressures - and if those pressures will be alleviated by gaming story points, they will. This catch-22 had me - I truly think the only way is to report only an estimated finish date. Any public posting of velocity eventually filters into a management by velocity - because that's the only metric management has. And we are back on the same old loop - we can have a measure of productivity as long as we do not use it in any manner as a measure of productivity.

Add to this I don't think scrum has become setup to take this to its logical conclusions - agile/scrum has been sold as a fairly fixed methodology, not as a means to get some relative metric out of teams and use that in a series of experiments to achieve productivity improvements. And even if it were, the major wins we know and can prove work (quiet conditions, minimal interruptions, trust, respect, time for reflection and education, are a long way from being accepted by today's enterprises.

In short there is no silver bullet, and while agile looked a magic bullet it just turned out to be plain old lead.

The article makes the point that the LOC metric is confounded by duplication:

> Copy and paste programming leads to high LOC counts and poor design because it breeds duplication.

This problem is not insurmountable. Compression tools work by finding duplication and representing copies as (more concise) references to the original.* The size of the compressed version is an estimate of the real information content of the original, with copies counted at a significantly discounted rate. The compressed size of code could be a more robust measure of the work that went into it.

* Sometimes this is done explicitly, other times it's implicit

And what about Iteration? The learning value that can come from doing things poorly?! Imagine if Microsoft had LEARNED something from what they did wrong in Windows 95? Or Windows ME! Imagine how amazing their software would be now. They couldn't have done it without having totally screwed up first. Of course they didn't do that in the end...so...

Productivity by any volume measure seems meaningless in the software context. That's like measuring writing productivity by word count. Nobody really likes high-volume communication, unless the goal is to write a lot of trash.

Even if you deliver a system with a lot of features and no known bugs, if they aren't the right features, it's not valuable software.

If you don't work >43 hours/week, you aren't productive. At least according to one boss I've had. :|

I think that the the one thing that enables science is that even though you cannot measure all you want, you can still measure some things and that measurements are useful, just not directly.

Because this


productivity of working on a software is like measuring fractals.

Perhaps trying to measure true productivity reduces to the halting problem

Software productivity management (as an end in itself) fails to account for another fundamental axiom: that software itself isn't the end-product, but itself is a tool or defines a process by which some task is accomplished.

Count lines of code, function points, bugfixes, commits, or any other metric, and you're capturing a part of the process, but you're also creating a strong incentive to game the metric (a well-known characteristic of assessment systems), and you're still missing the key point.

Jacob Nielsen slashed through the Gordon's knot of usability testing a couple of decades back by focusing on a single, simple metric: does a change in design help users accomplish a task faster, and/or more accurately? You now have a metric which can be used independently of the usability domain (it can apply to mall signage or kitchen appliances as readily as desktop software, Web pages, or a tablet app).

Ultimately, software does something. It might sell stuff (measure sales), it might provide entertainment, though in most cases that boils down to selling stuff. It might help design something, or model a problem, or create art. In many cases you can still reduce this to "sell something", in which case, if you're a business, or part of one, you've probably got a metric you can use.

For systems which don't result in a sales transaction directly or indirectly, "usability" probably approaches the metric you want: does a change accomplish a task faster and/or with more accuracy? Does it achieve an objectively better or preferable (double-blind tested) result?

The problem is that there are relatively few changes which can be tested conclusively or independently. And there are what Dennis Meadows calls "easy" and "hard" problems.

Easy problems offer choices in which a change is monotonic across time. Given alternatives A and B, if choice A is better than B at time t, it will be better at time t+n, for any n. You can rapidly determine which of the two alternatives you should choose.

Hard problems provide options which aren't monotonic. A may give us the best long-term results, but if it compares unfavorably initially, this isn't apparent. In a hard problem, A compares unfavorably at some time t, but is better than B at some time t+n, and continues to be better for all larger values of t.

Most new business ventures are hard problems: you're going to be worse off for some period of time before the venture takes off ... assuming it does. Similarly, the choice over whether or not to go to college (and incur both debt and foregone income), to to learn a skill, to exercise and eat healthy.

It's a bit of a marshmallow experiment.

And of course, there's a risk element which should also be factored in: in hard problems, A might be the better choice only some of the time.

All of which does a real number in trying to assess productivity and employee ranking.

Time to re-read Zen and the Art of Motorcycle Maintenance.


Also if you procrastinate a lot, you might end up learning something useful and get new insights that will make you more productive in the long run.

I'm going to keep telling myself this as I sit on HN...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact