
Worst Software Metrics - wbharding
https://www.gitclear.com/blog/the_4_worst_software_metrics_agitating_developers_in_2019
======
shakezula
At a previous job of mine, they implemented a "personal improvement plan" for
employees who were below management's "desired productivity". There was no
warning, no heads up that they would be measuring our GitHub accounts, and it
had absolutely no context of what each person had been working on. These
reviews came after a 360 peer review, and absolutely blindsided the team.
Several of my coworkers left early for the day and didn't come in for the rest
of the week. I got calls all day from various people asking what happened in
my 1:1, what was said, etc...

I wish I could show management how negatively and immediately that move
affected the culture & mood of engineering at that company. It was an
overnight change, and going forward it basically killed all trust in
management. I stayed around for a little bit longer, but I should have quit
the second they started those measurements.

It absolutely murdered the confidence of our junior devs. Of the senior
engineers who were put on it, none of them really had any idea what they were
doing wrong or how to improve, as no feedback was given about any of that,
just that they weren't enough. Some of our seniors had 15+ years of experience
in software and were industry experts, but it didn't matter: They hadn't been
hitting the number of commits and pull requests and tickets per week that
management insisted on.

When management was confronted about it, they admitted that code metrics
aren't a good measurement of productivity, but then insisted that they "needed
something".

It turns out, they only used code metrics as a bludgeon to get rid of the
employees they wanted gone, and since then that's all I've been able to see
code metrics as. They're a battering weapon used to give HR a performance
reason to justify firing you.

~~~
gtirloni
I'm sorry this experience left you scared of metrics but it doesn't mean they
are bad, just that people need to know how to use them.

I've had this exact conversation before and the extremism of "metrics are
always weapons" was a contributing factor for me to leave a, say, less
metrics-inclined workplace. Just saying there has to be balance.

~~~
overgard
I've been a professional software developer for twelve years, and while I'm
trying to keep an open mind, I have yet to see any useful metrics for what we
do. Like, useful AT ALL. I can think of zero numerical metrics that can
accurately tell you what's going on with a team. Number of commits, burndown
charts, velocity, issues closed/created, whatever, they're all simultaneously
easily game-able and also create bad incentives.

And by the way this isn't sour grapes, metrics have almost always made me look
good. One job I left, I had so many more commits than the next person that it
took them two years _after I left_ to catch up to me. And yet I'm absolutely
_certain_ I was not the most useful person on that team. The most useful
person was the guy that was writing the extremely sophisticated modelling
kernel, and if you only looked at the metrics he probably looked borderline
useless, but in actual terms he was by far the most valuable and important
person on our team.

What scares programmers is, everyone on our team knew that, and any sort of
manager with programming experience probably knows that, but you know that
those stats are going to eventually float up to someone who _doesn 't_ know
that the numbers are bogus. (After all, they wouldn't be tracking these
numbers if they knew how bad they are). So decisions _do_ get made from these
numbers, even when people promise they wont.

~~~
Fire-Dragon-DoL
I'm completely with you on that. As for an example where some metrics might be
useful: some form of complexity measurement of the code helps a new developer
on the team discover where to poke to find out issues, as well as which parts
are critical. While far from perfect, it gives you a starting point that's
possibly better than "randomly open files in project".

Now, this comes with interesting points: \- the metrics in this case are in
the hands of the professional in the field \- the metrics are there to serve
the person itself, without affecting others

Metrics on commits, line of code, tickets closed are useless from my point of
view. If you ask in a developer team who is weak at the moment (assuming there
is no room for improvement is stupid) gives better results. A decent
programmer is able to spot fake ones and is able to notice performance
difference over time (not perfectly, but definitely better than metrics).

Not to mention, I would be surprised if none of the team would start
complaining if someone is doing no work and still getting salary.

You can spend an entire day teaching your team something important about the
project you are working on, have no commits about it. Or you can use half the
day walking around thinking how to solve a bug. Not a single line is written,
but that's how some problems are solved.

I started counting in my working hours even when sometimes I wake up in the
middle of the night and my mind solves programming problems for 1 hour (let's
say). I have a decent employee, being at my desk typing is not what I'm paid
for, I'm paid for solving problems with software engineering to the best of my
knowledge.

------
overgard
Ugh the "velocity" one really hits home. A couple of jobs ago one of our
project managers was constantly showing burn down charts (which, of course,
never burned down); along with our "velocity". The funny thing is, because we
were doing t-shirt-size/fibonacci, our velocity numbers varied so wildly that
it was essentially a random number generator. Then we were required fit all
our stories within a "budget" of our "average" velocity.. What are you
supposed to do with the average of a random number between 0 and 200?
Apparently a lot, because these insane numbers that had no actual tie to
reality were used to determine how much we were allowed to "commit" to. Never
mind that some months we'd finish 50 story points and some months 250
(because, again, the numbers were a fiction); we were always trying to fit
things into 90 story points.

Luckily all this useless theater only took up an entire day every two weeks.
If you assume everyone on the team was making about $50 an hour, and it was
about 6-8 people in attendance to these things, then it came out that this
very dumb "estimation" process took 6 people * $50/hr * 6 hours... so about
1800 dollars. (Not including the project managers salary, which given that he
drove around a very fancy luxury car, was probably a lot more than the rest of
us).

And people wonder why I don't take scrum seriously.

~~~
winrid
To be fair, that's very poorly implemented. You should have reference stories
for what is a 1 in complexity and the highest value should be low, like 13,
which forces you to break things down and forces the PM to write better specs.

~~~
overgard
We did all of these things. But we had this ridiculous Fibonacci thing where
you would end up with a conversation like this:

dev: well, this looks like it's somewhere between a 5 and an 8, so let's call
it 6.

pm: no 6 is not allowed, it must be either a 5 or an 8.

dev: ok, lets call it an 8.

Ok so that's a very plausible conversation, right? I'm sure we've all had it.
But just that rule alone made the estimate on this story 20% off, just to
satisfy a rule. I'm perfectly fine with that, if we're only talking about that
one story. If we're both happy that this might go 20% faster or 20% slower
than predicted, no big deal. But if you're going to include it in a larger
rollup, you've included a very large error term with your number that you need
to carry along with the rest of the calculations. Did you ever take a physics
course in college? (A real one -- not one of the baby ones they gave to
business and liberal art majors). If they were good, they probably made you
deal with carrying an error term in your math. You probably even hated it
because it was a pain in the ass to deal with, but it's also kind of
important. If you're adding together 50 numbers that are all off by 20%-50%
and chucking away the error term, your summary and conclusion are _absolute
garbage_ if you do not properly keep the error term included. Math is
unforgiving like that. Presenting that absolute garbage as "data" is, in my
opinion, professional incompetence. It just is. I guess the softening factor
is that so many people do it that it's not an individual incompetence, it's a
lack of competence in the profession itself essentially.

~~~
winrid
Well yeah, you don't have to use fib numbers. Not really sure why that's
popular anyway.

Personally I prefer being on a small team with excellent devs that don't
require this kind of stuff. I built products that made many millions of
dollars a year with just me, a PM, and one other dev with none of this stuff.

------
eirikma
Does anybody know of a study documenting that measuring and reporting software
code quality metrics actually improve the project/product team's ability to
deliver?

So far I've only found studies documenting that measuring code quality improve
the team's ability to improve the measurement figures. That doesn't count.
More or less any measurement will be gamed if anyone think they are used for
something. But has anybody measured if these measurements actually are
helpfull?

~~~
wbharding
This is a terrific question, and a topic I've thought about quite a bit while
we've built our product (context: as the OP, I've toiled on "how to prove
metrics are signal-bearing" for a couple years now).

The problem we always end up at is "what measurable output can we rely upon to
tell the story that a project's fortunes are improving"? To date, its been
challenging to find a compelling answer. This may be why I'm unaware of any
study that addresses the question.

One possible workaround would be to take experts in the field, and try to
translate their qualitative interpretation into input that drives a metric.
e.g., if Andrew Clark from React could list three directories that harbor
disproportionate tech debt in their project, we could calibrate our Open Repo
Directory Browser to identify the factors that were unique to that directory.
This could then help managers pick out when and where its appropriate to make
payments to reduce their tech debt.

Unfortunately, the prominent open source developers aren't answering my emails
at this point. And even if they did, it's still a long road to translate
qualitative expert opinion into provable long-term benefits. But I think the
environment for making that leap is as ripe as ever. We just need to keep
working toward figuring out the most desirable+measurable outputs.

~~~
namibj
This sounds like something Felix von Leitner might be interested in. Consider
collaborating.

[https://blog.fefe.de](https://blog.fefe.de)

------
ilaksh
I have a solution to these metric gotchas. Don't install Jira at all and then
you won't be tempted by any of them. Better yet, don't hire a manager at all.
Instead get one or two senior devs and include them in some business
discussions and incentives.

~~~
overgard
_gasp_ include senior devs in a conversation about development? Blasphemy. At
so many places it's like there's some unwritten law that you can't include
developers in any sort of business conversation, you need to hire a project
manager as a go-between in order to badly translate things between business
and development so neither side actually knows what's going on.

~~~
mnowicki
Why is this so true. I usually assume that when a large percentage of people
do something that seems dumb, it's usually because there is some reason I
don't understand and not because they all coincidentally decided to be idiots
in the same way. Occasionally there is a common pitfall that does cause them
all to be dumb in the same way, and more often there's a good reason. In this
case though, it's so common but I can't see how there could possibly be a
reason for this. Maybe tech companies just tend to have really small meeting
rooms with only a few chairs in them.

~~~
overgard
I think the problem is this: if you talk to any good software developer (or
scientist for that matter), they're always going to speak in probabilities and
hedge everything. And they're absolutely correct to do this, because it's the
reality.

The problem is, the business people (generally) have _no interest_ in this.
They're interested in certainties. They aren't being irrational either -- they
need to go out and sell, raise money, etc., and to do that they need to be
able to say "we're shipping on december 5th" and "absolutely we can do that
for you". Having the developers say "well, uh, maybe we can do that in three
months but we don't really know" makes their lives 10x harder even if it's
true.

So this is where project managers come into the picture. They don't promise
the most _efficient_ use of human resources. What they offer is a
_predictable_ use of human resources. In reality this drags everyone down into
a very gloomy mediocrity and leads to ineffectual organizations. But to the
business person, it seems great because they gain the illusion of control.

In that light, our current predicament makes perfect sense: project managers
are the shamanic priests doing a rain dance, and developers are the weather.
The weather stays unpredictable, but if it happens to rain the shaman can take
the credit, and if the drought remains, it's because the ritual wasn't done
_correctly_ , you see. The illusion of control is the product they're selling.
What the managers are buying is someone that says "yes, we can have it rain on
the 19th" instead of talking to someone that says "we can predict the weather
about a day out, maybe"

(God I hope no project manager I work with ends up reading this, heh).

~~~
Fire-Dragon-DoL
Stupid question: can't the businessman do its job? I mean, it's not like the
market is constant everywhere, they must be dealing with probabilities too.

------
sebastianconcpt
Yep!

 _Considering that the metrics above are still considered viable at the dawn
of 2020 proves how much opportunity still remains for managers to improve the
measurement - > incentives -> long-term outcomes achieved by their team. To
their credit, the instinct that leads managers to utilize these metrics is
sound. They know that the best businesses make their best decisions using
data. But so much depends upon whether the data is any good. When you place
trust in a metric like commit count, you're liable to end off worse than if
you'd used no data at all (i.e., by pitting the team against one another).

Our opinion is that a less bad metric would be one that harnessed the signal
lurking deep within lines of code, while rinsing away its noise. The metric
would consider "issues resolved," but only after normalizing for
implementation complexity, so developers that work in front-end systems don't
take the lion's share of credit after resolving a myriad of tiny tickets._

~~~
TuringNYC
>> The metric would consider "issues resolved," but only after normalizing for
implementation complexity, so developers that work in front-end systems don't
take the lion's share of credit after resolving a myriad of tiny tickets.

Only if you are not being rewarded for solving your own bugs. Otherwise, I've
seen this go horribly awry.

I was at a Big 5 consulting company where one project was big on "how well you
managed issues and brought them to resolution". That meant that good
developers who prioritized code coverage and mostly bug-free code did not have
good issue resolution case studies to cite during promotion cycles. After a
year of good developers getting dinged on this, some good developers banded
together to purposefully leave known bugs in the release (especially bugs that
could be blamed on vague specs -- so instead of trying and resolve issues
during dev cycles, they were propagated to production). Then, the developer
would swoop in and do lots of "investigation", produce a fancy write-up and
email (carefully cc'ing senior management) and throw the business analyst
under the bus.

Senior Management naturally was impressed and some of the developers got
promoted for their "leadership" and management of "critical bugs."

Bad Incentives --> Bad Outcomes

~~~
coldcode
This is known as the Cobra Effect.

I worked at a healthcare company where the chief architect was considered a
genius by upper management because he would dive in and "save" production; in
reality he was the cause of all the bugs that broke it but he did not allow
anyone to fix the problems.

------
commandlinefan
> The longest has taken two weeks and is still ongoing.

Even worse, the typical braindead “solution” to this problem is to insist that
everything be broken up into chunks of a few hours or less each before work
can begin. The end result is “if it can’t be done in a couple of hours, it
isn’t worth doing”.

~~~
perl4ever
It seems like I have read where "esteemed members of the community" advocate
this, at least to enable estimation. Joel-on-software, maybe?

------
toolslive
Actually, LOC isn't that bad. Studies have shown

    
    
        - that complexity metrics are highly correlated with LOC.
        - people can produce & maintain software at about 10 LOC/h. This seems to be a constant of the mind. 
    

It means you're better off doing 10 lines of python than 10 lines of ASM as
the 10 lines of python will do a lot more.

Both facts are discussed at length in "Making Software" (Oram, Wilson)

~~~
x3n0ph3n3
> people can produce & maintain software at about 10 LOC/h.

That is shocking and horrifying if that's the norm for software development.

~~~
perl4ever
I think that people who claim to write many lines of code per hour just don't
count small amounts of time lost and don't realize it adds up or feel it
_should_ be exempt from counting. It's like the calories in broken cookies, or
the ten minutes it takes to get out the door when you only spent a few seconds
grabbing a few things.

One thing that I think can give you perspective is if you experience a job
where you account for your time like lawyers do, in 6 or 15 minute increments.
It's surprisingly hard to get your billable time over 50%, assuming you're
honest about it.

------
ljw1001
The 'best' software productivity metric I've ever seen is one I created and
tested on a large code base. It only measures coding, but it's better than the
other metrics for that. It is based on information theory and credits code
removed equally to code added. In the end, I reported the results to the devs
and then stopped doing it because there was too much opportunity for abuse by
my superiors. Once they got wind of it, they wanted to know everyone's scores
and I could see where it was heading.

The metric: It was based on a very powerful (and slow) compression algorithm
such that it could measure the change in code. Since it used compression,
whitespace, boiler plate, etc had essentially no impact. It was carefully
tested by comparing the results against human assessments.

The only clear issue was that it did over-credit mass changes (renaming a root
package in a java project which caused all files to be changed). This could
probably have been handled by adjusting the scores.

After looking at other metrics (bugs reported, comments on bug reports, code
reviews completed, comments in code reviews, etc) it was clear that some
people made _huge_ contributions through activities that were ignored by the
algorithm, and some people who had great software productivity scores
contributed little in these other activities. They were putting all their time
into their own code.

FWIW, In the same exercise, I came up with another good metric to identify
problematic code (rather than problematic people). That metric used machine
learning to find code that needed to be refactored. But that's a story for
another time.

------
sbellware
The only telltale metric of software work that I know is the ratio of time
spent making progress to time spent not being able to make progress because of
past mistakes made and shortcuts taken.

It's reasonable to consider that the reason that software developers are
saddled with such shallow process and tooling metrics is that they themselves
don't have any better alternatives to offer.

If we roll over and accept the lowest possible maturity organization and
process methodologies like Scrum, then we ultimately deserve what we get.

We have a roll to play, and it ultimately requires stepping up and seeing our
jobs as more than mere tool-wielding. Or said otherwise, in the rush to call
ourselves "engineers" we've stopped being engineers and stopped even pursuing
an understanding of the body of knowledge of engineering (which leads to
better grasp of the subject area that comprises process and measurement).

Velocity is a standard we're held to because we didn't step up to provide
better standards. And we didn't do that because we didn't believe it was our
responsibility to learn about the rest of software development. And in that
vacuum, people who know even less than us are left to dictate pseudo science
to people who should already be in command of the science.

We have a role to play if we wish to play it. It's not obligatory, but there
are real costs to pay when we don't.

There's no requirement or expectation to go above and beyond in an effort to
catch up on all the knowledge and understanding we've shed from our
wheelhouse. It's totally fine and understandable if we don't. But we also have
to accept that nature abhors a vacuum.

------
vageli
Is the measure of an architect the number of lines in a blueprint? Is the
measure of a craftsman the number of finished goods produced? It seems to me
that different projects in the same domain should use different attributes to
evaluate success. Why do we attempt to use uniform measures across-the-board?
Is it lazy management or does delving deep into project-specific metrics make
it hard for large enterprises to compare employees in similar functions? Or
something else?

------
forgotmylogin2
Is this operating under the assumption that there is no code review?

With code review, lines of code and commit count become much harder to game.
If you try to submit changes with unnecessary whitespace, unnecessary code
constructs, or incomplete code, your reviewer will see it and tell you to fix
it. Ultimately, trying to increase your lines of code or commit count will
come back to haunt you as you spend much more time addressing comments from
code review.

~~~
overgard
It's still super-easy to game. Just prioritize all the low-hanging fruit. Got
some tickets for increasing the font size or replacing the copy in the privacy
policy? Why not do those instead of working on the important but hard to
debug/estimate problem of some database transactions being slow, but only
sometimes?

------
jknoepfler
In order to add value to a dev team as a manager, one must have an intimate
understanding of the work developers do. There isn't any value in abstracting
away from that work when evaluating the contribution of team members.

"Context, action, outcome" is a sufficiently standardized formalism for
evaluating performance (assuming outcomes are measured in terms of business
value). Accurately reporting context, action, outcome requires understanding
both what devs do, why they do it, and how it ultimately produces business
value. There's no shortcut. If your management can't meaningfully do that
because they are non technical or managing too many people, or whatever, then
they are actively harming your team and should be removed with prejudice.

There is no reasonable way to conclude that devs can be effectively measured
with garbage stats. Even suggesting it seriously indicates incompetence or
laziness.

------
fake-name
Actual URL:
[https://public.amplenote.com/embed/53SmZouWw5nNyBaJNEe6CSdF?...](https://public.amplenote.com/embed/53SmZouWw5nNyBaJNEe6CSdF?hostname=www.gitclear.com)

I completely don't get this weird iframe for the entire article thing.

------
notmyfuture
I’m a manager of ~25 developers (who report through a few leads). I also come
from a dev background. IMO metrics are dangerous, but can also be useful.
Often I find basic code metrics useful as smell tests, but would never use
them alone, wouldn’t set them as goals, and wouldn’t use them alone to rank
performance in a team.

If someone has a very low number of commits relative to others on their team,
I’ll investigate deeper - review a few Pull Requests, check if they’re working
on something outside source control etc. sometimes I find problems, other
times not.

I’m curious what people think of this approach?

~~~
_Marak_
I would think this is a common approach. I use the same approach. Scales up to
hundreds of developers.

I do generally try to enforce the goal that on any working day there should be
at least one commit from the developer. If I notice a developer is
consistently missing this metric it usually indicates trouble in the project
they are working on or that the developer is just not delivering enough.

------
makecheck
There’s a tendency for “metrics” to evolve into a process where the police
give you speeding tickets based on the number of times you opened the
passenger-side windows of your car. Oh, and if you get a flat tire it must
mean that you should lose your license.

Also, beware of pretty graph summaries that are presented _without_ a healthy
list of caveats and follow-ups and further explanations. Beware if no one with
management power is asking the follow-up questions that matter.

------
zylepe
Best software metric: lines of code removed

------
smadge
Why not let developers justify their work and point out its positive impact
themselves? It would require some overhead, but they could highlight the
appropriate metric if it can be quantified and also justify more qualitative
improvements.

------
lazyant
Author concludes (before promoting his product) "The metric would consider
"issues resolved," but only after normalizing for implementation complexity",
so story points then?

------
chiefalchemist
Not to get too far off topic but is anyone familiar with how Boeing runs their
shop? What's management/leadership like? What metrics they use?

