Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What is developer productivity and how to measure it? (axolo.co)
97 points by arthurcoudouy on March 1, 2022 | hide | past | favorite | 88 comments


I don't agree with most of the advice in this article but rather than complain let me suggest an alternative.

As a line manager, with software engineers reporting directly to you, you should be able to use your personal judgment to understand the productivity of your software engineers. Don't measure it with acronyms, with metrics like the number of commits, or by paying attention to how many hours a week people are working. Pay attention to whether people get things done, and are they getting big important things done, or only little nice-but-not-critical things. Make sure you communicate enough so that individual software engineers understand how you think and what you prioritize.

As a manager-of-managers, it is going to be very difficult for you to measure developer productivity. It's tempting to look at metrics like the number of code reviews a developer does. But these can at most be a sanity check, not the core metric to go for.

Instead, you can measure productivity of teams. Is the team getting things done, and are they big important things, or only little nice-but-not-critical things? Sometimes, a line manager will insist that everyone on their team is performing excellently, and yet you observe the team overall is not achieving very much. Probably one of the two of you is incorrect, and you should dig in to figure that out. The opposite also happens, where a manager states that everything is a disaster, but you observe that the team has actually delivered a lot.

The other thing you can do is to teach your line managers how to judge individual productivity. There's no silver bullet, it's just a natural outcome of having conversations about who is productive and who is not and how to tell and what to do about it, so be sure to have enough of those conversations.

None of this is easy to quantify, but the hard truth is, there is no natural mapping from numbers to developer productivity and it is usually a bad idea to try to quantify productivity. You are much better off using human language and intelligent thinking to evaluate productivity, rather than reductionist metrics.


I too have come to think that no simple metrics will ever replace the need for a competent manager who can use intangible, subjective context to evaluate their team. I think that even if you get some metrics that work well initially, the system will change such that the metric becomes the goal and the metrics then become much less effective.


You still have a goal that people are optimizing for though--how to increase that intangible, subjective measure known only to the manager. It leads to people optimizing for talking about the stuff they do more, showcasing their accomplishments, and in bad extremes brown-nosing, infighting and sabotage of others work. Whether any of those behaviors lead to a better product or outcome for the company is something to strongly consider.


Well put. I couldn't agree more.

Humans and relationships are nuanced, including work relationships and the responsibilities and expectations there. It's best to treat them as they are rather than trying to shoehorn those things into such a sweet little checkbox.

Frameworks are alright, but they need flexibility built in. They certainly shouldn't be treated as religiously as they are commonly.


I can't agree more with you. I tried to sum my thoughts in my first reference

> One of the most common myths — and potentially most threatening to developer happiness — is the notion that productivity is all about developer activity, things like lines of code or number of commits. More activity can appear for various reasons: working longer hours may signal developers having to "brute-force" work to overcome bad systems or poor planning to meet a predefined release schedule.

The SPACE framework is not about measuring quantitative data only. I feel the need to explain how certain metrics might be interesting, but rather to identify key issues or unexpected events during engineering sprints. Without data analysis, you would not be able to understand why there is a drop of productivity during certain periods, and usually, those drops were created by the management (too many meetings or lack of follow-up)


I don't see how you can make the leap from "its hard to measure" to "no metrics are useful". As with anything you have to use your judgement, experience, and its a case-by-case thing. Everything is a signal. Lines of code, number of bugs fixed, number of bugs found, severity of bugs, hours in office, meeting project milestones, contribution in team meetings, etc, etc. Its up to you how to interpret each signal. As you rightfully said, there is no silver bullet.


There's an entire class of products I'll name "internal platform tools" whose primary objective is to improve the developer experience with the intent of having the side effect of increased developer productivity by making it easier & more enjoyable to build things within a company. The teams working on these tools need to understand how their products perform the same as a team building some widget for a "paying" customer.

Without some quantifiable metric, how do these teams know if their products are getting better or worse? The discussion always goes to measuring developer happiness & developer productivity because we want with some degree of confidence to be improving or at least maintaining these metrics.


Inefficiencies in the developer experience show up as frustrations for developers. Developers are very happy to tell you what frustrates them and how badly.

Often what frustrates people is a latency, which is something you can measure and track. Other times it is an ugliness, surprising footgun, or lack of conceptual integrity - these are fundamentally human experiences, and subjective assessment is the only way.


Agreed, we need both quantifiable metrics, and also a human brain to interpret them with subjectivity, context and compassion.

I see many people wanting to take writing code into the liberal arts domain, but I am of the opinion that it may be more useful if we can overlap it with the engineering domain. IMHO the goal should be to repeatedly churn out high-quality bug-free code, and to create an objective process methodology so that time and money is well spent. We may end up with multiple different methodologies for various technologies, domains, etc.


The part where the code needs to execute correctly is engineering domain. But lots of bad code executes correctly. "Programs must to be written for people to read, only incidentally for machines to execute." Writing things for people to read is unavoidably an arts discipline.


>Writing things for people to read is unavoidably an arts discipline.

I understand what you mean, but I'd have to disagree with that. I've been working on a very large engineering project with a large-ish team (~50 members) for over two years. All of our communication is via an established methodology of engineering diagrams, design documents, position papers, etc all of which are in a structured format that follows common rules/regulations/conventions.

As a result, all the companies we work with for e.g. understand our P&ID diagrams, electrical schematics, mechanical design docs, system layout diagrams, etc. There is no reason why such a methodology can't be brought to code. I don't view code as anything special - having now worked on on both sides. I think there really is a lot of value in the engineering methodologies that can be adapted and applied to the software world.


Do your diagrams, design docs, and position papers not vary in the clarity of their presentation or in the wisdom/simplicity/fitness-for-purpose of the ideas they convey?


Indeed, they vary. The larger point is we still get a lot accomplished/communicated and done because of a common underlying methodology. I don't have an answer for what that means when adapted to the software field. Its going to be a soup of many things - coding guidelines, BDD, TDD, modular programming, etc, etc. I'm sure there are brains far bigger than mine already working on this, its not really an original idea in that sense.


Generally you find the problems in your developer experience, and use those as your metrics. Maybe it takes three PRs and an hour and fifteen minutes to deploy to prod, lower number of PRs and minutes to deploy would be your metrics.

Or maybe to introduce a new endpoint in your API takes X amount of boilerplate lines, Y files, etc and you post mortem new endpoints after your change to ensure that number is dropping.

Talk to people, find out what their problems are, quantify the problem, measure.


We just need to be careful that this unquantifiable, subjective rating doesn't include biases.


Everything including "objective" metrics includes bias. And that's before you take into account people outright gaming metrics (objective or subjective).


As a simplified example, if I write 1000 lines of code and you write 1000 lines of code. We should have the same rating if that's the metric used. There shouldn't be any bias there. It only introduces bias when the manager feel your code is better than mine, etc.

Now the objective measure itself might have some sort of bias, but at least the rules are set and you're not getting screwed over based on someone's feelings. You can argue metrics, you can't argue your managers feelings.


The thing is that those metrics are very poor metrics that don't correlate well to the "true ideal performance", even if compared to a subjective manager's intuition with all the randomness and biases.

Replacing a subjective metric that's at least somewhat effective with a metric that's totally useless (because those inherent inaccuracies/biases are even worse than even a poor manager's judgement), that's throwing out the baby with the bathwater. The primary purpose of a performance metric is to measure performance, and being prejudice-resistant is something that's nice to have - the primary reason why you implement a metric is not because you need something that can be argued.


But that is proper. The quality of craft/creative work matters. It belongs in the evaluation of craftsmen and creative workers. And it is fundamentally a feeling. When you are junior you may not yet have developed this judgement or taste. Your job is to learn it, from your own failures and the feedback of your senior colleagues. When you are senior, you have it. You are more valuable to an organization precisely because you can be trusted to have positive feelings about good work and negative feelings about bad work, and therefore do the right thing in a position of decision-making power. Also because you enculturate the next generation of senior craftsmen through your feedback.

This shouldn't be surprise at performance review time, nor should it necessarily come from your manager -- it should be coming from your senior colleagues on each of your code reviews, giving you a chance to improve your bad code before it gets checked in. But when your senior colleagues think your PRs are worse on average than those of your peers, then yes absolutely you should get a worse rating.


Your comment doesn't change if you replace lines of code with manager's perception of you. If you're both equally liked by your manager then you should receive the same rating. Within the metric being defined neither is biased since they have clear and explicit definitions. Against the true metric of "productive engineer" both are biased.


And how do you handle your manager having a cultural or unconscious bias against your <race / religion / body type / gender / appearance / clothing / hair color / fragrance of the soap you use / eyewear / etc.>? You just live with them not liking you and not measuring up to others in their mind?


Subjective evaluation for performance purposes is often done by committee for this reason. Your work is read by several people who are unlikely to have the same idiosyncratic biases, at least some of whom don't know you. (That cuts both ways, though; they also don't know the context for the work).


Except you have no measure or target for the manager's feelings.

You have to define productive engineer in order to claim the metrics are biased.


If your 1000 lines of code generate four new bug tickets and mine doesn't generate any, is that biased to say mine is better?

Or how about even if yours generates 10 comments on the PR correcting things to match code quality guidelines and mine doesn't?

I don't think we often track things like that.


In order for these metrics to have even a tiny tiny chance of not being completely gamed (even unintentionally) you'd have to define a rigorous formula of weighted metrics that take things like one of my siblings mentioned into account (did your 1000 lines create a regression or 5 and mine didn't? Code quality? Lots of review comments that took forever to debate and resolve?). And that's assuming you could actually measure those things properly. Was that comment a valid one regarding you missing quality guidelines or was it someone trying to game your metrics negatively so that he'd look better?

I think it's impossible to create something like that and it'd be very very bureaucratic and still prone to gaming. I think having something 'in between' is the best approach. You still allow a manager to interpret these things together with you but the manager should give you a guideline for what to look out for. We can use these metrics to inform decisions about performance but it's completely counter productive to simply have a few metrics where you have to hit specific numbers.

PR throughput? No problem, I'll form a clique of a bunch of people that OK each others tiny PRs. This will result in so much overhead that we won't actually get much done, piss off other team members, create a hell of basically unusable commits, make it more likely that code quality suffers because nobody has any chance of having an overview of what you're doing overall and you will likely create regressions that developed over multiple commits and would've been caught otherwise because let's face it, each unit test you write is its own PR. You say obviously you won't get through with this because your manager is supposed to stop that? Well he can't if we just want a completely objective and metrics driven approach!

With the hybrid approach, you know from your manager that PR throughput is important not at the expense of quality and other things. You want small PRs for certain reasons but not at all costs. There is no exact formula because no two situations are exactly the same. Of course bias comes, of course bad managers make this bad. So does a completely "objective" metrics driven environment in which you play the metrics game. There is no perfect solution.


Exactly, I don't know what the answer is (probably a combo of subjective and objective measures, along with a healthy dose of independent oversight for both) but relying completely on 'gut feeling' is an express train to unconscious bias land.

In reality I think someone who only looks at team members who are "doing the most" is really just measuring who is talking about their work the most. You need some kind of objective measures like features shipped, assigned bugs resolved, etc.


I don't think adding metrics is actually a good approach to reduce bias. Any kind of measurement can be twisted if you want to.

Also, the metrics you choose will undoubtedly contain a measure of your bias anyways.

For instance, the metrics two people would choose to represent developer effectiveness will not be the same, and those differences will reveal what kinds of workers they prefer.


I prefer some kind of metric because I'm tired of screwed over by blind shitty managers.


I hear you. But metrics aren't going to save you. You need to find a manager who isn't shitty! They are out there, don't lose hope.


There aren't many though and almost none who haven't come up through the trenches themselves.


Absolutely agree.

I think there is a conflict of interest there, though. Managers have a vested interest in saying that their team is highly productive. Managers of highly productive teams get raises and more head count, and eventually promotions. Anything else reflects poorly on the manager.

So the managers-of-managers do need to keep their eyes on this too, but I also agree with you that it's harder for people in that higher-level position to evaluate this. I guess, as you hint at, the manager-of-managers can look at team output overall, and if that's below expectations, that's a starting point for discussion with the line manager.


Measuring team output seems just as difficult as for an individual.

I was reading an interesting piece in the economist today about maintaining peoples performance on a long space flight. One part that stood out to me is people being productive makes them happy. I always assumed the causation would be the opposite way round. Perhaps the best a manager of managers can do is try and figure out if the members of each team are happy or not.


Funny. I read a book a long time ago about developer productivity. They started with saying: "measuring productivity witk KLoCs is terribad". And later on: but we only have that, so let's use it anyway. "Stopped reading there". And here in 2022 it's exactly The Same Thing:

> Measuring developer outputs can be detrimental

And then:

> Design and coding: The number of design papers and specs, work items, pull requests, commits, and code reviews, as well as their volume or count.

All they do is add more and more metrics. But this has the exact same problem as with the infamous KLoCs measure: how do you interpret it? How do you know it is not gamed, to begin with? Actually, now you have two problems: collecting and analyzing this mass of metrics can have a significant cost.


One more thing: "put your money where your mouth is".

Bug bounties work, why wouldn't "feature bounties" also work?

You say you want those features, preferably bug-free, for this deadline. And there's $5K for the team if the objectives are met.

Then, your metrics problem boils down to how to impartially measure customer satisfaction, or how well the objectives are met (in some contexts, bugs are unavoidable etc.).

Metrics can still be important to help the team identify their problems (or rather, confirm that their intuitions about the problem). It's an optimization problem: measure first, then do something about the actual bottlenecks.

That said, some programmers are such nerds that more money is not the highest motivation. One can use some creativity here.


It's very hard to align an explicit incentive scheme with the outcome that you actually want.

In this case, you'll get your features, but they very likely won't be bug-free. They'll probably be quite slow and fragile. They might not scale. They might not be well thought-out. They might break backwards compatibility, or break other features that your customers are already using.

In other words, why wouldn't a developer borrow limitless technical debt in order to claim the bounty as fast as possible and move on to their next bounty?


Especially since the developer will likely be at a new company or team in 1-2 years.


Because I am talking about programmers, not about mercenary-developers.


Then what change in behaviour were you expecting your incentive scheme to result in?


The "scheme" a) draws from gamification b) prevents the sometimes hostile reception of the use of metrics c) reinforces the feedback loop. Positive perception leads to positive behavior.


Of course. No true programmer would be motivated by reward to produce shoddy work.


There is plenty of research that show extrinsic rewards don't really work that well, especially for intellectual endeavours.

On top of that you have the problem that now your developers will follow the set objectives to the letter even if it transpires that something else was required.


I'm personally not a huge fan of collecting quantitative data to evaluate engineering productivity. The context of such metrics is usually more important that the data itself, meaning using some discrepancies in your results to identify business needs or issues. When I work with quantitative data, I try to find pattern rather than analyzing the data iself (why do pull requests last longer on Monday afternoon? do we have too many meetings there?..)


I suggest that in every comment, you add “author here” or similar to make that very clear, because it isn’t always very obvious from your comments.


Thank you for the suggestion! Will do


More lines of code = more liability. Also more commits, frequency of commits etc. does not equate to more productivity. In fact in some instances you actually are introducing more liability into a codebase doing that. The flaw with most of these metrics of "productivity" is that they inherently are saying that coding is analogous to a factory worker building something when in reality it is analogous to someone designing the things the factory worker has to assemble.

While I'm not a fan of subjectivity in ratings, the challenge is that it is very difficult, and I would argue, virtually impossible to do it objectively. So what happens instead is that when metrics are used to evaluate engineers, the smart ones figure out how to game them. Does that make them, or the team more productive? Nope. Can that have un-intended consequences that actually make the code less stable, and decrease productivity. Yup!

But if you're going to go with these measurements you might as well go big. Throw out anything related to Agile, require estimates that are accurate within 15 minutes and severely punish engineers for not getting estimates right. Might as well also add in heavy documentation requirements too. After all, this rigorous measurement etc. has all worked so well in the past <dripping sarcasm for this last paragraph>.


Measuring developer productivity is like observing quantum state, the act of measuring it generally fucks it all up.

Rather than task the developers with all manner of bureaucratic Agile bullshit like tracking hours, arguing about story points, submitting to kindergarten-style daily stand-ups, velocity tracking, retros, etc; I would suggest a different tact. How about measuring developer productivity by observing if they're building what you need at the rate you need it built. If not, then you need to figure out if you can afford to replace them with someone that can.


> Measuring developer productivity is like observing quantum state, the act of measuring it generally fucks it all up.

This is a much more entertaining version of Goodhart's law [1]

[1] https://en.wikipedia.org/wiki/Goodhart%27s_law


> because productivity and contentment are linked, it's feasible that satisfaction can operate as a leading indicator of productivity; a drop in satisfaction and engagement could foreshadow impending burnout and lower output.

Great review of the hazards involved in quantifying developer productivity - the correlation above has been true everywhere I’ve ever worked.

If the company you’re working for:

- is not investing in improving the developer experience

- is not listening to developer complaints about slow, tedious, or error prone processes

- is perpetually pushing tech debt onto a backlog that only grows

Then chances are you work for a company whose leadership does not understand and value software engineering. They likely see it as a cost center, and they likely incentivize managers by rewarding initial delivery of projects, at the expense of maintainability and developer sanity.

I know I’m preaching to the choir, I just had to put it out there for all the young engineers. Don’t waste too much of your life and happiness trying to patch those sinking ships.


Well put!

> your life and happiness

Also remember that these things are why you're here in the first place, and not to be the best Level 4 SWE Management Trainee in the trans-western division this quarter.

It's only healthy to maintain perspective. It can hurt in the short run sometimes, but there's only misery if you don't keep it.


Yeah. You work to live, but you don't live to work. Even if you love your job; Don't put all your eggs in one basket.


As many readers pointed out, team velocity and process bottlenecks are a much more valuable focal point than individual developer metrics. But for this, having the ability to observe what is happening and dig deeper into the data is critical, so that you can back iterations on your improvement efforts with data.

The research is also slowly laying the foundation of what are useful metrics to track and what excellence looks like for the industry. Unfortunately, those metrics are typically difficult to measure because the underlying data often spans multiple engineering systems: Lead Time, the poster child of DORA metrics, requires data from at least your source control and your CI/CD systems.

Btw, you might be interested in checking out Faros Community Edition: https://github.com/faros-ai/faros-community-edition – an open-source engineering operations platform we’ve been building for this very purpose. Our goal is to bring visibility into engineering operations, and make it very easy to query and leverage data both within and across your systems. It’s container-based and built on top of Airbyte, Hasura, Metabase, dbt, and n8n.


this is neat, 'first review time' is definitely one of those softer metrics that can make a meaningful difference


> Each organization can set a wide range of metrics to follow every week, such as:

> Number of commits.

> Average commit size.

> Frequency of code reviews.

> Number of code reviews.

> Time to review.

> and so on...

No. This has been tried many times and companies think this is how you measure productivity, but it is not even sustainable. Developer productivity is not about moving the needle, it is about outcomes, and not outputs.

An outcome is finally merging an unsustainable PR that has sat for a month. It is not how many comments, reviews, meetings, or commits needed to get to the outcome.

The only people I know who want to implement these terrible measurements are the type of people who have ambitions as large as Mount Everest but die on the decline back down. The real goal is to be more like a f1 pit crew where you leave out the metrics and end up performing better than if you measured them.


>Developer productivity is not about moving the needle, it is about outcomes, and not outputs.

What criteria does your team use to measure the outcomes and/or during a post-mortem?

> The real goal is to be more like a f1 pit crew where you leave out the metrics and end up performing better than if you measured them.

But all F1 pit crews have defined measurable metrics for success. I don't see the analogy here? Can you help me understand it?


We measure happiness and various sentiments towards these areas like knowledge transfer, documentation, ci/cd reliability, pr velocity, etc.

The f1 pit crew analogy refers to how certain teams realized that measuring KPIs too religiously meant the vary difference of performance when it mattered.

Similar to other sports where coaching actually matters and these metrics are rather useless unless there’s a coaching role to be deliberate with them.


>We measure happiness and various sentiments towards these areas like knowledge transfer, documentation, ci/cd reliability, pr velocity, etc.

I see, so you guys don't measure anything specific that is actionable? Did you face any challenges dealing with non-performers or under-performers dragging the team down?


Yes. Those people tend to also be happiest with each of these areas and have coaching plans with their managers given they are also early in career. Unhappiest are the senior and principal talent, but not by much more.


Okay, interesting. If you don't mind saying, what is the size of your team?


30+ ICs between two major OSS codebases.


>An outcome is finally merging an unsustainable PR that has sat for a month.

Ultimately the desired outcome is money arriving in the bank. Which is even less fathomable.


There's a perhaps apocryphal story that, to avoid motivating programmers to pad out their comments, IBM decided to measure productivity not by number of lines of source code written, but rather by number of bytes of object code generated. And then when a new release of the PL/I compiler came out, management was quite pleased to learn that it had improved everyone's productivity significantly!


I really don't think you can measure developers by their productivity. the impact of productivity is predicated on design meeting requirements, the accuracy of requirements is predicated on stakeholders knowing what they need.

the only quality that matters is how effective the software is in its business function. how effective does it make stakeholders? how well does it capture engagement by users? the right question to ask changes in business context, but if you can't answer it, you might as well throw darts and flip coins. if you can measure the impact of their code before and after deployment you might have a chance, but it's probably hopeless.

as far as I can tell it boils down to a subjective and qualitative assessment of developer performance. you can also take the contrapositive: where would we be without this person? how long would we have taken to get there without them? what would we not have learned without this person?

I'm nervous about the implicit bias that comes with this kind of perspective, but I think it's the best we have for now.


There is no end to the search for a developer productivity metric, but it refuses to be found for reasons that are fairly obvious to technical people, but that doesn't stop people from trying – for decades. So now they've retreated to these vectors called "frameworks" that try to obscure with complexity the fact that they are not in any way able to "measure what matters" – in this case, the ratio of value output to value input – nor are in any way deserving of the term "metric". I contend that such non-measures are of absolutely no value to engineering managers; they're management theater and purely a distraction and a waste of time.

Let's leave aside for a moment that this piece begins with an impressively uninformed and circular definition – "Developer productivity, in general, refers to how productive a developer is during a specific time or based on any criteria." – and focus instead on the question of why does this stuff keep popping into existence; what's behind it?

As a tech exec who's researched and given several talks on this to large audiences of non-technical execs like CEOs and CFOs, I believe the root causes are an understandable and intense desire for "visibility" and exec accountability coupled with a set of false beliefs held by non-technical managers including "anything can be measured if you try hard enough" and "nothing can be managed unless it's measured" and the classic quantitative fallacy of "things that can be measured are more important than things that can't be". Besides, it's only fair that if the VPs of sales and marketing have to stand up and talk about funnel metrics and sales rep productivity (with real metrics like net new bookings divided by fully loaded sales rep cost) that the VP of engineering - an often enormous fraction of a SaaS company's budget – should be similarly held to account for some number, any number, we just need a number, so we can look for "trends" (actually, noise). It also seems to be driven by a push from HR for fairness in promotions and terminations, which is also totally understandable, yet misguided.

I have a wisecrack response to non-technical executives when discussing this which is "how do you measure your own productivity?" that helps them understand the absurdity of what they're trying to do and how common it is that no true measure of productivity exists. People really struggle to understand that some metrics, no matter how great it would be to have them, simply do not exist, and so we have this – measurement theater.

[edit: fixed typo]


How about starting with mean time to merge a code change? I get that there are other variables that contribute to productivity but things like satisfaction, collaboration etc are extremely difficult to measure well and just IMHO pretty tangential (disclaimer - I work on a dev prod team and my entirety of last year was spent building engineering metric dashboards and discussing what to measure, and it's not easy so I get it).


While we're at it, let's measure productivity by lines of code added. /s

Using merge time is a terrible metric, perhaps worse so than deadlines because it can be more effectively weaponized. What would the incentive be for the developers other than to rush the code review process? Merges are not where you want to be rushing anything, but rather the opposite. If project deadlines are necessary, allowing code review its due time affords developers the ability to informally schedule things without sacrificing craftsmanship for what in reality is a vanity metric. Some code needs the be carefully considered and given time while other code doesn't necessarily need much review or worry at all, but no one can tell that by looking at mean time. If a developer is asked why some tasks had a longer than average mean time, then now they have to waste even more time by explaining themselves. In the worst case, the incentive to rush the review process results in more time wasted on bugs that could have been caught before they even had a chance to be merged.

Am I misunderstanding your view of how mean time to merge would be used?


Sounds good to me. Then I imagine the question is all about the kinds of changes that are being merged, which leads to: How many features can be produced per unit time? How many bugs are produced per feature that must be fixed therefore slowing the rate of feature production?

The questions of whether features are appreciated by users, or which bugs should be fixed or not, or if a product is feature complete or needs more, are questions of business, and not developer, productivity and efficiency.

And regarding documentation, I consider that an integral part of code/software that can be judged similarly w.r.t. quality and impact, having its own features and bugs.


Using a metric like mean time to code change could incentivize brute forcing work. Developers who are aware of the metric may work unsustainable hours and that could lead to burnout. Also, they may try to cut corners on tests, reviews or documentation in order to ship more things faster.

I think that qualitative metrics like satisfaction and collaboration could be helpful, especially when combined with traditional metrics like mean time to merge a code change. Taking my previous example of overworking or cutting corners to achieve high numbers, a qualitative metric for something like satisfaction might indicate a problem where a work output metric wouldn't.

But I think that any combination of metrics will be an oversimplification that could lead to problems if they are the only thing that matters. I'm not sure where the balance lies. I like that metrics can offer an objective view of performance and make it easy to spot trends. But I am wary of them oversimplifying things and dehumanising the team.


> Using a metric like mean time to code change could incentivize brute forcing work. Developers who are aware of the metric may work unsustainable hours and that could lead to burnout. Also, they may try to cut corners on tests, reviews or documentation in order to ship more things faster.

I think this depends on how strongly leadership/management tries to use this metric to change behavior, but in a company like mine where we have a lot of business dependencies, contracts etc which rely on predictions of throughput, this is an important metric for us to predict timelines. It's not used against a team or platform and it is primarily used internally by management and not so much engineering teams which I think is the right way to use it.


> It's not used against a team or platform and it is primarily used internally by management and not so much engineering teams which I think is the right way to use it.

Absolutely. It’s the old descriptive vs normative saw. We should be very interested in measuring how productive we are. But how do we really know how productive we should be?

I think trying to answer that is hard to impossible for a developer, because you can always automate or abstract further, but the cost to do so marginally increases and you probably won’t know the full cost until it’s already realized, at which point, requirement and prediction are obviated.

I know I’ve had my fair share of negative experiences with a scrum team that wanted ever faster velocity and used burndown charts as a stick with no carrot in sight.


That makes sense to me. Then in your situation, assuming you need those metrics and they are helpful, it may be helpful to add a metric like "satisfaction" or something that could counterbalance the negative effects that the other metrics could have.

I could see how a satisfaction metric would counterbalance a work output metric. The team will probably feel less satisfied if they know that corners are cut or if they are having to work unsustainably.


> How about starting with mean time to merge a code change?

I'm currently experimenting with using "Mode time" as I think it is less susceptible to data skew from outliers. See example below:

https://oss.gitsense.com/insights/github?p=days-open&q=days-...

For popular open-source projects that I used in the link above, the Mode time to merge is less than a day which is quite good in my opinion. And as you can expect, if you look at larger pull requests (>=10 file changes) the overall percentage drops by half as the link below shows:

https://oss.gitsense.com/insights/github?p=days-open&q=pull-...

I think what the link above shows is, you can't just willy-nilly use merge time to measure productivity, since there are a multiple variables at play.

Full disclosure: The link that I referenced is my tool


> Measuring developer outputs can be detrimental because there are not enough data points to understand if the unproductiveness was caused by the developer himself, or by his surroundings/company.

How about just doing your best as an organization and as a people manager to make your developers happy and fulfilled? That increases productivity and motivation to succeed more than anything, IMO. Give great pay raises regularly, give a ton of time off, get rid of people managers who are jerks, etc. If your company has goals, and your developers aren't producing code to meet the goals, your goals are probably too high, you have too few developers, or your developers aren't motivated to complete the goals because they are being treated like shit or don't agree with the goals.

Management always wants to think that they are right in every decision and the employees are the ones who are unproductive, but after decades of working for "the man" in about 10 different industries in different positions/careers, I have found the fault lies with management 80 to 90 percent of the time due to some leadership failure or combination of failures. The problem is poor leadership and lack of motivation, no doubt in my mind. I've also led large groups of people (in the military) and by far the best thing I could do for them was make their personal and work lives better by not getting in the way and by not acting like a dickhead. Adding metrics to things just caused more useless work for me. You can't force change in a system via metrics, the only place where measurement changes outcome is in quantum physics.

I hate to go on a "capitalism vs. communism" type rant, but the best places I have ever worked, with the best "productivity", have been flat orgs where the developers and other employees are included in the decision making and the management and execs are open and caring and don't try to put profits and the business above the personnel. When everyone shared the success or failure of the company on equal terms, we could all get things done that were unthinkable.


> Management always wants to think that they are right in every decision and the employees are the ones who are unproductive, but after decades of working for "the man" in about 10 different industries in different positions/careers, I have found the fault lies with management 80 to 90 percent of the time due to some leadership failure or combination of failures.

Getting crap reviews/evaluations because a project failed due to management screwups is the universe's way of saying "you should have left long before, but leaving now is your best available alternative."

Much of this discussion is how to protect people from the consequences of staying in a bad situation.

The correct response is to not to try to fix things, but to leave. Staying only perpetuates the problems.

Starve the beast.


That's all well and good but if you accept the premise that 80 to 90 percent of management is failing then you need luck on your side to avoid ending up in the same situation again.


Features, Schedule, Cost

Well, schedule and cost at least have straightforward measurements.

The issue then is features. Or is that it?

The "pick two" model really is just the business view. Invariably you will also have:

- adherence to process (ideally process would be an overall enhancement to productivity, but it usually becomes a net-negative)

- maintenance costs (patching, libraries, language versions, database versions)

- infrastructure churn and upkeep

- random org shit: meetings, more meetings, training, certifications, HR, ticket walls, etc

- documentation. Is that important?

- ... are the requirements known? settled? at least ballparked?

As I see here, invariably measuring developer productivity is of course blaming the victim: WHY AREN'T YOU MORE PRODUCTIVE, and of course shrugging away the nigh-unlimited ways an org can hamstring or frustrate a developer.


If you are manager of developers and it isn't clear to you who the core developers are and what each member of the team contribute (or not) then you should be fired for incompetency. Talking to developers, and keeping an eye on who does what and knowing the skills of each developer, is the minimum I would expect from a manager. If you can't do that then stop being a manager.


For every productivity metrics don't forget to balance it with a quality one.

The reality is always in the middle.

Example: If you solve a problem quickly that leads to support ticket being open that's not good.


I’m getting a chuckle at the hubris in the comments so far.

Possibly the world expert at this point (Dr Nicole Forsgren) in this exact topic comes up with a framework based on the best of what she knows from years of studying this and refining her approach.

Random HN commenter: ahh just measure time to commit.

Random HN commenter: biases are cool, so just use personal judgement.


Article is confusing because it presents SPACE with proper "header" styling opening the section, but then rolls right into DORA as if it's another paragraphs instead of an entirely different section.

(you're still not wrong tho haha)


I cannot find anyone suggesting using time to commit as a metric in the existing comments.


I did reference this in the article.

> The DORA (DevOps Research and Assessment) framework introduced some metrics to track team flow, such as deployment frequency, which measures how frequently an organization successfully releases to production, and lead time for changes, which measures how long it takes a commit to reach production. If you're interested in the DORA framework, we published a dedicated article on How to implement the Four Key Accelerate DevOps Metrics.



Good research leads to good design resulting in a small amount of code in the right places that either:

Fix a bug, Add feature set Do something new.

My epithet for one programmer was "He writes a lot of code"

Extra code makes it harder for the next guy to figure out what's going on, and has more space for bugs.

But that came from a "productive" developer and that code can tie down a dozen maintainers in dozens of customer sites.

The productivity is job creation for a bunch of folks whose main ambition is finding a job where they don't have to work with crap code.

I've done a number of projects where I got rid of multiples of code compared what I put in.

The best example was where I replaced a subroutine with a single character constant.


we(as an inudstry) measure it by claiming we do scrum but in actually just create pomp and circumstance and just do what we would do anyways. It gives us a number, we dont care if its accurate or effective


I have yet to see engineers that are even close to being accurate with estimates because they are in essence inventing something new with a bunch of unknowns. Put another way, there are two types of engineers. Those who are bad at estimating who readily admit to it and those who lie about their skills at estimation.


It really depends on how you define bad at estimating. I can tell if something will take a few hours or a few days pretty reliably, but not down to the second. The trick is to include all the friction into the estimate (test, deploy, potential collateral damage, random pings from biz or devs, etc) then add 30% for oops factor. It's much better to be early and overestimate than be late, particularly when there are dependencies on your completing on time. The more you know a codebase, the better at estimates you are of course.

This all falls apart without a good tight spec. If the spec is loosy-goosy, then forgetaboutit.


> This all falls apart without a good tight spec

Maybe I've had 30 years of bad luck, but I've never seen a "good tight spec" since I started programming professionally in 1992. Most of the time there's no "spec" at all.

Even if you do manage to get the estimate-demanders to back off until the spec is good and tight, you're just moving the problem upstream - they'll just want an estimate on how long it will take to get the spec right.


Ya that's a lot of bad luck or maybe it's just the industry you work in. I've worked (and currently work) at departments that require it from biz. We can send it back for refinement too, or just pick up the phone and ask questions, etc.

>they'll just want an estimate on how long it will take to get the spec right.

The people who want the estimates are the same people responsible for the spec, so you're actually pushing the problem onto where it belongs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: