Hacker News new | past | comments | ask | show | jobs | submit login
Can developer productivity be measured? (stackoverflow.blog)
113 points by 97amarnathk on Dec 11, 2020 | hide | past | favorite | 150 comments

In my ~25 years of professional software development, the single biggest factor in productivity for me has been whether I was involved at the start of a project. Knowing the initial design decisions, and being comfortable changing anything, allows me to be orders of magnitude more productive than when I'm diving into existing code designed by someone else.

I saw this perhaps most acutely with a company I sold - for a couple of years I was more productive than on nearly any other large software project I've worked on, because I knew the ins and outs of everything. The developers who bought it and took over are probably better developers than I am, and they are unquestionably excellent coders, yet it took a couple of years for them to get productive at making even medium sized changes. It became incredibly obvious to me how handicapped you are diving into something someone else made, especially if the original designer isn't there anymore.

Meshing really well with managers & PMs is probably the next biggest factor in my own experience, but it doesn't come even close to the gap between being there from day 1 vs coming in much later.

> Productivity tracking tools and incentive programs will never have as great an impact as a positive culture in the workplace.

I'm a fan of choosing to use time management apps and productivity tools to manage my own budgets. But I admit that I hate it when I have to do it for someone else.

One of the most powerful benefits from being there since the start is the complete confidence in ripping out and deleting obsolete code later. Even good developers new to the project are afraid to do this, and they should be since it's very risky without the full context.

The natural trajectory for a project is to keep adding features until it collapses from its own weight. Only the long tenure developer can fight this and revitalize a project by removing the useless excess.

I've been with the same company and mostly leading the same software system for the past 10 years.

Feature work is such a smaller part of my individual contributions at this point - I do some here and there so I don't get too out of touch with the front end and user experience - but much of my coding work these days is reworking existing core functionality.

Thankfully we understand the necessity of deep maintenance for our system that we fully expect to still be running in 10 more years, but even with that it's damned hard to keep up. I can't imagine having developers come and go every couple/few years and little or no leadership support for code and systems improvement.

Couldn't agree more. I've noticed this many times in my 16+ yrs into the industry. Not recognizing this is one of the primary reason line managers letting an employee (esp those who are involved from the beginning) go, if they are asking for a raise, thinking they are easily replaceable. It really costs the company.

This has been observed long ago, and is known as Brooks's Law [0].

Building software is a knowledge business, and there are three types of knowledge involved:

    1. Subject knowledge: understanding of the subject the software is about (e.g. accounting when building accountancy software).
    2. Platform knowledge: understanding of the platform used to build the software (e.g. Python, SQL, React etc).
    3. Architecture knowledge, which is what the parent is talking about: understanding of the specific choices made in the development, being aware of all the Chesterton's Fences [1] etc.
[0] https://en.wikipedia.org/wiki/Brooks%27s_law

[1] https://en.wikipedia.org/wiki/G._K._Chesterton#Chesterton's_...

These are great angles on software development and productivity, especially Chesterton’s Fence. I have watched several companies do rewrites of large codebases only to realize it was a massive mistake after taking years longer and spending millions more than intended. In every case it was a disproportionate focus on the problems they had, and a failure to understand what had been working well before tearing it down.

I’ve also watched Brooks’ Law in action, watched people thrown onto a project to try to get it out the door cause it to slow down. I do not believe what I said is an example of Brooks’ Law. Brooks’ Law is “adding manpower to a late software project makes it later", it was not an observation about all software development in all stages. Brooks’ Law assumes that the “ramp up” time is finite, and not particularly long.

I’m also not actually saying that people aren’t productive when joining existing projects, I’m saying that there’s a much deeper component of productivity that depends on involvement from the beginning. I’m sure you’ve already experienced being very “productive” after a short time when joining a large project, I definitely have. It’s the kind of productivity that depends on being there early, and Brooks wasn’t making any qualitative statement about productivity, only quantitative.

Fred Brooks was a lovely man to speak to, BTW. I met him and had a long interview when I was considering going to North Carolina for graduate school. The lasting impression I got wasn’t anything he said in particular, it was more of the positivity and optimism he carried about software and life that made me want to hang around and hear what he had to say.

> I’m sure you’ve already experienced being very “productive” after a short time when joining a large project

Most definitely, but that only happened when at least one of the following conditions were met:

1. There was good documentation in place and an onboarding process that made it easy to transfer the foundation of the architectural knowlegde.

2. The work was clearly delineated and compartmentalised, so understanding the overall architecture wasn't critical for at least the entry-level tasks.

3. The architecture was a simple, standard pattern I was already very familiar with: for example a Django monolith.

I'd also like to add that a better name what I called "architecture knowledge" would be "institutional knowledge", as it does not include just the architecture per se; it includes the auxiliary things like development process, testing mechanisms, deployment etc.

Very much agree. It honestly shocks me that more people don't recognize this. I think as humans we just tend to forget stuff over time, as we go, and as our perspective changes.

Make sure no code base would take more then a month to rewrite. That way it can be rewritten. But some tests could probably be reused, so tests should not be too intertwined with the code, tests should work independently from the code it tests.

Many enterprise projects usually runs multi-year, its nearly impossible to make a code base that can be re-written in a month.

How would that work for say the windows code base?

I have no OS dev experience, but I like the idea of many smaller specialized programs that are able to communicate with each other. Another problem with large and slow moving code bases is that things around changes, like hardware getting better/faster so software architecture that made sense 30 years ago might not be the best solution today. An OS should probably not be developed by a single team or company, there should be an open core/kernel, then companies could instead compete and specialize in different combinations and support of systems, apps and services developed independently, where each part could be replaced.

This also applies for product management imho. At least for me it does. If I have to start a product from start it is much better than taking over another person's product.

The article argues that there is no useful measure that operates at a finer grain than “tasks multiplied by complexity”.

I think that complexity is hard to measure and therefore easy to game.

At GitLab we only measure tasks completed, the number of changes that shipped to production, with the requirement that every change has to add value. This measure has been used throughout R&D https://about.gitlab.com/handbook/engineering/performance-in... to assess productivity for multiple years now with good success https://about.gitlab.com/blog/2020/08/27/measuring-engineeri...

When you tell new engineers about this target they see a great opportunity to game it, just ship smaller changes. It turns out that smaller changes are quicker to ship. Lead to better code and tests. Have lower risk of cancellation and problems in production. And lead to earlier and better feedback.

Inspired by Goodhart’s Law I'll propose the following: A measure that when it becomes a target improves productivity. ~Sijbrandij's Law

Your proposed law has been tried for many years, by just as many good-willed people who believed their measures would result in target increases. In fact, the entire industry is being bombarded by one such methodology that includes those measures: Scrum. Must we really repeat the years of complaints, criticism and debates to show any measure can get warped and gamed to the point it only vaguely resembles a tool of productivity?

So we get young, naive engineers to focus on small changes. Cool, probably as it should be, you gotta start somewhere. And when these developers get hungry for bigger projects, when they get bored implementing the umpteenth small and by that point (for them) trivial change, how do you encourage them to tackle bigger technical problems? Those that lay the foundation for the new people to do their job more easily and on-board quicker? Or did you actually not tell us all, and you measure far more than just the number of changes?

Thanks for the feedback. I agree there is risk of larger technical improvements not happening enough if you only measure the number of changes.

Maybe a few things are happening:

1. Some large technical improvements can be shipped in multiple changes that add value.

2. Most companies do more large technical changes than is optimal.

3. Engineers are motivated to make the large technical changes since they are interesting and make their future work easier so they will prioritize them despite the measure.

4. GitLab is having fewer larger technical improvements than optimal.

5. Our dual career structure ensures that there are engineers who can do these larger technical improvements without being below average themselves because they are more productive than others.

6. We are not pushing very hard on this metric since we do it in a group setting instead of per individual.

> how do you encourage them to tackle bigger technical problems? Those that lay the foundation for the new people to do their job more easily and on-board quicker?

Isn't gitlab known for disastrously poor infrastructure with all the long outages? I.e. the exact things where people need to take their time to tackle bigger technical problems, not complete short tasks. I guess this attitude explains it, at least partially.

Would be sure nice to back up those claims.

Would be sure nice to back up those databases: https://about.gitlab.com/blog/2017/02/01/gitlab-dot-com-data...

In all fairness though, this is the only major gitlab incident I can recall, and it's more than three years old at this point.

It seems like a useful aggregate metric, but is it also used to rate individuals? For that purpose, it seems like it would be terrible. What if you have an experienced staff member from whom everyone else constantly seeks advice? That person may be having a positive impact that isn't visible as merge requests.

We try to not go below the level of a group when making productivity assessments. So measure the team instead of the individual. This is indeed to encourage helping each-other out as you noted.

> We try

And what do you do in the end?

If you were a manager, could you honestly say it was ever in a subordinate's best interest to do more than the bare minimum?

This should illuminate why these conversations constantly go in circles.

I have never had the opportunity to try this method but after much theory crafting and many useless pointing meetings and much statistical investigation with negative correlation between "complexity" and "time to completion" I can only think that this is the correct way to get velocity.

What does "good success" mean in this context?

Productivity as a software developer consists chiefly in not making mistakes. That can lead to the situation where your best developers may appear to do nothing for long stretches. Research and deliberation are desirable. Blind hacking is the least valuable yet most visible activity of inexperienced programmers. All common "objective" measures of productivity such as closed tickets, lines of code, or PRs are seriously flawed.

The most common implementation I've seen of this are senior developers who write almost no code, but spend their time telling junior developers what to go back and reimplement, giving advice, etc. It's still the junior programmer's actually getting the product written, just with nudges in the right direction. Not a bad system really, reminds me of the relationship between officers and enlisted in the military.

I'm currently building models around high functioning software developers/projects and what I've noticed is, churn can swing quite a bit. Take the following for example:


Over 150 days, this developer's churn really fluctuates and that's because they work on different things, that requires different amounts of code. And if you look at the following:


you can see they still commit regularly, but as the Reviewability section shows, their changes are mainly small ones, which sort of aligns with what Sid (sytse) mentioned, which is mainly focusing on small changes.

If you look at the bigger picture:



The churn for the project microsoft/vscode fluctuates quite a bit as well.

Based on what I've learned so far, you really need a good baseline (that can vary greatly from one developer to another) to be able to determine if somebody is more/less productive.

This isn't at all universally true, it depends on the costs of failure. If you're programming a Mars rover, then sure, avoiding mistakes is paramount. If you're writing a one-off data analysis script that only has to work once and only has to be mostly accurate, spending a week coming up with the ideal specification and writing a full comprehensive test suite is usually much less productive than hacking at it for a few hours until it works. Good software developers know when to make the appropriate safety tradeoffs given the goals and constraints of the task at hand.

That's a straw-man retort. I never said every task requires one to go up a mountain. I only said that sometimes a software developer has reason to stop and think.

Also I would suggest that ad-hoc analyses and other such hacks are unlikely to lead to PRs and other visible artifacts.

I should have been more clear; I was mainly disagreeing with your first sentence. The rest mostly makes sense to me. Ad-hoc analysis was only one example, you could easily replace it with working on a prototype, where part of your goal is to find mistakes by making them, as that is often the most efficient way to discover what the problems with your design are.

Eventually any simply metric like this will become warped because of the effects of both Goodhart's Law, and Campbell's Law.

Something that I've been wondering about is that maybe team's should decide what their metrics are for the upcoming 3 months and then come back and decide what the new metrics are. At the very least, it becomes a big game about the org then.

Impossible to have 'metric based reviews' at that point. But I think that's fine.

That seems to be how quarterly planning worked in the largest org I've seen inside, though the PM and EM did most of the deciding and execs+board approved/disapproved.

How do people learn about a lot of “laws”? I know practically none, and it seems so useful to be able to produce them when explaining an idea...

I wouldn't say I know very many laws. I happen to know a few relevant to my fields of interest. They are memorable to me because they state complex ideas in brief, understandable ways. Rather than trying to learn a bunch of laws for the sake of knowing laws, look to your field for the important ideas and you'll often discover that someone has written formulated a "law". Most folks know Murphy's law, a general one, perhaps Occam's razor, and there's also Sturgeon's law and Godwin's law. In computer science you have Conway's Law, Brooks' Law, and Moore's law.

You probably already know quite a few laws, just not that they have names. "Power tends to corrupt; absolute power corrupts absolutely", and "Any sufficiently advanced technology is indistinguishable from magic" come to mind.

I don't know the answer to your question, but it helps to know what Wikipedia really likes lists, so I was able to guess that it might have an article called "List of eponymous laws", and so it does:


I assume you mean expensive mistakes! Making cheap mistakes is a great way to learn!

Cheap quickly discoverable mistakes that actually get fixed are a fantastic way to learn - about the tech, about the market, about software in general, about yourself.

One cheap, but mostly invisible mistake that lingers isn't a problem... but they can pile up.

Not making mistakes isn't productivity, that's the bare minimum of acceptable behavior.

People aren't perfect. Mistakes abound. If the bare minimum for a position was "no mistakes" you'd never find a qualified candidate. I'll go further and add that your attitude makes things worse in practice because since mistake can, and do happen, your wholesale rejection incentivizes effort spent hiding mistakes and scapegoating. That's effort that could have gone into getting useful work done.

I’d go even further and say there are two types of programmers:

1. The ones that “don’t make mistakes”, but actually they are just unaware. 2. The ones that expect themselves to make mistakes and prepare for that.

You'd think so but I've met a lot of programmers who just sit there and type out inadvisable programs non-stop.

We can't even come up with an objective way to score software itself. So how in the world are we going to go even deeper and score the process and the people that create it?

Sarah and Bob make clocks, but sometimes they make hats, and sometimes they make screws, or hammers, or lamps. And sometimes the things they make get sold to customers, but sometimes other employees take them home, sometimes they make parts for each other to use when making bigger projects, and often they help each other and other employees out on unrelated projects. And sometimes they do repairs too. Oh yeah they also paint portraits that hang up around the office.

Try coming up with a measurement for their individual productivity that is easy enough to be useful, hard to game, and cheap enough to make it worth the price.

The first step is to figure out how to measure the value of all the stuff they make...

Most developers like solving problems - this gives them high. Often, without realizing it, they create problems they are eager to solve to get their dose. Solving problems can be quantified, too. Unfortunately, it's hard to quantify the number of problems avoided by manifesting! Often this goes against the first goal I mentioned. For example, solving one problem 10 times gives you closing 10 tickets, making 10 PRs, and contributing a lot more LOCs in a short amount of time. But creating one PR and one ticket, which not only prevents those 10 but 100s and 1,000s more in the future, is quantified as "less work." I've had this at one job recently where every time I suggested fixing a repeated issue was answered with: "We have a bigger fish to fry." Yet, we kept wasting time frying tadpoles.

The smallest organizational unit at which productivity can usefully be measured is an agile team of about 7 people. Below that size the effort of quantifying productivity exceeds any possible value of doing so, and incentivizes the wrong behaviors.

A good manager can get a reasonable subjective sense of individual productivity but won't be able to quantitatively measure it.

I agree, and I would add that there is one good subjective way to measure the productivity of individual developers, and that is talking to their teammates.

The problem is that we do not have a standard "output unit."

> Productivity: the effectiveness of productive effort, especially in industry, as measured in terms of the rate of output per unit of input.

We can all agree "lines of code" is a shit metric, and we can't say "# of bugs closed," because each will have variable difficulty and value. Programmers employed by a business are in charge of automating repetitive tasks, not performing them (the classic measure of productivity).

I perform UX research on APIs. Here, we standardize the "output unit" and therefore can get a better idea of a developer's productivity. Every developer performs the same task, so we can simply measure time spent.

There will never be an ethical solution to measure developer productivity during the workday; this isn't Ford's assembly line.

Even worse, # of bugs closed may be measuring the inverse of what you think you are. See the classic Dilbert cartoon about "writing yourself a new minivan".

I actually had an employee that would write up highly convoluted big reports that I just couldn't understand. he would close them a week later - sometimes with a meaningless patch, sometimes without. couldn't fire him - he was my cofounders toady.

the only reason I beat him on 'bugs fixed' is that one of my jobs was to dig through the old issues and remove them if they weren't clear or relevant.


Thank You. I had not seen that one before.

He mentions increasing salary won't lead to increased productivity ... and that's true, if the same developer remains. But what if we remove that constraint? What if increased salary means a higher quality of developer takes the position? Wouldn't this mean higher productivity?

Bit of a cold scenario, but one way to game it out is hypothetically removing the current dev and then hiring someone better at double the pay.

Or, less unfair seeming, double the pay by hiring a second dev. That might not double productivity ... depending on the situation it might 1.5x it, or just as easily 4x it.

Productivity is a measure I hate to use for individuals because it’s similar to the problem of root cause analysis in systems - our individual productivity comes from many different levers and influences and we tend to act like what works for some archetype of person should work for everyone and even most managers just go by feel basically, which is hardly objective nor measurable let alone accountable. If my CI system is crappy and gives me poor feedback that makes it hard to tell I’m doing something wrong in my commits, it can demotivate me. But fixing it doesn’t mean that I’ll suddenly become a 10x developer either. Similarly, conditions for maximizing potential reliability and performance exists at all times in systems but at least we can open up our editors and go inspect running systems while we really can’t do that with people.

Sometimes firing someone ironically gives them a wake up call and they’ll do great for their next job. Sometimes they don’t learn, sometimes they’ll never recover. Sometimes promoting people helps them, sometimes they become overwhelmed and performance drops again (I’m not speaking Peter Principle either).

I worked for a telecom company for ten years. The first few, everything was good, but we were struggling and raises and bonuses were non-existent and layoffs were frequent. Eventually I realized I was making a lower salary from when I started so I just started working less. Toward the end I was working less than four hours a week, getting positive feedback from my manager and skip-manager.

It hurt my career staying at a dead-end job but it gave me years of free time doing pretty much whatever I wanted.

I don't completely agree with this. Money can affect developer productivity, but it takes time to realize its value. Money makes things easier. That person whose being paid the smallest amount possible, is probably trying to just scrape by. Paying people more money frees their personal life up to handle other things, less stress about family stuff etc. I never believed the "leave your personal life at the door," thing. How you live affects your work, and making life easier makes people more productive (on average, not all cases). Now this isn't an immediate thing. It's an investment and it takes time. The person has to realize that money is available (emotionally) and start to trust it will be there.

Two engineers who are not aligned with each other's goals will easily do the work of zero people.

Well said, this point can not be stated enough!

He's right and wrong at the same time.

If I pay above market rate I'll attract better devs for sure, and the caliber of folks in my hiring pipeline will get better. It's not obvious at all unless you know where to look. College is the prime example of that. If you pay better you'll have more new grads applying and they will prioritize you over other offers (unless you are an exceptionally prestigious employer). But even then, you'll never talk to the student who interned twice at FAANG and got a firm offer a year before graduation. You can get that guy only if you are willing to employ him at FAANG salary for two summers.

Employing these guys won't make my existing hires any better than they are in the immediate future.


Better hires leads to better teams. I find that certain developers have a multiplicative effect that applies to other devs. They mentor, document, review and help everyone grow. That might actually slow them (taking half a day to explain high level architecture to a lowly junior coder) until you realize the junior coder is now capable of answering questions from his teammates.

> hiring someone better at double the pay

Well... if you knew how to spot somebody twice as good as the one you have now, why didn't you hire that guy in the first place?

Because he demanded twice the salary?

Well, in that case, I’m 10 times as good.

> a higher quality of developer

So what is a "higher quality of developer"?

Maybe when developer productivity measurment becomes standard accross the industry we will realise that tech workers are in fact workers. Cogs in a machine. And not independant individuals imposing their will to the world through sheer will like some Randian hero. Maybe then it will then be plainly evident that developers are as alienated as any service worker, and in the end as disposable in the eyes of the shareholders.

Will we then organize with other workers to create better working conditions for everyone or will there be fewer and fewer developers working with ever more powerful technology chasing richer than ever VCs?

> as disposable in the eyes of the shareholders.

If a company is willing to sacrifice engineering talent and institutional knowledge for short term gains... Good luck staying in business.

Reference: Every outsourcing project I've seen.

They are called Best cost countries now.

> Maybe when developer productivity measurment becomes standard accross the industry...

I suggest you look to database models/schemas standardization for an indication of how close this is coming to fruition. I personally can't measure developer productivity at a fine-grained level until requirements are stabilized, and I personally cannot stabilize requirements unless the domain is so well known the data store is standardized. I had hoped SAP would lead the charge through empirically iterating towards standards, but they left out the huge small and mid-size business markets with what they use today. And what they use today is still far from industries' standards.

We're no closer to standardization than when I started in software decades ago. We don't even have standard means of storing, transforming, displaying and tracking metadata upon calendars, addresses, phone numbers, names, and lots of other ephemera I can rattle off, within a single stakeholder industry, not to speak of within the software industry in general. There have certainly been efforts to standardize like Silverston's, but they haven't caught traction.

I'd sure like to see that happen, because it would short-circuit a lot of the discussions I engage with stakeholders to only the site-specific requirements, where I really add business value. Instead, I have to derive the data model from intricate discussion of their requirements, since they themselves have not agreed upon the parts that are common across their respective industries, so I end up at the start of dicussions with all sorts of little twisty pieces of a data model, all alike.

In every organization I've worked in, it was obvious who the high performers were and who the low performers were. It was obvious to everyone. The only blind spots were people usually seriously misjudged their own performance.

The problem, however, is that management is always being pushed to make objective measurements. For example, to fire someone, you have to first put him on an improvement plan with objective measurements. Otherwise, you're wide open to a lawsuit over discrimination, etc. You have to prove to a judge someone isn't performing, or that you gave raises based on performance.

Management also gets pushed into these attempts at objective measurements by attempts to optimize the numbers like what works great for a manufacturing process.

What is productivity of managers? How it is measured?

Why productivity of developers must be measured, but productivity of managers not?

For better or for worse, manager's productivity is normally taken from the developers/engineers the manager manages, namely the team's overall productivity, so we're back to the same problem of how to measure developer's productivity

My productivity (as a manager) is measured by my ability to deliver on commitments I make with product management and senior leadership. Basically, my ability to match my teams' productivity with corporate goals.

Which lines up with nradov's comment about the smallest useful unit for measuring productivity is the typical self-contained development team. Attempting to determine if Bob in TeamA is more productive than Cindy on TeamB doesn't generally result in any actionable information. What matters (from a senior leadership perspective) is can TeamA or TeamB build the things that need to be built in a timeframe that's acceptable to stakeholders.

As a manager, if I feel like Bob or Cindy are unproductive, then I need to figure out why. And LoC or number of commits isn't going to tell me that. Possibly the number of defects found in QA, but even that isn't perfect.

Yeah, that's messed up. Instead one should take the amount that the manager manages to improve over the baseline.

Which might be impossible to measure.

I once talked to a retired hardware engineer, a fellow who made real electronic devices, not software. He told me that, over the whole course of his career, 80% of the projects he worked on never made it to market. In other words, 4/5th of his total "productivity" turned out to be waste. Make of it what you will.

Put differently, 80% of his time gave his brain training and practice which likely improved the 20% that made it to market.

Also consider, frustrating though I'm sure that was, he probably still got paid for his effort in the 80%.

Interesting point of view, thanks for putting it that way ! It makes me think of artists :

- how many hours did a musician spent on his instrument before selling his first record ? - How many drawings/paintings before Picasso could sell something ? (etc.)

Why didn't he just work on projects that would make it to market? /s

He could not have chosen to only attempt the successful projects.

Any decently competent technical leader can tell if a developer is being productive or not. It's stupid to waste time trying to measure something that is virtually unmeasurable.

"It's unmeasurable, but everyone can tell." is that what you're saying?

Seems like a No True Scotsman fallacy to say only good technical leaders can tell if a developer is being productive and in the same breath say it's unmeasurable.

hours worked, bugs fixed, tickets closed, costs saved, clients saved, KPIs/OKRs hit, time in queue, hours-to-close-ticket, uptime, SLAs hit... surely some collection of indicators, while not a pure signal, would let you highlight outliers either above or below the curve.

All of your indicators mean absolutely nothing if you do not measure the difficulty of the tasks submitted. Do that, then we can talk.

It's like saying, "your productivity as a carpenter is how many houses you build in a year" while refusing to take into account how big or complex to build are those houses.

The funny thing is, except for "tickets closed" and maybe "costs saved" there is no metric here you can directly attribute to a single developer, and probably not even a team.

Even "hours worked". What do you mean, hours spent in the office? How do you know the person was not drinking coffee or staring at their code in thought of something else.

That being said, I think your metrics are good. And even single developers should be measured by them (which means all developers get measured by the same metric and get the same value). Why? Because it helps the business of SLAs are hit, no matter why they are hit.

Even crude metrics can be useful as a signal to focus attention on a potential problem area. At that point more complex heuristics can be applied.

What would all those managers we hired at high salaries do then?

I don't get orgs that use stats like commits/LoC/PRs as KPIs. Most time for software engineering ought to spent ensuring you're building the right thing which requires a lot of collaboration, writing design docs, thinking about the problem, etc to avoid 'building the wrong thing' which is probably the 'default' behavior and hard to avoid. Software engineering is only really valuable if you can easily extend and build it on it to enable whatever product or service you're selling to change as the business changes. If you're churning out throw-away code you never reuse you don't realize any of that value and you will lose.

I did have the idea of directly tying value to a graph of code that enabled a certain user journey. Sorta like 'CUJ-coverage' instead of test coverage. So if a user spent $20 at checkout, every line of code that was touched to enable that user's journey would be credited with that $20. I think this would be an interesting metric I'd probably respect but there are still probably a lot of blindspots this methodology doesn't capture.

Developer productivity is inversely proportional to complaining. Listen to complaints carefully.

I saw your text as a light grey so I decided to re-read it a few times. I absolutely agree with you. The people who complain the least should be paid the most attention when they do.

There is a lecture by the late Randy Pausch (https://www.youtube.com/watch?v=ji5_MqicxSo&vl=en). The gist of the part I'm mentioning is, "When I stop correcting you, I've given up on you." People who don't voice their opinions aren't necessarily happy, they quite possibly have decided it's not worth trying to change things.

Yes, I think promoting people who DO NOT complain is problematic.

There is no right way to deal with this other than to listen to complains and figure out what it is.

One way of thinking about complaining is that it is a form of feedback. As a manager, you don't want to silence people giving you feedback, frankly, this is about as stupid thing as you can do.

Better way to deal with complaints is to educate on what kind of complaints are productive and what kinds are destructive.

For example, I try (not always succeed) to restrict to myself to only complain about things that I am ready to solve if somebody tells me "go ahead, fix it".

Funny how this can reinforce the idea that only incompetent people complain.

Imagine a situation where the company and/or the project have a few serious problems, but the company refuses to fix or even admit any of that. The developers who couldn't live with the problems have already quit. The developers who remained have stopped complaining, because they have given up.

A new developer comes, notices the problems, and starts complaining about them. People notice that the newbie makes a fuss, but nothing changes. Later the developer either quits, or gets used to it and stops complaining.

Here is how the management probably interprets the situation: "People with the least experience complain most. The correct approach is to ignore them, and wait for them to grow up. More experienced developers have realistic expectations and mature behavior."

Completely agree.

Completely disagree.

If no developer complained about anything nothing would change and the project would be slippery slope to oblivion.

Usually any project needs some kind of feedback loop to correct any problems and most of the time an important link in the loop is developers complaining.

It just needs to be done in a productive way. For example, retrospective is an attempt to direct complaining to be productive element in the process.

Also, it is a good starting point to be very cautious abut any radical opinions like that.

Yeah, you're right, I misread what was being said.

It is pretty ironic how much time is wasted on trying measure developer productivity.

Why is it ironic? The stakeholders for increased developer productivity go beyond just developers. Even the slightest increase in developer productivity, let alone the ability to objectively measure it, is the holy grail of software development. Companies with access to nearly infinite resources can and would deploy them for a marginal gain in developer productivity. So much emphasis is spent on hiring the most brilliant minds and then on managing their projects and time so why not on optimizing their output.

You are making a lot of statements here without anything to really back it up, why is it the holy grail of software development? measuring output and actually improving developer performance are not even remotely the same thing, would it not make more sense to spend time on anything related to the actual development, like training developers for example?

Productivity is important, sure. But as with all other professions in which people interact, the interpersonal skills and behavior tend to be more important IMO. Productivity can be massively impacted (positively and negatively) by how well people communicate and get along with each other.

As an individual I often wonder if my contributions are meaningful. The author says, “individual performance is best left for individual contributors to measure in themselves and each other.” How can individuals possibly measure their own performance if it can’t be measured externally?

Of course it can be measured externally, it is just too expensive.

love the classic 'are story points hours / no / then wtf are they' conversation when PMs intro jira + cousins

have never been sure how summing together something that is supposed to have no relationship with time magically provides an estimate of anything

also not sure why teams are using the central source of truth for progress as the 'daily todo list making' tool

I live in the real world so I estimate in hours

The answer is in the article itself. It gives you real historical data so you can predict how long the project will take with evidence, rather than just a feeling, hope, or guess.

> Velocity is an aggregate measure of tasks completed by a team over time, usually taking into account developers’ own estimates of the relative complexity of each task. It answers questions like, “how much work can this team do in the next two weeks?” The baseline answer is “about as much as they did in the last two weeks,”

If there's one thing the last 50 years of software development has conclusively proven, is that estimating the number of man months (hours) a project will take doesn't work.

hmm fair, I skimmed and should have read more carefully

still: (1) sounds like complexity predicts weeks? so they are estimating hours. And (2) I think if jira clones were really a tool for estimation, they'd have uncertainty scores and some kind of prediction market built in

A clear solution for this exists: we should double the number of people who are estimating how long a project will take.

counting hours doesn't work so lets count unicorns? Want some of my koolaid? You seem to be out.

> have never been sure how summing together something that is supposed to have no relationship with time magically provides an estimate of anything

Most people dramatically underestimate the amount of time something requires. As long as you give them a clear conversion rate between story points and hours, they will estimate the task in hours -- incorrectly, despite having made the same mistake hundred times in the past -- then convert the hours to story points and tell you the result.

Then someone notices that you have like 200 man-hours in sprint, and you have only selected story points for 100 man-hours. Which in fact is perfectly okay, if you understand that the "100" is the underestimate, and the realistic estimate would actually be close to 200, so you should be happy about the plan! But most people will not get it, and they will insist to plan properly for 200 man-hours. If you don't have enough political power to stop them, they will make you plan for 200 man-hours.

Then at the end of the sprint, everyone is stressed out, and they only completed 50% of planned stories. Because they underestimated how much time the tasks would take... just like research shows humans always do, no matter how many times they got burned in the past, no matter how much you yell at them to make better estimates.

(By the way, the problem with making realistic estimates is not just that individuals suck at it, but also that social forces actively prevent it. Research shows that people who make more realistic estimates are considered less competent than their colleagues, precisely because everyone notices that their estimates are longer that they believe they should be. And no one later changes their opinion just because the estimate turned out to be correct. Like, really, people who estimated something to take 2 weeks and delivered it in 3 weeks were judged as more competent by managers than people who estimated it to take 3 weeks and delivered in 3 weeks. The former made a better impression at the beginning, and the latter didn't provide a better result at the end, so the former made a better overall impression. This is how human brains work.)

So the smart way out is to make a metric that is taboo to convert to hours. Give vague verbal descriptions, like 1 is "trivial", 2 is "fairly easy", 3 is "simple", 5 is "medium", 8 is "kinda difficult", 13 is "tricky", and 21 is "needs to be split to smaller stories". People will first feel weird about it, but then they get used to it, and they will start delivering consistent ratings... like, the kind of story that gets assigned 5 story points in January will probably also get assigned 5 story points in December.

Then all you need is calculate velocity, which is, well, the conversion rate between the story points and hours. But you can't say that, or it will ruin the magic! You just say "during the last sprint, we implemented 50 story points, so for this sprint we will also plan 50 story points", and hope that people will accept that, without making the conversion explicit. And it works...

...until someone says: "Hey wait, so if we have 200 man-hours and plan 50 story points, that actually means that 1 story point equals 4 hours, right? And why are we giving this specific story 3 story points? 12 hours sound too much to me, I am pretty sure we could do it in 8 hours, or even 4 hours if we work hard, right?" (The rest of the team is silent, either because they agree, or they don't want to be seen as less competent.) And then you get another sprint when people plan too much, complete 50% of it, and get another stern talk about being more careful about making estimates.

It is a psychological trick that only works if you stop estimating stories in hours. It always breaks when someone insists on connecting the dots, converting the estimate to hours, and "fixing" it because it is "too much". If we could reliably estimate stories in hours, we wouldn't need story points, but experience shows we can't!

(But if you tell this to people, they will insist that they absolutely can make proper estimates, or that professional developers should be able to make proper estimates. Well, they can't, and we don't live in the should-universe.)

> I live in the real world so I estimate in hours

Do you make your estimates in front of other people who sometimes second-guess them? How often you actually meet your estimates?

I have never felt my individual productivity go up.

It feels like as I progress my individual work stays the same, but helping others eats any efficiency gains I personally make.

As if when you are new to a module, you are slow because you don’t know anything, then once you have expertise, you are slow because you know everything and are helping others.

Would be interesting to measure this somehow.

> As if when you are new to a module, you are slow because you don’t know anything, then once you have expertise, you are slow because you know everything and are helping others.

This suggests that the proper way to keep team productivity high is to have all team members working on the product since the beginning, and treat them well so that they don't quit and don't have to be replaced by new ones. Maybe even start with slightly more people on the project than necessary, so if a few of them quit during the project for unrelated reasons, you can still finish the project with the remaining ones.

Probably not going to happen, because this goes against maximizing short-term productivity at the beginning of the project. The short-term productivity is maximized by having the team as small as possible, and only worrying about problems after they happen.

This is the only way you can scale your time, by ramping up others to be as efficient as you are. Although you might be becoming less of a contributor individually, you are enabling the larger group. This type of productivity can definitely be tracked based on how many people you have helped and their corresponding lineage of knowledge and work output.

The only way to do it I can think of: have two teams or individuals develop the same thing simultaneously and measure the time required to get a result of the same quality. This should be done in longer term to take into account code quality (poor code quality slows down future development).

I did this once for a medium complexity task. The quality ended up the same because both developers had good taste. One developer took 2 hours for the job, the other took 2 weeks. And people don’t believe in 10x developers...

A few things can play havoc with this type of measurement. One is that the way we determine the "quality" of the code is based on the current scope of the project.

If the scope right now is pull a bunch of values out of spreadsheets and generate reports on them, the highest quality code would be the most terse: it looks up the files, get the information, then displays it. If tomorrow the scope changes to "do that, but in realtime, across multiple machines", the highest quality code is the one that implemented a database and REST API.

Since scope changes all the time, we can never evaluate which set of code is the highest quality.

And then keep adding new features (identical for both teams) for a few years and measure the time taken.

I've had a number of projects to either add features or fix a bug in large volumes of truly weird (and sometimes jerkoff code), COLT and JES3 being particularly flagrant examples. It can take weeks to find out where the bad code and less than half a dozen lines to fix the problem.

In just about any system of productivity metrics, these two episodes would mark me as dismally productive:

In the bank I was working for, the incidence rate of online banking mainframe reIPLs went from every few days to zero.

At a telecommunication provider, data center reIPLs similarly reduced.

This assumes direct managers want productive developers - this is not my experience. The goal of managers is to increase the number of people they manage, and get more money. I have time and again done things fast only to have blocks put in place to slow things down - no one wants the job done easily and go home, where's the money in that. The inability to measure productivity is a direct result of this imho.

One of the most useful programmer metrics that I've found is code churn: (new lines + deleted lines) / total changed lines. Instead of telling you how much work your programmers are doing, this metric tells you what kind of work your programmers are doing. Small numbers mean bug fixing (end of project and maintenance) and large numbers mean new development and features.

What about high deletion amounts? I've merged PRs with hundreds of thousands of lines deleted and none added. It took quite a bit of sleuthing to figure out someone had left entire copies of directories side by side with different names, where one was completely unused. Conversely, someone had a huge addition to the repo that actually was total garbage.

A large change to the code base would indicate to me that the product isn't ready to ship or test, possibility. At the very least, it would prompt a discussion about the code and how we are managing it. Maybe we need some process changes or tools to prevent junk code from accumulating.

Edit: if you're talking about the math, I think "changed" includes added and deleted. So, it's the ratio of added and deleted to the total change.

You can measure productivity just fine on any tasks that repeat. How long does it take you to run the right tests, find the implementation for a failing test case, make a merge request, create a patch release, pull up the logs in case of an incident? All these tasks repeat over and over again and a good developed can do them much quicker.

Sure, it's easy. Count how many lines of code they write per day. Likewise, aeronautical engineering productivity can be measured by counting kilograms of mass added per day.

The real underperformers go negative.

Yes if you are running an app development studio where every app looks and functions mostly the same.

Most software projects get managed with a ticketing system that logs the work to be done as individual tickets. Counting the number of cards a developer closes over a certain period allows us to see what actual work is getting closed off.

Measuring closed tickets is an excellent metric if the tasks are written well and assigned based on business priority. When more tickets get closed, more good things are happening with the project, be that bugs getting closed off or features made.

Paying for dead cobras always pays off.

Also ticket != value. Lots of tickets for things that involve almost no work and things that actually make a difference to the customer/product are not equal.

Everything I work on is new products/projects and tickets come in all sizes and shapes, and often change daily as some exec crams in more new ideas or some designer or product person "clarifies" the ticket, even after the work is done. Tickets are often written and estimated long before decisions are actually made. Defects are written that require a lot of investigation only to discover it's some other teams problem and you can't do anything or turns out to be a temporary service outage no one communicated or misconfiguration in some CMS or even plain simply not understanding what the product does.

Measuring productivity by tickets closed is a whole pile of dead snakes.

In case people don't understand the dead cobras comment: https://en.wikipedia.org/wiki/Cobra_effect

It pertains to perverse incentives.

What do you think of the idea that if tickets aren't doing a good job of representing value to the customer in some form (even if it's second or third order value), those tickets are poorly written and it's a "garbage in/garbage out" situation?

It's not really a helpful observation, but I'm curious if there's a way for the relationship between "people asking for things" and "people building things" to be repeatably fruitful (IMO it's very possible that reliably producing customer value is either insanely hard and/or not doable consistently).

Some things are incredibly important but not directly related to the customer; like services called by service called by services called by clients. It's not always easy to connect the dots especially in a micro service world. But without the ultimate service the customer can't do anything.

Making good tickets across a dozen organizations and 100's of people is hard to ever get right. Which is why counting tickets is sort of pointless, you might have a ticket to add a single value to a database and without it, the whole product doesn't work, but you have no idea since there are 10 layers between you and the real customer.

What you are saying is that my colleague who has been working on one ticket, solving a critical bug, for the last 2 or 3 weeks is a an unproductive one since he haven't closed a ticket for a while ?

Counting closed tickets is indeed a measure for something, but by itself it's far from being a good indicator.

Yes, he's unproductive. What he should have done was broke it up into dozens of tiny pieces so that he could inflate his ticket count. It is more acceptable to the spreadsheet to have "Hard problem part 1", "Hard problem part 2", "Hard problem part ...", than it is to simply have "Hard problem" and take longer to do it.

You can measure progress of a military campaign by counting and reporting on “bullets fired” yet that’s not how military strategists operate.

On the other hand, doing busywork is a pretty integral part of being in the military

Hard problem part 1 doesn't really indicate that they did / will solve a problem though. It's work, but is that 'productivity'?

That seems kinda arbitrary to just manipulate the issue into tiny pieces to fit some sort of metrics system... but not reflective of the actual work.

That seems to just lead to the typical gamification that comes with counting tickets and other metrics systems that end up being arbitrary or even easily manipulated.

Ticket measuring just seems like asking for Goodhart’s Law.

Yeah, you aren't measuring level of productivity. You are measuring level of tolerance for bullshit ticket creation.

In my experience, project managers work as hard as they can to fight that tendency too: they usually insist that tickets be observable and testable independently, specifically to discourage developers from logging time on non-visible tasks.

Great. Now I can't find anything anymore in the ticket system because every ticket that was non-trivial has been broken down into a dozen of other tickets.

There's only one way to call this: ticket system abuse.

I hope you're being sarcastic.

Probably, but to represent the non-sarcastic point of view, it comes down to "[person] in the room" syndrome[1]. Spending 2-3 weeks on a bug is never good, if you're not communicating progress, and one way to communicate progress is to break down the work into smaller chunks.

Does it literally have to be individual JIRA tickets? No way, but going off for 2-3 weeks doesn't give the business the insight it needs to in order to wisely invest time/effort into work being executed.

[1] https://medium.com/machine-words/a-guy-in-a-room-bbbe058645e... (I thought Joel Spolsky said this but I can't actually find the original source, if anyone has it I'd appreciate it!)

> Spending 2-3 weeks on a bug is never good, if you're not communicating progress

Assuming that it is important to fix the bug and that the developer is competent and trusted - why not? What would communicating progress improve here?

Mind that you cannot communicate when it is done (otherwise it would not be a hard bug) you can only communicate what you have done so far and what you try next. But what kind of business value does that create?

The value is that the developer doesn’t reasonably have the context necessary to make the call whether or not, over time, the issue is worth continuing to invest time to resolve.

Not that they couldn’t make the call if they had all the info, but the time required to gather and understand all the context would be a second full-time job.

> The value is that the developer doesn’t reasonably have the context

Sorry, that English doesn't make sense to me. The value is that... the developer doesn't have context? How is not having context a value?

Maybe you meant "the reason"? But then that doesn't answer the question what the value is.

The person I replied to asked what the business value was, so I explained that the [business] value comes from having a dedicated person who understands the context in which the issue is being worked.

Sorry, still don't understand.

Are you saying that without communicating progress there is no person who understands the context in which the issue is being worked in?

That doesn't make sense to me and seems to be totally orthogonal to any communication of progress.

So do you have a problem understanding or do you disagree?

For now what you are saying does not really make any sense, so I can't really disagree...

I’m honestly not a fan of how you’re framing this as a problem I’ve caused, so I think I’m going to bow out. Have a good rest of your day!

That's why the developer has a manager who is a human who can talk to them and find out what they are working on, instead of a robot that can only process tickets created and tickets completed.

Of course.

The mistake there would be getting that specific about it – it's not a useful method for an individual developer, but it _is_ a useful method for the team overall, where these effects get amortised.

(Although in this specific case, your colleague who has been working on one ticket for two or three weeks is operating in a way that I find is usually pretty harmful for productivity overall. "Solving a critical bug" is almost universally something that can be broken down further.)

That assumes all tickets are the same. A developer might take on a very difficult task with a lot of hidden technical complexity that ties them up for weeks. Another might pick up little bugs and small text updates. With no other insight, the metric is meaningless. It becomes easy to game by avoiding any difficult and time consuming tickets - such as refactoring - as much as possible, instead picking the quick and easy things that make your metric look good.

Sorry but I strongly disagree. In fact I’ll come out and say its perhaps one of the worst ways you can measure productivity.

At best you’re measuring _activity_ not productivity. You just turned a group of smart people into headless chickens jumping on whatever ticket so they can to look busy. Which cultivates an environment of fear, which in turn kills deep thought and creativity... two essential ingredients for good software.

I could even argue that ticketing systems are the bane of good software, making real priorities intransparent... but that’s a rabbit hole I won’t go into here.

Instead I’d argue we shouldn’t be trying to measure developer productivity at all.

Productivity in software development is non-linear and difficult to assign individually.

How do you measure the productivity of that “lazy guy” that had an amazing shower thought one morning, implemented it by lunchtime, which in turn leads to the company making millions more by the end of the year?

Or what about the person on the team that spends most of their time supporting the rest of the team, unblocking them and helping them be productive?

Two examples of why we shouldn’t even be trying to measure developer productivity.

My own experience after 25 years in this industry is the moment someone says “but how do we measure developer productivity?” is the moment that companies software products begins a long, slow death.

Ultimately what development teams and companies (not individuals) should be measured on is _results_ that positively impact customers and business.

When the product is a success, no one cares about individual productivity.

Nice, I call dibs on the low hanging fruits!

I think there is some value to that metric but I would not call it excellent. A really good developer might come up with a slight requirement change or an engineering detour that makes many tickets meaningless, he might improve tooling such that things that took hours now take minutes, he might be able to write tests or come up with new processes that increase quality 10x. If we think of guys like Jeff Dean, the ability to close tickets written by project managers is not what makes them stand out.

Well, don't forget you're supposed to "estimate" everything (based on a 10-second glance at the description) before you do it, and you're also measured on the accuracy of your estimates.

> Measuring closed tickets is an excellent metric if the tasks are written well and assigned based on business priority

Don't forget the weight.

I've seen single tickets taking weeks for bug investigation.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact