Hacker News new | past | comments | ask | show | jobs | submit login
I deploy, therefore I am (sleuth.io)
75 points by mrdonbrown 52 days ago | hide | past | favorite | 54 comments

I really wish people would stop trying to reduce software development to simple metrics. Number of deploys is as meaningless as lines of code or number of story points as a metric for judging people's performance.

It does indicate a culture where there aren't massive barriers to getting changes in.

I worked at one startup on the AI team, and we were allowed to deploy at will, we iterated constantly, pushed multi times a day, and had a flat hierarchy. We weren't under the political umbrella of the engineering and product groups. It was definitely risky but no major issues ever came from it and we gained the trust of DevOps team and our CTO.

The back end team, different managers, was very hierarchical, had to schedule deploys well in advance and inside certain time windows, and non technical engineering managers would monitor and send summary emails to everyone after each deploy.

Guess what, that back end team sucked. Their timelines for features were embarrassingly long, they watched the clock, their managers loved to booze, they overcomplicated the architecture and never produced anything innovative for the company.

Deploys can signal culture.

Deploys can signal culture.

I worked at a company that deployed multiple times a day too. There were no automated tests and the QA process was terrible, so most of the deploys were bug fixes.

Deploys can signal culture, but not always in a good way.

Just a bit of devil's advocate here, but is it possible that the problems they were solving were much more complicated than those on the AI team? Obviously I have little insight into the actual work, but there are some hints.

Long timelines indicate difficulty interesting features into an already complex codebase. Leaving the office early can indicate that low quality hours had a higher downside risk of introducing problems into a complex situation. Pure speculation here, but I know that architecture can look overcomplicated when it's actually just solving a complicated problem.

“Deploys can signal culture.“

They can signal culture maybe. But if you tell your back end team that you think their culture is bad because of not enough deploys they will crank up the number of deploys without fixing any of the deeper issues. All metrics like number of deploys , tickets closed , story points, number of commits are interesting data points but you should never try to optimize for them. Doing so signals lazy management that likes to beat around the bushes.

Numbers tell you what is happening, not why it is happening.

If you see interesting numbers, you need to figure out the why, before just trying to move the numbers.

It can also signal a lack of process and uncertainty about required features. A team that deploys a ton due to mistakes and unused features might end up less effective than a team that deploys more commonly used features with a more effective testing and verification harness.

This is why it’s hard to boil it down to a few simple metrics.

Deploys is perhaps a good metric for engineers looking at managers, but not managers looking at engineers. (And definitely not managers looking at managers!)

Management is always looking for shortcuts and ways too avoid the difficult job of evaluating people's performance in an objective manner. They see scoreboards, and due dates, and activity streams as important tools for observability. But what they miss out on is the end performance of the actual product, and how individuals played a role in improving or worsening that performance.

It seems all these short term metrics lead to people picking the simplest tasks and processing them at high volume. I could easily up my story point output by 10x if I picked the simplest bugs and stories from the backlog. Instead I take the most difficult ones where there often is no output for several weeks or longer. Thank god my manger sees the value in that. I would hate an environment where my output is measured in day or hour increments.

> I could easily up my story point output by 10x if I picked the simplest bugs and stories from the backlog

shouldn't those have comparably fewer points assigned? Completing 10 easy features/fixes should be roughly equal to completing 1 big one.

Time != Complexity; higher complexity makes time harder to estimate; complexity is easily underestimated or just completely unknown prior to at least some investigation.

Obscure seeming bugs can be trivially fixed, or days of work to have any idea what's going on, let alone fixed.

There are things on the backlog where nobody knows how long it will take. For example trying out something new or chasing down an elusive bug. It may take a few hours or it can take weeks or months.

> I really wish people would stop trying to reduce software development to simple metrics. Number of deploys is as meaningless as lines of code or number of story points as a metric for judging people's performance.

I mostly agree. I for one am a big fan of "painless" deploys. To me it means we should be able to deploy pretty much when we want to...

My manager (also a programmer in previous life) once explained to me (privately) is we are neither waterfall nor agile or anything like that. We need to first have a repeatable process before we can even think about those things. (Officially, we were "agile" but that's beyond our pay grade.)

“I mostly agree. I for one am a big fan of "painless" deploys. To me it means we should be able to deploy pretty much when we want to...”

That’s a good goal.

What kind of “agile”? Officially I’ve only ever worked at agile shops, but the process has been radically different from company to company.

More to the point: the statement itself is meaningless. It is a reductio ad absurdum summary of Rene Descartes 6 Meditations (On First Philosophy). Descartes did not actually state that line or anything directly resembling that line though it is a close approximation. A more precise summary would be something like: If I can doubt I can doubt away everything but doubting itself which is a base thinking machine.


When number of deployments becomes an evaluation metrics, I can imagine an engineering culture like this:

- Dev rushes to make merge requests with the cost of thoughtful design and testing.

- Tension around code review turnaround time: how dare you take hours to review my code when I'm far behind in the leader board.

- Deployment has a bug? Awesome, now I'll have 2 deployments.

- Deploy faster and break everything.

Frequent and continuous deployment is good. Making it a metrics is bad.

To be fair, the scoreboard in the post displays the success of the deploys too, which would address the third and fourth point. But of course, people can still game the metric with a ton of insignificant incremental changes that don't do anything.

Points 1, 2, 4 happen also when you have a deadline culture. It's almost like having a culture that rushes people to results will always compromise on quality.

I see it as just one part of the story tbh, a deployment itself means nothing.

So if I deploy one critical, long-term project this month and my coworkers deploy around ten each say, how does this system know that the one thing I deployed was actually more valuable and impactful than all those little deploys? Or do I need to deploy the unfinished work in progress constantly to keep up the metrics because this is just another useless measure like lines of code or commits?

Assuming your project supported continuous development, I'd probably say you should break that project up into multiple deployments, ideally hidden behind a flag and each easily revertable. This would reduce the risk of each step and hopefully eliminate a whole host of potential incidents.

How does code hidden behind a flag become less risky if it's deployed in increments?

You will only find out when you allow it to run.

Well, for example, say you were adding a new preference page. First, I'd do a deploy of the new database table(s), even though they aren't used. Then, I'd add a link to the page in the UI and put that behind a flag, maybe with a small non-functional UI so that the designer could play with it. Then, I'd implement the basic functionality and open it up to my team to start playing with it. Then, I'd ship other deploys for things like tests, more edge cases, UI tweaks, etc.

At some point in there, I'd open that page up to my beta customers, trial customers, or whoever is less risk-adverse. Once I'm happy with it, I'd do a percentage rollout to ensure any issues don't affect all customers at once. Just an example, but the idea is to reduce risk of each step and make each step easily reversable/hidden if something comes up.

It's not, and it cannot ben deployed in increments.

What OP and many people are missing is that a lot of us don't work on cool and hip new apps. Some people aren't in a osition to push incremental changes to a payroll system.

So release for release sake. Push it to production to make the release quota but hide it because it's incomplete and doesn't work. What's the point of that theater?

Reduce the scope of each deployment which results in deployments being predictable and safe.

This is why cycle time is a much more useful metric. The book "Accelerate" shows what a mature conversation around software metrics should look like.

EDIT: To be fair, my point may not directly address yours. "Value" is still a tough thing to determine.

Exactly. ^

Personally, I like to try to track how many times functions have been called (if possible), where bottlenecks are and if they make sense (is there some trade-off that necessitated them), and overall module usage.

That gives a more generally clear picture of performance.

There's a reason the execs thinks in terms of revenue, cost, and profit. Why would you want your engineers, designers, or anyone abstracted away from that? From my experience, every time you try to do that you end up with the business and the workforce misaligned.

It's not always easy to quantify value, and sometimes the return on an investment is slow and that can be unattractive. However, that doesn't mean we should try and hide away from figuring out the true value of work in business terms.

In my opinion, we should judge the impact and effectiveness on the base line metrics as the rest of the org. In a for profit company that's measures such as increasing customer spend, increasing conversion rates, reducing churn, reducing operational costs. All of those measures can be directly tied to revenue, costs, and profit. It's the language the business speaks, why would you want to speak a different language to the rest of the business? More than that, why would you want to start measuring performance in a different way to the rest of the business?

Because, although it oughtn't, the culture that way of thinking produces some of the worst code because it's more difficult to quantify the direct value of a lot of the work that goes into good software engineering.

Security ends up just being a cost, nobody bothers to actually quantify the risk because nobody really knows. This leads down the path of "the only security that's quantifiable are my legal requirements."

You can't know the productivity gains/losses from a refactor until long after you've done it.

You can't know how productive different ecosystems are until long after your team adopted them. Your only measuring stick is how excited your devs are to use it. What's productive for one team might not be for another.

Nobody quantifies acquiring tech debt so it piles up while people "deliver" and the productivity losses come so gradually from it that teams don't even realize they've slowed down because their velocity says constant it's just that cards gradually get more difficult.

Performance becomes a cost because there's a huge grey area (Jira) between "so slow I consider it down" and "general annoyance that slowly nudges me away from the product." Unless you're planet scale or whatever your sample size isn't big enough to notice these things.

This way of thinking creates feature factories.

I agree 100% that we can do better than being totally disconnected from the business but this ends up being so much worse.

I do not disagree that it's really tough to quantify a lot of those things and most of them will be estimates and may be wrong, in particular over longer periods of time. But in the context of the article (a larger business with profit generating goals) I don't think you can walk up to management and say "lets switch technology, we're excited about this new thing" and expect to be taken seriously.

It's not dissimilar for security and code quality, they are just part of doing the job properly. They're also part of continually identifiying future business risk. If you find a vulnerability you can absolutely identify the risk of leaving it there, just like you can identify the level of risk of not thinking about it up front with a feature.

We absolutely can provide estimates to backup why we should tackle some tech debt, it may not be incredibly accurate but if you're unable to estimate any gain from tackling it, is it really a problem? When you see a security issue you've got legal risk and brand reputation risk. Both of those are very tangible, while they're not easy to estimate and there's a lot of room for error I don't know many businesses that would rather take the risk of EU data breach fines vs paying an engineer to spend a few days fixing it.

That doesn't lead to feature factories, it leads to informed decisions. Sometimes the business is going to choose to take on the risk of slower development in an area if they think they're unlikely to revisit it vs doing a refactor. Sometimes they may also take on a risk of security issues if they feel it's outside of their threat model. It's up to us as product/engineering teams to give a clear picture and side our professional advice in a way they can understand. In many cases we may need to push to do a refactor or learn a new technology, but we can't expect to persuade someone while we're speaking a different language to them. Saying that they're only interested in business value and that makes them a "feature factory" doesn't make sense in that regard.

Perhaps the business and leadership really enjoy the taste of risk, and if that's the case then trying to qualify value in some other language still won't help. There we have a larger problem with the practices of the leadership and company as a whole.

I agree that the areas you've mentioned are ones where it's hard to qualify value in terms the business understands, but it doesn't mean the system is broken. It's the language used to describe the goals of the organisation. If you're lucky enough to work in a company which places a goal as team happiness as well then you can also speak in those terms in some cases. However, in the context of the article (Atlassian) I struggle to understand why a team lead would try to abstract away the core business value a team is generating. A team leader should be rushing to quantify it as accurately and clearly as possible to give the team a leg up in the orgs career ladder.

That scoreboard sounds particularly toxic.

Pure garbage. I still can't believe they actually think this is a good idea. That honestly is once of the worst metrics I've ever seen.

Reminds me of when my team at bigcorp noticed that you got given a badge on your personal page for being quick to respond to code reviews. Cut to suddenly every code review being immediately responded to with the comment "got it, will review soon".

That said, it really did change the team's behaviour. Code reviews became a priority instead of something left to batch up and do later. Whether that was a positive change or not it's hard to imagine an edict from management achieving the same result.

Yep, +1 from me.

They put a PR reviews scoreboard on the TV that displayed a random metrics dashboard and we immediately started teasing each other about it and did more PR reviews as a consequence.

We liked it because it wasn't seen by anyone else apart from engineers taking a coffee break. Many people ignored it and noone thought less of them. If they told me a part of my bonus would be tied to my PR high score, I would be seriously demotivated.

But all in all - I think that just proves gamification works. What you're using it for is what matters in the end.

A programmer will fight long and hard for a bit of colored ribbon.

- Napoleon

So what happens if I'm working on a feature that I can't deploy in small chunks meaning I make fewer but larger commits and deploys? Even if I'm deploying good work by this system I'm ranked less than my co-workers by the nature of the work.

So now according to the companies public ranking system that is viewable to my bosses and coworkers, I look much worse than everyone else and the nature of the work keeps me in that situation.

Instead of feeling motivated, I'd go into the office every day feeling like crap because I'm doing my job and being told I'm less than my coworkers for it.

But hey, so long as we "move fast and break things" who cares right?

So you pick less complex stuff and look good and get promoted.

Or you do more complex stuff. No one else will want to do that so you become an expert in certain area and are above the leaderboard in a way. You answer to no one.

Or you get on the committee who decides these items and you push for a multipler for complexity.

Or you hack it.

I really despise this kind of 'pointed-her' writing style. It was popular in some older academic texts, and seems to be gaining popularity in blogs and the like in recent years. I have never written a sentence about a generic 'boss' as 'boss asks blah what do you tell him', nor was I taught it, nor would I expect to read it.

The genderless generic has always been 'they/them', it doesn't require positive action, nor discourse about a hypothetical generic's 'self'-identified pronoun. It has always been third-person, since long before anybody gave a shit.

Wow, I guess you must feel compelled to call out unnecessary male gendering all the time!

As I said, I consider them both incorrect.

Perhaps it's wrong, but much like incorrect use of 'I' vs. 'me', one annoys me more because, rightly or wrongly, it reads to me more like deliberate thought/effort went into it; that the author thought they were getting it right.

I should also say I don't object to gendering specific hypothetical/imagined characters in a story-telling sort of way - 'let's say Sally is a software engineering manager, and her direct report Bob ...' - but the generic case should always be third-person, even if interspersed with such story-telling.

I actually agree with you that the singular they should be used instead of any gendered pronoun, but I think your focus on only female pronouns is (very mildly and probably unintentionally) sexist, similar to how applying law enforcement unevenly can be racist.

Like I said, it just 'perhaps wrongly' stands out more.

I have said the same thing in threads where people are arguing over whether or not an author's sexist for saying 'him' or something though. Particularly on HN those cases are more likely to draw other top-level complaints from a political or societal perspective, I don't care about that, I just think it's annoying and bad grammar.

Focusing efforts on smaller+faster deploys (along with some measure of deploy quality to weigh those deploys) seems like a far better indicator of a team's velocity than simply tickets closed.

Exactly. Agreed to all the comments that trying to find a simple metric for developer productivity is futile, but there is value in encouraging people to ship more often, and in doing so, ship smaller things that generally have less risk. Also, ownership is key here as the person writing the code should be the one that pushes it to production and owns its impacts, as I've found this results in higher quality, more sense of ownership, and quicker incident response and frequency reduction.

As other people have specified in this thread, the underlying problem is Goodhart's Law, i.e. these performance tracking systems break down when pressure is applied to them.

There's a difference between "measurement will fail when there is poor management" and "measurement will always fail".

But having said that, a leaderboard seems like a handing out poor management tokens.

I don't understand your point. It's not the measurement itself that is the failure. It's making that measurement a goal that's the problem. The real metric that matters for any company is profit. Anything else is just an arbitrary measurement that is believed to help achieve that goal. The metric of profit is also gamed which is why a lot of people and companies do shady things.

Wait, what?

> It didn't matter [...] what tickets I closed, only whether I fixed that bug that was bothering that one big customer

That's what "closing the ticket" means, right? You close the ticket when the bug is fixed. This is what matters.

The deployments are much more roundabout metric -- maybe you fix 5 bugs with one deployment, or maybe your bugfix is big, so you had to deploy schema change first. Only JIRA tickets (of the right type/severity of course) show the business impact.

They sell CI tooling, they blog about CI tooling, what Jenkins just isn't cool anymore.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact