Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Rules of thumb for software development estimations (vadimkravcenko.com)
180 points by bndr on May 3, 2023 | hide | past | favorite | 157 comments


I stepped in as a project manager for my R&D group when the war started. I wasn't going to but we had nobody else at the moment so here I am. One of the perks of this arrangement is that I basically can do whatever I want. Worse case scenario - I'm back to my engineering job I wasn't going to leave to begin with.

So the first thing I did when I stepped in is I made estimations paid. As a business partner, you can request an estimation for a feature, but an estimation itself then becomes a project. Alternatively, you can just tell me your deadline and we'll manage things on our side to fit your date. Do you know how many times did we do the "estimation as a project"? Once. And their project got cancelled anyway so we didn't even have to do the actual work and nobody knows if our estimate was accurate.

I've heard many times that business needs estimations. As it turns out, it doesn't. In times of inherent unpredictability, the real priorities show. All that business needs is to be better than competitors. It needs to create marketable value. It needs ideas, it needs analysis, it needs expertise and hard work. Estimations don't help create value. They only create more work. As soon as you make this work visible, poof - nobody wants estimations anymore!


Yep. I've been through this cycle too. The phrase that helped me sell the idea was "estimation is design". You can't come to an estimate that has any relationship to reality without knowing what you're intending to build, and that's a design function, even if it's very high-level, coarse, and approximate. Just expecting that design to materialise from thin air is extremely unrealistic, so you have to go through some sort of intentional process to get there. Making that process cause an appropriate amount of pain does make it clear that none of it is free.


Estimations are more important in large orgs with competiting priorities. (Or when you're going to be billing a client and they want to know, in an external-facing team like that.)

If you have a straight list of priorities it's not going to change how hard the project is or anything; you'll discover some stuff earlier that will help you project when to expect it, probably, but the work needed is still the work needed.

But if you have one VP who wants to do A and one who wants to do B and the higher-ups want to know how much they'll both cost in order to decide which to do first... then yeah, you need an estimate, but you also ideally need to make it clear that those estimates are going to take time and are going to take resources away from other work.


Under no circumstances do the software estimations for major competing projects in any way affect the political decision. Even if the teams themselves sweat blood on the estimations, the politicians will get their own people to do their own opinions and argue with those.

The decisions are all a compromise between peers horse trading budgets.


> But if you have one VP who wants to do A and one who wants to do B and the higher-ups want to know how much they'll both cost in order to decide which to do first...

Assuming that all efforts bring about the same value and it's only the cost that is different is a huge (but very common) mistake.

It's the other way around. Any time you're comparing two options they're likely to cost approximately the same, but it's likely one is wildly more valuable than the other. That is the thing that needs to be estimated.


I haven't seen an assumption of the same value, and didn't say that assumption was necessary - the decision-maker wants those value estimates too. Same basic principal - if value is X vs 3X but effort is Y vs 10Y... that's good to know. "Likely to cost approximately the same" has been very atypical in my experience.

Hopefully that estimation is done even in the "straight line, not-parallel priorities, no cost estimation needed" scenario!


AFAIK, the only single purpose one can ever achieve with an estimation is to do ROI analysis and make a go/don't go decision about a project. Any other goal won't be achieved.

With that in mind, it's perfectly natural for projects that need an estimation to be cancelled. Even if the ROI is positive, it has opportunity costs against other ones with a case clear enough that nobody needs the data for a decision.


> ROI analysis and make a go/don't go decision about a project

And if the estimate is even remotely realistic, the decision will always be "no go". Even if it's actually something that's actually needed.


Do we work in the same business? Software almost always has incredible ROI.


There are two types of software projects, roughly:

* Speculative, with the hope that the ROI will be a multiple of the money put in. These are often in support of revenue-generation.

* Maintenance, where the goal is to keep the software useful for as low a cost possible, so they have a shoestring budget.


I love this, but you can't always use it. Sometimes management needs an estimate (even if a coarse one) to decide whether the project is worth doing or not. Sometimes you're doing fixed-price bids for a customer, and you'd better know how much to ask for in the bid. And so on.

But your main point is absolutely right. Estimates aren't free. They're a project, even if a small one.


How do people find "quick wins" (high value things that are low effort) without some kind of estimate?

As an engineer I feel it's pretty common that no one works on the quick wins because they are toiling away at some big new thing.


"Quick wins" sounds a lot like taking out a loan of technical debt.


Ok - the best idea i have heard for a long time :-)


Anecdotally, I've observed across my ~12 year career so far that an emphasis on estimates and estimating is negatively correlated to productivity, lead time, velocity, impact, positive outcomes etc...

I suspect the reason is because management is trying to use numbers to justify bin packing more work to an already oversubscribed team. What never shows up in those project management spreadsheets is the very real and predictable cost of context switching and the increase in mistakes from dealing with a larger amount of in-flight work.


> that an emphasis on estimates and estimating is negatively correlated to productivity

When the estimates and their accuracy become the primary goal, productivity is secondary.

My worst experience with this was a company that valued roadmap accuracy so highly that we were rated on our on-time delivery more than anything else. The inevitable result was heavily padded estimates and teams who carefully avoided doing any more work than necessary (yes, early delivery was technically negative points for your bonus). The pace of work was incredibly slow and methodical, but the company got their metrics optimized. Madness.

The opposite end of this spectrum isn’t great, though. There’s something about teams that pride themselves on no estimates and no deadlines leads a lot of people to spin their wheels forever. I’ve also been stuck on some teams with endless cycles of rewrites and refactors and switching to the latest language or framework every 6 months. We did a lot of work, but didn’t ship a lot.

There is a middle ground that is much nicer than either extreme.


> There is a middle ground that is much nicer than either extreme.

That middle ground is focusing on releasing. It is done on a completely dimension from the "estimate everything" vs. "estimate nothing" duality of your post. So I really disagree with your characterization of it as "middle ground".


Focus on releasing what?


On releasing whatever they are doing for people to use.

Somebody already answered "value". But that's too abstract to my taste.


Value.


Estimating and a lot of project management stuff is a form of procrastination for a lot of folks/businesses. It doesn't matter what fields the cards have or how they are arranged, the work still needs to get done, and all of this shuffling is just delaying the start of that work.


That's only true if there's a fixed set of work to get done. But that's rarely the case. Often, management has N different things they could have done, and enough people and time to do M of them, for M < N. Which ones should they do? Well, whatever maximizes profits. So they (management) estimate income from each thing that could be done, and ask engineering (hopefully) to estimate how much it will cost to implement (or how long it will take, which equates to cost). Then they make a (hopefully) more informed decision than they otherwise could have made.

Look, there's lots of ways this gets done badly. I get that. But the idea itself is not nonsense.


This is a cost centric perspective. I wish I could find the YouTube talk where Merck switched to a metric of "cost of not implementing" IE: how much revenue do you lose per day by not implementing. They found out of 100s of projects, that about two were 3 orders of magnitude larger, yet they delayed shipping those projects so they could get more in a release cycle, and diverted resources. From that new perspective, it was obvious that they should stop doing everything except those two projects. From the same talk, Microsoft found a third of projects were revenue negative! (Better to just not do them at all)

The point of the talk was ultimately that the revenue of projects usually dwarf the cost, to the extent that if a project is worth doing, it is clearly so whether it takes 3x or even 10x the time estimated.


At the team level, keeping M to 1-2 works really well in my experience. Of course M is never really 1 or 2 because you're always wrapping up small details from $previous, looking ahead at $next or just doing $maintenance. A reasonably sized team will be kept busy enough with "just" an M of 1 or 2.

The constraint helps to ruthlessly focus on the most impactful work. The maximalists want to get cute and try to bin pack but it just doesn't work, unfortunately.


That's an interesting thought. I've never considered it before, but estimation as procrastination might be a good explanation.


Totally agree. My management mostly is interested in estimates and deadlines . They never engage in discussions about productivity or quality. This results in people continuing ineffective processes and other systems. It also encourages people to make tests pass at any cost or close tickets even when there are deeper problems that should be resolved.

In short, the focus on estimates


Yep. If you don’t track the hours for The Process it’s just pure conjecture whether or not The Process is a value add.


the thing is, management is accountable to a budget, timeline, and an ROI. They have to be able to say how much something is going to cost, how long, and how much they're going to get in return. Now, a person doing the estimation has to be able to take everything into account, down to the cost of context switching, but those three things are all that matter upstream. Also, the people doing the estimations have to be able to answer questions like "well, what can you do in, say, half the time?". It's not easy.


The elephant in the room here is, of course, that often accurate estimates are unwanted. If you tell the truth about how long something will take, maybe your customer will go with someone else. Maybe the CTO will decide to outsource instead of giving the work to your team. You know those other guys are probably as good at estimation as you are, they're just deliberately going low so you have to too. (And thus, everybody now expects IT projects to overrun!)

Other than that I agree with most of the linked article, I get very good results by breaking tasks down into smaller (<1 week) jobs and getting a best and worse case estimate for each. Sum both columns, multiply both by some number for unknown unknowns (usually 1.5, 2 when there are bad vibes) and you have a realistic minimum time and a worst case. If the max is >2X the min (it will be to begin with!), get more information to narrow the range, repeat until it's not outrageous.


“Telling the truth” implies two things: that the actual effort needed is fixed, and that you know it. In practice I’ve found neither of those things to be anywhere close to true for large projects. Furthermore the pursuit of accuracy by doing finer and finer-grained estimates and then rolling them up can waste a lot of time because it provides fodder for bike shedding while missing the forest for the trees. I can’t count the number of times I’ve seen this process lead to a total train wreck.

What works better is to be very clear on the big picture goals, do a speculative high level breakdown to inform an initial deadline and staffing commitment, but keep the precise scope flexible. Then get right into prototyping and building, attacking the areas of largest risk and refining the requirements as you go. You only should have fine-grained plans for the next 2-3 months with the rest of the roadmap intentionally being flexible, save for any key milestones that are needed to ensure progress to the high level goal.

Of course this requires deep expertise with cross-functional influence and bi-directional trust with management, but if you don’t have those things big projects are fucked regardless. In that case your best bet is hunkering down with some agile methodology as a shield while looking for a better job.


I find my biggest hurdle with estimation is often one small requirement which turns out to require a large amount of work. Like, "have this element behave in [unique way] on this page." Ok, well, that element was never even built to be touched individually, let alone have custom behavior. So now it's actually doubled the development time.

This is something which can usually be caught during estimation, but occasionally it isn't something you're going to notice until you're actually digging into the code and implementing it. And if you're going that deep during estimation, you're already practically building the thing, so estimation is now taking up a significant amount of time and effort. I am not a super experienced dev, but I haven't really seen a way around these issues cropping up often enough to make estimation feel close to worthless. (that is, it's unreliable often enough that it's unwise to couple any timeline too closely to any estimate)


> What works better is to be very clear on the big picture goals, do a speculative high level breakdown to inform an initial deadline and staffing commitment, but keep the precise scope flexible.

For this part I find that separating out the aims from the goals is a helpful, e.g.

Aim: to be sheltered from the rain

Goal: build a roof

Then you can adjust the goals (what kind of roof; materials etc) while keeping in mind whether the goals meet the aim, and whether the aims are still appropriate, and their priority too. Analysis is more difficult when the concepts are conflated.


I've dabbled with Monte Carlos sims for Sprints that have pointed tickets. I know 1 point ticket is sometimes a 3, rarely a 5. Sometimes a 3 is 1, and some tickets turn out to be 5s or even 8s. I build a simulator based on the fact that most tickets will be accurately pointed, but sometimes they will be different (usually more). I already know what our capacity is, so I run the simulator 10,000 times or so and get back info that lets me say, "It appears that we are X% sure we will complete all the points in this sprint on schedule." It was an interesting exercise but I didn't push it too far because I was fearful of the "you told me you were 95% sure it would be done. You had math and stuff!"


> accurate estimates are unwanted

In my career I've learned that estimating usually boils down to a game involving guessing what the customer will accept, while the customer tries to figure out what they want. Dig deep enough, and there's usually a dependency like "it needs to be done before X", but when X will be done is also imperfectly known, so it's a tangle of dependencies.

The cases of hard dates usually involve businesses dependent on seasonal, holiday revenue or scheduled events like sports or entertainment touring schedules. Estimating for those is much easier, because the game becomes one of figuring what can be delivered by the date(s) in question.

Regulatory changes or fiscal timelines are usually not as fixed as they are believed, because there's lots of ways to paper a path over delays, although its costs money and business don't like that.


One thing I found crucial when dealing with estimation is to always, always explicitly add the level of confidence (in the estimate). This saves so many tough discussions later... either have it built in to your product development process (e.g. cone of uncertainty, or different stages with different predefined levels of confidence), or always communicate it right next to your estimates. Some stakeholders will push back initially, but you can reason and tell them that you need to invest more time into research, planning, PoCs etc. if they want a more accurate estimation. Eventually, it boils down to "are we agile yet", trust in the organization, maturity and culture. One of the questions I always ask when interviewing for a role is how this is done in the company, with actual examples when projects took longer than planned (and what happened). Tells a lot about the maturity of the org.


I get really frustrated that despite the advice to give confidence levels being all over the place I’m aware of no project planning software which incorporates that advice. It all assumes a perfect world where you can say it will take 6.5 days to complete a task, and then cascades from there.

I want project planning software which accepts lower and upper bounds, or an estimate + %age confidence, and then at the end gives me a project plan with error bars on each part of the timeline. Let me give stakeholders a diagram showing exactly what we mean by it being hard to say how long this will take, rather than what appears to be a rock solid plan timed to the day.

Also, let me fire any project manager who pushes back on giving confidence levels (yes, I’ve met several).


If your decide your confidence intervals are described by classical statistics, then adding them won't change anything on the outcome.

If you decide they are described by fat-tailed distributions¹, then the exact distribution and its parameters are much more important than your estimated intervals, so again collecting them adds nothing.

So, yeah, collecting confidence intervals never adds anything.

1 - Like they should, because nobody does a project on anything they know well enough to describe with classical statistics.


I have no background in statistics, but that may be to my advantage here. Doesn't keeping confidence intervals vague in the same way keeping estimates vague help?

"Yes, I think it's a minor change. Almost certainly done by the end of the week, or else we have bigger problems."

"I don't know what this entails. Looks like it could be done in a day, but I wouldn't bank on that unless I can spend an hour investigating."

"This is a large feature, but nothing seems terribly complex. I think it's one or two weeks, but we should be ready for it to get snagged on small defects. We should leave room after the release for a rapid followup patch that may be needed."


> If your decide your confidence intervals are described by classical statistics, then adding them won't change anything on the outcome.

I'm trying to understand what you mean by this. Are you referring to the fact that for thin tails, the sum of expectations will grow significantly faster than the standard deviation, and thus the 90th percentile will, relatively speaking, tend toward the expectation with more tasks? (I.e. an appeal to the LLN.)


I believe what the parent is referring to is that the sum of two normally distributed variables can be modeled as normally distributed[1]. Ie: normal distributions are closed under addition. This fact leads to one's naive idea about adding variances working as expected.

However, in task estimation[duration] aren't normally distributed - they're much more log-normally distributed, and you can't simply add log-normal distribution parameters in the same way[2]. Instead he log-normal distribution is (largely) closed under multiplication which is fairly useless when we want to add tasks to determine total time.

Moreover, the reductive conceptualization of normally distributed estimations leads people to the erroneous communication of "X plus or minus Y". The assumption is that the bounds are symmetric(they aren't). The chance that the estimate is higher than the mode is much greater than it is lower than the mode.

As a data scientist, I want to say that the model for task estimation is over-reduced. Even so, I feel like the push-back I'd get from suggesting that folks estimate in log-space would be that I'm being ridiculous complicated.

1. https://en.wikipedia.org/wiki/Sum_of_normally_distributed_ra...

2. See the Fenton-Wilkinson approximation.


> Even so, I feel like the push-back I'd get from suggesting that folks estimate in log-space would be that I'm being ridiculous complicated

Its a more theoretically grounded relative of the common agile “T-shirt sizes quantified by Fibonacci sequence, without the repeated ‘1’” estimation recommendation (which is pretty close to just doing “each complexity step is ×1.5”.)


Adding variances is not a naive idea but a fundamental property of variances! And I don't think that's what GP means because they bring up heavy tails as a point separate from "classical" methods -- and the lognormal distribution is heavy-tailed under appropriate parameters.


Yes, it's about the LLN. At the end you will get an error proportionally smaller than a typical confidence interval that everybody already has an intuition for.

You won't add meaningful information to the analysis.


For software estimation, you can get a long way by just assuming that the distribution is lognormal.

https://jbconsulting.substack.com/p/task-estimation-conqueri...


The log-normal is one of the distributions that can lead to a perfectly tame project or to everything blowing out and no expectation being met with the same numbers, just by changing the estimation methodology.

That said, I don't think the findings on that article generalize well. There seem to exist some properties that makes many software projects more erratic than a log-normal would imply.


I appreciate the feedback but I'm not sure I follow. What changes to the estimation methodology do you mean? And what properties make software projects more erratic than a log-normal (which is already very erratic)?


For example, the results are absurdly sensitive to the choice of estimating your uncertainty as a defined deviation, or a defined low percentile within it, or a defined high percentile within it. They are also absurdly sensitive to the choice of estimating a mode or a mean value.

On more erratic than log-normal, well, I mean it. There are projects that can't be fit to a log-normal, they have fatter tails. I have no idea why. A log-normal is something tractable, many projects aren't even that.


I'm still not sure I follow. In the article, the uncertainty is not intended to be independently estimated. The distribution described has a single parameter, which is the median completion estimate, and that is the only parameter needed; the uncertainty is fully derived from that. The distribution is not absurdly sensitive to this; it's just the scale parameter, so it scales the entire distribution linearly, which is a property that you would expect from fundamental symmetry.

(Unless you mean that the summary of the estimate, rather than the shape of the distribution, varies a lot depending on whether you're quoting a 95% or a 99% confidence interval. But that is just the nature of long-tailed distributions).

I'm also not sure what it means to "fit" a project to a log-normal. A single project doesn't have a distribution, unless you're measuring completion time of individual tickets within that project. The entire project's completion time might be very far off of the median estimate, but as long as the distribution assigns a nonzero probability to it, it's hard to say from just one sample whether it was right or wrong.

If you are measuring the distribution of the individual tickets and they aren't distributed along a log-normal, I'd be very surprised; it would be worth seeing what distribution they do fall into to learn what state of knowledge is being captured by those estimates.


> Also, let me fire any project manager who pushes back on giving confidence levels (yes, I’ve met several).

My favourite is the ones who not only will push back on confidence levels but will then drill down into any estimates they think are "conservative".


True that. It's inconvenient, but you can use custom fields (e.g. in JIRA) for this, but I agree, it's not a proper solution as it doesn't display e.g. ranges :/


Now I'm thinking I should build a better Jira where the basic unit of everything is a range instead of a number. I wonder how long that would take me.


If you’re looking to capture all of JIRA’s features, it would take you a very long time. Duplicating their extensive automation capabilities is a massive project.

If you ever get around to that, let me know. JIRA really is best in class and I want to shove it into a very tiny, very lonely box before tossing it into the deepest part of the ocean I can find.


The area of statistically sane software for regular users is so underexploited that there are many instances of X you can plug into "X but with ranges instead of numbers" and carve out a niche for yourself.

I myself am hoping someone will pick up X=spreadsheets at some point soon.


You’re looking for causal.app for that, and it’s as good as it sounds.


Microsoft Project does support PERT estimates, though it's difficult to find and use: https://support.microsoft.com/en-us/office/expected-duration...


Doesn't PERT prescribe simply adding together the optimistic and pessimistic cases with each other? Percentiles are not additive.


Fogcreek's software fogbugz had that... It's a pity that it's mostly been abandoned.


Estimating in highly inaccurate resolution, eg fibonacci or powers of 2, is a way to embed the lack of confidence in single number estimates.

You may think something takes 9 days, but if you are only allowed to answer either 8 or 16, choosing 16 is the only safe option and automatically includes some lack of confidence.


While that's true, it doesn't work if everyone doesn't have the exact same understanding of how it works. For example, most people I've worked with will choose 8 instead of 16 for an estimate of 9. This makes it risky to overestimate and appear less capable.


It's a suitable approach in the local context (tickets/tasks for the team), but won't work on the project level and with stakeholders. Stakeholders eventually will want a timeframe.


There is lots of good advice here, but I'm always amazed that these articles miss the #1 improvement to estimates recommended and validated by scientific studies: do not estimate in units of time.

Everything else here applies, but don't estimate in time. Estimate in difficulty points, t shirt sizes, cups of coffee, gold stars, anything will do. Then as you track your outcomes (see TFA), measure the average relationship between your units and time. Then use that relationship to project a timeline based on your gold stars or whatever.

Using the law of large numbers to average externalities and developer inaccuracy into your time estimate has been the most accurate method since Kahneman and Tversky's Nobel prize winning studies in the 1970s.


Yet whenever I've seen this attempted in practice it's failed miserably. It sounded good when I first heard it, but now I treat it as cargo cult nonsense. Maybe it works but nobody does it right?

A point is a different size for everyone, but a day is a universal (ok technically global) unit.

We deal in time all the time, we know how long an hour is, a day, a week. We can remember "I did something like this before, it took me 2 weeks", not "it took me 13 points".

No manager can predictably translate story points into days, which they need in order to pitch to customers and manage their budgets.

The first thing I ask when I have to estimate in SP is "how many hours is 1SP?" and after a few minutes of the usual back-and-forth, whoever has to actually use these damn estimates always says something like "I treat 1SP as half a day". Bingo, now I can give you a number you can use.


> We deal in time all the time, we know how long an hour is, a day, a week. We can remember "I did something like this before, it took me 2 weeks", not "it took me 13 points".

I really don't get this. Can you guys genuinely work 2 weeks on a feature without doing anything else?. No extra meetings, no dependencies, no incidents, no coworkers to help?

Something simple can take me 2 weeks or 2 days - it really depends on all the other stuff going on when I'm working. That's why imo it's so silly to estimate time unless you're tracking every minute you work on that specific thing and manage to seal yourself in a box away from distractions.

The thing I remember when estimating is. Huh, last time I touched this feature it went really smooth. Doesn't look like much work, adding a validator here and I've got an example right there. Factor in some testwork and I guess it's like 3 points?

> No manager can predictably translate story points into days, which they need in order to pitch to customers and manage their budgets.

Either you're a godlike estimator or you disappoint your manager A LOT. Who the hell are these people that can reliably say feature X will take me 8 weeks? Is that also not just an estimate? I'd rather wallow in the vagueness of points than get my ass reprimanded because I said something would take X weeks. If you're looking for absolute predictability you need factory line work.


If it's business the one that wants the estimation, then I take into account everything: unrelated meetings, bugfixing of important stuff that gets broken, potential sickness/holidays, etc. What's the point of telling business that you can do the task in 3 days, if you cannot actually have 3 complete days to do the job? If you have "other work stuff to do" that will interfere between you delivering the item, you have to take that into account.

> Either you're a godlike estimator or you disappoint your manager A LOT. Who the hell are these people that can reliably say feature X will take me 8 weeks? Is that also not just an estimate? I'd rather wallow in the vagueness of points than get my ass reprimanded because I said something would take X weeks. If you're looking for absolute predictability you need factory line work.

You give estimations in time units because that's the only unit business wants. You can sure say "that will take X points"... nobody will listen to you and they will demand a different answer. It doesn't really matter, actually.


I don't think I'm godlike, I've just been contracting for most of 20-something years and I've got good at it. I always give a range. It's never "X weeks" it's "between X and Y weeks" (showing all my workings, too). "X weeks" is always wrong and "X points" is not only meaningless, it also gives no indication of any uncertainty. By giving a range I give the managers enough information to understand the risks - they can choose to go low with their own estimates if they need to win the business, there's no need to pressure the team into giving an artifically low estimate.

I agree that if you start giving a single number of days/weeks, that's bad. Then there's a strong incentive for everyone to start padding their single figure estimates to cover their asses, and the managers end up just halving them in their gantt charts and pushing the team to work quicker. That's an adversarial environment, where nobody trusts each other.


It's more like "the last time I did a case like this it took around 3 weeks to close the case, so 3 weeks this time. Maybe a little less because there's less experimenting.".

Basing it on past experience like that automatically takes into account (average amount of) meetings, waiting on others for code reviews, etc.


>Can you guys genuinely work 2 weeks on a feature without doing anything else?

Sometimes, but you need a progressive team that, e.g. that doesn't do stand ups every day when almost all the work is multi-day efforts.

You also need people who are technically skilled to not bug a developer about something they can find out themselves. More skill = more autonomy = less bother. That holds true for their work too.

A regular occurrence is people around developers leaning on them to figure out technical details because they are not technical enough - that should be their job, so that gap contributes to a higher number of meetings.

>Who the hell are these people that can reliably say feature X will take me 8 weeks?

They're probably overestimating then time the delivery to be around 8 weeks (they are more skilled than their estimate would lead you to believe). A feature like that should get broken up, then you estimate the individual ones. It doesn't matter that the fragments aren't testable/deployable on their own, it helps with the complexity. Decomposing that feature also helps you estimate better.


>The first thing I ask when I have to estimate in SP is "how many hours is 1SP?" and after a few minutes of the usual back-and-forth, whoever has to actually use these damn estimates always says something like "I treat 1SP as half a day". Bingo, now I can give you a number you can use.

That's because they aren't doing it right.

SPs for a team should be measured based on recent past performance. How many hours is 1SP? Let me look at how many hours the team has worked over the last 4 sprints and how many story points they have completed. You only need to ground it when you have a new team.

The problem I see is that people never want to stick to story points and never want to run the calculations. They want SPs to be the same across teams which isn't possible with this method. What project managers should do is look at the feature SP size and the SP per sprint to see how many sprints the works will take, which gives you an equal metric between sprints. You don't say team A is delivering 30 SPs over the next quarter working on a 20 SP feature while team B is delivering 45 SPs over the next quarter while working on a 60 SP feature. You instead look at the features and say that team A is working on a feature that looks like it'll take them 4 sprints to do and team B is working on a feature that will take them 8 sprint to do.

>We can remember "I did something like this before, it took me 2 weeks"

That works if enough of your new work is similar to previous work. I find that is rarely the case.


It's absolutely cargo cult nonsense and judging by the surrounding comments I'm SO glad we've all finally decided to stop pretending like it isn't.


This fails when you keep thinking in hours during estimation phase. Stop all such translations and instead map points to relative complexity and architectural impact.

Add button in UI = 1sp

i8n text on button = 2sp

New field in database incl backup and migrations = 10sp

And so on.

Once you’ve done this enough times you can correlate it to hours. Never talk about this factor to the ones doing the estimation and always keep the point system fixed to avoid fluctuations in velocity.


Sure but your estimates are way off. Adding a button should be a 2 and i18n should be a 1. A text is just replacing a hard-coded text with a call to i18n of a key. In fact that should be a 0 so to speak. Adding a button means actual functionality i.e. a button always has to actually do something. That something is actually probably way more points than a simple new database field which is a simple copy and paste of a script that does an alter table. I can add that to our automated db update scripts in my sleep, so if i18n is a 1 then the field is a 1 as well.


> Maybe it works but nobody does it right?

> A point is a different size for everyone, but a day is a universal (ok technically global) unit.

I can definitely see that. Pretty much what you said is what I always hear, but the actual theory behind it (and what I'd guess they're testing in those studies) is that people can agree a given case is the same size ("big" or "small"), but the actual time taken to complete it is going to depend on the individual's experience (both in general and with that specific codebase).

It's just that the time mapping in scrum isn't supposed to happen at the individual level, it's something handled by the scrum master / manager / whoever that interacts with the rest of the business, using an average. This way time estimates when the team has different levels of experience get smoothed out into something hopefully more accurate at the sprint level.


It's not rocket science, but I can't speak for how your teams have been doing it. I know there's a tremendous number of well meaning people who get certified in a rote method and only understand it as dogma. Doing this right only requires understanding the underlying principles and figuring out a method that your team likes.

Principles are:

- humans are much better at estimating relative effort than average time. So estimation sessions are only in terms of effort. The answer to "how many hours is one point?" is "as long as it takes." With developers like you this can be a hard line to keep but it is absolutely full stop required.

- consistency in relative effort unit sizes is a requirement for the math to work. Group estimation does this automatically after a few sessions, and can help expose miscommunications and better architectures along the way... but it's not the only way to do it.

- a consistent yardstick for "done" is required for consistent unit sizes. (Logically)

- project managers track average number of points completed per sprint. Even though this average will become extremely consistent, it is an AVERAGE ONLY and can not predict any individual sprint.

- There is no pressure to burn points "faster". Remember, point sizes are arbitrary and consistency is required. When engineers feel pressure to complete more in the same time, consistency is dropped and the math breaks) if you are using sprints this is an easy trap to fall into. More points per sprint != better. Making tasks easier is OK though, ie with automation or technical improvements. Note that this would impact the estimated point size of your tasks, not the number of points put through in a sprint!

That's it. Do it how you want, but have consistent relative point sizes, don't let engineers talk or think about time, don't treat an average like a single sprint prediction, and don't pressure engineers to increase velocity.

A casino can't predict a single hand, but they can predict with great accuracy their profit margin after 100 hands. Using the same math, you can't predict a single sprint, but you can predict with great accuracy 5, 10, or 20 sprints... as long as you help the developers stay unpressured and therefore consistent.

ALL THAT SAID, when you say "I did this before, it took me 2 weeks," that is also a very effective way to estimate. If your tasks are highly consistent, definitely track time for implementation of similar work (because human memory is very fallable) and estimate this way. Just don't let yourself notice that you defined a consistent unit of effort and used past average time to predict future average time, and you can feel like you've found a great life hack.


Well no, and rocket science works which is one difference.

I've been through all this with multiple teams/companies and I wasn't always even opposed to the idea! In the end all point-based estimates seem to do is add more ways to be wrong.

Humans are actually pretty good at estimating absolute time if they've had some practice. After the first few times you estimate a day and it takes a week, you realise you're an optimist and you should compensate for that. It seems the wrong conclusion to draw would be "oh well it's impossible, I should use made-up units instead", just get better at it. There was a time I was mostly doing fixed-price work, nobody would have accepted points in a quote and nobody would pay me per point.

Consistency in unit sizes is basically impossible, from what I've seen - unless there's an unspoken but commonly understood translation to units of time.

Project managers who try to size sprints in points tend to eventually give up, after finding that the the actual effort taken varies so wildly between sprints. (I also tend to think sprints suck, and Kanban is the way to go for agile - but that's another topic.)


> There was a time I was mostly doing fixed-price work, nobody would have accepted points in a quote and nobody would pay me per point.

I've done that too and fixed-price work is like handling dynamite. I've always felt it was good for new grads to spend a couple years in small, eat-what-you-kill, consulting shops to understand the business of software development better. After that, then go to megacorp/faang or whatever but the lessons learned in a small consultancy will help you see the forest through the trees.


My team does point based estimation and I like it. The points are useful for getting an idea of how much stuff to work on in a sprint and a measure for how much room we need to leave in for the inevitable and consistent problems that we have to pick up mid sprint.

The real benefit imo is that it is a good shorthand during estimation that reveals misunderstandings or flaws in a ticket. Everyone pointing a ticket at a 2 is whatever, literally any time spent discussing if it is a 2 vs a 3 drains my vitality, but every time there is a big outlier or bimodal distribution we get a lot of value from talking it out.

I think what points gets you here over hours is that it focuses on the task at hand. I think a time centric approach requires your estimates to be more like 'X hours for John, Y hours for Shirley'.

I'm not touching your points about how there are other contexts where it doesn't make sense to do this, or the absurdity inherent in pretending that they don't still boil down to time in a meaningful sense because I agree with them.


I've never been anywhere that used story points effectively, but the notion sincerely appeals to me, and I look forward to being somewhere that has it working.

But there is one practical question that has always confused me: how do you "measure velocity" in Scrum, when each sprint's scope, and the story points within, are all preset at the start of the sprint?

Do we assume (realistically) that dev is not completing all stories within the sprint? Or that they are adding more stories in the middle? Or are we assuming that each sprint is finished exactly as planned, on time, and changes in average "velocity" reflect differing levels of confidence or ambition of the dev team as the decide how many stories to include in each sprint?

In a rough KanBan process, I can see measurement of "velocity" as being more straightforward. Am I missing anything?


Sounds like maybe your experience is with places that do this backwards? Velocity is a backwards looking measure. It doesn't commit your team to grind until that's done.

We get as much done as we get done. We take a guess about how much work that will be at sprint start, based on previous average, and make sure that much work is well understood and prioritized. But tasks turn out to be easier or harder than expected, people get sick, shit happens and you end up accomplishing more or less than what you expected. At sprint end you measure what you actually accomplished, to help predict the long run timeline of the project and maybe to get a better guess at how much work to prep for next time.

There should be no time or "amount done" pressure on the engineers. We measure the actual throughput and use that to inform management how long the work as defined will probably take. If management doesn't like that assessment, they can manage the situation by adding engineers or reducing complexity/scope. If you really like playing with fire you could let them ask to lower quality standards, too.


Got it. That makes sense. And I hear you on how you could accomplish less story points, or tickets, or whatever during a sprint. Happens all the time for the reasons you say.

But how do you accomplish more? Are you in organizations that have enough trust that items can be added to a sprint while it is in process?


> The answer to "how many hours is one point?" is "as long as it takes."

In principle the “how many hours to one point” question is answered by tracking and analysis of past sprints, where over time you develop an empirical measure of velocity, which gives you an idea both of the average time it takes to do a atory of a particular size, but also the size of the error bars when taking that average as an estimate.

Which can be useful for sanity checking things like sprint sizing, though the existence of the stat risks it becoming an optimization target in some orgs, either because management sees it and wants “line goes up” or for some other reasons (teams can do it to themselves, though management is more likely to.)


The goal of estimating is to know how long it will take to complete a given task. A successful estimating methodology ought to provide this. I am not convinced the method you outline does provide this.

First, I would be interested to see any data you have supporting the idea that “humans are much better at estimating relative effort than average time”. My own experience is humans have biases toward optimism or pessimism and are poor estimators of the magnitude of risk, but I’ll ignore that for the moment and accept the premise as true.

The other underlying premise of your method is that velocity is a random variable about this perfect prediction with a known distribution, such that via the law of large numbers, over a large number of sprints, the average velocity can be known with high accuracy. I see a few issues with this.

One issue is that velocity is not static. Velocity changes with team morale and how interesting or motivating the task is. It changes with how well engineers are suited to a task. It changes with organizational changes, the addition of new team members, the loss of old ones, the work environment, etc. I don’t buy that these are small factors that average out over the long run, either, since my experience leads me to believe they don’t. A poor work environment or coworkers leaving the team or low morale usually are harbingers of bad things to come, and, opposite, a growing team working usually has exciting challenges, morale is usually high, etc. My experience hasn’t shown a mixed bag of equal good and bad, but instead two divergent sets.

Even if velocity were static, the other issue is the distribution is unknown. Sometimes teams have bad weeks, and sometimes they have good ones. Sometimes something catastrophic happens in the organization and a dozen engineers are twiddling their thumbs or are thrown into some task they did not foresee. Sometimes things go unexpectedly well. A priori, there’s no way of knowing what a particular team tends to, or if that trend will continue. One could try to measure it, but my experience has been that something that was supposed to be static will change long before a sufficient amount of data could be collected to be meaningful.

Even if you knew the exact distribution you’d encounter, the final problem that remains is that projects aren’t infinitely long. There aren’t a million sprints to smooth things out. There will be variation in the outcome.

I guess what I’m saying is, estimates are estimates. Why systematize their creation when they will ultimately be wrong, and creating them is so costly? I could understand having a system if it were low-overhead, but my experience with a system similar to what you’re describing was anything but that.


I'm a bit surprised by the "validated by scientific studies" bit. If there's anything I got out of the likes of Kahneman (Noise), Hubbard (How To Measure Anything), and Tetlock (no particular reference) it's the complete opposite: If you estimate in t-shirt sizes people will come to false agreements. The variation in what people mean when they give the same t-shirt size is enormous. Even the same person estimating similar tasks weeks apart may give the same t-shirt size but when asked how to translate it to time will give wildly varying numbers.

The only thing that actually stands up to hindsight verification is estimating in as concrete units as possible – ideally calendar dates.

Of course, if you invoke statistical properties of past estimations to convert t-shirt sizes to calendar dates, you're sort of doing that already. But why do it behind people's backs? In my experience (and drawing from system theory) it is much more useful to allow people to receive the feedback of their successes and failures.


I'm so happy to see someone who has actually read some of these studies on here.

There are only 2 reasons to do the translation behind peoples' backs AFAIK:

1) also from kahneman et al we know that humans are bad at estimating time, and much more consistent and accurate about measuring effort. And consistency is required for the law of large numbers to apply. By estimating in consistent non time units you avoid those cognitive biases.

2) humans will very quickly start to unconsciously bias the units if they feel pressure. Speaking in terms of time puts one very close to a lot of pressure points for a lot of people.

Ideally engineers are concerned with doing things Right, and PMs are using past performance to predict how long that will take.


Customer: What will this project cost? You: Five XL T-shirts.

Excuse the snark, but a lot of business decisions are made based on how much Project X will cost. You likely get paid based on time, so it's quite natural to gravitate toward time units - any intermediary unit is just an abstraction.


Maybe you missed the part where I explain how PMs translate to time for business planning. Yes that is absolutely necessary.

And yes it is an abstraction, which is the whole point.

Human brains are extremely bad and inconsistent at estimating time, and much better at estimating effort. So you abstract and let them estimate in the terms they're better at. Not to mention, the abstraction means you average in unpredictable disruptions like illness, windows updates, and other events. Abstracted unit estimates are demonstrated to be much more accurate than (not-abstracted) time estimates.


People sell projects for a fixed price? How does that work? You either overpay a ton of underpay right?

And how would you even say what a project will cost? Projects change all the time, and there is no way to know the entire scope beforehand unless it's really really small. Otherwise you're just lying.

Genuinely how does this work?


But you do this in the non-software world all the time — if you leave your car in for a service, you'll get charged £X base on it taking them Y amount of time. Some cars will take them ±10% but it balances out in the end.

Or if you're getting building work on your house, they'll caveat that it'll cost you £X, saving any unforeseen circumstances that arise (if you've asbestos in your attic, they're not going to stick to the original price and just eat the cost)

In consulting, most clients want a fixed cost for a fixed scope, with any deviations coming as change requests & billed accordingly. Generally people are hestitant to agree to an agile model unless you've already got a lot of trust built up.


It's very touchy and all about the contract. The scope has to be very clearly defined as well as what an expansion of scope is and then what happens next. Basically, how the change order process works and what qualifies as requiring a change order. So, in a way, it's not literally fixed price. Or maybe "fixed price with an escape hatch" is a better term.

A sophisticated enough client can design a fixed-price deal that can completely devastate a software development firm if the guardrails are not in place. On the other hand, a sophisticated dev firm can design a fixed-price deal that will nickle and dime a client in perpetuity if the client doesn't recognize the trap.

Fixed-price is like handling dynamite, you have to know what you're doing.


Personally, I’ve never seen one go well though all of the projects have been relatively complex with no repeatability.

The vendor ends up losing money and the client gets a deal on paper but not in reality when you realize the corners that were cut to shove a product at the door.

You could generally go back and look at the emails on the project and see the exact point in time (often about 2/3 in) were the vendor realized they were losing money and started cutting whatever possible to not lose their shirt on the project.

I suppose that if projects were relatively straightforward like “we’re building you a wordpress site with X number of templates” one may have a chance at estimating upfront- but I’ve never worked on projects that predictability.


fixed cost projects either don’t change, or more likely have a very specific change procedure written in to them, so the customer knows changes aren’t free.

(of course changes are never free, but we all seem to want to brush that away, to the detriment of the customer.)

there are a lot of software projects out there where the customer isn’t chasing some shiny new toy. the folks who wrote the firmware for your microwave had a really good idea what the entire feature set was going to be before they started.


My previous manager and PM would scrutinise the time taken in Jira, then create a personal spreadsheet to roughly map points to hours/days (because senior stakeholders don't think in cups of coffee or t-shirt sizes), then refer to the spreadsheet privately during estimation. But he eventually let slip about the existence of it

So we just bypassed the charade and starting giving hours/days. The whole team was really just doing hours/days -> points mapping in their heads during estimation anyway.


> The whole team was really just doing hours/days -> points mapping in their heads during estimation anyway.

This is a common story. But because of the calculation step, your PM was still mapping estimated time units to actual time consumed, and using that mapping for predictions. So you were getting some of the benefits of an abstracted unit (accuracy, averaging in unexpected events) without the stress reduction or focus on quality over time.

But if it worked for your team, I would advocate to keep it. Human groups are hard enough to coordinate as it is, there's no need to hew to a dogma.


So your team does basically, let’s say, 10 gold stars a week.

Your epic and VeryNecessaryFeature(c) takes 30 gold stars.

Pointy hair boss: this takes 3 weeks. (and make it 2)

How is this different from skipping the proxy and just using time directly?


The answer is not having a PHB.

No non-unit-of-time estimation system will matter if there isn't buy-in from all levels. If the Directors/VPs/C-Suite etc view development in assembly line terms, then Xpoints = Ytime can never be avoided and yes, it just becomes an exercise in frustration.

To date I haven't found a good antidote other than avoiding workplaces that view estimations as exactimations of time.


Ah, so the best way to provide time estimates so that the rest of the business can do planning is to not work at a business that needs to do planning.


On a story by story basis saying points=time is an idea as silly as your deliberate misinterpretation of my post, yes.


The problem in your story is a non technical boss making time estimates/demands. If that's gonna happen, the developers shouldn't bother with estimates at all because no one is listening. They can just go on missing the boss's fantasyland deadlines and save themselves the trouble.

If the boss wants to know how long things will take, they have to ask the engineering teams. And the most accurate way for an engineering team to answer is with a track record of abstracted unit estimations and how long they took, to predict how long the abstracted unit estimation in question will take.

And when the PHB says "make it two", the engineers can only reply "ok, tell us what features or standards to cut." No different than wiring a house: we can finish faster if you choose some rooms to skip, or some safety standards to skip. But those are the boss's decisions, not the engineers.

BTW I agree with the responder who says quit from those environments ASAP. I don't know how engineers can overcome a dysfunctional business lead like that... and that kind of business lead is often an expert at making sure the repercussions land on someone else. Case in point, by asking you to finish faster he is tacitly asking you to skip capabilities or safety checks... and that way you're responsible for the fallout.


It’s different because it can remove a bias out of estimations.


But eventually everyone knows that your gold stars stand for weeks. So how does this help?


Because you can keep tracking how long things actually take. And then divide by the gold stars, to see how much time you actually need for a golf star.

And that number might change over time and that’s okay, and you can keep tracking it.

It’s a bit of a mind trick.

Instead of thinking: “How long will this ticket take?” you think: “Is this rather as complex as the 3 star tickets I did in the past, or is it as complex as the 5 star tickets? “

You estimate in terms of other tickets, not in terms of time.

It helps to remove bias.


The only quasi-interesting property this has is that it can change over time.

Nothing you said changes the main problem: gold stars are weeks and we all know it. Next year they are two weeks, fine.


No, they are not weeks, and the minute "we all know it" you are fucked.

Engineers are not concerned with time. They are concerned with feature completeness and quality. They do not care how long a task takes, only that it is done Right. When the PM asks how hard it is, that's their only consideration.

The PM only cares about time to make accurate predictions for the rest of the business. A given task takes as long as it takes to be done Right. If the business wants it on a different timeline, the business has to redefine the task, or redefine Right (ie skipping tests or doco).

Research bears it out: human brains suck at estimating time, and are much better as soon as you talk in terms of effort.

BUT if you want to insist on brute forcing through our cognitive biases, I have good news: you can do almost as well by tracking the relationship between "developer time" and "real time". You're still dealing in abstracted units, of course, but you can still pretend you aren't, which is the important thing.


I get the sentiment, but ultimately business is time-bound. Everything is about time. But now we have a bunch of “engineers” (that somehow are fundamentally incapable of structured planning and execution) that are “not concerned with time”.

I found your characterization of engineers and PMs not useful and a bit childish. This is not how money is made.

Let’s just agree I don’t understand this. You sound well-informed and I’m just a simpleton.


"You said it would take a week"

"No I said it was 12 jellybeans"

"but we all know that 12 jellybeans is a week!!"

"No, we all know that 12 jellybeans is explicitly not a week. It just happens that 12 jellybeans has taken 'around a week' in the past. An estimate of 12 jellybeans is still not a promise that it will only take a week."


then "ok, well i'm not paying you until what i asked for is in production".


Withholding salaries is a crime, so... I strongly doubt it.


This mapping falls apart when the individual devs have different levels of experience. A gold star is a different unit of time per individual; Dev A might be able to complete 1 gold star in 1 week, but Dev B with less experience takes 2 weeks.

The right thing to do isn't to estimate the average 1.5 weeks, since the case might not actually be done for another month, and it might end up actually being done by Dev C who just joined the team, for who 1 gold star is currently 3 or 4 weeks.


A team has an average.


If you took all the winning lottery numbers you could probably find a average as well.


This is a nonsensical response. An absolute non sequitur


Ok, so how many 'difficulty points, t shirt sizes, cups of coffee, gold stars, anything's fit in a two week sprint?

I hate pretending that we're estimating 'complexity'.


How many has your team completed on average in the past sprints?

It's not pretense. WTF should an engineer care about time? An engineer has to care about Done Right. It takes as long as it takes. A PM cares about time, and they can looks at the past history of similar tasks to estimate it very accurately. The abstracted "effort" unit is a way to identify "similar tasks".

It's also very well demonstrated that human brains suck at estimating time. Estimating difficulty is way more consistent.

But if you hate thinking in terms of complexity you can do the same thing by making sure your tasks are evenly sized, and averaging task throughput over time. Or if you also hate your pm you could tag similar tasks and have them do the leg work without the math to help.


> How many has your team completed on average in the past sprints?

In other words 'what is the average conversion from complexity to time'; in other words 'let us try to point time but call it complexity'.

> WTF should an engineer care about time? An engineer has to care about Done Right. It takes as long as it takes.

That's great, but then #points in a fixed-time sprint should be expected to be all over the place.


I've seen 3 major uses for estimations (besides the obvious forcing function to align on complexity/ideas among the team members):

1. Somebody really needs a time-based estimate for the project, e.g. the customer wants feature X by a certain time and we need to figure out if it's feasible or not in that timeframe, or there's a certain external event (expo, or any other kind of marketing event) with a fixed date and we need to see if we can make it. This is usually rare in traditional product companies.

2. We need to see if the project is worth it, in general. Maybe it's just too much effort to even consider.

3. We need to prioritize a couple of projects/opportunities. ICE/RICE comes into play, and we need to stack rank based on return on investment. Relative effort in meaningless units (tshirt, cups, etc.) make perfect sense here.


Estimating in something other than hours still needs to be converted for hours in order to be useful (for resource allocation, budgeting, etc). The only thing you can do with "points" is compare things. Not unimportant, but still not enough.

In "estimating in the small" such as for a development iteration one can often get away without a precise conversion to hours and dollars. But in deciding whether to do ProjectA or ProjectB each being 100+ man-year projects, or whether to reassign the new hire to the team of ProjectA or ProjectB, you still need hours. It's not going to help to say the project is six hundred thousand points, unless you can convert it to the real estimate which is "a 20-40 man team for 4-8 years"


> Estimating in something other than hours still needs to be converted for hours in order to be useful

Yes. Did you see the part of my comment where I explained how you do that?

The fact that this is the most accurate estimation method is well established in research. You can get close with some other methods, like tri-partite estimation, but they're more cumbersome.

WHY it helps so much to abstract away from direct time estimation, has a few going theories. One obvious reason is that the abstraction and tracking the relationship to real time brings into play the Law Of Large Numbers, the same mathematical law that allows casinos to profit on games of pure chance. The more sprints you have under your belt, the more your conversion rate averages in unexpected external influences like illness, windows updates, upstream problems, etc, in proportion to their actual occurrence rates.

Another of the most popular theories is that human brains have specialized areas/heuristics for predicting time vs predicting effort, which come with different cognitive biases.

Fundamentally I can't give you a definitive answer to why, only that speaking in terms of difficulty with engineers, and converting to time to talk to business planning, is well demonstrated to greatly improve accuracy in controlled studies. In fact, the best estimators, when predicting time, still predict 130% of the time needed for a complex task. Using abstracted measures it is not unusual to see accuracy within 5% or 10% of actual.


> The fact that this is the most accurate estimation method is well established in research. You can get close with some other methods, like tri-partite estimation, but they're more cumbersome.

You keep talking about the research, is there any chance you can provide some references? I'd like to review this research as what you say sounds plausible and I'd like to know more.


The law of large numbers doesn't help here because a) the error distribution is strongly one-sided (it being much, much easier for something to take twice as long than half as long) and b) there aren't enough projects, typically, for implementation teams to stay stable for long enough for the central limit theorem to compensate.

I wouldn't be at all surprised if the relationship between estimate and actual typically followed a power law relationship, in which case speaking about any averaging over time is meaningless anyway.


Great response.

Part of the point of estimating difficulty is exactly to reduce the one-sidedness evident in time estimations. Human brains are much more accurate and consistent with effort estimation than time. It's still very likely a beta distribution but for the level of accuracy we're talking about it's just fine.

We should probably clarify the "bar" here: the very best time estimators never beat about +30% delta from actual time. (Note that I'm not using +/- intervals because it's only ever +). So how many points (effort/time ratio values) does it take for convergence to get your effort/time ratio mean within 30% of the peak of the curve?


I don't think that 30% bar is commonly understood. From details in conversations I've had in the past where the reaction to (for instance) a 50% miss was "we can't ever be that bad at estimating again", there's a pervasive attitude that it's a fault in implementation teams that they're not accurate to the day. There's no middle ground: a team that can't give "good enough" estimates is held as incompetent, while the best anyone can ever do would be unacceptable.

What's interesting on that specific 50% miss I saw is that the original estimate from the actual teams themselves was argued down as "too big, do it again" by their own management...


I've heard it's lognormal, which doesn't follow the central limit theorem either.


I’m always trying to gather extra ammunition for when I’m arguing for points based estimation methodology… do you (or anyone else) have any favourite scientific studies on this ?


The reality (mine at least) is that every time I've been asked to estimate in t-shirt size or points, there is a helpful chart mapping those to units of time.


1. Have you done it before? If no, triple your estimate.

2. Do you know everything involved? If no, double your estimate.

3. Do you have someone that you will ask questions of or pair with to get through difficult sections quickly? If no, add half to the estimate.

4. Does your team like to nit pick pull requests? Add half to the estimate.

5. Factor average weekly toil into the estimate. If you don't measure this, triple the estimate.

This may seem like a really big estimate. But measure the estimate against the outcome and tell me I'm wrong.


All this while remembering that you will probably only get 4 hours a day to dedicate to the project. Between responding to emails and chats, paperwork, training, meetings, and context switching - your day will be half gone before you can even touch a project.


Estimating a project that doesn't have detailed scope and specifications is a bit like asking "how long is a piece of string". Asking for an estimate before knowing what it is you are building is canonical MBA widget production management. Numbers and dates unrelated to reality so the idiots with the power can put data into their spreadsheets and claim they're doing something.

Software has nearly 0 reproduction costs, so by definition you'll (mostly) only write software for something that has not been done before. How long will it take to quantify the unknown?


Sorry, I guess I wasn't clear; my suggestions are for when you already have scope and specifications. When you think you know what to do and how to do it, you can then add all the extra time to come to the "actual estimate".


what i do is break it into two projects. The first is a "discovery" project where you learn what you need to know to give an accurate estimate. You can usually get a discovery estimate in the ballpark by understanding how complex the ask is and how many people you're going to have to interview. Then you deliver the actual estimate along with all your evidence that will drive the scope definition.


No!

>>> someone at the top needs to see if the company can invest that much money and for how long, which requires some numeric value attached to a project.

Software is an enabler. It's still blowing up the world. Almost all new projects are "build a new capability we don't currently have". Can you fire lasers from the backs of sharks? No then do this shark-laser project and you can.

The actual cost is kinda irrelevant - either laser sharks are awesome for your business or not.

Cost control in software development is not an issue of estimation and project management - that's for ditch digging.

Cost control in software is about iterative development, early surfacing and engagement, it's about small teams well lead and left alone. Want integration - build an integration environment and a dashboard with tests on it - dont build a gannt chart.

I think I need to get something off my chest


This is my rule: Start with the developer's estimate. Then for every management layer in the organization, multiple by 2.25 until you hit the c-suite.

This turns out to be surprisingly accurate when measured against reality. I'm guessing someone has already thought of it and I don't really have a name for it.


My factor is 5. Works like a charm. It’s just not politically correct to say so.


This was my estimation process for construction add-ons (back when I did this between college courses). Take an estimate, double it and add 50%. I always regret when I don't take this approach.


This is nice, although there's a self-fulfilling prophecy element in this. You will postpone your project to fill up the declared time


I noticed one important point mentioned in the banking app example, but not elaborated on:

Shirnk critical scope.

By this I mean if you can produce a simpler version of whatever is that you're doing and have separate tasks for expanding it, do it.

Your e-commerce solution can start off with just one payment provider. Your form validation can initially respond with just "nope!". You don't need that color picker shaped like a peacock in your MVP.

Especially if these things prevent you from delivering something that someone, anyone can use.

Also half of such things become irrelevant and ultimately get dropped.


I repeatedly see features being placed into a "phase 2" that appears approx 90% into a project as the deadline looms.

Half of those features eventually get implemented, the rest it’s realised are not needed at all.


That's the correct way of going about any software development. The problem though is the customer will likely be unwilling to agree to a contract that doesn't have at least 75% of the scope or the product is "unusable". Their existing/previous product is usually better all the way until every single feature of it is available in the new system (I have not yet in my career ever seen a system that isn't replacing a previous system - at least if you count an excel sheet or a piece of paper as "previous system").

What you end up with when shipping an MVP is you also convert a minimum number of users and you end up with two systems. The "MVP" is thus often "whatever allows all existing users to switch" which is why projects snowball to be too big and too late. You won't realize what people used the old system for, until you try to replace it. And it always turns out it has 2000 features when your discovery had found 20.


I wrote about this:

https://albertcory50.substack.com/p/how-to-finish-a-software...

The key thing to note is: It's not just big software projects; it's ALL big projects

The California Bullet Train is a poster child for this, as is the Big Dig in Boston, and (probably) your kitchen remodeling.

Once you accept this, you can learn from the way other disciplines handle it. Or fail to.


I work in consulting where if your estimates are not right then you don't eat. One thing many people forget is that writing code is only a small part of the overall effort in software development. I feel like i could write a book on the subject but I think the most common error is not understanding the scope of the overall effort from an idea to effective use by end users and only estimating the typing-on-the-keyboard part.


The smaller the potential upside of the project is, the more the estimates matter.

Put another way: A project that promises to return USD 10 for every USD 1 invested is still an attractive investment when the ratio is 8:1 or 7:1. This suggests that construction costs can double or triple without destroying and the investment still has a healthy return, and so an estimate really only needs to be very rough. (In fact, the act of estimating quickly adds to “construction” costs so should be kept to a bare minimum in these cases, to keep profits as high as possible.)

It’s when the cost/profit ratio slips lower that estimates become more important, and ironically, waste more time and money.

(Note that I’m talking specifically about estimation and not project planning… projects, especially large ones, still need plans.)


> A project that promises to return USD 10 for every USD 1 invested is still an attractive investment when the ratio is 8:1 or 7:1

What in the world experience have you or anybody else ever had where a software project can be estimated with that level of accuracy?


Great article on this topic: https://apenwarr.ca/log/20171213


Very good. I would also balance the OP article with Allen Holub's talks https://www.youtube.com/watch?v=QVBlnCTu9Ms


yeah this is really good - thank u


All I know is government contracts, but I've been pretty successful at estimations. Much like the top posts in this thread, none of it is for free. Most of the time requirements come in a few sentences, maybe a paragraph, rare cases a document. There's no estimation until the requirements are flushed out in some type of use case which has some bearing on the technical constraints of the system. (The client pays for this, and provides feedback.) The next step is planning, which is where estimation comes from. (The client pays for this.) In terms of planning I do strive for maximum 16 hour chunks which helps accuracy. For estimations that I have low confidence in, I'll have conditional chunks that may be used. All of this is transparent to the client.

If the client asks for an estimation before the above steps it's known by everyone that it has no official meaning, because everyone is aware of the above process.


> The "one size fits all" approach: Assuming that every task will take the same amount of time is a recipe for disaster. Different tasks have different complexities and require different skill sets.

I sometimes wish we lived in a world where we'd have enough data to accurately predict at least the common tasks, thus freeing people up to reason about the more complex tasks.

For example: adding a new path to a webapp, which will resolve to a Ruby on Rails template, that needs 2 dropdowns and 3 input fields, which need input validations and checks against already saved DB data on saving will take around X hours of work. Doing this with Python and Django will take around Y hours of work. Using a certain set of tools to assist with development will affect these estimates by Z hours.

But I doubt anyone ever has data that granular.


I think once we get to the point where the design is so fixed for a set of standardised tasks like that, that we know how long it will take, the design is also so fixed that we can run a script that generates the code for it, so the time to make it is close to zero.


> I think once we get to the point where the design is so fixed for a set of standardised tasks like that, that we know how long it will take, the design is also so fixed that we can run a script that generates the code for it, so the time to make it is close to zero.

For all of the boilerplate stuff, I fully welcome and embrace it. For example, we already have codegen for various stacks, even Ruby on Rails: https://guides.rubyonrails.org/generators.html

Sadly, model driven development never really took off, because seemingly everyone was interested in being able to iterate and ship new features in frameworks quickly, as opposed to slowing down and working on different ways to interact with what's already there.

SOAP had WSDL, but REST needed a whole bunch of time until we got OpenAPI and levels of codegen that SoapUI had years ago: https://github.com/OpenAPITools/openapi-generator

MySQL Workbench also has great bi-directional ER diagram support, with forward/backward engineering and schema sync: https://dev.mysql.com/doc/workbench/en/wb-design-engineering...

But maybe I'm asking for too much, generating templates for views with model fields, or even model bindings for the ORM is already good, as is generating DB migrations as well.


Dr. Barry Boehm (USC) basically invented the rational approach to software estimation. Every software engineer responsible for estimating used his Cocomo techniques (or similar) through the early 2000’s.

Most of his work still applies:

https://www.gristprojectmanagement.us/software-2/software-ef...

https://en.wikipedia.org/wiki/Barry_Boehm

You can buy his books from online bookstores.


COCOMO is an impressive project and I wish someone would update it with modern projects to see if/how things have changed.

But the main rule of thumb I brought from COCOMO was that it takes 5 person-days to get 100 lines of code from idea to production. That has been remarkably accurate in my experience. I've witnessed high-performing teams do 125 lines in 5 days, and low-performing teams doing 60 lines in 5 days, but it's still well in the ballpark.


Well, maybe the new “GPT” tools will help us to do that! Sounds like a good project for a masters thesis.


These are good basics, but will still underestimate large projects.

For a padding rule, use: estimate^2

Because T tasks times N estimation errors/delays is an exponential growth in project time complexity.

For statistics and data, see here: https://erikbern.com/2019/04/15/why-software-projects-take-l...


I like the Shape Up approach. I'm currently the solo dev at a company and I have taken Shape Up and cut it down to the bone. When I'm asked about how long something will take I say "give me x days and we'll see how far I can get". Often one or two weeks. This is honest and understandable.


Thank you for sharing your knowledge. Talking about estimations is always complex.


We created a small webapp that works pretty good for us:

https://calestimate.com

(No login needed to try it out)


I liked this article. Practical, not full of bs, not full of bias, some good tips.

Might check out his other articles now.


Wow. A solid quality article. Great info. Well done!


All you need to know about estimates is Hofstadter’s law.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: