I object to the notion that it’s ok for bug ingress rate to be higher than bug egress. For me that’s symptomatic of underlying problems. Either (A) those bugs are on important features, and the team is favoring novelty over functionality by prioritizing new feature dev over feature maintenance, or (B) the broken features are unimportant and the team is failing to weed out irrelevant functionality from their codebase (it is important to remove features while you add them to not get in a zero-progress situation, unless you grow the team along with the codebase), or (C) the team has bad engineering practices causing a high ingress rate, or (D) many bugs are based on misunderstandings, which points to documentation or UX issues. No matter how you slice it, I see it as never acceptable to let a bug pile grow indefinitely.
Do you see this as a sort of compromise, as in it indeed pointing to deeper problems but needing a workaround in the real world, or do you disagree that a growing bug pile is symptomatic of deeper problems?
First, if story points are an indirect measure of time, then the "psychological game" you're playing will be immediately revealed if your engineers are as smart as claimed. There is no reason for me to point something a 2 over a 3 unless you're measuring the time it takes to deliver software based off those measurements. On the opposite end of the spectrum, my confidence in whether a story is an 8 or a 13 becomes significantly weaker as the numbers get bigger.
Using numbers, specifically, is a hint that whomever is handling process just wants to predict when the project will be done. Numbers trick you into thinking they can be added, margins of error are not additive. My go-to question in these situations is why don't we just estimate with abstract sizes? (Small, Medium, Large, etc) Surprisingly I'm often met with resistance.
Second, if story points are not an indirect measure of time, then why are you pointing stories? What does the pointing gain your team other than fluff? If you say it's for prioritization then you're just invalidating the premise and we're back to an indirect measure of time.
Finally, I have not seen, and you have not presented, evidence that engineers are good at estimating. In fact all that I have read seems to indicate the exact opposite, that engineers are very bad at estimating (in fact that they are largely too optimistic). One could argue that this can be trained, or that you'll get better at estimating as you gain more experience. Which I will concede that you will get faster and better at estimating and implementing the same exact feature, but that is not what we do. We implement new features, things we likely haven't done before.
Adding a level of indirection removes some implicit biases. Points might be indirectly related to time, but only incidentally. Points sound more directly related to task complexity, which is itself indirectly related to time.
It's kind of like how lines of code seems like kind of a poor measure of program complexity, and yet, study after study has shown that regardless of language, framework, etc. the number of line of code is the best measure we have for latent bug count. This correlation doesn't always make sense if you look at specific, contrived examples, but the property seems to hold in aggregate.
The points can (and should) be assigned before the work arrives. Agile planning is a little more nuanced. There's business valuation (points from business development) which then become stories that are estimated during/over another sprint. The points don't correlate directly to hours because you haven't assigned points or know what resources will be on the estimated sprint. That is the responsibility of the Scrum master to handle. If people are vacationing, sick, replaced or there are new hires, you put a variable time-value on the points. Most importantly, you put a few stories in the current sprint off the top of the queue and pull as time allows. Telling the engineers that "all these stories must be done by the end of the sprint" negates the whole process.
> Finally, I have not seen, and you have not presented, evidence that engineers are good at estimating.
In general, nobody else is capable of coming close. That being said, I've blown up estimates to maximum by inquiring about specific details (what file, function, library, what repo, do you have creds for that, how long are the code reviews, etc) in a majority of tasks because some engineers are good as estimation for specific implementations, otherwise they concede on identified complexity/meta-process. Small, Medium, Large seems like the correct approach.
We could do this with hours except hours actually translate directly to time. From them I can determine when you think I should be done and thus the implied goal; points I can't. Because of that, I have incentive to overestimate when it comes to hours, to give myself extra time 'just in case'. With points I don't; they don't imply an end time. Even if the PO's estimates are wrong, they can't get onto me, because I never had a due date. Because of that, my estimates tend to be consistent (even if inaccurate), and consistency = predictability. That's the goal, making it predictable when things will be finished (within a given tolerance; hours never give us that). Whether points are big, or small, the PO can determine "This team has an average velocity of 20 points, whatever size those points may be. That means I can expect around 20 points per sprint in terms of stories. I have 50 points I need to get done for this next release, that means I can expect it in three more sprints". All done without the engineers ever having a goal they are trying to make, no incentive to change their estimates, and literally the best predictability (not accuracy; you don't need accuracy) you can achieve, which, it turns out, is actually pretty damn good in practice.
I think you're missing the fact that points only matter relative to each other. There is no absolute meaning. The author hints at that but I don't think is explicit about it, assuming you already know that. As such, a team tends to be consistent with them, even if not objectively accurate. And their mistakes average out into something predictable, even if not accurate. Because there is no incentive, conscious or unconscious, to massage it.
Some people -do- decide to use t-shirt sizes for sizing, rather than points. It still works. This is generally equivalent to 3, 5, and 8 using points (fibonacci). Most people who use points say anything less than a 3 is wrong (make it a three), and anything larger than an 8 should be broken up because otherwise it's difficult to estimate (with -maybe- a 13). So use whichever you like. S, M, L, and maybe XL, if you want to map to four options, as you have with points. Though you need a way to aggregate them to determine a velocity; how many S = 1M, etc. That's why people tend to use points.
Basically, the author said, flat out, it's a psychology game. AND IT IS. What a PO needs from a dev is consistency in how they estimate, not accuracy. From that you can measure the actual work completed over time, and get a measure of velocity, which can be used to accurately predict the delivery of future stories, within a pretty good tolerance. Ensure the psychology for that consistency is there. Points are part of it. No goals, milestones, deadlines, etc, are part of it.
My point is that no story points need to exist. If you want to average the amount of work done over a period of time then just count bugs and stories completed. The Central Limit theorem applies equally well to stories over the long term, so just collect data. Automatic.
We add story points, presumably because we don't want to wait to gather enough data on stories, but the neutral position is to not use them because they incur a cost (meetings, training to get consistency right, etc). So why have them? I have not seen a convincing argument.
We add story points because given consistent incentives, estimates tend to be consistent. Maybe consistently under or consistently over, and obviously there's a bit of wiggle room from estimate to estimate, but they tend to converge quite quickly, and to within a week or two's uncertainty of what we'll have done at any given point within the next six months, and within 6 weeks within the next year. Quite a far cry from not using them.
Meetings? Some, sure; you'd have them anyway just defining what it is you're doing, the additional burden to assign points takes up maybe half an hour per person per week or two. Training to get consistency right? There's no training involved. The hardest thing is to get people accustomed to picking a number, relative to the others. But that's not that hard either; we basically just took the first sprint's stories, organized them into a line (much like his rectangles) going from least to most complex, then discussed where to draw three vertical lines, separating them into four distinct sizes, 3s, 5s, 8s, and 13s. We then made sure the largest didn't feel too large compared with the other 13s (or else it might actually be a 21, just compared to what we had agreed a 13 was, and so we had to split it up), and from there we always had 'reference stories' to decide whether it felt more like a 5 or an 8, say. And while we sometimes differed, we could always hash out why we differed and come to an agreement on exactly how large the story was. Again, per the OP, so long as you are consistent with how you address those discrepancies, your overall estimates will be consistent, and give you predictability.
One half hour per week or two is hardly a huge cost to pay when it gives the business the ability to accurately predict when we'll have something delivered.
I don't care if the argument is convincing. I've seen it work. You're free to do whatever you want; I know what I've seen work, and what I've seen not work. I've yet to find something that works so well.
If you just want to throw out anecdotes I'll give you my own. I collected data on all the pointed stories at one previous job for three years and found a negative correlation between story point and time from a story being started to being completed.
Sure, you might claim that it all averages out in the end, except that our velocities were wildly in flux for that entire span as well. But we were just "doing it wrong" right?
I doubt that you did, if I understand you. Because it sounds like you're saying that overall 3 pointed stories took the most amount of time, and 13 (or whatever your max) took the least.
But let's say it did. I'd look to see why your estimates fluctuated all over the place. Or whether stories were being closed when they were actually finished (i.e., being accurately reported). Did you have deadlines? Did you have delivery pressures? Because that right there is a good reason; as you near a deadline you start padding estimates more. Did you keep having things come up that broke the sprint? Etc. All manner of things can cause estimates to be wrong. But not negatively correlated, -especially- with velocities constantly in flux (and I mean seriously in flux; you take an average because it can and will vary, especially if there's unexpected stuff, like someone getting sick, that you didn't account for when planning); that to me, yes, definitely sounds like you were doing something very, very wrong.
However, it’s hard to make long-range estimates using only many tiny stories/bugs, because you don’t want to break the job down with such granularity for months or years into the future - plans will change by then, and all that design work will have been wasted. That’s what makes big stories useful; you can estimate months of work in a few minutes. But because they’re so big, you can’t treat them all as equal sized.
The problem is not that they're engineers, no one is good at estimating work they've never done before. This is a well known problem in pretty much every single software shop I've ever been in. Teams never deliver what was planned on time, only functional teams cut features for releases. This is not a "win" for estimation.
Because you keep focusing on time estimates instead of point estimates. People have intrinsic biases related to time and their productivity. Like how most people implicitly assume they're above average in intelligence, looks, etc.
I've played around (in spreadsheets and basic apps) with trying to create systems that scaled available slots to team size as a way to force correct granularity.
When using bug trackers, I find the most frustrating aspect is the "non-linearity" of the workflow. By that I mean, how do I answer the question "What am I supposed to do next?". You can sort by project, or by priority, but what I typically end up with is a list of items that I already looked at a dozen times. And even though I don't want to look at them a second time, I haven't found a way for a bug tracker to do that for me. Ideally, I would want to look at each task at most two times: Once for triaging, and once for working on it. That's it.
So the way I understand it, you're trying to address this, at least partially. A task starts out as untriaged, then you tag it as triaged, and that means you only had to look at it once for triaging. Which is great, because it's a linear workflow.
Some tasks are obviously critical, and will end up in the next open release milestone. Some belong to a feature that is not released yet. But what about the stuff that ends up in the backlog? These smallish, nagging bugs that are not super-critical. That is the big, ugly pile that keeps growing and growing. How do you keep that big pile manageable? Ideally, that pile shouldn't become big in the first place, but how do you prevent that from happening?
Another benefit of sub-categorizing this way is that it makes it easier to resolve bugs as duplicates. When a new bug is filed, it's hard to see that it's a duplicate when you're comparing it against 10000 other bugs, but it's easier when you're comparing it against only 100 other bugs in the same category.
I doubt you'll ever be able to get it down to "at most twice." But it needs to be much easier than ever resotring to "looking through the whole list."
It's always possible to make it shorter to help communicate a main idea. (However, it does takes extra effort to extract the essence of a long piece.) Your essay is ~15,000 words and desperately needs a short "elevator pitch" of its main points. I'm a very verbose writer so when I think others' writing is verbose, it means everybody is going to drown in the text.
Here's my summary of what I think you're trying to say:
1: There psychological problems with deadlines and large bug lists that cause counterproductive results
2: I discuss 2 psychological "tricks" in software development to counteract the unwanted behavior
2.a: Estimate software by abstract units such as "points" instead of concrete units such as "time/hours/weeks". The abstract units bypass the human biases that lead to bad estimates. Use the points to determine "relative" sizes of each "story". (E.g. Developers vote to converge on the "size" of each story point.) The last step is to multiply the points by a unit of time to derive a finish date.
2.b: Do not have a big global list of bugs to burn down. The size would be overwhelming and demoralizing to teams. Instead, triage bugs into smaller "hot lists" so they "see" a smaller manageable queue to work on. Also, measuring bug fix times will eventually let you derive an "average bug size" that's reasonably accurate
The tldr would be something like "Here are 2 counterproductive management techniques with setting deadlines and assigning bug fix work -- and here are 2 ways to counteract it with management ideas that take advantage of human psychology."
Somebody else can wordsmith it better than I can but that's what I think your essay is basically about. The 15000 words are mostly examples or background ideas leading up to your recommendations (SLO vs SLA, Tesla, what I like don't like about Agile, Kanban, etc).
I recommend that you put your strongest main points at the very top to give your readers the mental scaffolding to hang the rest of your 15000 words on.
 "I would have written a shorter letter, but I did not have the time." -- Blaise Pascal : https://en.wikiquote.org/wiki/Blaise_Pascal
This is also why I didn't summarize everything at the top: that would encourage people to just read the top and stop there. They can do that with project management advice anywhere on the Internet. There's a place for that, but there's plenty of it already.
At least it's shorter than a book. :)
The value for the reader is in the act of chewing over a familiar problem along with your guidance. If the main points were made more obvious, then perhaps it also becomes more boring.
If you want to propose a more tangible recommendation/guide/process then yeah, give people hooky, easy to remember bits.
There's an opposite way to look at it: a good summary acts as a "hook" and entices readers to read the rest of 15000 words. I wasn't suggesting you delete the extra details. Instead, the bullet points at the top give the reader a "road map" to the rest of the long article.
>They can do that with project management advice anywhere on the Internet.
Well, you said the other articles out there are contradictory ("doesn't work and makes things worse") so there's your hook: you have a superior method.
If you prefer to write in a style that "unfolds" that's understandable. A writer can hold an opinion on the best way to present his ideas.
That said, I'll offer some counterpoint. A web surfer may have 20 browser tabs open as a "todo list" of unread blogs. The email inbox has a bunch of unread messages. There's also a stack of new candidate resumes he's supposed to read. That random person then clicks on your blog and sees the shaded rectangle in the scroll bar get real tiny which visually indicates it's a very long piece of text. 15000 words is ~1 hour of reading.
Since you're not a household name among famous authors, a lot of people just won't start reading it on faith alone.
They don't trust you enough yet that it will eventually unfold with an amazing insight. Instead, many busy people will just ignore it because there are so many other items competing for their attention. In particular, the project managers and business executives you want to reach and internalize your recommendations are especially prone to skip long articles. One hour articles are really making a huge demand from multitasking managers so they need a nudge to see if it's worth their time.
There's a glut of information overload out there and long articles can act as "RADIOACTIVE - DO NOT ENTER" signs to the people you most want to convince of your ideas.
I think the author is partly right that a breakdown might be more harmful because you lose too much information. HN and reddit comments are how I decide whether something is worth my time, so a summary isn't necessarily beneficial.
We're now in the process of switching to a structure similar to Basecamp's 6-week cycles. And those cycles obviously do have a deadline. However, I would still say this kind of deadline is better than your typical one for a couple of reasons:
1. The team is self-managing. Typically the team was involved in the pitch process for the project, so they have vested interest in getting this done. I think this is key to avoiding the Student problem. It's no longer an assignment dropped from above, but something you are keen to push forward.
2. The cycle is 6 weeks rather than a sprint of 2... So this feels more like a slower-pace mini-marathon. And the team has autonomy to drop features or make adjustments. I admit that's a weaker argument for it than the one above.
I wonder what's your take on this?
In your view, with the parts you crossed out (including the physiological/motivational structures)-- would Agile still be widely applicable outside of the software-oriented projects?
Couple of, say, hypothetical examples, perhaps for non-software projects
-- developing & submitting scientific grant application
-- organizing a non-trivial longitudinal survey
-- looking for college for kids
-- designing a motorcycle with unique frame/engine layout
-- planning and shooting a movie
Story points - there is a lot in theory I like about these and you hit on those points. But in practice where my teams have struggled is still in the definition of a point. People understand the relative sizing concept but they still want a definition of what one point means and that invariably winds up being some kind of a time unit, which means all points get thought of in time units. What techniques have you used to solve this problem?
Defects - some good points were made but you never really discussed how these are managed as work. Stories and defects are getting worked on at the same time, but how exactly? Are defects just lumped in with stories and the PM prioritizes them? I do not think so, but you are not clear in this area. If a PM prioritizes a story but the team spends all of its time resolving defects then how is value being delivered as expected? You seem to have left this out completely, or I missed it.
Finally, also on defects, while your points on not estimating them makes some sense, someone has to decide what to work on, and generally you need some idea how long things will take to make that decision. Even in your wording, you acknowledge some defects take a long time to fix, where as others are super-quick and in the end they all average out. Still, someone in the organization still cares about dates and delivery. It made a lot of sense to me where you indicated that the story point estimate can be valuable to the PM in terms of prioritizing. When they see a story with a high estimate that might cause them to lower its priority compared to other stories they can get delivered quicker. But seemingly defect fixing would have to factor into this somewhere too, and then wouldn't the same concept apply? If defects are not estimated, how can the time it will take to resolve the defect inform the decision making process?
Story point size: the usual thing to do is to have "baseline tasks" (which the whole / almost the whole team did in the past) that you choose way back at the beginning, and then continue to use them as reference whenever estimating in the future. To do this, a couple of people who know the "approximate size" of a point estimate some, say, 2-point and 5-point past tasks, but don't tell anyone else how big a point was. And after that, they try to forget how big a point was during the baselining process. But you never, ever let people ask about time; you just say "was it bigger or smaller than the baseline 5-point task"?
Defects: in the model as it's being discussed, we assume (perhaps too optimistically) that we generally fix bugs before adding new features. This is why, in the second simulation slide , each subsequent feature takes longer than the last. Eventually, you cannot sustain this method if you still want to launch new features and your team size hasn't grown and you haven't contained the number of new bugs somehow; I can't tell you what to do when that happens. It's hard.
If you have mostly small bugs (which is common; just fix them) and a few large bugs (also common), then if the bugs are important, they can probably be described by writing a story. At that point you can elevate them to the estimation and PM prioritization process.
Beware, however, that in general, if a bug is introduced by adding a new feature, you should almost always fix it before launching that feature. Otherwise you have basically lied about how long it took to implement that feature, and as that gets worse over time, it progressively upsets your estimates.
Rather, I like to use a discrete list of story point values as scalar value that represents a probability distribution for how long the task might take. As the story point gets bigger, not only does the mean time get bigger, but the variance grows as well.
For example, a 1 is 2-4 hours, but a 13 is 2-3 weeks, and a 40 is 1-2 months. The idea is that not only do more complex tasks take more time, but the precision of our estimates goes down.
This makes engineers happy because they get to be more honest about estimates, and it makes managers happy because they only have to deal with one number.
 Unit-less measures might be easier in an environment that is more concerned with true productivity than with deadlines, but that is not most environments.
1 point -> 2-4 hours
2 points -> 4-8 hours (1 day or less)
3 points -> 1-2 days
5 points -> 3-5 days (up to one week)
8 points -> 5-9 days (up to one two-week sprint)
13 points -> 2-3 weeks (1-1.5 sprints)
20 points -> 3-4 weeks (up to two sprints)
40 points -> 1-2 months
Anything beyond 40 needs to be aggressively analyzed broken down into steps, even if those steps are not deliverable features per se.
For a two-week sprint, you have to account for at least one day each sprint for demos, retrospective, and planning. Thus, you have at most 9 days for implementation and testing.
EDIT: Fixed formatting.
That would be an ETA, which is quite useful in many cases.
> Or telling salespeople they need to sell 10% more next quarter.
That's a minimum quota. Also useful.
> Or telling school teachers they need to improve their standardized test scores.
Which is not so much a 'goal' as it is a response to the need for kids to actually learn information. It's a goal in the sense that "you should be doing your job to a minimum degree of proficiency" is a goal.
> Or telling engineers they need to launch their project at Whatever Conference (and not a day sooner or a day later).
Which is absolutely an arbitrary goal that doesn't have anything to do with the product, but is also good business sense.
Sometimes you need stupid goals.
What ends up happening is that people will cheat the system so they can hit the target you enforced, while sacrificing some other critical thing you forgot to enforce. In the educational system, for example, what happens is "teach to the test," which causes improved standardized test scores at the expense of actually learning things. And, of course, you get schools that outright cheat when scoring the tests: https://www.washingtonpost.com/news/answer-sheet/wp/2015/04/...
Re: ETAs, those are predictions, not goals. Predictions are good. Most of the article is about the difference between the two.
You will have to increase the educational system's budget. You will have to raise taxes. You will have to build more classrooms. You will have to hire more teachers. You will even have to feed the damned kids, and improve their home life.
There is no "method" that avoids having to do these things. These are the problems you have to solve to reach the intended goal. How you go about solving these problems does not matter.
Use whatever method you want. Sweet talk them. Bribe them. Hire a guy named Vito with a baseball bat. The method doesn't matter. Just solve the problems.
A manager that just says "teachers will be evaluated based on their students' standardized test scores" will get nothing useful, because they did not provide a method, and the teachers aren't empowered to solve their problems in a productive way.
There is no method for a teacher to solve those problems. It's a systemic issue.
I found that the more experienced/older I am, the less this happens. We even finished multiple such tasks sooner then was required - of course contributing factor was that deadlines were sane. I somewhat used to have this tendency when I was younger. Experience makes you better at managing time risks. It is actually one issue I have with cookie cutter agile. Everyone is so micromanaged by the system, that they don't get to get experience needed to learn the above.
There are also contributing cultural factors. In many companies, people who stay late last week tend to be praised and rewarded over those who work in predictable speed and kept eye on the deadline from day one (not in my current team). Incentives matter and finishing tasks at the last moment at cost of evenings/weekends makes you a hero.
I personally know people, including leads of agile teams, who believe that last moment desperate effort is somehow necessary. Like you are lazy if you dont. That means there will be last moment effort, whether it was needed or not. So if the team is going to be on time, an additional work is found to be done - or more often possible logical steps to make the deadline are not taken (a feature the customer explicitly said is not necessary is done anyway etc). I wish this would be a joke or exaggeration, but I have seen it multiple times already. While details were each time different, it really did boiled down to techies essentially organizing that desperate effort for themselves.
It does not matter what the process is, how exactly you estimate etc. As long as not doing the responsible thing is rewarded, people will do that.
I've only read half of it, but I liked it a lot. What about you? Did you like it? Why?
> it was [possible] the author probably would have just written a single paragraph in the first place and saved a whole lot of time for everyone.
This does not fit with my experience, even disregarding things like YouTube videos which drag on in order to justify longer ads.
Writing concisely is hard, and often much more time consuming than a lengthy brain dump. It's also tempting to elaborate on every point, however tangential. I have to constantly fight this in my own writing; my desire is to be complete, but really I'm drifting off-topic, diluting my point with irrelevance, making it harder to follow.
Based on the comments above, it seems like the tl;dr missed important nuance, but that isn't always so.
Obviously, a summary can't be an explanation + an argument + history. It has to be the most salient points presented alone. Think about an abstract for a paper: it should tell you enough that you can understand if you should read the paper now, and enough context so that you can remember to read it later if your situation changes.
If you feel like the tl;dr doesn't pay full respects to the article, that's fine but I feel like not many readers will mind.
edit: also thanks for writing that article, it's great.
I can't understand this. You can measure time, but you can't measure effort directly, unless you're maybe doing some physical work like pulling weights. Also, you can't usually increase a mental effort, except by spending more time (e.g. by working overtime).
I can only see points or other non-time measures as proxies to time intervals (with uncertainty), but we can't know the function points -> time beforehand, we can only measure and map it from experience.
If you could explain "effort" from some other angle, I'd be grateful.
My answer is that you can plan for work to be done on a deadline but you can't negotiate when it will be done with the people doing the work. It's going to take as long as it takes. It's poor management that sets unrealistic expectations and demands results.
You can plan for work by using data. You get data by tracking effort estimated versus completed and triaging bugs. You prioritize goals instead of setting deadlines. You measure and refine.
It sounds counter-intuitive to business people who think in terms of, I just sold customer X the product and they need it delivered by Y so that we can get the team paid by Z. This is where poor management decisions can sink your team. If Y is decided by the sales or management team with the customer and they didn't consult their engineering team... then they're working on another planet. The goal of processes like this are not to eliminate Y but to set reasonable expectations and objectives.
As I like to remind my business owners: you can have something that works -- it may not be the whole kit -- or you can have nothing at all. Winning is about prioritizing objectives.
One book I've read recently that taught me a lot about management is Extreme Ownership. I think there's a lot of cross-over from this book into Agile methodologies that I think non-technical stakeholders can really understand.
If I'm interpreting that fragment right, your essay is treating "programmers development time" as a fixed rate of progress (the "9 women can't have a baby in 1 month" meme) so one answer to meet a deadline is to remove features until the deadline can be met.