Hacker News new | past | comments | ask | show | jobs | submit login
Premortems will keep your code alive (cobaltrobotics.com)
67 points by eschluntz 7 days ago | hide | past | favorite | 13 comments

I like this idea and this framing of it. A way I've put it in the past is that a postmortem is only really useful if you thought that you were already doing everything you could to prevent an issue of this type. In 95% of incidents I've experienced in my career, this is just not the case. There was a known lack of testing, the deadlines were too tight and we decided to push through and take shortcuts to launch anyway, there was known tech debt where everyone on the team was scared of this code but it was too much work to address it so it kept getting put off, etc.

Often every person on the team could rattle off 5 major problems off the top of their head. That's it, that's your postmortem right there, let's go and prioritize solving these underlying problems. But engineering orgs get obsessed with establishing a sacred heavy-handed process around postmortems. There must be a reviewer, and a chairperson, and a standardized form that takes an hour or two to fill out, and several rounds of revisions. This gives you a nice shiny metric to point to where you can say that 100% of incidents were followed up with a postmortem. If any big picture questions come up during the postmortem, well that's not the kind of thing you can just make a jira ticket for, i.e. "don't make the deadline so short next time". So we'll move right past that one and then make one or two actionable jira tickets like adding another view to the metrics dashboard which overfits on this particular issue while doing nothing to address the underlying problems, and then get back to work taking shortcuts and building up more tech debt on a whole new set of features.

A premortem may end up with the same sort of perverse incentives, but it's an interesting thing to try. If it's something the org commits to actually putting aside time for ahead of every deadline, I could even see it potentially helping a little bit to address some of the underlying problems.

It depends how accurately you can predict failures.

Take SpaceX's early days, the actual cause of failure #2 wasn't on anyone's top 10 list. Doing a post mortem has value, it lets you see if the shortcuts you take make sense.

Sounds like you're tired of working at a place the size of your current job and yearn for some place smaller. If you can't make a jira ticket to discuss the unreasonableness of timelines, and how to address the most egregious lapses in judgement by management in planning and deadlines, what's Jira even for? Make your manager do some work! Otherwise we're all just faffing about with assigning story points in order to avoid the actual work of programming.

Yup, one of the things we care about is making the process as painless as possible, so one one groans if someone says "let's do a premortem".

For postmortems, there's always existing risks that people knew about, but I think they're still helpful to identify which of those risks are actually causing the most pain and up their priority.

Good engineering processes cannot make up for bad business processes and organization dysfunction.

Rather than ask "what could go wrong?" the standard technique of premortems[1] is to imagine an specific thing went wrong and then brainstorm root causes. Our brains are much more creative when working backwards this way.

I've done premortems for the launches of new features where we imagine some symptom ("user apparently saves data, but it doesn't commit") and then have the team try to hypothesize causes for it. You can generate the symptoms either by thinking of common failure modes or having people on the team privately come up with their own cause-symptom pairs.

Another important thing to do is use the exercise to discuss what tools/dashboards/techniques you would use to diagnose and repair these issues. his can help you find gaps in your monitoring, logging, and diagnostic capabilities.

[1] https://hbr.org/2007/09/performing-a-project-premortem

Before deploying new code, ask yourself and at least one other person “If this is going to cause a major issue, what would it be?”

I love a premortem, but this isn't quite how I do them. I prefer to start long before a deploy, before a feature hits sprint planning. By asking my team "What could go wrong if we start this?" we often uncover some of the unknowns that can derail our engineering effort. It's particularly important to get the quality assurance team involved that early too, by getting them to think about how they'll test the feature - QA should be about implementing processes to assure quality after all.

When I first heard about premortems, the application was project planning. Before embarking on a project, you'd assume it failed and try to root cause the failures. You'd then take action to mitigate these risks (or possibly scrap the project completely if you discovered a fatal flaw, but that would presumably be rare).

I've since used them to mitigate risks to high-profile events that you can't really simulate, like major product releases or peak traffic events. In those cases, it's important that you do the premortem at a time where it can make a difference. Ideally there is a time when you are more or less code-complete but still have a few development cycles before the big day where you would be otherwise working on polish work.

Author here! Sorry about the CSS crashing :p

Don't worry, our robots are hosted separately from our marketing site.

Page seems to not load CSS for me. Anyone else seeing that happening?

Edit: Here’s a snapshot of the page with seemingly no CSS loaded, same as I was seeing https://archive.ph/qe2IW

Turned out to be our nginx cache fighting our wordpress cache.

Clearly should have done a premortem :p

Same for me. Tried with and without uBlock turned on. Getting 404s for the css files

Yep, getting 404 on the css files.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact