Often every person on the team could rattle off 5 major problems off the top of their head. That's it, that's your postmortem right there, let's go and prioritize solving these underlying problems. But engineering orgs get obsessed with establishing a sacred heavy-handed process around postmortems. There must be a reviewer, and a chairperson, and a standardized form that takes an hour or two to fill out, and several rounds of revisions. This gives you a nice shiny metric to point to where you can say that 100% of incidents were followed up with a postmortem. If any big picture questions come up during the postmortem, well that's not the kind of thing you can just make a jira ticket for, i.e. "don't make the deadline so short next time". So we'll move right past that one and then make one or two actionable jira tickets like adding another view to the metrics dashboard which overfits on this particular issue while doing nothing to address the underlying problems, and then get back to work taking shortcuts and building up more tech debt on a whole new set of features.
A premortem may end up with the same sort of perverse incentives, but it's an interesting thing to try. If it's something the org commits to actually putting aside time for ahead of every deadline, I could even see it potentially helping a little bit to address some of the underlying problems.
Take SpaceX's early days, the actual cause of failure #2 wasn't on anyone's top 10 list. Doing a post mortem has value, it lets you see if the shortcuts you take make sense.
For postmortems, there's always existing risks that people knew about, but I think they're still helpful to identify which of those risks are actually causing the most pain and up their priority.
I've done premortems for the launches of new features where we imagine some symptom ("user apparently saves data, but it doesn't commit") and then have the team try to hypothesize causes for it. You can generate the symptoms either by thinking of common failure modes or having people on the team privately come up with their own cause-symptom pairs.
Another important thing to do is use the exercise to discuss what tools/dashboards/techniques you would use to diagnose and repair these issues. his can help you find gaps in your monitoring, logging, and diagnostic capabilities.
I love a premortem, but this isn't quite how I do them. I prefer to start long before a deploy, before a feature hits sprint planning. By asking my team "What could go wrong if we start this?" we often uncover some of the unknowns that can derail our engineering effort. It's particularly important to get the quality assurance team involved that early too, by getting them to think about how they'll test the feature - QA should be about implementing processes to assure quality after all.
I've since used them to mitigate risks to high-profile events that you can't really simulate, like major product releases or peak traffic events. In those cases, it's important that you do the premortem at a time where it can make a difference. Ideally there is a time when you are more or less code-complete but still have a few development cycles before the big day where you would be otherwise working on polish work.
Don't worry, our robots are hosted separately from our marketing site.
Edit: Here’s a snapshot of the page with seemingly no CSS loaded, same as I was seeing https://archive.ph/qe2IW
Clearly should have done a premortem :p