A monthly standing meeting as a result of failure to communicate? Even if they had implemented the recommended procedure prior to the failure, this wouldn't have mattered one bit: "ensure that decisions not to install microcode bundle updates are documented and approved." That's not a fix. It wouldn't have affected the impending decision favorably, just made it documented and rubber-stamped.
The approvals aren't done by a person with any technical knowledge, it's a standard-issue bureaucrat. Literally, "authorizing official" and "responsible directors" spelled out in the PDF consist of an actuary with an MBA and BA in economics, a single graduate in "Sceince" from '77 whose resume is littered with leadership platitudes and zero indication of hands-on work in technology, two-three underwriters, and the usual coterie of HR, procurement, inclusion professionals.
This is how you get gridlocked bureaucracy and why the Federal retirement system is literally run out of a Pennsylvania cave.
I got the impression that it was more that the goal is a more in-depth discussion of the technical aspects of the updates and the risks associated with doing and not doing the update. "Formalize the monthly microcode bundle meetings with IBM and Unisys to include documenting meeting participants, detailed meeting minutes, and discussions of risks identified in the release notes for the current microcode updates."
I'm not disagreeing with you about this being a CYA report, but at the same time, to me this feels like a problem with no solid solution. The only thing I would venture to say would have helped, based solely on the report, would have been notification of the issues the other IBM customer had in January to customers who had not upgraded; at the very least to disseminate what the condition looks like and the monitoring script much sooner.
IMO, that statement alone does illuminate a "solid solution", which is to reduce vendoring out some of these critical functions and gradually staff leadership with a mix of people who have significant hands-on experience with technology. Or more shortly, "care for properly."
overflows... even happen to the best of us.
Props to the IRS for having the processes in place to handle. It took hours, sure, but at least it was ordered and the rest of us can benefit from this
If I leave this industry I feel like my over-exercised skill of juggling this mental overhead will be pretty useful in places where the burden is lower. I'll feel like I have an extra brain just laying around in reserve.
And if you're asking about the phrase, it's intended to mean that if a problem is experienced by "the best of us," then surely it happening to the rest of us is not something to be ashamed of, but only to learn from.
"Per the ESS Managed
Services contract, Unisys or IBM as the managing contractors who are responsible for the
IRS Tier 1 storage environment should have been first to identify the outage and contact
the IRS. During this outage, it was the IRS who initially recognized a problem, and the
IRS had to reach out and notify its contractors to prompt action on remediation. The
contractors did not uphold their contractual agreement."
It seems like it comes down to a damned-if-you-do-damned-if-you-don't decision to install (potentially?) major upgrades as soon as they come out, or wait to see if early adopters have any issues.
While knowledge of the issue from January for another IBM client would have changed things, I'm not sufficiently familiar with if issues like this are routinely spread among the install base.
It sounds like the presentation of the firmware upgrade had about as much information as change logs (unfortunately) normally do: Fixed bugs. Performance enhancements. &C with no further detail. Details would have perhaps aided in identifying the issue more quickly once it began happening, but I'm not sure I believe it would have changed the initial decision unless it was something whose trigger was inevitable.
...it’s almost like watching the TV show Jackass
More details with references: