> During the early part of this event, we were unable to update the Service Health Dashboard because the tool we use to post these updates itself uses Cognito, which was impacted by this event.
Poetry.
Then, to be fair:
> We have a back-up means of updating the Service Health Dashboard that has minimal service dependencies. While this worked as expected, we encountered several delays during the earlier part of the event in posting to the Service Health Dashboard with this tool, as it is a more manual and less familiar tool for our support operators. To ensure customers were getting timely updates, the support team used the Personal Health Dashboard to notify impacted customers if they were impacted by the service issues.
I'm curious if anyone here actually got one of these.
The PHD is always updated first, long before the global status page is updated. Every single one of my clients that use AWS got updates on the PHD literally hours before the status page was even showing any issues, which is typical. It’s the entire point of the PHD.
Through reading Reddit and HN during this event I learned that most people apparently aren’t even aware of the existence of the PHD and rely solely on the global status page, despite the fact that there is a giant “View my PHD” button at the very top of the global status page, and additionally there is a notification icon on the header of every AWS console page that lights up and links you directly to the PHD whenever there is an issue.
The PHD is always where you should look first. It is, by design, updated long before the global status page is.
My employer is a pretty big spender with AWS. I didn't hear anything about anybody getting status updates from a "Personal Health Dashboard" or anywhere else. I can't be 100% sure such an update would have made its way to me, but given the amount of buzzing, it's hard to believe that somebody had info like that and didn't share it.
Is it really? I get the value of eating your own dogfood, it improves things a lot.
But your status page? Such a high importance, low difficulty thing to build that dogfeeding it gives you small amount of benefits (dogfeed something bigger/more complex instead) in the good case, and high amount of drawback when things go wrong (like when your infrastructure goes down, so does your status page). So what's the point?
I can really imafgine what happened: Engineer wants to host dashboard at different provider for resilience. Manager argues that they cant do this, it would be embarassing if anybody found out. And why choose another provider? Aws has multiple AZs and cant be down everywhere at the same moment.
Engineer then says „fu it“ and just builds it on a single solution.
Poetry.
Then, to be fair:
> We have a back-up means of updating the Service Health Dashboard that has minimal service dependencies. While this worked as expected, we encountered several delays during the earlier part of the event in posting to the Service Health Dashboard with this tool, as it is a more manual and less familiar tool for our support operators. To ensure customers were getting timely updates, the support team used the Personal Health Dashboard to notify impacted customers if they were impacted by the service issues.
I'm curious if anyone here actually got one of these.