That incident report is about an issue two days ago, the title of this post (which matches the page) is misleading.
They have however also been having issues today, at least with chatGPT, that looked to me to be issues with the web server configuration for their Next.js ChatGPT front end. It seemed to be returning the apps root html file for all requests for JS. All seems fine now though.
> Other issues are due to misbehaving bad hardware that need to be identified and removed from operation.
> We are actively working on addressing those limitations this quarter.
This always boggles the mind, but I've seen similar several times in the past on different HPC clusters. Hardware bugs that you just cannot seem to shake down, that are triggered just often enough to be a problem but seldom enough to be "impossible" to debug.