Hi, right now you seem to have a downtime (500 for DPAD_all.png and timeouts for /sp/current). The app however does not display that an error has occurred.

Yes I know - So much for my big talk about handing a large load. There's no substitute for proper load testing.

The server that both the app and my blog are on only has 1 core and 1.5gb of ram. I didn't pay close enough attention to kafka, and it turns out kafka was using half of the available ram starving the rest of the processes. And it didn't help people's bots were thrashing POST requests at nginx to edit the page.

I've just done some high fructose server maintenance, spinning up a new machine 4 cores and 8 gigs of ram. The old site is proxying all traffic across, and once DNS propagates it'll stop being hit at all. Hopefully that'll ease the congestion.

Edit: at the time of writing everything seems back up and happy. Nginx was running out of open file descriptors, kafka was eating all the ram and ghost (my blogging platform) wasn't sending the right cache-control headers.

A few tweaks and a bit more CPU to play with and everything seems happier now.

