Hacker News new | past | comments | ask | show | jobs | submit login

Sudden surge of traffic as all their users returns to work?



Could be, it's the perfect time overlap between US-West, US-East, and Europe.


Yes - I wondered if they took some servers down prior to the break as a cost saving measure, and forgot to reinstate them.


Doubtful. It's not impossible a company the size of Slack would be reliant on a specific engineer logging on in the morning before a traffic spike so the service can handle the spike in load, but that's a misuse of modern distributed cloud-based computing.

Hate on the cloud all you want, but AWS has (several flavors of) load balancers and various ways to automatically scale up and down resources (and if you're conservative, you can disable the 'down' part). If you're operating a major SaaS company like Slack and not taking advantage of them, something's gone wrong.


It's easy to fall behind on bumping up the high watermark for your max autoscaling or for new traffic patterns to cause emergent instability. New code paths are taking unprecedented amounts of traffic all the time.

In 2021, how does one keep track of resource starvation at the process, container, os, service, pod, cluster, availability zone and region levels?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: