Hacker News new | past | comments | ask | show | jobs | submit login

Yes, it sounds sort of like that. But this can be remediated by two things I was asking about: smoke tests and partitioning.

When making a config change I'd assume they don't make it to all servers at once and instead roll it out gradually. If this caused the server to instantly start 503'ing all customers, presumably this would have been caught - perhaps it was more delayed though (resource leak, etc) and obviously that is somewhat more difficult to catch.

If they're properly partitioning customers, ideally they wouldn't even ship the configs to all servers (slightly less good, but still pretty good they could ship them there but not parse/load them). It sounds like at the least this customer's config change effected 85% of servers, which seems absurd to me.

So yes, I can see how it happened, but for Fastly, which runs one of the biggest CDNs, these don't seem like very reasonable mistakes.




One of the big competitive advantage Fastly has compared to say Akamai is that configuration changes roll out extremely fast. I could see them skimping on smoke tests to keep that advantage and not thinking that this could ever happen.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: