Hacker News new | past | comments | ask | show | jobs | submit login

what was the ultimate cause/fix of issues in your case? a database thing?





Insufficient testing

While that may be the case, the things like this I've experienced have been more along the lines of incompetent management.

In one case I was doing an upgrade on an IPTV distribution network for a cable provider (15+ years ago at this point). This particular segment of subscribers totalled more than 100k accounts. I did validation of the hardware and software rev installed on the routers in question prior to my trip to the data center (2+ hour drive). I informed management that the currently running version on the router wasn't compatible with this hardware rev of card I was upgrading to. I was told that it would in fact work, that we had that same combination of hw/sw running elsewhere. I couldn't find it when I went to go look at other sites. I mentioned it in email prior to leaving I was told to go.

Long story short, the card didn't work, had to back it out. The HA failover didn't work on the downgrade and took down all of those subscribers as the total outage caused a cascading issue with some other gear in this facility. All in all it was during off-peak time of day, but it was a waste of time and customer sat.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: