Hacker News new | past | comments | ask | show | jobs | submit login

How do you monitor HW and network failures and how do you notify SoftLayer? Is that 1-2 hours replacement time true for each components of your server fleet?

1-2 hours is their new server provisioning time. For HW issues we use nagios (that checks raid health and ECC memory health regularly) and at the moment we just file a ticket with SL about the issue showing them the output from our monitoring. They react within an hour and HW replacement is usually performed within an few hours after that (usually limited by our ability to quickly move our load away from a box to let them work on it).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact