Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is there anything that can be done to mitigate that? E.g. give ssh network and the daemon top cpu priority?


Recovery-oriented computing. Seperate Bohr bugs (easy to reproduce) from Heisenbugs (difficult to reproduce/change their behaviour once investigated).

Bohr bugs generate an alert and happily meander through normal support channels.

Heisenbugs go through phases -

1. Probation. On continued failure,

2. Restart. If the app or service fails after a restart,

3. Reboot. If the app or service fails after a reboot,

4. Re-image. If the app or service fails after re-imaging,

5. Remove/elimate the node.


Some equipment will auto-revert to a last known good configuration if you don't approve new changes within a window... though high CPU could lock that process up..


in this case the old configuration was lost, it took an hour to rebuild because tooling normally used to rebuild it for testing was unavailable and building had to be done locally, using a single machine someone ssh-ed into, and that just takes a while. Luckily, a person was around who knew how to do the rebuild without fancy tooling.


Yes - use taskset or isolcpus with other magic to put sshd on its own CPU core, or one core per CPU. Lots of HFT places do that.


That doesn't help if the problem is a bandwidth congestion problem.


It can help some amount, though. Bind the NIC interrupts to a small handful of cores. Or, ensure that ssh only works through a management NIC, and have that NIC bound to the same cores as sshd. You can get really fancy with these setups, especially when working with NUMA stuffs


I'm a bit surprised there's no sort of SSH undo subroutine that reverses the previous command if connectivity is lost. Of course it couldn't cover every possible stupid thing but it could fix simple stupid mistakes like fouling up a port assignment or disabling the wrong network adapter.


A commit / confirm / rollback cycle is how this is typically dealt with in network automation.


How does ssh know what command you did and how to reverse it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: