Does anybody else here agree with this mentality? This seems a major mispractice to me. I've worked at companies with as few as two people to as many as 50,000 people. None of them have had production systems that are entirely self-maintaining. Most startups are better off being pragmatic than investing man-years of time handling rare error cases like what to do if you get an S3 upload error while nearly out of disk space. There's a good reason why even highly automated companies like Facebook have dozens of sysadmins working around the clock.
I thought all of his other points were spot-on but this one rings very dissonant to my experience.
I agree - it seems off to me. Sometimes you really want to diagnose your problems manually.
I'm also wondering how command and control is maintained without SSH access. Is there some kind of autoupdating service polling a master configuration management server (i.e., puppet's puppetmaster)?
I can appreciate that ensuring that a typical deploy doesn't require hand-twiddling. That makes sense, lots of it. But not disabling SSH.
I think the problem is that I've made it seem like a strict rule in the article; "You must disable SSH or everything will go wrong!!!". It's really just about quickly highlighting what needs automating. Like you say, sometimes you just want to diagnose your problems manually, and that's fine, re-enable SSH and dive in. But if you're constantly fixing issues by SSHing in and running commands, that's probably something you can try to automate.
Personally I always had a bad habit of cheating with my automation. I would automate as much as I could, and then just SSH in to fix one little thing now and then. I disabled SSH to force myself to stop cheating, and it worked well for me, so I wanted to share the idea.
Of course, there's always going to be cases where it's simply not feasible to disable it completely. It depends on your application. The ones I've worked on aren't massively complex, so the automation is simpler. I can certainly see how not having SSH access for larger complex systems could become more of a hindrance.
Facebook has dozens of sysadmins working 24x7, but they also have 200,000 servers.
While diagnosing issues may require logins, you aren't going to be fix anything on 200,000 servers without automation. At large scale, all problems becomes programming problems.
It also means the death of jobs for sysadmins that only know how to go in tweak *.conf files and reboot servers. So, I guess that sucks for you.
1) Automation doesn't have to be complete automation. I can use Chef tools like knife-ssh to run a single command on every one of our boxes in near-real time. This might not be automated to the extent that the OP is referring but it's 99.8% more efficient than logging into 500 boxes to do it (or 200,000 in Facebook's case). If you can get 99.8% with ease, it may not be worth pre-automating for that final 0.2%.
2) Knowing what is worth automating comes from experience. If I have to do something nasty once, it might be a total waste of time to automate it. There are lots of one-off things I type at a command prompt that are quicker to run than to automate. I think being so dogmatic about automation as to say you should never run things at a command line requires you to spend peple's time non-optimally.
No there is no reason to give up on such a tool; the only good reason to support such idea is to get some sort of attention at your person by claiming something crazy
This is correct. The tip about disabling SSH isn't about security, it's just about quickly highlighting areas where you're not automated.
When developing an application for example, it's often necessary to SSH in to play with some things. But once you've ready to go to production, you want as much automation as possible. Forcing yourself to not use SSH will quickly show you where you aren't automated.
Someone else pointed this out too. The goal of the tip is really to stop users SSHing in just to fix that one little thing, so you could still allow your automation frameworks SSH access and just disable it for users (the idea is to disable in firewall, not turning off SSH on your server, that way you can still use it for emergencies). The idea worked well for me, but obviously isn't for everyone, YMMV.
Perfectly valid. This particular tip certainly seems to have caused some great discussion! It worked for my particular case, but I can definitely see it not working for everyone.
I've added a link to this thread to my tip, and expanded on it a little to warn people that it's not for everyone.