One thing this blog doesn’t get into is probably the one thing I struggled with for the longest time. Maybe it’ll come in the next part, but how are people setting up their authentication/access to their hosts? Are you using root and allowing root over ssh? Maybe you’re limiting subnets in sshd_config? Maybe a separate local account named something like “ansible?” With full sudo permissions and nopasswd?
How are people properly hardening this account? I would love to hear how others are doing this because every time I think about our implementation, it just feels wrong.
Experienced the same dilemma. I’ll say what I do and what I want to do.
What I do is SSH as root using SSH certificates. This allows me to use rsync commands in Ansible, because elevating to root after login does not.
What I want to do is generate short-lived SSH certificates for root login. Something like, “Run prod-ssh command, enter password, touch YubiKey, and I can SSH as root for the next four hours.” I know how to make this possible, it would just require a couple days of engineering to do it, and I have other things to do.
If I had more hosts to work with I would make one of them run HashiCorp Vault, and serve as an SSH certificate authority for the others.
If I were a team of people managing servers I would probably switch from Ansible to something else. I would want every configuration action to go through source control before being pushed to live servers. As it is, I’m one person, so I don’t need that.
Evaluate for yourself whether this meets your needs, of course. I want to find my own balance between security, safety (removing foot-guns), and convenience.
For root rsync in Ansible, have you tried using "become: no" and the synchronize module with "rsync_path: sudo rsync"? IIRC that's worked quite well for me.
It'd need passwordless sudo so only a minor improvement, but avoiding root login allows you to limit which tasks run as privileged/non-privileged users, tick boxes when it comes to audits, and allows multiple users to run the same playbooks under their own account.
As you've just said, it might not fit everyone's needs, but thought it's worth putting out there for anybody who would prefer not to log in as root.
If I wanted to tick boxes when it comes to audits, TBH I would migrate from Ansible to something else where changes get pushed to source control first (before push) and nobody has root access during normal operations. I’m using Ansible specifically because I can push directly from my personal machine.
I strongly prefer NOT to have passwordless sudo. Disabling root login and then enabling passwordless sudo seems like a pointless exercise in ticking the boxes—any benefit from disabling root login is undone by enabling passwordless sudo.
Regarding your idea of short lived root SSH certificates. Netflix BLESS[1] is an implementation of this, but it relies on AWS Lambda. Thers's also an open source re-implementation without this dependency, called CURSE[2].
I've.. gone a little crazy with it. I have my own Ansible Collection consisting of a common role, a Certificate Authority role, and a pair of SSH Host/Client roles, which together provision and lockdown the CA server, configure the users, principles, and certificates on my machines, and setup a system for future (renewed or revoked) certificate propagation. It was one of my first major ansible projects, and I learned a heck of a lot about best-practice putting it all together. The CA server will renew keys automatically, but can only be accessed with a dedicated key that's left in the playbook directory.
I have a separate playbook called 'provision' that, among other things, creates an unprivileged user, gives it sudo permissions, disables root login and password authentication in sshd_config and then restarts ssh. It's the only playbook that uses root. All subsequent connections are made using the unprivileged user and the become directive for privilege escalation. It's a bit awkward, because the playbook can only be run once, but it works in practice.
SSH with public/private key pair or certificate authentication and sudo (optionally with pam_ssh_agent_auth[0]). This of course means you have to have access to the node first before you can do anything with SSH. Which most often is the case if you roll your own base images in cloud environments or use something like cloud-init.
Just curious, what was your rationale for using pam ssh agent auth over nopasswd? It definitely looks more secure as it’d prevent our sudoers from sudoing into our ansible account but that’s about it. We have local console access disabled for the ansible account.
The SSH key you use for sudo can be different than the one you use for accessing the machine, though I never used it that way.
When I first used this setup I tried to build a server configuration that itself was completely open source[0]. So there was no way to bring secrets (like a root or sudo password) to the machine as everything had to come from the source repository so I had to rely on asymmetric authentication.
My managed hosts get a special "for Ansible use only" user account (which has "sudo" privileges) created at installation time.
That user is only allowed to log in to those hosts a) with a couple of private keys (i.e., no password authentication allowed) and b) from a few specific (bastion) hosts (controlled via firewall and "Match" rules).
A dedicated private CA and short-lived certificates would be "better" but I haven't yet bothered.
If you use `become`, you can use an otherwise non-privileged account with sudo rights. When executed with the --ask-become-pass, ansible will interactively prompt you for your sudo password before commencing.
If you feel manually filling in the sudo password every time is cumbersome, you could also integrate it with a yubikey or make it fetch your sudo password automatically via e.g. GNU pass.
The standard practice is for ops/developers to SSH to servers as themselves. They get sudo privileges on the machines that they should be able to manage.
Usually this is exclusively using SSH keys, password authentication is blocked. root login is blocked too, you can only login as a normal user and sudo.
It's quite easy to manage with ansible itself. Make a configuration file to store public keys and username and that's about it. That will work pretty well until the company has 500 employees and wants to invest into a SSO solution to manage authentication company wide (AD/kerberos/pam).
For radius do you have a secondary/alternative method of getting into the system in case radius is down? Or is radius a core service for your environment and a P1 issue if it’s ever down?
How are people properly hardening this account? I would love to hear how others are doing this because every time I think about our implementation, it just feels wrong.