A cursory read of the article is enough to gather a main point, which should be a disclaimer: it's aimed at medium/big companies that have significant security threat model and have a dedicated sysadmin team.
If you have neither, this is adding operational complexity for no real gain. When your threat model is small, potential attackers are not sophisticated enough to target SSH access. And configuring this and properly using it is a big waste of time. You might even lock yourself out of your servers because of a mistake or lost keys (that's my biggest fear which prevents me from ever switching to certificates unless I get a dedicated sysadmin).
Yeah there are benefits, but they are just too abstract to bother for small companies.
> medium/big companies
My goal (or our goal at smallstep) is to build tools to make this easy enough that it makes sense for small teams. If you have one client and one server, pubkey authn will probably always be easier. And as long as it's the default, certificate authentication will always require some amount of configuration. But I think good tooling with baked in defaults to encourage best practices can make this easy and quick enough to deploy to be a good idea for most non-trivial SSH deployments.
However, much of this is aspirational at the moment. So your point remains. The point of the post was to raise awareness and start a discussion about what's needed to make this feasible for pragmatic people.
> dedicated sysadmin team
I don't think a dedicated sysadmin team should be necessary. On the contrary, with the right tooling, and unless you're just completely punting on any rational management of SSH keys, I think certificates are easier to administer.
> potential attackers are not sophisticated enough to target SSH access
The more common attack vector is a lost/stolen/compromised laptop or a private key that's accidentally committed to a repository. This is a very real vector. It happens in real life on a regular basis. Since certificates expire a lost/stolen machine or committed private key will become worthless for accessing internal infra very quickly. So it reduces the attack window, which is a real and significant improvement. For a compromised machine, SSH certificates aren't a cure-all, but they're at least as good as pubkey. They make it easier to keep private keys off disk, providing better cover. And they make it harder for attackers to exfil private keys, cover their tracks on the compromised laptop, and do bad things with them later.
> You might even lock yourself out of your servers
You could do this with pubkeys too. If this happens, cloud VMs typically have a virtual serial interface you can use to remediate. You can leave pubkey authn enabled and still use certificates day-to-day, so you could keep a backup pubkey available for a few select admin users that you trust. Alternatively, you could issue a long-lived certificate in a secure place for use in an emergency. If this is the thing that's preventing you from trying it out I hope these ideas get you over the hump! If they don't, I'd love to hear why!
> they are just too abstract to bother for small companies
This is a very common problem with security stuff :(. That's why I think the headline feature here is "SSO for SSH". It's an operational improvement and better UX. The security benefits come for free.
We decided to keep user ssh keys in ldap and pin the key of the ldap cluster. If LDAP were to go down, we have local [only] accounts we can access the box via the console.
Do all OS distributions have new enough openssh, is it ever built with missing support by default?
what extensions are necessary in the certs to be trusted for ssh
or the CA cert (is your average test CA cert going to be valid for ssh without tweaking)?
Do any (or all?) cloud server products regularly accept cert based configuration just as they accept public key lists?
How will this interact with other features like using a pkcs mechanism for a token key?
If revoking does work, that means the system will deny cert based auth when it has internet connectivity problems? If it wont isnt building more general pools of certs more dangerous than restricting systems to specific lists of public keys?
Also questions like what does an eaves dropper see in the clear and how does it differ from what they see with raw public key?
(I find it insane that ssh ultimately ends up with a weaker trust model than a pretty average website in a typical install due to less certainty and clarity about what a normal certificate based configuration should look like.)
Yea, fair... there's probably enough to cover for a whole follow-up on nuts & bolts and these sorts of considerations. This post was more about raising awareness and it was already super hard to keep it to 3000 words :P.
I think I can at least partially address all of these points though.
> Do all OS distributions have new enough openssh, is it ever built with missing support by default?
I haven't done an analysis but certificate authentication was added in OpenSSH 5.4 in March, 2010. So it's almost ten years old. So yea, I think all reasonably current distros have support at this point.
> what extensions are necessary in the certs to be trusted for ssh or the CA cert (is your average test CA cert going to be valid for ssh without tweaking)?
To clarify, SSH doesn't use X.509 like TLS/HTTPS. OpenSSH invented its own certificate format.
The docs for `ssh-keygen` have pretty good info on this.
* https://man.openbsd.org/ssh-keygen.1#CERTIFICATES has basic cert info
* https://man.openbsd.org/ssh-keygen.1#O has info on some cert options / extensions
If you have experience administering nix boxes & SSH the options / extensions should look pretty familiar. Basically, it's a lot of the same stuff you'd ordinarily put in config files on each host, but instead you're baking it into certificates.
> Do any (or all?) cloud server products regularly accept cert based configuration just as they accept public key lists?
As far as I know, none of the cloud providers work with SSH certificates out of the box. It's kind of weird, actually, because it seems like super low-hanging fruit and would be a better user experience. Instead, they use public key authentication and deliver pubkeys to an authorized keys file for you (via their metadata APIs).
That said, it's super easy to set this up yourself on a cloud VM. You could easily bake the necessary bootstrapping into an AMI. Alternatively, you could use a startup script. One of the things we're working on is streamlining this experience. For now, here's a gist demonstrating the concept (it assumes you already have `step-ca` running):
Most of the work there applies to other tools but obviously you'd use a different client instead of `step` to get the certs.
> How will this interact with other features like using a pkcs mechanism for a token key?
Oh hrm. This is a good question but I'm not sure, mostly because I haven't put SSH keys in a PKCS module myself. I know it's possible. You can implement the `ssh-agent` API pretty easily and back it by whatever you want. This is on our roadmap but we haven't done much investigation. It's possible that it just works already with standard `ssh` and `ssh-agent`s.
For MFA, you can still use PAM authentication with something like Duo if you want. Certificate authentication basically works exactly like public key authentication, but it manages public key distribution differently (using certificates).
That said, personally, I think the right place for MFA is during certificate issuance rather than on each SSH connection. That way you're not constantly asking people to MFA. This decision is somewhat subjective and depends on your risk profile / threat model.
Generally the best practice for certificates is to issue them for a short enough duration that you're comfortable not doing revocation. I think a work day is pretty safe and reasonable, but some people may not be ok with that and choose something shorter. Of course you'd have to reauthn more frequently then to get a new certificate. A thought to ponder: for any reasonably sized infrastructure public key removal won't be instantaneous either.
That said, SSH does have cert revocation capabilities. You create a "revoke file" and configure it using a `RevokeFile <filename>` directive in your `sshd_config`. So it's managed on each host, which means there are no connectivity issues, but also means you have more of a management problem (on par with managing authorized_keys :/). For that reason, if you really need revocation, my recommendation is to do revocation checks in a bastion host. Then you only need to maintain the revocation list in one place.
> Also questions like what does an eaves dropper see in the clear and how does it differ from what they see with raw public key?
Great question. I should know the answer to this, but I don't. I just spent like 20 minutes perusing the specs at:
and it's not obvious just from skimming. Also couldn't find a definitive answer from the google machine. I'll need to read the specs more closely.
That said, it looks like pubkeys/certificates are exchanged during diffie-hellman key exchange, before a symmetric key has been negotiated. The implication is that this information is sent in cleartext. TLS1.2 does the same thing, but 1.3 keeps certificates confidential. So there's possibly an infrastructure enumeration vector here -- a passive attacker could see whatever info is in your certs (usernames, public keys, permissions). It looks like normal public key authentication would also transmit a lot of this same data, but maybe not the permissions. Again, I'd have to look at the spec more closely. I think for most people this is not a big risk and the profile looks pretty similar to pubkey authn, but I'm saying that without a complete understanding so caveat emptor.
> I find it insane that ssh ultimately ends up with a weaker trust model than a pretty average website in a typical install due to less certainty and clarity about what a normal certificate based configuration should look like.
Lol yes. Me too.
To be fair though the trust on first use model with raw pubkeys made SSH way easier to deploy, which was one of the things that allowed SSH to displace telnet and, from a security perspective, it was obviously a huge improvement over telnet. So I don't fault the SSH folks for this. But it's 2019 now so we can/should do better!
It looks like there is some support in both keygen and ssh-agent at least for combining a cert file with a pkcs11 key and maybe I can work out how to get custom format in and out of pkcs15 storage. The situation with gpg card applet and gpg-agent ssh support looks like it may be similar..
My general view is that a smartcard with a non-extractable key, corresponding x509 cert, and pin manages the minimal something you have, know, and identity in the only way that everyone at least intended to support at some point. I.e. ssh actually works well with it using the raw public key, as client certs pkcs11 should work with the browsers, some of the vpn implementations have pkcs11 support.. I think even kerberos should be able to bootstrap identity from it with pk-init. All the other applets combined into commercial keys for MFA are all very interesting but tend to each be piecemeal in protocols they could be used for.
A program, script or any executable, which outputs (prints in stdout) authorized_keys style formatted data. This data can be pulled from any source.
Thus, if public keys are stored in LDAP and script pulls keys for active users only, public key can be easily revoked by disabling corresponding user account.
I still think it's harder to do this than it is to use certs, since you'll need to build some tooling to securely distribute keys. But that would vary by environment, for sure, and it's conceptually simpler for folks who don't know how certificates work I guess.
Do you know when AuthorizedKeysCommand was added to sshd? Is it available in most distros?
It's too bad there isn't a RevokedKeysCommand! That would make certificate revocation a lot more flexible. I wonder how hard it would be to add.
Yea Kerberos is another option. If you have all the necessary pieces, that is... you’d need managed devices, an LDAP/AD setup, DNSSEC & SSHFP, PAM & various agents on servers.
Certificates offer all the same benefits, and a few more, with less work & fewer moving pieces. They’re also more flexible since you can control the auth flow to get a cert.
You need GSSAPI anyhow if you have Kerberized home directories and not typing passwords at ssh. Although you're likely to have LDAP (in which you can store public keys), all you need is the KDC and a simple Kerberos setup on ssh clients and servers. I wonder why anyone thinks you need more.
Kerberos tickets and infrastructure are roughly equivalent to using ephemeral certificates as far as I can see. The one reason you might want certificates is if you require FIDO MFA, as I don't know if you can use anything other than OTP with Kerberos implementations. (The hook for "the auth flow" is preauth.)
You can also use "GSSAPIKeyExchange yes" to solve the trust-on-first-use issue without resorting to DNSSEC, since the server identity gets verified via Kerberos in that case.
The main benefits of certificates I can see are the increased auth flow options and better scaling at large scales since you lose the service ticket traffic.
Also, I'm not convinced that TOFU for SSH is really all that terrible. Maybe it's just my use case and my situation, but I SSH to the same servers over and over again. And if I need to give access to someone new I give them the public key first.
SSH X.509 looks like a complex solution to a not very big problem that can be more easily solved with DNSSEC and SSHFP.
* They put authentication to your SSH servers entirely under your own control, without escrowing keys to the operators of the TLDs.
* They're deployable independently, without enrolling all of the rest of your infrastructure into DNSSEC and all its unreliability and complexity.
* They integrate with IDPs, so that users can get short-duration authenticators based on MFA logins, without storing long-lived secrets on their machines.
SSH X.509 certificates have nothing to do with the web PKI, other than that they use the same encoding. Engineering orgs that use SSH CAs run their own CAs. That's the point.
Happy to have helped!
> Operating SSH at scale is a disaster. Key approval & distribution is a silly waste of time. Host names can’t be reused. Homegrown tools scatter key material across your fleet that must be cleaned up later to off-board users.
Outright lie or incompetence at worst, or a "my salary depends on not understanding this" case at best, being demonstrated here. Even if you don't have chef/puppet/etc managing at least some level of fleet configuration (if you don't, your org is hopeless anyway and ssh-at-scale is not a first problem to solve), it's oh-so-easy to put just your pubkeys somewhere (S3, eg) and manage them via, say, git.
Next problem is that openssh-style certs do not support cert chains. So you cannot revoke an intermediate. A huge aspect of PKI is that you must be able to recover from a compromise. The rest of this point is left as an exercise for the reader.
You have to secure the CA! You have to properly authorize signing requests! These things fall into the realm of hard.
Puppet's ssh key management only adds and removes public keys that you specify, but leaves other keys intact. Which means that a hacker can place a backdoor key in authorized_keys that is not going to be spotted by Puppet. Cert-only auth prevents this class of attack from the beginning.
Just generate the complete list of authorized_keys yourself, don't let puppet modify it for you. But do let puppet distribute the file. It's beyond easy.
But this is a distraction. Anyone that can influence the insertion of a permanent backdoor in authorized_keys is almost certainly in deep enough to not care about any specific mechanism. Being able to populate arbitrary, unaudited data on your hosts is an intolerable situation to be in.
And anyway, you're wrong. The 'purge_ssh_keys' directive can be set to 'true', disabling merge.
> Cert-only auth prevents this class of attack from the beginning.
Cert auth is a subset of pubkey auth, which includes both. You could set AuthorizedKeysFile to 'none' but you'd have to explain how an attacker that can influence authorized_keys cannot also influence sshd_config. Of course these are different files likely populated via different mechanism, but the threat model you are implying here seems overly specific. It might work for some people but, given the tone of the article (unspecified 'you' are doing in wrong), I don't buy it.
> chef/puppet/etc ... put just your pubkeys somewhere (S3, eg) ...
Yea, cool, and now you have a second authentication system that you have to manually administer to onboard and offboard people. And you have permanent long-lived single-factor credentials on a bunch of (possibly unmanaged) endpoints, that are easy to exfiltrate, and are a common target for attackers if a laptop is lost/stolen/compromised.
You haven't addressed host name reuse or host rekeying which, again, are real operational problems.
And somehow running and administering chef/puppet/etc or this S3 bucket is easier than running a simple certificate signing service connected to SSO that requires basically zero maintenance once it's setup? Argumentative, at best.
> Next problem is that openssh-style certs do not support cert chains ... recover from compromise.
Since only hopeless organizations don't have config management, we can solve this there. SSH PKI is simple and doesn't have intermediates. But you can configure multiple roots. To rotate or revoke a compromised private key you'd simply deploy a new root using cert management and issue new certificates. In a non-emergency situation you can slowly roll certificates by having two trusted roots for a transitional period while everyone gets new certs.
> You have to secure the CA!
You have to secure your configuration management that's deploying public keys everywhere and your authorized keys files on every host! Pubkey distribution has a bigger attack surface area and the same exact compromise consequences as a CA. This is not even a contest.
> You have to properly authorize signing requests!
You have to authorize requests to deploy public keys!
There are tools for this. What's the hard part here? Verifying an OIDC identity token? Mapping a token subject to an SSH principal?
Worst case, you could keep whatever crappy manual process you have for authorizing public key distribution and issue a certificate instead.
> These things fall into the realm of hard.
Ok, I guess? This is all relative. It's easier than pubkey authentication at any non-trivial scale. There's a difference between something being hard vs. being unfamiliar.
You have that anyway. I wasn't suggesting to dump authorized_keys into S3 if you aren't already using S3. I was just using a throwaway example. Any large enough environment to care, is large enough to need to manage this type of resource as a sunk cost.
> somehow running and administering chef/puppet/etc or this S3 bucket is easier
Yes, because it's a sunk cost, and mandatory anyway, outside of any ssh consideration. Any org should be striving to do less things, not more things.
> simple certificate signing service
there's no such thing
> configure multiple roots
You got me there. That is a reasonable enough solution. Of course your 'backup' root has to be offline, but that's the same as with an intermediate. I could probably manufacture a case where having a cert chain is easier but since it's not obvious to me at a moment, I'm sure would be pretty contrived.
But, aren't you getting farther and farther away from simple?
> You have to secure your configuration management that's deploying public keys everywhere and your authorized keys files on every host!
True. However, that's a sunk cost. You have to do that anyway.
> Worst case, you could keep whatever crappy manual process you have for authorizing public key distribution and issue a certificate instead.
Then why bother with the cert at all?
Also, you are insisting that an org's manual process is crappy. If your process for pubkey authz and distribution is crappy, you have crappy practices and it's a sure thing your shiny new CA is a huge liability as another crappily deployed and crappily maintained system.
> It's easier than pubkey authentication at any non-trivial scale.
Well, I dispute that. At large scale you have even better config mgmt and if you're doing it right, can handle either way. CA is still probably better because the benefit to you at large scale is not relief from managing pubkey resources, it's from being able to do short-lived certs as well as an easier route to tighter authz. At moderate scale, get real, these things aren't a concern.
Your main problem isn't that you're wrong, it's that your absolutist view of it is a distortion. (Which makes it wrong.)
I've seen projects that'll do the SSH CA part, but unless I overlooked it I didn't see any RBAC or mapping ability to restrict users to specific servers.
This is one of a handful of problems we're working to solve / streamline for an SSH product. We might even open source this bit, but not sure yet. If we don't, it still probably wouldn't be $hundreds/$thousands per month unless you have a huge org. If you're interested at all I'd love to talk more about this -- what your requirements are, whether you want hosted/not hosted, whether you want to pay at all / how much you'd be willing to pay, etc. Easiest thing to do to stay in the loop is send us your email at https://smallstep.com/sso-ssh/ by "requesting early access" and we'll reach out to schedule a convo & keep you updated as we make progress. Or just watch our blog! :)
I think I was a bit vague about RBAC. I'm thinking that the user account creation isn't vital in this use case since the audit log holds who generated/access. It's more about controlling who can generate a cert for what.
Eg, I think Teleport's model has the RBAC config at or near the CA part, so the cert generation either happens if authorized or denied if not.
We also have a one-time-token mechanism that you can have some trusted infra like Puppet issue to hosts as they come up. The token includes the specific name that you want bound in the cert. It can only be exchanged for a cert with that subject.
Super secretly: we also have a whole policy language and enforcement engine that we'll eventually get around to doing something with and would address this issue pretty comprehensively.
After reading your comment again I think I still might be misunderstanding your use case. Are you talking about having principals like "frontend" and "backend" and then having RBAC that says "mike can get a cert for frontend"?
SSH only does authentication.
Eg, UserA only has been granted access to ServerB, so only show them ServerB as a target and reject all others.
If you already have Kerberos, fine. If not, certs are way easier to implement & more flexible.
For client certificate, the fallback can be long-term certificates, locking up in a safe, that should only be opened when all other ephemeral certificate fail. I think this is less risky.
The fallback certificate is only client-side, and can be keep offline/air- gap from the main credentials.
You only need to get the CA to sign the fallback cert.
^ sounds just bad, doesnt it? be respectful to others and thou shall succeed.
There’s a difference between being respectful and walking on egg shells. I can respect you and still say, bluntly and to your face, that you’re wrong. In fact, I think not doing that would be disrespectful.
The title was tongue-in-cheek. It communicates a concept compactly. It’s direct and draws attention. It’s hyperbolic, but hyperbole is a legitimate literary tool. I am sorry if people are taking it too literally and somehow making it about them and their insecurities and being offended. But I think that’s as much on them as it is on the title.
This word police thing is sort of annoying and pedantic. If you actually read the article it’s unboxed and explained carefully and fully. That’s what matters.
We don’t sell anything related to what’s in that post (actually we don’t sell anything right now). It’s all open source. We believe everyone deserves good PKI; that it’s an underutilized technology with bad tools. We have plans to make money off other stuff once that’s in place.
Admitting it is clickbait does not lessen my disdain for the use of clickbait.
Perhaps the title could have been more diplomatic. Perhaps I was not sensitive enough to the feelings of folks who use non-certificate-based SSH authentication. But it’s just a title. I meant no harm. <3
Do you have any feedback on the content itself? :)
It’s quite possible to be accurate in my language and still speak informally.
It bugs me a little that you seem to be deflecting criticism and minimizing it rather than just accepting it and moving on. I didn’t think your headline wasn’t “diplomatic” enough and don’t think that’s relevant to this comment hierarchy where you respond.
You need to get to familiarize yourself better with your audience.
I don’t want to be antagonistic but that’s just not true. I’ve talked to a lot of people about this. Maybe 10% of people I’ve talked to know how to use ssh certs. These are technical people who are very smart and know what they are doing, and know what “grok” means. That’s why I wrote the post.
If you already knew the info in the post then cool! Sorry to waste your time.
So huge thanks for the article!
OTOH, key deployment depends on the situation and size. We have a single office (=> no network bottlenecks), our /home lives on a central NFS and machines pull their users from LDAP. When I joined the company, after I got my account, I ran `ssh-keygen`, set my keyphrase and could connect to any machine. If someone quits, the LDAP user is removed.
Regarding TOFU: I think we have some admin.git which contains all machines, and their public keys are distributed from there. So no TOFU for us. With PKI this central repo of machines wouldn't magically go away, the script would be just someone else's/your's (and it would be technically cleaner).
Also, when reusing hostnames the deployment system could reuse the sshd keys instead of creating new ones.
I immediately understood the concepts by skimming the article and the potential benefits, but also because I understand the dangers and effort involved in running your own CA, the potential pitfalls.
I am extremely pleased to have learnt something new today.