Hacker News new | past | comments | ask | show | jobs | submit login
SSH Bastion Host Best Practices (goteleport.com)
358 points by old-gregg 11 days ago | hide | past | favorite | 84 comments





I think this article has come up before? Either way, it's a quirky thing for Gravitational to post, since their flagship project --- Teleport --- basically eliminates bastion servers altogether (you might think of it as an API-controlled self-contained bastion server). Teleport is free, and worth checking out: it solves a bunch of SSH management problems, not just controlling access, but also linking SSH access to SSO, running fleet-wide commands selectively, and generating transcripts of SSH sessions.

Teleport is kind of big and sprawling. But they've repeatedly contracted Doyensec to do assessment work for it, and Doyensec is a fantastic firm. I think parked behind Tailscale, so none of your SSH infra is exposed to the Internet to begin with, it's a pretty great solution, and I'd do that again before I ever hand-tooled an SSH bastion host again.


An interesting category of HOWTO have been companies teaching you how to do it yourself, for real. Before you get started, they pitch you at the end offering a paid-for option that has 0 learning curve. That's a good pitch.

Yes, McDonald did this kind of thing a few years back https://youtu.be/rcu4Bj3xEyI

Thankfully here the pitch is at the very bottom in the conclusion (although it's also in the sidebar).

Realistically what is the risk of ssh being exposed if white listing is done, 2 factor auth and key auth are used? I suppose someone using a zero day, spoofing or from a whitelisted IP may successfully exploit but really?

If you're asking about the risk of exposing Teleport to the Internet, part of the issue is that Teleport does a lot more than SSH.

We've been experimenting a bit with tailscale and ssh access - and I'm not 100% convinced there's a great way to guarantee continued access - if you bind sshd to the tailscale vpn ip, an update that restarts ssh and tailscale could result in sshd not being able to bind the expected IP - leading to ssh being down. I think this is mostly due to sshd listen directive being somewhat limited.

I am doing a pilot of the same thing.

so far I am mostly using tailscale + firewall. Using a firewall directly on the host as you mentioned seemed a bit dangerous - although we are trying it on a few servers. For now cloud provider firewall + tailscale.


> Either way, it's a quirky thing for Gravitational to post, since their flagship project --- Teleport --- basically eliminates bastion servers altogether

Not quirky at all. Probably the aim is to inform the reader of the myriad things to do to keep bastion server secure, and then suggest there is an easier alternative. :)


There's a few weird things, but it's mostly okay.

Do not trust the firewall on the bastion host, if an attack can get into the bastion host, they can disable the firewall, so it cannot be used to limit egress. It's better than nothing, but consider using a firewall that's managed on a via a separate management network. I do agree that you should only allow SSH from a few known IPs.

Limiting the number of users is weird, and not recommended. Create all the accounts you need to provide individual accounts for the staff that need to access the bastion host, you will need that as things like HIPAA require named accounts for auditing. None of the accounts need any privileges other than the most basic. Users do not need sudo/root privileges on a jump host.

Other than those two complains, it's good recommendations.

A final recommendation: If you use AWS though, consider using Session Manager instead of SSH and drop the bastion host. You can still connect using the SSH command, using proxy command in OpenSSH, but no public IP or bastion host is required.


> A final recommendation: If you use AWS though, consider using Session Manager instead of SSH and drop the bastion host. You can still connect using the SSH command, using proxy command in OpenSSH, but no public IP or bastion host is required.

Yes, this. Also check out https://github.com/rewindio/aws-connect for a convenient wrapper around SSM to make it easier to use (I'm not the author).


I wrote something similar after I moved our fleet to SSM because I didn't want yet another CLI app to memorize flags on. It's ruby based and runs in an interactive mode by default. It doesn't cover the whole set of `aws ssm` featureset but focuses just on things that are needed for debugging sort of tasks. Leaving it here incase it's useful to anyone else: https://github.com/ajbdev/ruby-ssm-ops

Nitpick: the aws-connect quickstart suggests to install it through bpkg. But it turns out that bpkg does not have any "uninstall" or anything similar. I ended up doing just:

    rm ~/.local/bin/aws-connect

> if an attack can get into the bastion host, they can disable the firewall, so it cannot be used to limit egress.

This assumes that the attacker can get unconstrained root access to the system. It's fine to assume that attackers will but it's not as if you can't make that difficult.

Agree with the rest of what you said though.


At least in the DoD and IC environments I've worked in that had bastion hosts, the bastion host was severely locked down:

- Shell compiled without built-ins

- No coreutils

- No sudo

- Root account disabled

- Read-only root filesystem

- No user home directories

- Destroyed and rebuilt from template every X hours on some maintenance schedule

Effectively, all you can do is ssh in, ssh out, and forward ports. It might be theoretically possible, but as far as I know, no one has ever compromised one, especially since you can already only get to the bastion from a government VPN anyway, and authentication to that requires a smart card, so there are an awful lot of things you need to compromise to get to that point.

This also answers the suggestion down the page of "why don't you just apply the same controls to every host and not have a bastion." Because the bastion is unusable and you want to actually use your other hosts.


This setup should be easier than it currently is. Any bastion host that’s used for more than jumping is asking for trouble.

From a defense standpoint, one should consider "shell on a box" to usually mean attackers can get root on a box. If they can get persistence, they can wait for a kernel CVE to abuse.

Now, if you're just using a bastion as a jump host, you don't need to offer shells on it. Just allow people to proxy a port to behind the bastion and be done with it.

    PermitTTY no
    ForceCommand /usr/sbin/nologin
    AllowTcpForwarding yes
    AllowAgentForwarding no

I suggest:

   AllowAgentForwarding no
   AllowTcpForwarding yes
   X11Forwarding no
   PermitTunnel no
   GatewayPorts no
   PermitOpen *:22
   ForceCommand echo 'Nope'
Then:

    ssh -J finaluser@finalhost user@bastion
You can nicely use your local agent, etc. Bastion is relatively hardened. Etc.

I think it's probably reasonable when performing your incident response or even threat modeling to assume the attacker has or could escalate privileges. The linked article doesn't discuss anything that would make that harder, although perhaps practices like staying patched and minimizing attack surface are somewhat assumed (they do bring up choosing your OS based on minimizing attack surface for example).

There's also a lot you can do to harden that boundary. You can harden your kernel, you can execute user's shells in constrained environments like docker containers or restricted shells, leverage sandboxing technologies like apparmor or selinux, etc.

The user/root boundary can be a lot thinner than people expect, so I get why you'd want to point out that reliance on the attacker not escalating should be met with an evaluation of that boundary, but I think it may be understating the boundary to unconditionally not trust a host based firewall, or to say that getting onto the bastion itself is enough to disable the firewall when it does indeed require escalation.


Is finer grained control over TCP forwarding possible? e.g. allow forwarding only to certain TCP ports?

"PermitOpen *:22"

If you use SSM instead of a bastion host, how do you tunnel traffic to internal services that are not exposed to the internet?

I haven't actually tried it, but you can use SSM in your ssh config as a ProxyCommand. As I understand it, that will allow you to just use the ssh command as normal, with all the normal ssh abilities to do tunneling and port forwarding.

Yes that's right. You can use SSM to port forward. Here's an example of the SSH configuration.

host i-* mi-*

    ProxyCommand sh -c "aws ssm start-session --target %h --document-name AWS-StartSSHSession"

Twice I've seen Bastion Hosts compromised. Both times it practically gave the attackers the highest access. In one case it basically hid where the attack came from (compromised logs and all). In another it let them hijack an admin's password by reading his sudo.

IMHE, Bastion Hosts suck.

If you are forced to use one, send logs to a safer one-way storage encrypted and put tampering triggers everywhere you can in the Bastion Host. Also make sure you log outgoing connections. And make sure you can easily match incoming to outgoing.

If you absolutely have to use sudo on the Bastion Host force it to OTP only. Or if absolutely not possible, use 2FA, but this is a risk as something somewhere might not be properly protected and the password will leak. But the better way would be to have the bastion host run on some read-only image and not letting it upgrade or do any admin task at all. Maybe even remove admin users, SSH, the whole lot.

And related, do not have a single account with god-like access to everything. Isolate permissions. This is probably the hardest to get OK'd but it's the classic SPOF where they got you by the balls.


     IMHE, Bastion Hosts suck.
I agree, any security standards you're going to apply to a bastion host, just apply them to your entire network if possible, add security at every layer. So many times a bastion host just serves as a checkbox with added toil of jumping through a host. I despise them for the most part.

Having seen how bastion hosts or “jump boxes” work inside the enterprise I share your view. In practice they are generally not very well protected and are a very attractive target for attackers. It’s better to use a privileged session manager or regular ssh with mfa and ideally some type of identity proofing.

I can see that you can get a lot of things wrong with a bastion host, but if implemented sensibly, it should just be one more layer of a defense-in-depth strategy. What would you recommend instead of a bastion host?

> What would you recommend instead of a bastion host?

The question isn't to replace, but to remove. If you apply the same security to the actual hosts (which you probably should anyway) then why have an intermediary?


It does not seem to be mentioned here. But, my #1 hardening suggestion is install the Tripwire IDS (Intrusion Detection System). It is probably the best thing you could ever do for yourself as a system administrator. It integrity checks the entire file system. If anything happens to your system that you didn't authorize you're notified of it immediately. After initial install it is important to minimize and exclude false positives so that you end up with a system that rarely changes in ways you don't expect or can at least explain.

Another really useful tool is logwatch.

I actually caught an intruder this way hijacking my system several years ago. They removed rkhunter, chkrootkit and a variety of log files. And, modified lines in the last logged in users log. But, a combination of logwatch and tripwire caught it.

https://opensource.com/article/18/1/securing-linux-filesyste...


I personally use OSSEC for File Integrity Monitoring. And it has also actually caught an intruder that modified some PHP-code on a webserver. The attacker forgot to use the prefix @ in the PHP-code so a new error message was sent to the logfile and reported by OSSEC.

The premise sounds iffy ("SSH bastion hosts are an indispensable security enforcement stack for secure infrastructure access").

Every time you build some infrastructure, you expend scarce resources like engineering effort (=opportunity cost), time, money, and complexity by adding moving parts to your christmas tree of technology. You should always critically evaluate what's the most low hanging fruit you can invest in for a given end goal (eg improving security) considering the complexity costs. SSH bastions can be worth implementing in some situations, but not top of the list in many cases.

The next sentence starts talking about "security compliance standards" - you sometimes have to submit to doing stuff for reasons of ticking boxes, but it's important to remember when you're doing what's best for security and when you're going through motions mainly to tick boxes for someone else.


It recommends:

  # Configure idle time logout
  ClientAliveInterval <value in seconds>
but i don't think this is correct. AFAICT, this is a keep-alive mechanism, not a timeout. I don't think openssh has an option to kill idle sessions.

Correct. For that you would populate the variable TMOUT to a positive number in seconds and make that variable read-only

  grep ^read /etc/profile.d/timeout.sh
  readonly TMOUT=7200
This variable can also be set in tmux and gnu screen. People usually figure out fairly quick how to bypass the timer but it is handy when people console into servers via the drac/ilo and forget to log out. Some shells don't do anything with TMOUT so a bastion must only have vetted shells.

You are correct. This is widely copy/pasted bad advice and does the exact opposite of what the comment says.

It is not an idle timeout logout at all. Instead, it causes sshd to periodically send probes to the client. This has a couple of effects, most notably keeping tcp sessions "active" and frequently exchanging packets (this can be useful to keep connections through statefull firewalls alive if you are genuinely idle), and to rapidly detect and disconnect a client that has actually gone away.

I think the origin of this incorrect description is the CIS documents. They have the exact same gross mistake in them.

I think the ClientAlive probes are useful and should be on, but it's definitely not an "idle logout" as claimed.


Good writup. One thing I would add for bastions if you wanted to harden them would be to disable session multiplexing if you are using MFA/2FA.

  MaxSessions 1
The default is 10. The plus side of multiplexing is that subsequent connections using the same ssh connection channels are not validated against the authorization mechanisms such as login or 2FA. This reduces friction and speeds up the login process because login is not actually occurring. The trade-off of multiplexing is that all subsequent logins using that ssh connection are not logged nor are they validated with MFA. This means a person phishing your team members can easily hijack their connections without needing a password or 2FA and there are no lastlog entries. SSH Session multiplexing combined with passwordless sudo makes taking over a company trivial even if they have 2FA and strong passwords.

Another risk with a bastion model is port forwarding. As an organization you have to decide what is appropriate for that bastion. Unrestricted forwarding? Restricted? Denied?

  AllowAgentForwarding                    no
  AllowTcpForwarding                      yes
  PermitOpen                              192.168.1.2:22
If this bastion is for a PCI environment then one may want tighter restrictions. If it is for a development environment then maybe less restrictions and just better auditing on each host to enable forensic remediation.

If your bastion is also used for automation to drop files into a staging area, you can limit that automation to file transfers and even limit what it may do with files. This prevents the automation from having a shell or performing port forwarding.

The keys should be outside of the home directories to prevent malicious tools from appending additional authorized_keys into the account. Make use of automation to manage key trusts and add a comment to keys to map them to an internal tracking system like Jira. This assumes your MFA/2FA is excluding specific accounts or groups via PAM and permitting the use of ssh keys with specific groups or accounts.

  AuthorizedKeysFile               /etc/ssh/keys/%u

  Match Group                      sftpusers
        Banner                     /etc/ssh/banner_sftp.txt
        PubkeyAuthentication       yes
        PasswordAuthentication     no
        PermitEmptyPasswords       no
        GatewayPorts               no
        ChrootDirectory            /data/sftphome/%u
        ForceCommand               internal-sftp -l DEBUG1 -f AUTHPRIV -P symlink,hardlink,fsync,rmdir,remove,rename,posix-rename
        AllowTcpForwarding         no
        AllowAgentForwarding       no
-P sets limits on what may not be done in sftp. -p does the inverse and limits what may be done. [1] -l DEBUG1 or VERBOSE will give you syslog entries of what commands were executed on the files. This is useful for audits. Some redundant settings above are also useful to set explicitly for audits.

Another thing mentioned in the article is iptables. In a PCI environment one may want to also have explicit outbound rules using the owner module to limit what users or groups are permitted to ssh out. So if your organization have a group of people allowed to use this host as a bastions, then one could write a rule like

  iptables -I OUTPUT -m owner --gid-owner devops -p tcp --dport 22 -d 192.168.0.0/16 -j ACCEPT
Or specify what CIDR blocks, ports, protocols may be used. You can use REJECT rules after this rule to make it obvious a connection was not allowed so that people do not spend hours debugging. This module is also handy for limiting which daemons may speak to your infrastructure. How strict or liberal the rule is entirely at the needs of your organization.

Lastly I would add that bastions should have as minimal an OS install possible and have SELinux enforcing. Actions denied by SELinux should go to a security operations center after you spend some time tuning out the noise and false positives.

[1] - https://man7.org/linux/man-pages/man8/sftp-server.8.html


Thanks a lot, great hardening considerations.

It would be interesting to hear what you think of Keykloak.


Sorry I have never used it so I don't have an opinion. That looks like an oauth/openid/saml ssh integration?

Yes and I have met it once when at a huge Telco, while doing my bastion host in AWS a security architect installed this and used Keycloak as the policy engine to allow connections using SSH keys. It worked really well and also gave us a very strong granular control on who could connect, and a great audit trail.

I wish we could save posts. So this reply is my method… thanks for the write up.

You can click the timestamp on a post, and then click the "favorite" link, and that'll add the comment to your favorites list (which I think would be https://news.ycombinator.com/favorites?id=whynotminot&commen... for you).

TIL! Thank you very much

See also the answer from mindcrime.

There's also one very important difference between those two:

- others can see your favourites.

- you can see both your upvotes and your favourites

so only use favourites for things you don't worry about others seing.

I don't know if this is important for you but for a lot of people here it probably can be.


You can always just upvote the comment. Your profile page has a link to see comments (and stories) you've upvoted in the past. See:

https://news.ycombinator.com/upvoted?id=YOURUSERNAME&comment...


A superset of these best practices in the article would be CIS benchmarks. Collectively agreed on by industry leaders and provide extensive resources that span the gamut of cloud, networking, and storage infrastructure.

CIS supported technologies: https://www.cisecurity.org/cis-benchmarks

CIS Audit AWS infra: https://github.com/toniblyx/prowler

Better to be proactive than reactive :^)


If you are on AWS you dont need bastion hosts anymore. Use Session Manager.

I agree in general but there are a handful of edge cases which Google solved better with IAP: SSM can't forward ports to other hosts or any resource other than EC2. It's great for using SSH, SFTP, even tools like Ansible work fine, but if you need to get a port forward to something like RDS, a service in Fargate, etc. you'll need something else.

We still need bastions to connect to RDS. But we connect to the bastions using SSM.

If you decide to use a bastion host, why not use a security-focused OS like OpenBSD?

If you’re using - say - Debian all over your infra, introducing a whole new OS just for the bastions increases complexity without bringing any significant advantage.

It genuinely seems like the perfect OS for this. Install minimal sets, disable some unnecessary default services, and viola..

The "how to grant and manage access to resources" issue is still unsolved in my opinion. There is a middle ground somewhere between raw bastions and managed access services or open VPNs that could be filled.

There are a few different players in this space, but the one to watch is Boundary by Hashicorp.

https://www.boundaryproject.io/

Basically managed authenticated proxy connections to any resource you could possibly need. Still young, so it's missing auditing and some of the convenience features, but give it a year and it will be a compelling open source competitor.

Teleport is great, but their centralized model is not suitable for all situations.. and the pricing (at least for kubernetes) leaves a lot to be desired.

There is also StrongDM, which is very similar with a better pricing model.


Newbie here to VPCs, bastion hosts, VPEs, etc. After spent a while on the topic, some questions arise that you might find redundant or can answer.

I am wondering why we need to configure this many steps as outlined in the article, and in general. What is the point of Teleport in the first place? Why is there no managed service that takes care of all of that, with me focusing on just deploying an app and running it in the VPC.

Can’t 99% of the use cases be put in a template and managed by a service provider, including the security?


This is the aws managed service (it’s free). https://docs.aws.amazon.com/systems-manager/latest/userguide...

A lot of it is. But unless you stick to the big 3 cloud providers, you need this for bare metal / colocation server deployments, which also happen to be much cheaper.

Can someone explain to me the benefits of limiting the IPs that can SSH into the bastion? It seems to me the main thing that's protecting against are misconfigurations of SSH (accidentally letting root log in with no password or something) or a zero day in SSH but I'm not convinced by either.

The company I work for does it so that bastions hosted on some public cloud hosting service are only accessible from the company network or by machines connected to its VPN. We handle _very_ sensitive data, and some engineer screwing up the configuration for a bastion would be _very_ bad. Defence in depth is important.

What do you want to be paranoid about? One access point or a million access points?

Also adds defense-in-depth against stolen credentials -- it means an attacker can't just exfiltrate stolen SSH credentials to use sometime later from somewhere else on the Internet (or sell them / pass them along to a different specialist) -- the attacker either has to use them in-place, or break into some other machine that's also on the allow-list.

> misconfigurations of SSH (accidentally letting root log in with no password or something) or a zero day in SSH

these are entirely valid concerns. defense in depth, principal of least privilege. humans make errors.


Could someone do me a solid and explain best security practices around bastion hosts and vpn?

e.g. - would you still require users connected to the vpn to go through a bastion host? - would you ever run bastion/vpn through the same box? - are there preferred access use cases for each?


Yes, you would still have people connect to the bastion if they're on the VPN; part of the point of a bastion is to have a central place to monitor and control SSH access, which a VPN doesn't really do for you. Additionally, you will inevitably end up with team members who need access to the VPN (to reach staging and test versions of your applications, or to access customer support consoles) but don't get SSH access; a bastion gives you a standard configuration to apply to your fleet to ensure that "on the VPN" doesn't ever equate to "can log into a server".

You should generally do both things.

Wait, I should word that better. You should generally have both sets of controls: network access control with a VPN, and fine-grained, auditable SSH-level access control. I don't love the "Linux shell server" approach to providing those SSH controls.


Thanks for the response, that clears things up quite a bit. Would you create jump-boxes per environment or do you generally just have 1 with all the different service/env access logic?

It depends. It's more important to have some controls in place than to make super-complicated controls. Again: shell servers you SSH into to SSH out of are kind of an anti-pattern. See elsewhere on the thread about Teleport, which, combined with Tailscale, is I think a pretty good answer to these concerns.

I run an "internal" set of bastion hosts that are gateways into a system that runs telnet. This internal system is able to run SSH, but connections stop around 100 because of OS limits. We need to support 400-500 logins, and that has to be telnet. Everybody connecting has to go through these bastions, including VPN users.

I recently built an nspawn container with tinysshd server, with a .profile that execs telnet to the relevant system on login.

We had previously used an old version of Microfocus Reflections (terminal emulation) with stunnels deployed on all the clients and bastions. That was not containerized, but the server stunnels were set to chroot() on startup.

I recently was forced to support the latest version of Reflections, and since it doesn't support chacha-poly, I also built dropbear SSH server just for them. Reflections is very expensive (~$500/seat), and the best that it supports is aes256-ctr, using Tatu Ylonen's commercial ssh.com (which appears to be abandonware). I really hope we can get rid of that.


very nice writeup - one of the better ones i have seen. you can go a step further and eliminate open inbound port 22 (make the sshd server 'dark' to the network) with open source solutions like this:

https://ziti.dev/blog/zitifying-ssh/

disclosure: we build SaaS on top of OpenZiti (the open source) so are opinionated in this domain. and, to be clear, the above is just one layer...other layers of security still apply.


i generally end up liking what teleport is doing and what they are all about... i keep meaning to try their opensource stuff out. does teleport's sshd 'listen' on port 22 and does it need an opening in a firewall?

still, having sshd listen on localhost and not a public ip is pretty cool imo. Ken and I did exactly that on a stream one day https://youtu.be/oSlwZcwZcsU if anyone is interested. The one extra step one could do is to convert sshd to only allow connections from localhost by editing /etc/ssh/sshd_config and set the ListenAddress to only 127.0.0.1

Make those bastions dark!


> does teleport's sshd 'listen' on port 22 and does it need an opening in a firewall?

Sorry, one of those crappy it depends answers. The teleport node agents, the agent running on the server you want a session on, can be configured to listen to inbound connections from the proxy (but doesn't use port 22 by default), or can be configured in a reverse tunneling mode where it does outbound dialing towards the Teleport proxy service. When using the reverse tunneling mode, you don't need inbound access to the end nodes, but still need the nodes to be able to make an outward connection to the Teleport infrastructure.

This is how the cloud hosted Teleport works as well, we can't be expected to have outbound network access to peoples machines, so all the agents will dial the cloud hosted proxies, and setup reverse tunnels that are then used for the inbound connection requests.

In most setups though, the Teleport Proxies would then still have inbound connectivity and are meant to be internet facing, so a client can request an SSH or other session, but that single way into the environment can be hardened, layered with additional security, as the environment may require.

Note: I'm affiliated with Teleport, my comments are my own.


Maybe I missed it but did they cover logging all keystrokes entered by users over the bastion? (In the case where you need to log into it first vs merely doing port forwarding)

People still use Bastion hosts?

I'm trying to grok why they're better than SOCKS5 proxies... Is it because they provide shell access and a larger attack surface? ;-)


Auditing requirements.

Many certifications or legal requirements demand that you log all changes to your systems, including administrative changes.

Bastion host are a well-understood (both by operators and auditors) way to implement that, so it's still a go-to solution.


That would make a lot of sense if SOCKS5 proxies weren't commonly used for auditing and provide much more transparency about what operations someone is doing on the internal systems.

So how do you implement MFA in your SOCKS5 proxy?

Auth method flag for SOCKS5 has 0x08 is for MFA.

does SOCKS5 provide encryption?

Yes.

Between the client and the SOCKS5 proxy? Of course using the SSH SOCKS proxy will encrypt data, I was rather thinking to a plain SOCKS5 proxy. Are there clients and servers supporting SOCKS-level encryption between the client and the proxy? I didn't see that possibility the last time I've read the SOCKS standard (but it was a few years ago).

SSH Bastion Best Practices --> Don't use one

Could you share your reasons for not using one in the provided scenario?

You can use a service like Teleport or set up an SD WAN. I am using a split tunnel AWS VPN client service to avoid having to set up an Bastion server

Some great tips here, take my upvote!

No mention of sshguard or fail2ban?

Or you could just use strongdm!

Wireguard to SSH



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: