* Works on amzn linux 2 - installed by default on newer versions
* otherwise: $ sudo yum install ec2-instance-connect
* The SSH public keys are only available for one-time use for 60 seconds in the instance metadata.
* you can send up your own SSH keys `aws ec2-instance-connect send-ssh-public-key`
* cloudtrail logs connections for auditing
* doesn't support tag based auth but it's on the roadmap
* plans to enable it in popular linux distros in addition to amzn linux 2
Install local client:
$ aws s3 cp s3://ec2-instance-connect/cli/ec2instanceconnectcli-latest.tar.gz .
$ pip install ec2instanceconnectcli-latest.tar.gz
$ mssh instanceid
I assume that "sudo apt install ec2-instance-connect" will also work?
There are tons of support questions about "how can I add multiple SSH keys to my EC2 instances".
Now if only AWS brings in "projects". That's the last usability edge that GCP has.
Whenever there's a service that maps to the other, I just always seem to find the GCP service easier/faster to learn and use effectively. Bigquery, stackdriver, pubsub, dataproc, compute, load balancer, et al. Getting stuff done with those is miles easier, in my experience than the comparable AWS offerings, at least if you don't already have extensive experience with one over the other.
is the AWS code opensource ?
This will lookup your username in AWS iAM, and if it has the right permissions, it creates an account and copies the public ssh key associated with that user.
Over the years, AWS has put their focus entirely on "Enterprise" customer functionality as opposed to "developer friendly" capabilities.
I once found a race in the UI, reported it with complete repro instructions, and then Google made me have to do a 30 minute hangout meeting where I had to repro the bug in front of the Google engineer. The call resulted in two different tickets... we found another bug.
So gcloud might have some nicer features, but the engineees building the web app seem to have some basic misunderstandings about concurrency. The BigQuery UI (new one not the old one) is similarly riddled with bugs.
(But..yeah, at a time it was an applet)
The way they describe tying SSH keys to IAM roles very closely matches the SSH key management that GCE had when it launched back in 2012.
FWIW, it sounds like one advantage of the new AWS service is that it will provision a new SSH key each time you connect. Whereas, I _think_ the GCP one provisions one key per machine.
You're mostly right about gcloud, with some nuances for multi-user or network-homedir environments among other exceptions.
Gcloud does however have one of the snazziest possible hacks for resetting and securely communicating Windows account passwords:
Transcript from Alphabet Q1 2019 Earnings Call; April 29, 2019 
> Sundar Pichai, CEO Google:
> We are also deeply committed to becoming the most customer-centric cloud provider for enterprise customers, and making it easier for companies to do business with us thanks to new contracting, pricing, and more. Today, 9 of the world's 10 largest media companies, 7 of the 10 largest retailers, and more than half of the 10 largest companies in manufacturing, financial services, communications, and software use Google Cloud.
> Some of the companies that we announced at Next included: The American Cancer Society and McKesson in Healthcare; Media and Entertainment companies like USA Today and Viacom; Consumer Packaged Goods brands like Unilever; Manufacturing and Industrial companies like Samsung and UPS; and Public Sector organizations like Australia Post.
> Finally, to support our customers' growth, we also announced the addition of two new Cloud regions in Seoul and Salt Lake City, which we plan to open in 2020. These new Cloud regions will build on our current footprint of 19 Cloud regions and 58 data centers around the world.
This doesn't seem like just a hobby to Google.
Do you understand how expensive this action is?
You know how it goes. They built their business on Google's platform, and it was a dream until some AI detected a pattern of activity it didn't like and they were excommunicated. The app was shut down, the website stopped getting traffic, and all they money they had charged customers via Google Pay was frozen. No appeals, emails go to /dev/null and after a month of campaigning on social media they finally get an email from an intern saying that after review, they won't be changing the automated decision.
Rackspace managed AWS environments use this for high compliance systems.
The problems it solves are a) that login attempts are logged on a separate system for compliance and b) user management is handled in a centralized way. Both are handled with EC2 Instance Connect.
This new service is basically a managed SSH so things like port forwarding etc will work. With SSM you can't do port forwarding etc because it is not SSH aware.
And as much it pains me to say it, Azure has that feature.
But when you've got some sort of bug or issue that you're not getting any metrics out of, no logs recorded, no kernel crash dump, nothing sent over netconsole, nothing showing up on the instance console screenshot... Sometimes serial console is what you need.
But, for the borked networking case, I'd recommend not modifying your networking on live instances. Make your changes on a test instance, figure out what works, and add it to your configuration management ;)
Unlike SSM, Instance connect goes direct over SSH - so you either need to be inside of your AWS network, on a bastion host that can route to your AWS network, or use a public IP address.
It would be great if they combined this functionality with the HTTP wrapping capability so that I do not need to expose SSH/route to SSH ports in any way but can also use IAM policy to control which unix user a given IAM principal can land in the host as (Example use case would be I would only want a certain class of user to land as a user with sudo/root access).
This is still valuable to my use case, and we'll go ahead with it using the bastion approach most likely until they hopefully integrate this with their HTTP SSH wrapper.
But sigh I just built a PKI infrastructure provisioning system using a gigantic shell script, maintenance user with sudo permissions and ssh access where a master node would command a fleet of slave nodes.
I guess all of my work was for naught since this seems to cover some my needs for user and ssh key provisioning.
Oh well, it'll work elsewhere on all other clouds. And I guess I should release it publicly, it's just not pretty enough yet. Every time I do, gremlins come out of the bushes complaining that the code isn't elegant enough for them.
Today's dev ops stacks move so fast nobody is an expert on a single stack for longer than a week.
I just finished setting up JumpCloud to manage SSH keys and logins on all my AWS instances.
Or am I missing something and this would follow the PCI DSS?
IAM based auth was long overdue.
Either way, this will be very nice to have.
Managing server access in a multi-account organization is a real issue, though. I currently manage 11 AWS accounts and the best solution I've implemented so far is extending NSS and configuring sshd to query our identity provider (Okta) for user/group information and SSH public keys. Each type of server is configured to permit access to a subset of Okta groups. For example, members of the DevOps group get full privileges, anyone else that has a use-case for using SSH (like in a QA environment) gets some form of limited access. With this in place, I can grant/revoke privileges and manage developer keys all from one central location.
Another way to think about it is, if you're an SRE, you want to eliminate toil, and an interactive SSH terminal is toil.
- Encourages you to go to instances and check stuff rather than improve monitoring / health checks.
- Can do quick fixes on a few boxes rather than re-running the deploy. Great! But terrible when the person who knows how to do that is away.
- Tailing log files rather than centralised log management for all the things.
- Trying things out / quickly checking something in production rather than being rigorous about keeping test / staging in sync with prod.
The “problem” is ssh is such a great affordance (until you have tons and tons of instances and you can’t do anything by hand anymore) that it means you don’t need to fix internal processes and tools around deployment, configuration and monitoring.
If there’s no workaround you feel the pain and will be forced to set things up right, usually with benefits to security and repeatability.
As is often the case, the best thing about ssh (in terms of managing infra instances) is also the worst thing.
With that said at very small scales it might be overkill to automate all the things so sure, fill in the gaps with ssh and a wiki page.
I don't think it does, well it doesn't when you have > 40 machines anyway. Plus it doesn't give the ability to compare and contrast simply. (graphs are _awesome_)
> Tailing log files rather than centralised log management for all the things.
Yes, I tend to agree. But proper centralised logging is either exceptionally hard, or a hefty splunk tax. That also encourages people to derive graphs from logs, which is arse about face. Graphs first, logs when you are desperate.
> Trying things out / quickly checking something in production
I can see this, but normally one would expect people to not have general access to prod, if they are going to do that...
For example, managing ssh keys for an individual is gloriously simple, but managing them for a large organization is a huge headache. You want to use ssh certificates, but even those are implemented in a weird way, and really you should use an SSO system for auth. (This makes that easier/better, so, yay?)
When people start sshing into production servers, they end up making local changes. They focus more on the "pets" aspect of managing systems rather than as "cattle". They have to install a litany of extra software to diagnose and troubleshoot bugs, rather than expose system metrics and tightly control the app environment and its operation.
Remote access to production app servers is basically a backdoor waiting to happen, and may violate corporate security policies. When you have local user access to a Linux host, it's almost guaranteed you can privesc to root.
Finally, almost everyone I have ever seen will either force-ignore/auto-accept host key changes, or just accept them blindly, because IPs and hostnames may change, and there may be multiple environments you haven't logged in to, etc). This completely defeats the purpose behind mitm protection, which is the main intent of using SSH, though these days its other features may be arguably more of a reason to use it.
And for the tech hipsters out there: "it isn't serverless!!"
There's a lot of things that require more than easily exported system metrics and logs to troubleshoot.
While I've played around with using PCP's perf plugin to try and remotely do things with perf, generate flamegraphs, etc., it doesn't work nearly as well as just SSH'ing into the thing and running perf directly, especially if the perf data file is going to be large. I don't see how you could do serious performance engineering work without SSH access.
But, I think I'm nitpicking here, because I generally agree that there should be very little to no reason to login to servers via SSH day to day.
On the other hand, I tend to lean more towards a ‘wild animals’ model, where, sure, you can tranquilize one and bring it in to look over, but once it’s got the smell of humans on it, it’s doomed if you let it back out in the wild again.
Once you ssh into a production box, it is forever tainted. Sure, poke around in it, install some perf tools to run some diagnostics, learn what you can about its behavior. But then, rather than putting it back into the wild to serve traffic, out of mercy, you should destroy it and replace it with a clean instance.
Thanks for the super informative answer. About the quoted portion... yeah! I assume it's my responsibility to do something... like manually check the host IP or something? What is the recommended practice to deal with this situation?