Hacker News new | past | comments | ask | show | jobs | submit login
Emacs Tramp over AWS SSM APIs (martin.baillie.id)
116 points by zshev 15 days ago | hide | past | favorite | 57 comments

TIL: TRAMP (Transparent Remote Access, Multiple Protocols) is a package for editing remote files [...] Whereas the others use FTP to connect to the remote host and to transfer the files, TRAMP uses a remote shell connection (rlogin, telnet, ssh).


Tramp is one of those "reasons to use Emacs in the first place" packages. I've been using Emacs since the 1990s, when someone impressed me with syntax highlighting in Lucid Emacs, and I only picked up Tramp last year. In the last 6-9 months or so, almost all of my development has been over Tramp.

What's particularly impressive about Tramp is that other Emacs packages tend to work well with it. For instance, you can Magit over Tramp --- or, better put: Magit just works in Tramp buffers. Same with language server stuff. It's kind of wild when you think about what's happening under the hood.

Tramp can use FTP if you tell it to.

And it's still transferring files. It's not remotely editing.

Tramp is by far the most magical feeling trick in the emacs toolbox.

vscode ate everyone's lunch in this department.

Yes, the vscode remote development plugin is a game changer. It's the new benchmark for how client-server IDEs should work. I am (and more importantly, my team is) no longer constrained to the terminal and memorizing incredibly obscure emacs or vi commands to get stuff done on a remote instance. There is no input lag because vscode keeps all the IDE UI local while doing all the heavy lifting remotely. And to the article's point, it treats the remote as "cattle not pets": all of my vscode settings and preferences are local and synced to github, and any time I connect to a new instance, it's able to reinitialize all of my vscode remote state from scratch. Tramp may have been able to do some of that before... but its accessibility was lacking.

Emacs isn't constrained by terminal, Tramp doesn't need knowing any "incredibly obscure" command, and since one uses local Emacs there is no input lag to speak of. I feel like you are conflating Emacs and Vi here even though they are not same at all, only in case of Vi you connect via ssh in a terminal and do everything remote side, not in Emacs. I use Tramp to have local Emacs connect to remote docker container, where everything else is in remote side. Even remote language server works in eglot over Tramp.

To be clear, I was only mentioning vi and emacs together because both have incredibly passionate communities that can be quite myopic to the UX deficiencies of their platforms. I have a lot of muscle memory committed to emacs so I still use it a lot, but I can onboard a dozen junior developers onto vscode in the time it takes me to help someone figure out emacs. And to me, the final missing piece that made vscode suitable as a general purpose replacement for emacs was the remote dev plugin.

The UX is 90% a matter of what you know and what you are familiar with coming in.

I had to figure out how to do rectangular copy/paste in vscode, and it took just as long as it did to figure it out in emacs.

Yeah. Quitting vim has the most trafficked Stack Overflow page:


"Most people don't know how to use vim and emacs" is both incredibly controversial around these parts and totally true.

What's controversial? That people who say they know vim actually don't?

You onboard new hires to an editor? That's ... surprising to say the least.

It's not that out of the way. The previous place I worked, we had to help new hires get their editor set up, because most of them hadn't used Ruby/Rails before and didn't know what they'd need. We had an onboarding doc that helped you get started with Code + Solargraph, RubyMine or Sublime + Solargraph so they'd have features they'd have used in other languages. Apart from that, we'd also have to guide them through getting the editor set up to do things like format on save etc. to ensure the codebase was clean.

I'm confused with claiming obscure emacs or vi commands for TRAMP, I'm from the emacs side and as soon as I understood the file path scheme (e.g.: /ssh:$host:/path/to/file) I didn't have to do anything beyond that.

I would say that the dev client/server setup you're describing and what TRAMP provides are different things overall as well. TRAMP really just provides a way to get a file from a remote, edit it locally, and on save write it back to the remote system. I would not consider it a valid use case for remote dev especially now with how prevalent things like LSP's are and I don't know of a major mode that is designed around a remote LSP I'd just do X forwarding or some other screen share at that point. I would agree that overall it's a gap for emacs that VS Code does better.


Trivia: the original Emacs (written in TECO for PDP-10s running ITS) also had transparent access to remote filesystems using the same syntax (host:path).

It was free though: remote files were accessed over the net via a FUSE-like userspace process.

In the mid 1970s.

TRAMP is more than just opening and writing files: it runs VCS operations remotely, which is very helpful.

https://emacs-lsp.github.io/lsp-mode/page/remote/ suggests that LSP under TRAMP is basically there, though I haven't had occasion to try it.

It can run everything remotely. If you are using eshell, you can

    cd /sshx:host:
    any remote command
It is quite impressive how well it works. Most lsp modes are tramp aware, such that completion and tag jumping will also do the right thing.

I'd love to see a workflow comparison between emacs, vim (with remote work via neovim's tcp support + neovide) and vscode. I'm currently using Emacs and have a pretty decent setup for remote work with jupyter-emacs and tramp, and it's pretty much 0 overhead to run the same code on multiple remotes, or have the same remote run code stored in multiple places. With that said, all abstractions end if my SSH connection breaks, since remote stuff dies on disconnect. With neovim the remote can run inside of a tmux pane, so disconnections are not really a problem, but my vim skills are as of yet not as great.

I haven't used VS code yet, simply because of lack of time in relearning another editor. In which particular way do you feel like VS code remote plugin is superior to the alternatives? And is there anything lacking in your VS code experience as of today?

> I'd love to see a workflow comparison between emacs, vim (with remote work via neovim's tcp support + neovide) and vscode.

To me, emacs is a great editor with variable quality IDE-like capabilities, highly dependent on workflow.

VScode is sort of the opposite. It's an at best ok editor with a strong suite of IDE capabilities that are mostly consistent.

I don't think the level of integration between tramp and other essential emacs plugins (flycheck, jedi, magit, etc.) is comparable to vscode. In vscode, entire plugins responsible for this stuff are sent over to run on the remote, leaving the local to run the UI - but also to keep all the settings and state.

In emacs I have to either add pretty complicated scripts to my .emacs just to get stuff to play together if it's even possible at all, or stay in the terminal and run it all on the remote (and put up with the lag, and re-mount/upload my configuration when a new instance starts).

For the longest time I used emacs on the remote and pycharm/jetbrains locally (and was a vscode skeptic) - that changed once I saw what the remote dev plugin was capable of (jetbrains doesn't have an equivalent). I still use emacs in the terminal on remotes for quick text editing, but for project work vscode works better specifically because it's easier to resume on disconnect (one-click restore of all state) and easier to configure. I use tmux in the vscode terminal to resume remote shell sessions.

More importantly, it's a lot easier to onboard others to vscode because the IDE as a whole is more discoverable, more user-friendly, and follows platform conventions more closely compared to emacs or vi.

The one big feature that I miss in vscode is tab key behavior/intelligent indentation. Emacs does this way better - tab just does what I mean, instead of inserting a useless literal tab or spaces.

I haven't found many (any?) modes that don't just transparently with with tramp. I'm curious on your experience.

Even python, if I M-x run-python while on a remote file, it runs python on that remote machine.

You seem to imply that using vi or emacs is an inferior UX than using VSCode (as it's a constraint), but I'm sure many people people feel differently (me included, and I have used VSCode as my main ide for a while).

I'd infer that most people struggling with vim or emacs UX are people who are newcomers to each editor.

How? Tramp has been working at this level for well over a decade, if I'm not mistaken. And you don't have to have anything installed in the "host" that you are connecting to.

If you're working on very slow network connections, or the network just dies entirely, it's not uncommon for Emacs just to hang entirely. You then have to kill Emacs from another terminal. At the time I was running exwm (an Emacs window manager), which made the whole thing even more painful. Emacs is powerful, but polish is not its strong point.

That said, this was a few years ago now. Things may have improved in 26.1 when threads were introduced, and async got even easier.

I can concur that a slow connection is bad. It shouldn't hang entirely, as C-g should still get you response back. That said, it will hiccup bad and a save needs a round trip, regardless.

When your dev environment is on a remote server, the DX of vscode is superior, everything just works(like all your plugins), it's seamless and fast.

Tramp is great for editing some remote files here and there, but to match vscode you will have to put a lot of effort to make everything feel equally fast and make all your packages work. Even then it won't feel as seamless as vscode because it "cheats" by installing a remote component, and I don't find that to be a valid complaint since you are already installing your whole dev environment in the remote server.

Having said that (I'm not a vscode user), what I always do is use Emacs on the remote server inside tmux. For me that's better and superior to the vscode remote plugin, my dev environment is local to my editor.

If you install any auxiliary dev tools on the remote, it will work just as seamlessly. Git, lsp, etc.

I confess that you need a solid ssh connection. But for the most part it has been great for a long time.

I hate hearing the 'everything just works', because it 'just works' until it doesn't. Apple users use the same term until their software no longer just works.

> How?

Microsoft has done a lot better job promoting VSCode than GNU has promoting Emacs for the past few years. More mindshare among influential developers / evangelists has lead to massive increases in adoption which leads to better extensions which in turn fuels more adoption.

It probably doesn't hurt that VSCode uses the MIT license.

The vscode remote plugin is developed by microsoft and it's closed source. I can see why, it gives them a competitive advantage over all rival IDEs.

Every time you save your edits Tramp makes a new connection to the remote server, it's slow(1sec vs 1ms) and becomes annoying waiting for the save all the time. For doing quick edits it doesn't matter but for doing dev all day it does.

Configuring OpenSSH's ControlMaster setting here makes an enormous difference. Summary: it keeps a connection open for a while in case you want to connect to the same machine again. If you do, it reuses that connection so the new one is nearly instant.

I'll give it a try, thanks.

Examples, starting with a new connection to a server I haven't accessed recently:

  ᐅ time ssh myserver exit
  Executed in    1.66 secs
Now visiting it again uses the existing connection:

  ᐅ time ssh mastodon exit
  Executed in   55.89 millis

(Assume the hostname is identical in both cases. I edited one and not the other. Oops!)

Tramp offers different "connection methods" with different characteristics. For example, the scp connection method uses scp to copy files to/from the remote machine, and this implies a new ssh connection each time.

But the ssh connection method transfers the files inline, using base64 or uu encoding, and then you do not need a new connection each time the file is read or written.

Modulo being a proprietary extention.

"Perhaps more interesting, though, is that for the last couple of years AWS has supported tunneling the SSH protocol over their SSM APIs if you use the SSM “document” called AWS-StartSSHSession."

That's interesting. I know some places go to great lengths to keep developers from accessing production without some sort of break-glass procedure through a jump host. I'm curious if they all know about this sort of loophole.

SSM is much preferred to a jump host for a number of reasons.

1. You don't have to expose a jump host at all, which is one less exposed asset to manage and worry about.

2. Your security team should already be collecting Cloudtrail logs, so they get auditing of SSM/SSH "for free".

3. You can control SSM access via your SSO provider, which means you can trivially enforce a bunch of policies all in one place vs having to configure SSHD.

4. You can control SSM access via IAM.

5. You can limit session duration easily.

6. No more SSH agent hijacking, at least I don't think.

I also wouldn't call this a loophole, you have to explicitly have permissions to use SSM.

>I also wouldn't call this a loophole, you have to explicitly have permissions to use SSM.

Perhaps not the best wording on my part. I was aware of SSM, but not aware of the SSH tunneling features. I'm wondering if that's common. Is the SSH tunneling controlled separately, or on by default if SSM is on?

It is "on" by default, but the user still has to have the 'ssm:StartSession' permission (and probably others) to open the SSM session, and for some(?) operations you also still need to have the appropriate credentials (ssh keypair or a password) to login via SSH.

SSM Session Manager is one of the (if not the) preferred way to manage SSH access to instances in AWS. It's kinda hairy to set up, but it removes the need for bastion hosts/jump boxes for most use cases. From my experience I would say it is quite common.

Installing Yet Another Agent on your cluster/VMS and ensuring they are updated while the SSM agent got an upgrade I believe from python to go it still does a lot more than just provide ssh sessions correct?

I don't really know much about the agent. I'm not super concerned with keeping it updated though.

Also forgetting to quote ~ commands when going through a jump host leads to unexpected behavior — usually disconnection!

We've been using symops[0] which uses AWS-StartSSHSession document, but what's nice is it allows to set up different workflows for how people access servers. Plus all the advantages of SSM in general (IAM/SSO, CloudTrail etc).

[0] https://symops.com/

As the article states, it's completely controlled by IAM and whatever federated identity management you hook up to AWS, and the events are auditable via cloudtrail etc.

Accessing production on the command line is an anti-pattern. I can only think of one good reason to do it: If one is investigating a security incident where a hacker has broken into production and screwed around. Even then, one would want to snapshot the instance and take it offline to investigate it.

If there's some tricky bug in production, then one can create some sort of debugging service that runs on another port and deploy it to investigate the bug, or use management and monitoring tools. Copying files up to production is something that should be only done by an automated deployment script.

> If there's some tricky bug in production, then one can create some sort of debugging service that runs on another port and deploy it to investigate the bug,

If you are under time pressure to fix an escalation from a high profile customer, and you don't have such a service yet, do you make the customer wait for you to write one, or do you just use command line access? Or else, if you already have such a service, but it doesn't contain the necessary diagnostics to investigate this particular problem, do you make the customer wait for you to enhance it, or do you just use command line access? Or you make your debug service totally generic – allow it to run arbitrary code supplied by the user – in which case it can do anything the command line can, but how is that actually any more secure than more standard means of command line access? Plus, it is going to be adding friction which may slow down resolution.

> or use management and monitoring tools.

Often these work fine for some problems, and then you get a problem which they don't cover adequately, and you need to go beyond them.

>Accessing production on the command line is an anti-pattern

Seems to be at odds with

>then one can create some sort of debugging service that runs on another port and deploy it to investigate the bug

In many cases, that's just SSH. In most cases, I'm not copying files around, I want to connect to the real environment where firewall rules, API keys, permission systems, overlay networks, etc are in place. If there's a stuck process (let's say, lock contention) it's much easier to just SSH on and run gdb and check the stack to see what it's doing. Some languages like Java have pretty rich tooling out of the box for remotely connecting to processes. Others, like Python and Ruby, you just use gdb

Either way, there's no copying data necessary--you just need access to the running process. For a large system with hundreds of identical servers, I don't want to deploy a debug service everywhere; I just want to connect to the one with an issue and check that.

Snapshotting works sometimes, but I used stuck processes as an example since that's usually where all this remote/log/etc stuff falls apart. And, as-it-so-happens, things like lock contention tend to be really hard to recreate in synthetic or simulated environments that don't have real, authentic load.

Keep in mind that doesn't mean "go crazy with `root` in production". You can combine that strategy with scripting and tooling to drain/isolate/quarantine servers where the stuck process is still running but they don't have live traffic being routed to them.

I see this "ZOMG NO ONE TOUCH PROD" mentality a lot in highly regulated environments but it's usually more sustainable to try to isolate in-scope system's functionality as narrowly as possible to avoid bringing unnecessarily large amounts of things in scope (e.g. put the billing functionality in a microservice to limit PCI scope)

That's the way things should work and ought to be done.

But what about when things don't work like they should and ought to?

Want to debug network connectivity issues? See which process is hogging CPU? Investigate installation/delpoy problems? Reinvent the wheel, or use what's already there.

If ssm-session-manager-plugin gives you a shell, then it should not be too hard to extend Tramp to use it directly. (I know nothing about ssm-session-manager-plugin.)

Tramp does not need scp to transfer files, it can just as easily multiplex them over the shell connection by using base64 or uu encoding.

> For a good wee while now, AWS SSM (or AWS Systems Manager as I see they are calling it nowadays) has arguably been the most secure way to permit controlled and audited access to an EC2 instance.

SSM is definitely not the most secure way[0]. SSM is super complex and super-integrated into the rest of AWS, and also isn't cross-cloud to GCP, Azure, DO, etc, so now everyone needs an account just to log into a Linux server.

Worse, IAM roles are powerful but easy to misconfigure, and that's before getting into how hard they are to apply with any granularity because of the policy length limitations[1], so you're likely giving everyone access to log into every instance without even knowing it.

0. https://cloudonaut.io/aws-ssm-is-a-trojan-horse-fix-it-now/

1. https://aws.amazon.com/premiumsupport/knowledge-center/iam-i...

What does being cross-cloud have to do with whether SSM is the most secure way to SSH into an AWS instance?

Because everyone will need a (possibly misconfigured) AWS IAM account just to log into any Linux server.. this increases complexity and reduces isolation, compartmentalization, separation of concerns, least privilege, etc.

I was mentioning that particular misfeature because it was a personal annoyance of mine. Oh well, I suppose everything is about customer lock-in these days.

It sounds like you don't think AWS is the most secure place to host an application. That's not the argument being made here; the argument stipulates AWS.

SSM supports BYO doesn't it? Can't you install the agent on any machine to enroll it in SSM or does that limit what you can do?

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact