It's well established that if Alice forwards an SSH agent to Bob, Bob can use the SSH agent protocol to make Alice open DLLs, because there's an agent protocol command (SSH_AGENTC_ADD_SMARTCARD_KEY) that OpenSSH implements with dlopen: when you ask the agent to access a smart card, OpenSSH dlopen()'s the library corresponding to the `id` of the device. This is a Jann Horn bug from 2016, and OpenSSH fixed it by whitelisting DLLs to /usr/lib and directories like it.
The Qualys bug builds on Horn's bug. When OpenSSH dlopen()'s the library, it then tries to look up a PKCS#11 entry point function, and, when it doesn't find it, it dlclose()'s the library and returns an error.
The issue is that most of the libraries in system library paths were never intended to be opened maliciously, and so they do all sorts of stuff in their constructors and destructors (any function marked `__attribute__((constructor))` or `destructor` is called by dlopen and dlclose respectively). In particular, they register callbacks and signal handlers. Most of these libraries are never expected to dlclose at all, so they tend not to be great about cleaning up. Better still, if you randomly load oddball libraries into random programs, some of them crash, generating SIGBUS and SIGSEGV.
So you've got a classic UAF situation here: (1) force Alice to load a library that registers a SIGBUS handler; it won't be a PKCS#11 handler so it'll get immediately dlclose()'d, but won't clean up the handler. (2) Load another library, which will take over the program text address the signal handler points to. (3) Finally, load a library that SIGBUS's. If you manage to get a controlled jump swapped into place in step (2), you win.
If you're thinking "it's pretty unlikely you're going to be able to line up a controlled jump at exactly the address previously registered as a signal handler", you're right, but there's another quirk of dlclose() they take advantage of: there's an ELF flag, NODELETE, that instructs the linker not to unmap a library when it's unloaded, and a bunch of standard libraries set it, so you can use those libraries to groom the address space.
Finally, because some runtimes require executable stacks, there are standard libraries with an ELF flag that instructs the process to make the stack executable. If you load one of these libraries, and you have a controlled jump, you can write shellcode into the stack like it's 1998.
To figure out the right sequence of steps, they basically recapitulated the original ROP gadget research idea: they swept all the standard Ubuntu libraries with a fuzzer to find combinations of loads that produced controlled jumps (ie, that died trying to execute stack addresses).
A working exploit loads a pattern of "smartcards" that looks like this (all in /usr/lib):
The paper goes on to classify like 4 more patterns whereby you can get unexpected control transfers by dlopen() and immediately dlclosing() libraries. The kicker:
we noticed that one shared library's constructor function
(which can be invoked by a remote attacker via an ssh-agent forwarding)
starts a server thread that listens on a TCP port, and we discovered a
remotely exploitable vulnerability (a heap-based buffer overflow) in
this server's implementation.
This is pretty wild, but if anyone had asked me previously "should the SSH agent be allowed to dlopen anything?" I would have said "no". It really seems like for something this sensitive, running it in an empty namespace with no abilities of any kind, unless or until those abilities prove necessary, would be a good approach for those who have high security requirements. If I know I am going to use a smartcard, then I can make the PKCS library visible inside the agent's sandbox. But I can see no reason why I would have given it access to libsane, or whatever libenca is, or any other library for that matter.
Enca is an Extremely Naive Charset Analyser. It detects character set and
encoding of text files and can also convert them to other encodings using
either a built-in converter or external libraries and tools like libiconv,
librecode, or cstocs.
Currently, it has support for Belarussian, Bulgarian, Croatian, Czech,
Estonian, Latvian, Lithuanian, Polish, Russian, Slovak, Slovene, Ukrainian,
Chinese and some multibyte encodings (mostly variants of Unicode)
independent on the language.
This package also contains shared Enca library other programs can make use of.
Install enca if you need to cope with text files of dubious origin
and unknown encoding and convert them to some reasonable encoding.
How would this even be mitigated while preserving the (wacky) existing support for runtime-selected PKCS#11 provider libraries? It strikes me that the most compatible way might be to double down on the wackiness and try to perform the required feature detection in some more indirect way like parsing the named lib with readelf(1) or the platform equivalent.
The sensible thing would be to force users to register available provider shared libraries in an ssh-agent config file, but that feels like a pretty big breaking change.
Edit: Didn’t realize a patch was already available. I see that they did in fact fix this with a breaking change, by simply disabling the functionality by default, and recommending that users allowlist their specific libraries:
Potentially-incompatible changes
--------------------------------
* ssh-agent(8): the agent will now refuse requests to load PKCS#11
modules issued by remote clients by default. A flag has been added
to restore the previous behaviour "-Oallow-remote-pkcs11"
By finally acknowledging that loading plugins via shared objects is a bad idea, and it was only valuable in the days of resource constrained computers.
Any application that wants to use plugins and is security sensitive, should adopt OS IPC, and load them as separate processes.
Process separation was already in place. The PKCS#11 library is loaded by a long lived helper process, not ssh-agent itself.
> (Note to the curious readers: for security reasons, and as explained in
> the "Background" section below, ssh-agent does not actually load such a
> shared library in its own address space (where private keys are stored),
> but in a separate, dedicated process, ssh-pkcs11-helper.)
That didn’t help because the long lived nature of the helper process exposed it to the shared lib side effects such that they could be chained into a gadget. If I understand correctly, the long life is important for interacting with many smart cards and HSMs because of their APIs.
If you are suggesting that there should be an IPC API for this process and vendors ship a full program that speaks it, that seems reasonable at a glance, but not really something the OpenSSH project can dictate.
Indeed, my suggestion is zero dynamic libraries in security critical code/applications.
If security is a goal, loading in-process foreign code is already a lost battle.
Plugins as dynamic libraries made sense when we were fighting for each MB, not when people have hardware where they go to the extreme of running containers for every application they can think of.
It would help against attacks that depend on corrupting process address space, like this one.
Additionally, one could use OS security features to reduce API surface for each plugin, depending on what they are actually supposed to be doing, e.g. no need for file system access if they only do in-memory data processing.
As for "would it help in 100% of the attacks?", no.
Even if there were no plugins support, there is still the possibility to exploit logical errors anyway.
What matters is having a balance between reducing attack surface, and application features, and it than regard process sandboxing is much safer than loading foreign code in-process.
> How would this even be mitigated while preserving the (wacky) existing support for runtime-selected PKCS#11 provider libraries?
Put the pkcs11 libraries in a specific directory, configure only that directory, let users manually add others. Or stop using forwarding and configure ProxyJump where needed. (If that's the only use case you're interested in)
But I wonder if the PoC||GTFO maxim for security researchers leads to a lot of wasted effort. I was convinced that ssh-agent needed fixing as soon as they pointed out that it (a) accepts the name of a shared library within /usr/lib to dlopen over the network and (b) the range of crazy things that standard shared libraries do. Did they need to do all the extra work of developing an exploit in order to convince someone that was dangerous?
That's a great question. I think the answer is: no. They probably agree. I think, like, 80% of this is just art. There's a notion of fleshing out the constructor/destructor attack surface, which someone on Mastodon pointed out was pretty similar to how deserialization exploits work (they're chains of classes you can only instantiate and not directly invoke methods on), but this might be the only time this particular variety of attack ever happens again.
It's funny reading this thread full of people saying the attack is not that big a deal. It's a really big deal! It's just not a big deal for the reason people assume vulnerabilities are a big deal. :)
> Did they need to do all the extra work of developing an exploit in order to convince someone that was dangerous?
It might be clear in this case for someone who has already seen that really obscure bugs could be practically misused. But in the long-term, it is necessary to show that this is not some theoretical risk, which could be exploited in one never-heard-of hobby linux distribution. If security researchers skip these PoCs, a generation of future developers - who never saw practical PoCs - will just not believe it is relevant.
Saying the security researchers don't need to show PoCs is like saying mathematicians don't need proof, they just need to be very sure that a theorem holds.
That is the difference between a researcher/scientist and an engineer. An engineer could say, well this is likely exploitable, therefore let's safeguard against this. So that there remains a margin of security.
My guess is that without the PoC you will too often run into somebody who insists that in practice it's fine. Not always, but maybe it's one time in five. And now you've been told it's "fine" and so you should meekly go away right?
As I understand it that TOCTOU race in Rust's std::fs::remove_dir_all had been reported once before and just not accepted as a security bug. The C++ libraries have the excuse that WG21 decided this is all UB anyway and so they weren't required to fix the equivalent bug in their standard library, but in Rust the answer is that sometimes when you raise a real concern somebody says it's fine, erroneously.
It's not like OpenSSH did this by accident. They did this on purpose and they apparently thought it was reasonable when they did it, so if it's supposedly obvious that it's a problem, why is it in the design?
RTLD_NODELETE (since glibc 2.2)
Do not unload the shared object during dlclose().
Consequently, the object's static and global variables
are not reinitialized if the object is reloaded with
dlopen() at a later time.
Are there any other times when it's beneficial to use NODELETE?
You mean as opposed to never calling dlclose to the handle? If you specify RTLD_NODELETE, the dynamic linker can avoid some dependency tracking that would otherwise be needed to avoid premature unloading of the object because that unloading can never happen.
However, the main application of NODELETE is the DF_1_NODELETE flag in the shared object itself. A typical use case is if the shared object installs a function pointer somewhere where it cannot be reverted as part of the dlclose operation. If the dlclose proceeds despite this, calling the function later will have hard-to-diagnose, unpredictable consequences. Rather than relying on dlclose never being called (which is difficult because the object might have been loaded as an indirect dependency, unaware to the caller of dlopen), using the DF_1_NODELETE flag makes this explicit.
It's not really possible to safely unmap code in a library in the presence of various other useful features you might like to use, specifically thread-local storage, atfork, callbacks (and probably threads in general). Libvirt started to use this flag over a decade ago: https://libvir-list.redhat.narkive.com/TUbaBTsk/libvirt-patc...
Years ago, and in fact as I was coming off of a security-related job, I had the notion that we should map out decisions as a first class citizen of the development cycle, instead of or in addition to requirements (requirements are just one sort of decision). The problem with unexpected behavior is often a matter of Chesterton's Fence. We know something was done on purpose, we can't remember why, and so we break the constraint with crossed fingers and an uneasy feeling. If we tracked why we did things, which ideas supported which other ideas, which ideas complicated other ideas, maybe we would spot these sorts of things, and also the contradictions we try to put into the code.
What ended up souring me on the whole idea was realizing this would be a wonderful Blame Allocation System, by attaching names to decisions. It's been a while since I went from full enthusiasm to creeping dread so fast on a product idea, and probably not since.
The library that does this is DisplayDoc, a library that you LD_PRELOAD in a process you want to debug, and it opens a socket that the rest of the program connects to, enabling you to debug the graphics stack. Sure it's not exactly best practices but it's not entirely unreasonable for this library to do this. They've since patched the various bugs that qualys discovered.
Oh, it's absolutely not the library's fault; nobody could reasonably have expected that "random privileged code is going to dlopen your library at random times" was part of the threat model.
To be clear, the "remote" part of the code execution is that an attacker controlling your destination server can cause your client to run an attacker-controlled payload, if the client is forwarding their credentials (`ssh -A`). Most people don't tend to make connections to arbitrary SSH hosts, and certainly they don't do it while forwarding their credentials along.
It's a neat attack, and I applaud the Qualys team on their find, but this is not any sort of emergency situation for 99.99% of systems.
I beg to differ: this does not sound way worse than it is. If anything, it's understating the issue.
Not only can it be exploited across a wide variety of clients across multiple platforms, but all that's required is that you're using key forwarding.
This is devastating, because it's not just that you control the destination server and steal the keys, but you can take over the user's entire workstation.
Once you've got the user's entire workstation, you potentially have access to everything else they have, from their email, to other SSH hosts, to key loggers, to Git repos. This is about as bad as it gets, and all because someone is using Agent Forwarding.
Best of all, the victim has no idea that they've been completely compromised. They can live inside your machine for years, upgrade their sploits, and generally exfiltrate all of your secrets.
Never use agent forwarding. Just don't. "Agent forwarding should be enabled with caution" in the man page is another massive understatement. Even if you think you need it, check the other responses in this thread for examples of how to work around it.
Agreed. As this exploit proves, it's not even safe to log into your own servers using ssh forwarding if any service is exposed remotely, because if an attacker compromises that exposed service and gains root then they could extend the attack to your workstation, and that's a huge deal - especially considering that you have the private key to log into that server on your computer (so it's not an unsafe bet there might be other keys).
Exactly - agent forwarding is the laziest and fastest path to getting severely pwned, but the irony is that the alternatives are actually fairly simple and fast, if someone is willing to take the time to adjust their process a little bit.
> agent forwarding is the laziest and fastest path to getting severely pwned
Only for people who don't know what they are doing. Usually, such people also make poor replacement decisions that are even less secure.
> the alternatives are actually fairly simple and fast, if someone is willing to take the time to adjust their process a little bit.
I often need to work on code in ephemeral containers. Is there an "actually fairly simple and fast" method I can use to be able git pull and push to and from these ephemeral containers that:
1. doesn't require too much adjustment (a little bit is okay); and
2. is not less secure than agent forwarding with confirmation?
> Only for people who don't know what they are doing.
By that do you just mean that no services are openly exposed on the system? To my understanding, if any vulnerable service is remotely exposed then it's not at all safe to use agent forwarding with the affected version of openssh.
By that I mean to use the now-20-year-old 'ssh-add -c' flag. It'd seem practically no one is aware that an ssh agent does NOT have to silently and automatically sign just any and every auth request, given just how frequently I see people decrying "No! If you forward your agent that means a remote host has full access to everything in your agent! Never forward your agent ever! No buts!"
With the '-c' flag, if a remote host tries to use my agent, I get a graphical dialog on my local machine asking me if I want to let my agent do it. If I'm not expecting the dialog, I can just say no, and now I know the remote is compromised.
Actually, I go a step further. Because it is possible to accidentally accept a signing request by pressing the enter key that I meant to press for sth else, I make my agent require a passphrase on every use, not just a yes/no dialog.
In checking my machines for this CVE, I discovered my agent has yet another layer of security built-in. I use gpg-agent, relying on its famed security posture. Turns out, when forwarded, gpg-agent supports nothing except signing. It does not support adding keys from a remote, let alone fancy operations like loading a PKCS11 provider chosen by a remote.
----
Securing a forwarded agent has nothing to do with whether there are openly exposed services on a remote system. The remote system could have malicious code that entered it not necessarily through an exposed service. A compromised npm package in the supply chain, for instance, is sufficient. It doesn't matter how it got there. What matters is: when it does, can it abuse a forwarded agent. Hence the '-c' flag to ssh-add.
I was not aware of that functionality in ssh-add. Thank you!
Man page reference:
"-c Indicates that added identities should be subject to confirmation before being used for authentication. Confirmation is performed by ssh-askpass(1). Successful confirmation is signaled by a zero exit status from ssh-askpass(1), rather than text entered into the requester."
Edit: After thinking about it more, I think I may have misunderstood how the -c parameter would perform in regards to this CVE.
Would a confirmation prompt actually suffice to prevent this attack? Also, it begs the question, does this exploit rely upon someone already forwarding to a malicious server? I've only read the CVE, and skimmed the reporter's blog, so I don't know with certainty one way or the other.
If the exploit does require that the forwarding has already taken place, -c couldn't really help, right? The decision was made to allow it. I hope I'm not sounding contrarian, I'm genuinely curious about how this would play out.
The '-c' is not a mitigation to this CVE. It is a mitigation to a malicious remote silently opening further ssh connections to other servers you have access to.
My parent claimed that using agent forwarding is always insecure. Judging by their response to my comments elsewhere in this discussion, they seem to be under the impression that a forwarded agent will always silently and automatically sign auth requests. And yes, if you forward an agent with a key in it that's missing the '-c' flag, it will. Ignorance of the confirmation feature is classifiable under 'you don't know what you're doing'.
The same parent has also been beating their drum of 'create keypairs on remote servers' pretty heavily in many places in this discussion. That _is_ less secure than a forwarded agent that confirms each use.
----
I do not claim anywhere that the '-c' flag prevents this CVE from being exploited. It's just that the agent I happened to be using — gpg-agent instead of ssh-agent — just happened to be immune to this CVE by doing what OpenSSH has decided to do in response to this CVE. I.e., I blindly relied on gpg-agent to be secure and it paid off here.
----
> ... the forwarding has already taken place, -c couldn't really help, right? The decision was made to allow it.
It's not a confirmation of "Do you want to forward this agent?". It's a confirmation of "Do you want to sign a request using this key?". That happens in every auth request.
So if you ssh into foo, you'd get a dialog to confirm the use of your private key for this initial ssh. This is not added security, just an extra step in the initial ssh process.
But if you try to ssh into bar from foo, then you'd get another dialog on your local machine to confirm the use of your private key for the auth request by bar. _This_ is the added security vs. malicious code on foo ssh-ing into bar as you without your knowledge.
Just to add to this, with the new -J/ProxyJump directive, it's become (even) easier to login through a ssh host without needing to enable agent forwarding (Given that you're connecting through a not-ancient host running a reasonable version of openssh - beware of firewall/appliances stuck on ancient sshd and/or proprietary/"mini" versions).
Agent Forwarding is not a trivial thing to take lightly, but a knee-jerk reaction "ban it entirely" is too much.
I forward my agent by default because I've set it up securely. My setup is safe from this exploit too (I use gpg-agent as my SSH Agent). In return I get the seamless convenience I cannot get through any other method. Jump hosts are fine (and I use them too) but there is no way I'd be able to do remote git operations in ephemeral dev containers without the peace of mind (and safety) that agent forwarding gives me.
Creating keys on remote dev envs for git operations is _less_ secure than agent forwarding, even when those keys are encrypted (passphrase protected) at rest, because they have to be loaded into memory on the (potentially compromised) remote host.
> Creating keys on remote dev envs for git operations is _less_ secure than agent forwarding, even when those keys are encrypted (passphrase protected) at rest, because they have to be loaded into memory on the (potentially compromised) remote host.
That's not how agent forwarding works. An attacker on the remote server can piggyback on your SSH session and do anything else desired, so your remote git repo is still compromised, but the blast radius of these remote keys is much smaller. (in infosec, we'd usually call this least privilege but separation of duties also applies)
All of this is still possible even with gpg-agent, even if this particular RCE doesn't apply to you, so "Never Use Agent Forwarding" still applies.
> An attacker on the remote server can piggyback on your SSH session and do anything else desired
This myth is about 20 years out of date. See what the '-c' flag for ssh-add does. It was added in OpenSSH 3.6 back in 2003.
In fact, I can prove it to you. Take my pubkey from GitHub (same username) and put it on a host you control. Tell me to ssh into it with my agent forwarded and see if that gives you access to my GitHub account.
----
> All of this is still possible even with gpg-agent
Even without the 'confirm each use' flag, gpg-agent with a zero TTL visually asks for the decryption key on each use. There _are_ some agents out there that have no support for visual confirmations and yet happily accept the '-c' flag (looking at you, GNOME), but gpg-agent isn't one of them.
----
> That's not how agent forwarding works
You seem to be misreading. I'm not claiming that's how agent forwarding works. I'm saying that's how your suggestion of creating a keypair on a remote host works.
It _is_ less secure because it requires those keys to be resident on the remote. If the remote is compromised, decrypting the key in the compromised machine's memory is strictly insecure compared to doing it on my local machine with an agent. Once captured from the compromised remote, those keys can be exfiltrated and used repeatedly. But, if an agent is somehow tricked into signing an unauthorised request, that access is still limited to one use only.
>> > An attacker on the remote server can piggyback on your SSH session and do anything else desired
> This myth is about 20 years out of date.
This hole didn't simply disappear when -c was added.
The vulnerability is simply that the socket file containing the connection back to your agent is accessible by anyone who managed to escalate to root on the remote host.
You're making several assumptions:
#1: someone is using -c
#2: that -c even does anything on their platform
#3: the user pays attention to them and is untrickable
#4: there are no bugs in the ssh-agent or gpg-agent on the client machine
Any one of these being false renders all protection from -c moot; worse yet, a bug in the ssh-agent (or gpg-agent, if that's your poison) like the one in the subject of this post can be leveraged into complete client takeover.
> Once captured from the compromised remote, those keys can be exfiltrated and used repeatedly.
that is true, which is why they should be tightly scoped.
> But, if an agent is somehow tricked into signing an unauthorised request, that access is still limited to one use only.
That one use only is all that is needed. An attacker might install another pubkey, start up another socket process, or even rootkit the remote box if you have sudo, doas or if there are any privilege escalation vulns.
These situations are identical in that the remote box is pwned, but only one of these tries to limit the exploits to just that one remote box and not every other host your keys have access to.
> This hole didn't simply disappear when -c was added. The vulnerability is simply that ...
Why are you conflating the two? You're making claims that sth has always been utterly, completely broken, and the only evidence you can cite for it is sth that was revealed to the world a few days ago?
This CVE is a secvuln. No one is arguing against that. Secvulns happen. No software is bug-free. Does that mean every software everywhere is suddenly utterly completely broken?
Actually, while we are on the topic, why not argue banning SSH entirely. After all, each SSH connection is a connection back to the host where the `ssh` client runs. Tomorrow, there could be a secvuln discovered in the `ssh` binary that can be exploited by simply printing the right characters to stdout. In fact, this very vector has been used before to pwn vulnerable terminal emulators, even over ssh.
----
> ... which is why they should be tightly scoped.
As can forwarded agents (see 'IdentityAgent' in `man ssh_config`). In fact, this is how I separate client projects from each other and my personal projects. (I didn't do it for security, rather for the convenience of not tripping any 'max keys allowed' limits, but hey, I'll take the security benefit too!)
----
> That one use only is all that is needed. ... tries to limit the exploits to just that one remote box and not every other host your keys have access to.
You're making several assumptions:
#1: someone is disciplined enough to use tightly scoped keys
#2: that they bother to rotate all those ephemeral keys without fail
#3: that the sheer inconvenience of constantly updating keys doesn't bother them enough to say 'screw this!'
Guess what the weakest link is when it comes to computer security? The human factor. You're asking humans to go through way too much hassle they're not going to care about. Which means they'll voluntarily break the security of the system without care (and, of course, without understanding).
You also forget that:
1. Agents can be scoped too.
2. Jumpboxes are often used to jump to a large number (most often, all) of servers. No organization is going around creating point-to-point jump links between servers or dedicating a separate jumpbox for every destination server.
3. Your model, when exploited, gives the attacker repeatable, lasting access. You say, "That one use only is all that is needed", and you're correct, but only for the most determined and prepared attackers. Once-only accidental access is better than repeatable lasting access for the simple fact that most attackers are aiming for only the latter.
4. Your model is also more susceptible to silent persistent malware. My method has the benefit that exploited remotes are discovered before the exploit is given lateral movement access.
----
> ... who managed to escalate to root on the remote host.
No root needed. DAC enough. I say this because I run untrusted code in VMs/containers I ssh into, with my agent forwarded. And I consider arbitrary npm/python/etc. packages automatically untrusted.
Lots of people end up with AgentForward on by default as a sort of "make it work" fix, and lots of people use `git+ssh` on untrusted servers. Here's an example:
Assuming you do perfect integrity checks of the git repo you're pulling, git uses SSH and obeys ssh config for each hosts under the hood. It's safe to say that if you have forward-agent enabled git is vulnerable.
The attacker controlled destination server could be a compromised host, so this enables lateral movement from a deployed VM or remote dev machine into a developer laptop.
I don’t know how prevalent it is as a network architecture, but it seems like a bastion host / jump box would be a juicy target for this exploit, since it’d let the attacker jump upstream.
Sure, but first they have to root the bastion box.
If you root the bastion box, you have user credentials for anything inside the network. Controlling the user's laptop seems unlikely to be your most profitable next step.
> If you root the bastion box, you have user credentials for anything inside the network.
But that's not how a (properly-configured!) bastion host works.
You won't have user credentials for anything UNLESS users are using Forward Agent (which they shouldn't! simplest explanation here.. https://userify.com/docs/jumpbox ).
That's the point behind using ProxyJump. Your connection actually jumps THROUGH the bastion box and doesn't stop for interception along the way.
(And, of course, an attacker can't do anything very useful with ssh public keys except for maybe traffic analysis or learning more target IP's.)
Increasingly, the role of a bastion host is served either by something like Teleport, which handles authn/z and proxying without needing forwarded agents, or newer options in OpenSSH like ProxyJump where you hop via a bastion host but without ever forwarding your agent.
Yeah if I’m reading the technical analysis right, your conditions that you mention have to be correct and also the attacker must have “poisoned” library files on the targets machine so they can dlopen them, is that right?
The libraries are on the client's machine, not the server's. And they're not "poisoned"; the default distro-provided libs already provide the remote execution capabiity (eclipse-titan, libkf5sonnetui5, libns3-3v5 and systemd-boot packages from Ubuntu 22.04).
Ahh I see I thought the attacker also had to have custom malicious libs deployed on the client machine I wasn't sure if standard ones would do, thanks for clarifying that
There must be a specific set of libs present on the victim (client), correct. Qualys claims that stock Ubuntu Desktop systems often have these libs, and that they haven't looked into whether other distros tend to.
But yes, your point stands. Huge number of preconditions here to fulfill.
Are you saying that if you SFTP in to a client machine to upload a file to their server, it’s expected behavior that you’re willing to give them root on your machine?
Yea, but there’s a security boundary wherein you don’t want the SSH host to be executing code in your environment. Of course, the attackers can backdoor sshd to log credentials, setup init scripts on the host to execute code every client login and other shenanigans.
Even without this announcement, friends don't let friends forward their ssh agent. It essentially grants that machine access to your private keys. A RCE vulnerability is strictly worse than key exposure, but you probably shouldn't have been using it anyways.
I prefer using hardware tokens (in most cases a PKCS#11 smart card) because it means that even with a forwarded SSH agent, every request to use my private key requires a PIN on my client which is verified by the isolated cryptographic processor. It's impossible for my private key to leave that card and get cached anywhere else. While I haven't enabled it on my Yubikey I understand they can do similar.
The downside is that compatibility in edge cases, while much better than I'd expect, is still not perfect. In particular Windows support outside of Putty gets challenging.
The RCE is related to ssh-agent's support for PKCS#11, so, yeah you are right this is a valid method to prevent key access or theft via the agent (I also have to approve every use of my PK), but in this case it's not protecting against the RCE, and the workaround in the meantime is to disable PKCS#11 `ssh-agent -P ''`
The other downside is it's much harder to do bulk operations against a fleet. It's not reasonable to enter a PIN for each access when you need to push something to 1000 nodes. 100 nodes is probably ok, but not great.
I think SSH agent forwarding is a fairly common setup with VSCode Remote, where the developer wants to forward their keys to the remote dev server so that they can perform Git operations using their credentials.
Is it possible to configure SSH agent forwarding to only forward some specific private keys? Although I guess that wouldn't protect you from the root problem of a vulnerable OpenSSH client.
> Is it possible to configure SSH agent forwarding to only forward some specific private keys?
Not sure about that, but you can configure it so that it asks you for confirmation before every private key usage, so I suspect you could script a solution around that confirmation mechanism?
> Although I guess that wouldn't protect you from the root problem of a vulnerable OpenSSH client.
Yes – in the end, your SSH client, the terminal emulator it's running in etc. are ultimately software too that could be remotely exploited.
I think in this particular case, there was a certain mismatch of threat expectations between the attack surfaces of ssh (the client, exposed to lots of potentially malicious input) and ssh-agent (mostly accepting input from semi-trusted processes on the same host – except for agent forwarding, of course).
How do you manage ssh chaining then? Unique set of keys per user per machine?
Short of scp'ing your private key over from your starting box, I can't really think of another way. Then again, if it's not your box, (you don't own the hardware have sole monopoly of root), you probably shouldn't be ssh'ing from it anyway. I've always held there are no true secrets on a computer... Until multi-billion dollar companies decide to collude anyway.
Anywho, honest question. I'm a big fan of ssh roaming, never realized that ssh-agent was a thing, but after reading this, if I were to use it, I'd most certainly be doing it witout pkcs11 built in. Shotgunning shared libs for side-effects is absolutely mad. Need to slot that into my source code reading list.
Edit:
Nevermind, tried it, not sufficient for my use case.
I tend to do a lot of mesh-y bouncing around between servers, and -J seems to be more intended for a star/hub&spoke topology. Common ssh priv-key to all machines, or alternatively, a unique set of priv_keys per user per dest machine is about the way to go. You still have privilege escalations to worry about, but thems the breaks.
It's not limited to star topologies; You can use ProxyJump for any topology that includes fixed routes from a client to a host. Just add a separate host entry for each machine.
4 server example (this assumes your client can connect to only host C) with the following topology:
C -> B -> A
\-> D
ssh configuration snippet:
Host A
ProxyJump B
Host B
ProxyJump C
Host D
ProxyJump C
This example is a tree-like topolgy, but you can use host aliases (i.e. add a HostName that is different from the host entry) to define any fixed route to any machine you like.
For me, the main use of agent-forwarding is that I need to use a command that expects to use SSH to get between leaf nodes. For example git or rsync CLIs that need to manipulate the local filesystem and tunnel their own protocol over SSH to talk to another remote server.
At times, I've wished for something like uMatrix but for ssh-agent forwarding, so I could have policies for which peer-to-peer authentications should be allowed for which keys and whether these specific uses should require interactive confirmation.
I now have a design in my head for something like that using ssh certificates. Since I have zero use for such a thing I would probably build it wrongly though.
Just generate a new keypair there on your bounce box then. Don't do -A because this RCE means that not only do you lose your keys, but you lose your laptop, too!
A box you can't trust to hold your keys could just as easily put a ssh -A command in your bashrc and use your agent to perform stuff on any servers your key is accepted on.
One use of -A, namely using the ssh server as a jumphost, is covered by -J. The general use of -A, namely doing operations on the ssh server that require keys from the ssh client, is not.
If I'm on machine foo and I want to connect to bar.example.org and clone a git repo there from baz.example.org, and baz.example.org requires an identity key that is in foo's ssh agent, then -A is the only option.
No, you might as well copy the keys over there for all the security you're getting (actually, that's safer given this RCE which compromises both sides, even given that there's no real way to shred the leaked key, but of course I'm not actually suggesting this! they're both really bad.)
Your best option is too easy: just generate a new keypair on foo.
Then you can populate baz.example.org with that public key instead of your own.
Only if the keys are insecure enough to be copied, ie not backed by an HSM, or backed by an HSM but marked exportable.
>for all the security you're getting (actually, that's safer given this RCE which compromises both sides, even given that there's no real way to shred the leaked key, but of course I'm not actually suggesting this! they're both really bad.)
Connecting to a malicious SSH server is already dangeorous with or without this vulnerability, and with or without agent forwarding. Eg a malicious server can modify the shell to emit VT codes to mess up your terminal.
>Your best option is too easy: just generate a new keypair on foo.
Sure, but this creates additional identities that baz.example.org must be taught to trust, which is not always desirable or even an option.
> Only if the keys are insecure enough to be copied, ie not backed by an HSM, or backed by an HSM but marked exportable.
Agent forwarding forwards the keys insecurely. If you can launch an agent-forwarded connection, then the HSM is already irrelevant.
> "Connecting to a malicious SSH server" is already dangeorous with or without this vulnerability, and with or without agent forwarding. Eg a malicious server can modify the shell to emit VT codes to mess up your terminal.
yes, but messing up your terminal is not an RCE (and you should be able to fix it with stty sane). It's just an annoyance and just tipped you off that obviously there's something wrong there. :) Obviously you should be checking your host keys etc and ideally you woud never accidentally connect to a malicious server.
However, in the unlucky but perhaps inevitable event that you do connect to a malicious server, you shouldn't risk exposing your entire client workstation! (through this or another RCE) or your SSH private keys! (through normal agent forwarding operation) with agent forwarding.
> Sure, but this creates additional identities that baz.example.org must be taught to trust, which is not always desirable or even an option.
A key is not necessarily an identity. Most platforms permit more than one public key to be associated with a user, including github, gitlab, userify, etc.
However, ideally you are correct -- this would result in a new user account that has limited access to only the things it really needs (principle of least privilege).
>Agent forwarding forwards the keys insecurely. If you can launch an agent-forwarded connection, then the HSM is already irrelevant.
No? ssh-agent works fine with keys that are not exportable from the HSM. By the very definition it's not possible for anything to export the key, ssh-agent or otherwise.
As the article describes most Windows terminal emulators just call SetWindowText when they get the set window title escape code. For some reason Windows implements that function very inefficiently so if it ends up being called in a tight loop your computer can freeze.
>I was searching for a bug in ANSI characters that respond to a change of the window’s title. One of the things I did was check how the previous bugs were exploited. One of them, CVE-2015-8971 (found by Nicolas Braud-Santoni), was a bug in Terminology 0.7.0 that didn’t filter new line (\n) escape character when changing the window title. It allowed you to modify the window title and then re-insert it into the terminal’s input buffer, resulting in arbitrary terminal input, which then caused code execution.
That's a vulnerability in your terminal app, not in SSH, like a browser vuln when connecting to a remote website, but you're right, the point is well taken: there is always that possibility and you should be quite careful about choosing your terminal application.
`ssh-add -c` will cause your ssh agent to pop up with ssh-askpass every time it does an authentication. This is fiddly; You need to have configured your ssh-agent right, and there's a slightly different version to use keychains on macos that's equivalent.
Some people denounce all use of agent forwarding because, by default, ssh-agent doesn't confirm with the user before signing a request with the users private key. This means, if you ssh into a compromised/malicious host with your agent forwarded, malicious code on that machine can just silently ssh into other servers as you.
This trick has been used in the past by blackhats to escalate from a compromised CI environment to full production takeover.
GauntletWizard probably meant to respond to your parent in this thread.
There are uses for `-A` that aren't just for jumping into hosts. I work on code in ephemeral containers and need to use git against remote git servers. If I had to use a key local to the container, I'd have to constantly add new keys to my git servers.
Of course, I have secured by agent from abuse. I use gpg-agent as my ssh agent and make it visually confirm every use of my private key.
I even set up those containers to use forwarded GIT_{AUTHOR,COMMITTER}_{NAME,EMAIL} envs. So other people can work on the same container as me and git commits we make are still attributed correctly.
Agent forwarding is discouraged by the OpenSSH crew. Yet, it's commonly used because of the convenience it affords.
"Agent forwarding should be enabled with caution. Users with the ability to bypass file permissions on the remote host (for the agent's UNIX-domain socket) can access the local agent through the forwarded connection."
https://man.openbsd.org/ssh.1
musl libc refuses to implement dlclose[1] for precisely the reason that modules too often misbehave when dropped at runtime, and requiring this behavior is very rarely needed if ever. The number of modules that will be loaded is almost always bounded, and keeping a module around in memory is mostly harmless; certainly less harmful on average than trying to unload it.
Also, as you pointed out elsethread, dlclose makes a lot of things in a runtime implementation very hard. For example one reason that C++ exceptions are expensive is that they need to guard against unwind tables suddenly disappearing.
> musl libc refuses to implement dlclose[1] for precisely the reason that modules too often misbehave when dropped at runtime,
Which is some rather silly logic, because now musl's libdl is just one more module that misbehaves at runtime by refusing to clean up after itself. Other libraries being sloppy about resource cleanup is no excuse for musl to abdicate its own responsibilities.
> and requiring this behavior is very rarely needed if ever. The number of modules that will be loaded is almost always bounded, and keeping a module around in memory is mostly harmless;
I take it you've never used an application that hot-reloads plugins? Who cares about memory leaks, right?
> certainly less harmful on average than trying to unload it.
That's for the application to decide, not musl. If I don't trust a library to behave properly, I will either not dlopen it at all, or I'll dlopen it in an appropriately sandboxed subprocess and interact with it via pipes.
> I take it you've never used an application that hot-reloads plugins?
Used and implemented many times. Sometimes this could be a problem, but IME most plugin architectures I've seen don't rely on static constructors or destructors for implicit registration/deregistration of callbacks. Among other reasons, it's a landmine of multi-threading issues and glibc itself has historically had many bugs related to this, albeit mostly a consequence of glibc implementing (until recently) libpthread separate from libc.
Also, POSIX does not guarantee that dlclose does anything: "An application writer may use dlclose() to make a statement of intent on the part of the process, but this statement does not create any requirement upon the implementation."
> Who cares about memory leaks, right?
I can't find the blog post now, but IIRC Solaris made getenv thread-safe by never deallocating any memory allocated for the environ array or its contents. The blog post had a curious and memorable defense which explained how the memory usage was asymptotically bounded, similar to how hash table operations are described as asymptotically O(1) in time. And Solaris cares very much about memory management--unlike Linux or FreeBSD it implements strict memory accounting, so no OOM killer non-sense.
Is it ideal? No. But these are legitimate decisions taken in light of lots of experience, and if you want to write truly robust applications one has to pay attention to standards, implementation details, and best practices. IME best practice wrt module systems on Unix and the open source world generally has always been to avoid implicit registration/deregistration of callbacks. I believe the same is true in the Windows world, though I've never written much software for Windows. (Anyone know if FreeLibrary immediately invokes static destructors?)
FWIW, I have implemented hacks which relied on static constructors and where hot reloading was more problematic in the absence of static destructors. For example, I once implemented a proof of concept for automatic runtime reloading of SSL certificates and keys in the Splunk server by writing a library interposer (supporting FreeBSD, Linux, macOS, and Solaris) that installed global OpenSSL SSL_CTX callbacks. So I know why these things could be useful. But that example was a PoC hack. And even in situations where dlcose is supported as one might naively expect, you can still easily run into similar gotchas, like a user loading a module twice under different names. You have to choose either to deal with it the hard way, or disclaim support for such niche cases.
The suggestion floated in many comments here to create extra keypairs on remote hosts for access to other remotes from the former is a terrible idea.
Resident private keys on a compromised machine can be exfiltrated. Even if they're passphrase protected, using them requires them to be decrypted on the machine they're resident on.
`ssh-agent` is not the only SSH Agent. There exist SSH Agent implementations that are secure. E.g., gpg-agent. Use one of those, with the `ssh-add -c` flag or equivalent, and you get both security and convenience.
For example, if I'm on a remote server and I want to clone or pull from a private repository, or push to any repository
In fact, VS code expects you to forward an agent when developing on a remote instance with the remote-ssh extension. I believe it uses it for syncing repo changes primarily
I agree though that you should consider it insecure to forward a connection to a shared host if it has other users who are equally privileged, or more privileged
That is true, but actually you should avoid -A in those situations also.
> In fact, VS code expects you to forward an agent when developing on a remote instance with the remote-ssh extension. I believe it uses it for syncing repo changes primarily
That's the case if you're ever trying to initiate an SSH connection via Git from a remote host. That's because the actual SSH connection isn't being triggered on your desktop or laptop, where -J would do it, but because it's being initiated right on the remote server itself.
Let's call the remote server VSCODE (your desktop-in-the-cloud), your laptop/desktop LAPTOP, and the remote repo GITHUB.
So, the correct answer there is actually counter-intuitive:
Generate a private key on VSCODE (not LAPTOP) for that GITHUB repo. This key will be used only on that originating remote server (VSCODE), never anywhere else.
In fact, that repo key will ideally be scoped as tightly as possible in Github/Gitlab/etc.
As a general rule of thumb, generate at least one SSH key for each "location" where you'll be logging into another location. Keep your connections point-to-point unless you can use proxyjump.
> Generate a private key on VSCODE (not LAPTOP) for that GITHUB repo. This key will be used only on that originating remote server (VSCODE), never anywhere else.
Deploy keys are more secure in most situations, but I'm not convinced that generating a bunch of additional keys with write access to your repo is always more secure.
Regardless, it doesn't really fulfill the same need of "get access to repo from cloud desktop asap". I work with 20-30 private repos at my company, and I don't have admin access to most of them (so I couldn't create deploy keys for them). That would need to be kicked up and down a request pipeline, for each user accessing remotely, for each host they're accessing it from. It's much easier to Get The Work Done if I just forward my agent.
And sure, the company could maybe set up better practices (though again, I'm not even really clear on how much more secure it would be to generate untold numbers of deploy keys), but I'm more likely to get the axe for not producing fast enough than they are to make sweeping changes to their security practices.
Then just create a key on that box and use it for all of your repos (add it to your github keychain); you're still much better off than with one key to rule them all (and one RCE to rule your laptop ha)
That's a good point, probably the best way to do this, but does require manual management steps for every VM you want to use (which is perhaps a feature)
> Generate a private key on VSCODE (not LAPTOP) for that GITHUB repo. This key will be used only on that originating remote server (VSCODE), never anywhere else.
This gives anyone compromising the server at a point in time persistent access to your Github repo.
Ultimately, there's two or three different possible models here (transparent SSH proxying, agent forwarding, key-per-host), with different tradeoffs. Although transparent proxying is usually the safest, it doesn't always work and here's no one-size-fits-all.
> Honestly, you probably should never do -A anyway; use -J (proxyjump) instead.
I want to copy files between servers, ex. `ssh -A serverA` then `rsync ./foo serverB:`. Is there a way to do that without `ssh -A`? (And obviously without creating an actual private SSH key on either server)
This implies a much higher level of trust in a dev server than might be warranted, since an attacker can exfiltrate keys from that server and then permanently use them from elsewhere. Agent forwarding doesn't have that problem.
> That's the correct way to handle this situation, definitely not -A.
Maybe for your situation; certainly not for all. If agent forwarding was strictly inferior to ProxyJump/ProxyCommand, it would have been deprecated.
gpg-agent is used as an ssh-agent replacement (also replaces the venerable monkeysphere script). gpg agent forwarding is also needed via StreamLocalBindUnlink for signing built RPMs in a docker container on a remote (local LAN) Docker host. Sometimes rarely, I need to enable "ssh-agent" forwarding where I probably could proxyjump instead. My interactive ssh workflow is sometimes slowed as I use TOTP (via PAM) for 2FA. Noninteractive system accounts get dedicated ssh keys. host keys are constantly scanned and compared to a source of truth. Maybe I'll get around to deploying LDAP with OpenSSH-LPK to move away from flat files.
Knee-jerk suggestion to rewrite the world in Rust combined with formal verification.
But seriously, being an expert knife juggler is still gambling with the obvious compared to using safer tools in a safer manner. Rust needs the ubiquity of GCC[0] (partially by adding them to LLVM and adding std crates) and more attention paid to bloat (cargo-bloat, etc) before attempting to rewrite the world (apart from special cases).
Can I just say I know it's really dumb, but I loved that they published the explanation as a simple txt file, instead of setting up some whizbang website for it, or embedding it in their company blog.
What's a better solution if you want to be able to SSH across multiple machines? Do you need to always close the current connection to get back to localhost prior to a fresh SSH?
e.g. how would I ssh into foo, and then later into bar, or perhaps pull some code from github onto foo that is authenticated by my key?
> Not every org's policy allows adding unaudited ad-hoc SSH keys.
Then audit them and get them in the process. Agent forwarding is too big of a risk.
> Definitely not always, if the hosts you store these keys on are not as hardened as you local machine (or a hardware key connected to it).
Once you use agent forwarding, the keys are no longer protected on your local machine. (Ironically, this RCE is precisely because of the requirement to whitelist hardware keys!)
I need to read the source, I’m confused how -J actually works. Is the bastion doing auth and the downstream machines trusting? Or does it auth first and then forwarding a :22 connection from downstream back to localhost? Or something else entirely?
Just my understanding of the manpage and TCP forwarding (-L): an SSH connection will be established to the jump host, which then establishes a connection on port 22 to the destination. The local machine now has a forwarded connection to the destination and uses that to establish a second SSH connection between them.
Between local and jump host, there will be two layers of encryption. The jump host decrypts the outer layer, and the two ends the inner layer.
Just use a different, secure SSH Agent. I use gpg-agent and make it confirm every use of my private keys with me. Gpg-agent supports signing with a preloaded private key and nothing else, so it is immune to this attack and many others.
Please don’t call this the "correct" solution. It might work for you; it's not a general rule or widely accepted best practice.
Generating a key per host is just a different security model than key forwarding, and arguably a worse one if key forwarding is done defensively (i.e. forward a separate key, with only those permissions you would give to a key present on the connected host itself).
That is a TERRIBLE idea, if foo can be compromised. Even if you secure the private key with a passphrase, it still needs to be loaded into foo's memory, by a binary resident on foo, which can be used to exfiltrate that private key.
If you're confident foo cannot be compromised, then the whole point is moot anyway.
Make an ssh connection to foo with a port forward. Then make an ssh connection to bar through foo. Keys stay on your machine. I'm pretty sure there's built in features to do this.
it completely replaces the need for bastion hosts and ssh-agent at scale (I'm a hashicorp's paying customer)
compared to Teleport, boundary is lightweight (you dont need to install agent on each target server) and is integrated with the Vault for certificate based auth or simple cred brokering
I wrote a macOS app that supports using Vault Transit Engine keys with OpenSSH, as well as GnuPG and PKCS#11. (OpenSSH is supported via both the agent protocol and PKCS#11.) You can even use an Apple T2 (Secure Enclave) key for peer mTLS authentication to the Vault server. https://www.keymux.com/
Using this teams can share an SSH key without exposing the key; nor do you need to configure certificate PKI, use jump hosts, or otherwise change your existing software or workflows.
Teleport is far and away the leading solution in this area — certificate based end-to-end bastion style topology with many nice features including support for kubectl.
I can’t parse the exact meaning of the ”victim system” and ”attacker controlled system” in the OpenSSH release page.
Does this vulnerability allow the attacker to compromise the original system where the user starts the agent-forwarded connection? Or ”only” compromise machines forward from the jump host?
That's 100% correct. This is why no one should use agent forwarding with a jumpbox. Only -J. (see https://userify.com/docs/jumpbox )
Another point is that no one should have any real access to the jumpbox and it should be as minimal and stripped-down as possible. It's literally your bastion host, so you've got to keep it as strong as possible.
To be clear, agent-forwarding to a (potentially) malicious ssh server has always been a bad idea. Yes TFA's bug makes it a worse idea and it's absolutely worth it to patch it, but you should not be agent-forwarding to (potentially) malicious ssh servers in the first place.
As far as I read it, it's about forwarded ssh agents. Basically, if you `ssh -A user@system`, something might be able to execute commands locally. For example, this might turn messy for infrastructures using jump-hosts extensively, if people are used to <ssh -A jumphost> so they can easily <ssh system> afterwards. If you pop the jump host, you could pivot to the workstations with this.
At the same time, ssh-agent forwarding makes me queasy from a security perspective even without this. As far as I know, if you <ssh -A> into a system, admins with privileges on the system can gain access to your local ssh-agent already. In the example of the jump host, if you popped the jump host and stuck around for a while, you could probably harvest SSH keys and have some fun later.
We don't use bastion servers. My only real use case for ssh agent forwarding is if I need some scp / rsync between two remote systems during emergencies and those systems have no trust via SSH keys setup between them. In that very specific case, I don't know a better way than <ssh -A> to the first system and have some <rsync -e ssh> from there to the second system. Still doesn't feel great, even though I know only the people who could steal my keys are on my team.
Ah yeah. Not sure on that one. scp does have the `-3` option to copy between two remote hosts via the local host, but that can be significantly slower if the remote hosts are in the same network and local host is not.
Exactly. If I need to move a few megabytes around, <scp -3> and a coffee or a few simple tickets is a good way. A year ago or so, I needed 600GB moved between two systems ASAP during an outage that'd turn into a money-bleed at 6am. If I piped that through the VPN and my workstation, I'd probably still be waiting today.
Some time take a look at lftp [1] and its mirror subsystem for this. It can break up a batch of files or even one large file into multiple SFTP streams. Another upside is that it can replicate most rsync behavior in a SFTP-Only Chroot account. Downside is that without a corresponding daemon like rsync on the other end directory enumeration is slow which isn't a problem if one does not have a complex directory structure.
Play around with the built in rate limit options total and per thread to keep the network people happy.
And since the person you're replying to was mentioning command-line parameters, it's worth mentioning that this can be done with `ssh -J jumphost user@system`.
It's exploitable only by a SSH server that you connect to with agent forwarding enabled (i.e. one that you're already trusting with access to your SSH keys).
On my machine, I just removed the 'ssh-pkcs11-helper' binary, given that I'll definitely have no need for it in the foreseeable future.
Then I remembered I don't use 'ssh-agent' as my SSH Agent anyway; I use gpg-agent and I'm pretty confident in its security posture. Then I verified it: gpg-agent simply doesn't support any operations except signing with preloaded keys.
It was arguably still not necessary to post the complete exploit chain/proof-of-concept on the same day, since the chaining of four different shared libraries present in Ubuntu probably does not immediately follow from the diff introduced by the fix.
CVSS is meaningless, literally a Ouija board that reflects the intuition of whoever's computing the score. A more reasonable way to look at severity is on a two dimensional scale of lo-hi severity and lo-hi situationality. This is a high-severity, moderately situational bug (it would be highly situational if it required user interaction, and not situational at all if it didn't require control over the remote SSH server).
It's well established that if Alice forwards an SSH agent to Bob, Bob can use the SSH agent protocol to make Alice open DLLs, because there's an agent protocol command (SSH_AGENTC_ADD_SMARTCARD_KEY) that OpenSSH implements with dlopen: when you ask the agent to access a smart card, OpenSSH dlopen()'s the library corresponding to the `id` of the device. This is a Jann Horn bug from 2016, and OpenSSH fixed it by whitelisting DLLs to /usr/lib and directories like it.
The Qualys bug builds on Horn's bug. When OpenSSH dlopen()'s the library, it then tries to look up a PKCS#11 entry point function, and, when it doesn't find it, it dlclose()'s the library and returns an error.
The issue is that most of the libraries in system library paths were never intended to be opened maliciously, and so they do all sorts of stuff in their constructors and destructors (any function marked `__attribute__((constructor))` or `destructor` is called by dlopen and dlclose respectively). In particular, they register callbacks and signal handlers. Most of these libraries are never expected to dlclose at all, so they tend not to be great about cleaning up. Better still, if you randomly load oddball libraries into random programs, some of them crash, generating SIGBUS and SIGSEGV.
So you've got a classic UAF situation here: (1) force Alice to load a library that registers a SIGBUS handler; it won't be a PKCS#11 handler so it'll get immediately dlclose()'d, but won't clean up the handler. (2) Load another library, which will take over the program text address the signal handler points to. (3) Finally, load a library that SIGBUS's. If you manage to get a controlled jump swapped into place in step (2), you win.
If you're thinking "it's pretty unlikely you're going to be able to line up a controlled jump at exactly the address previously registered as a signal handler", you're right, but there's another quirk of dlclose() they take advantage of: there's an ELF flag, NODELETE, that instructs the linker not to unmap a library when it's unloaded, and a bunch of standard libraries set it, so you can use those libraries to groom the address space.
Finally, because some runtimes require executable stacks, there are standard libraries with an ELF flag that instructs the process to make the stack executable. If you load one of these libraries, and you have a controlled jump, you can write shellcode into the stack like it's 1998.
To figure out the right sequence of steps, they basically recapitulated the original ROP gadget research idea: they swept all the standard Ubuntu libraries with a fuzzer to find combinations of loads that produced controlled jumps (ie, that died trying to execute stack addresses).
A working exploit loads a pattern of "smartcards" that looks like this (all in /usr/lib):
The paper goes on to classify like 4 more patterns whereby you can get unexpected control transfers by dlopen() and immediately dlclosing() libraries. The kicker: