
Post-mortem and remediations for the Matrix.org security breach - Arathorn
https://matrix.org/blog/2019/05/08/post-mortem-and-remediations-for-apr-11-security-incident/
======
rlpb
One of the failures here is that they weren't able to keep deployed software
up to date for security fixes, even when those security fixes were publicly
known.

They have acknowledged this in their section "Keeping patched".

However, there is one thing I think they have omitted to consider. The more
reliance on third party software not from the server distribution they are
using, the more disparate and unreliable sources for security fixes become.

Careful choice of production software dependencies is therefore a factor.
Usually it is unavoidable for some small number of dependencies that are
central to the mission. But in general, I wonder if they have any kind of
policy to favour distribution-supplied dependencies over any other type.

Another way of looking at this: we already have a community that comes
together to provide integrated security updates that can be automatically
installed, and you already have access to it. Not using this source
compromises that ability. If some software isn't available through Debian, it
is usually because there is some practical difficulty in packaging it, and I
argue that security maintenance difficulty arises from the same root cause.

On a similar note, I'm curious about their choice to switch from cgit to
GitLab. Both are packaged in Debian, but I believe that even Debian doesn't
use the packaged GitLab for Debian's own GitLab instance. Assuming that Debian
GitLab package's version is therefore not practical, wouldn't cgit be better
from a "receives timely security updates through the distribution"
perspective?

~~~
Arathorn
This is an excellent point.

In the (distant) past, we tended to prefer to wrap our own stuff for critical
services (e.g. apache, linux kernel) rather than use distribution-maintained
packages. The reason was pretty much one of being control freaks: wanting to
be able to patch and tweak the config precisely as it came from the developers
rather than having to work out how to coerce Debian's apache package to
increase the hardcoded accept backlog limit or whatever today's drama might
happen to be.

However, this clearly comes at the expense of ease of keeping things patched
and up-to-date, and one of the things we got right (albeit probably not for
the right reasons at the time) when we did the initial rushed built-out of the
legacy infrastructure in 2017 was to switch to using Debian packages for the
majority of things.

Interestingly, cgit was not handled by Debian (because we customised it a
bunch), and so definitely was a security liability.

Gitlab is a different beast altogether, given it's effectively a distro in its
own right, so we treat it like an OS which needs to be kept patched just like
we do Debian.

For what it's worth, I think by far the hardest thing to do here is to
maintain the discipline to go around keeping everything patched on a regular
basis - especially for small teams who lack dedicated ops people. I don't know
of a good solution here other than trying to instil the fear of God into
everyone when it comes to keeping patched, and throwing more $ and people at
it.

Or I guess you can do
[https://wiki.debian.org/UnattendedUpgrades](https://wiki.debian.org/UnattendedUpgrades)
and pray nothing breaks.

~~~
nh2
The Nix package manager can help keeping packages that are not available for
your distribution updated and customised
([https://nixos.org/nix/](https://nixos.org/nix/)).

In the past I used to install newer, or customised, versions of e.g. `git`
than were available on my Ubuntu into my home directory using e.g.
`./configure --prefix=$HOME/opt`. That got me the features I wanted, but of
course made me miss out on security updates, and I would have to remember each
software I installed this way.

With nix, I can update them all in one go with `nix-env --upgrade`.

Nix also allows to declaratively apply custom patches to "whatever the latest
version is".

That way I can have things like you mentioned (e.g. hardcoded accept backlock
for Apache, hardening compile flags) without the mentioned "expense of ease of
keeping things patched and up-to-date". I found that very hard to do with .deb
packages.

It's not as good as just using unattended-upgrades from your main distro,
because you still have to run the one `nix-env --ugprade` command every now
and then, but that can be easily automated.

~~~
gnufx
I only know Guix, not Nix, but I found it mostly harder to make package
definitions for that than to backport rpm and dpkgs, at least for requirements
that aren't radically different from the base system. (That's nothing to do
with Scheme, by the way.)

Then, if you're bothered about security, it's not clear that having to keep
track of two different packaging systems and possible interaction between
them, is a win.

------
nealmcb2
That is an excellent and very helpful writeup!

I'm particularly disappointed to hear that Google doesn't provide any way to
rotate the signing key for an app. Is there an issue for that file with them
anywhere, or more discussion?

Some day, I hope reputable services have migrated to The Update Framework,
which has been pointing out and solving these and other problems related to
software updates for several years now.

[https://theupdateframework.github.io/](https://theupdateframework.github.io/)

Actually, a quick search leads to this - is it indeed possible to rotate your
key, at least for Android's Pie version?

    
    
      https://www.androidpolice.com/2018/08/13/android-pie-includes-key-rotation-way-developers-change-app-signatures/

~~~
Arathorn
So yes, Google Play has let you rotate your key for a few years now, but a)
Riot/Android was set up before that was a thing, b) It gives Google the
ability to push their own updates to your app, which some of the more paranoid
users might object to. So we set it back up with our own key again this time,
but this time will protect it with our lives...

Edit: [https://developer.android.com/studio/publish/app-
signing#app...](https://developer.android.com/studio/publish/app-signing#app-
signing-google-play) is the type of key rotation i was talking about here.

~~~
Arathorn
actually, the mechanism described in
[https://www.androidpolice.com/2018/08/13/android-pie-
include...](https://www.androidpolice.com/2018/08/13/android-pie-includes-key-
rotation-way-developers-change-app-signatures/) sounds different to this, but
given it mandates Android 9.0, we can't use that either yet. (Our minimum
Android is still 4.1...)

------
ummonk
Great post-mortem, with a candid examination of the decisions that contributed
to lax security on the homeserver. While a security breach is never great,
this kind of honest post-mortem improves my estimation of the chances that the
matrix.org team is likely to get things right in the future.

------
theamk
I applaud the decision to get rid of Jenkins.

The way Jenkins works, with each plugin being able to implement arbitrary
endpoint, it is almost inevitable that it would have many security
vulnerabilities.

No Jenkins masters should be exposed to the internet, ever -- and if there is
really no other way, then set up a proxy in front of it with strict whitelist
of allowed URLs.

------
Arathorn
Author here - hopefully the level of detail here will let others learn from
our mistakes (and hopefully benefit from how we've chosen to fix them going
forwards). Happy to answer any/all questions or comments.

TL;DR: keep your services patched; lock down SSH; partition your network; and
there's almost __never __a good reason to use SSH agent forwarding.

~~~
voltagex_
>The attacker put an SSH key on the box, which was unfortunately exposed to
the internet via a high-numbered SSH port for ease of admin by remote users,
and placed a trap which waited for any user to SSH into the jenkins user,
which would then hijack any available forwarded SSH keys to try to add the
attacker’s SSH key

You could also fund/donate to/advocate for a better SSH agent.

I use both Pageant and ssh-agent in my home network for ease of ssh'ing into
boxes, especially Unifi gear and some dev VMs. I don't think I will stop using
agents, but I probably wouldn't use them at work.

Why couldn't there be an agent that required you to touch a Yubikey before
it'd allow keys to be forwarded? Why couldn't you add prompting and timeouts
to an agent?

~~~
JoachimSchipper
ssh-agent has prompting and you can set up a Yubikey with ssh.

The problem here was agent _forwarding_ , which you should almost always
replace with opening a new connection via ssh -J (or equivalent.)

~~~
voltagex_
But can I prompt every time the agent is used?

~~~
Arathorn
How would you know whether the agent is being used by a legitimate app or a
malicious app racing with a legitimate app to steal access?

At least you only would leak a single access, and you would have a higher
chance of noticing, but I can also see that if the hijack was done
intermittently you might write it off as a glitch...

------
alanfranz
The writeup is interesting. As a security conscious developer (and with quite
a lot of experience with deployments of multi-server systems) I wonder if
there's a comprehensive, coherent guide in order to do The Right Thing
security-wise in such scenarios. Multiple interacting servers, multiple
developers, continuous delivery... I think that Google's BeyondCorp approach
is rather different than this (and SSH would be publicly exposed) but has an
inherent level of complexity which would be hard to cope with in a small org.

~~~
cyberpip
Check out
[https://infosec.mozilla.org/guidelines/openssh](https://infosec.mozilla.org/guidelines/openssh)
for a nice overview of best-practices on SSH.

------
gruez
>In terms of remediation, designing a secure build process is surprisingly
hard, particularly for a geo-distributed team. What we have landed on is as
follows:

>We then perform all releases from a dedicated isolated release terminal.

>We physically store the device securely.

Why didn't they go with a HSM?

~~~
Arathorn
This approach isn't incompatible with an HSM, as per:

> The signing keys (hardware or software) are kept exclusively on this device.

We still want to make very sure that the build environment itself hasn't been
tampered with, hence keeping the build machine itself isolated too.

A much better approach would be to use reproducible builds and sign the hash
of a build with a hardware key, but we didn't want to block an improved build
setup on reproducibilizing everything.

Edit: we may be missing an HSM trick, though, in which case please elaborate
:)

~~~
cyphar
I'm not sure I understand what you mean by it being incompatible -- a HSM is a
hardware device which generates and stores its keys separately from your
computer's main memory such that getting the keys (even if the machine is
compromised) should be impossible. In fact, it would eliminate the issues with

Since Android signing keys are just PKCS #8, and GPG keys are supported by
most HSMs, a HSM would definitely be usable (even if you just used an addon
HSM card that you added to your "release terminal"). Unfortunately in order to
safely use the HSM you'd need to re-generate your keys again from within the
HSM -- which obviously is a problem on Android. In addition, HSMs are quite
expensive and might be prohibitively so in your case. But I would definitely
recommend looking into it if you're really stuck on doing distribution
yourselves.

Reproducible builds are a useful thing separately, but using a HSM doesn't
require reproducible builds -- after all signing a hash of a binary is the
same as just signing the binary. The main benefit of reproducible builds is
that people can independently verify that the published source code is
actually what was used to build the binary (which means it's an additional
layer of verification over signatures).

One question I have is how are going to handle the case where the release
terminal fails? Will you have to (painfully) rotate the keys again?

~~~
Arathorn
I said _isn’t_ incompatible.

I.e. we are already using HSMs on the build server.

~~~
cyphar
Ah, oops. That explains why it didn't make sense. :P

------
altmind
> [The attacker] placed a trap which waited for any user to SSH into the
> jenkins user, which would then hijack any available forwarded SSH keys to
> try to add the attacker’s SSH key to root@ on as many other hosts as
> possible.

Can the system you log into via ssh just dump your forwarded PRIVATE key? That
easily?

Or this was about ssh client being patched on jenkins box to add malicious
keys wherever the devops ssh'd from jenkins box?

~~~
Arathorn
sorry, I think I could have been clearer here.

When you log into a host with SSH agent forwarding turned on, the private key
data itself isn't available to the host you're logging into. However, when you
try to SSH onwards from that host, agent forwarding means that the
authentication handshake is forwarded through to the agent running on your
client, which of course has access to your private keys.

So, even though the private key data itself isn't directly available to the
host, any code running which can inspect the SSH_AUTH_SOCK environment
variable of the session that just logged in can use that var to silently
authenticate with other remote systems on your behalf.

If you've already found a list of candidate hosts (e.g. by inspecting
~/.ssh/known_hosts) then your malware can simply loop over the list, trying to
log in as root@ (or user@) and compromising them however you like. Which is
what happened here, by copying a malicious authorized_keys2 file with a
malicious key onto the target hosts. You don't need to patch the ssh client;
it's just ssh agent forwarding doing its thing. :|

~~~
voltagex_
A simple yet clever attack. I wonder how you'd protect against it without
banning SSH forwarding, which has almost certainly saved me from (some) RSI.

~~~
Arathorn
since banning ssh agent forwarding I haven’t missed it at all - ssh -J has
been an almost perfect replacement for my use cases.

What sort of thing are you using ssh -A for which couldn’t be replaced by ssh
-J?

~~~
mschuster91
> What sort of thing are you using ssh -A for which couldn’t be replaced by
> ssh -J?

git checkouts from private repositories, for example. HTTPS requires
username/password which may or may not be checked/monitored.

~~~
Arathorn
Right. I covered this specifically in the writeup, because it's a use case
that we have too. Our proposal is:

> If you need to regularly copy stuff from server to another (or use SSH to
> GitHub to check out something from a private repo), it might be better to
> have a specific SSH ‘deploy key’ created for this, stored server-side and
> only able to perform limited actions.

And this is the approach we're taking going forwards.

If the problem is that you only ever want to read from git when an admin is
logged into the machine, i guess the safest bet would be to use a temporary
deploy key (or temporarily copy the deploy key onto the machine until you've
finished admining). Forwarding all the keys from your agent is a recipe to end
up pwned like we did, however.

------
squid000
I wonder why no one uses SSH certificate based authentication. Yes SSH support
certificate signed keys to allow login. No keys need to be at server at all.

------
EGreg
Package management and microservices is the reason for this.

------
deadbunny
TL;DR Don't let devs manage your systems.

------
zajd
This might be the best post-mortem I've seen. Bravo Matrix.org team,
informative and inspires confidence in your process.

------
dreamcompiler
> SSH agent forwarding should be disabled.

> SSH should not be exposed to the general internet.

> If you need to copy files between machines, use rsync rather than scp.

Great. Just great. I still remember when SSH was described as the solution to
fix telnet and rcp. And now we can't use it any more. Fan-freaking tastic.

~~~
Arathorn
SSH is fine :) But agent forwarding is the biggest footgun imaginable, and scp
sadly has design flaws some of which it literally inherited verbatim from rcp.

But using SSH as a shell is fine. And rewiring your fingers to type rsync
rather than scp isn't too bad either - plus you get resumption etc for free :)
(And yes, I appreciate the parent is being slightly tongue in cheek).

Edit: of course, if we'd been using xrsh and xrcp from XNS rather than this
newfangled TCP/IP stuff none of this would probably ever have happened...

~~~
dreamcompiler
Sorry for my snarky tone. I'm dealing with an intrusion of my own right now
and your writeup was actually quite helpful, so thanks for doing it.

~~~
Arathorn
gah, sorry to hear that - good luck!

------
wockaflocka
I have never used matrix.org service but I had heard of them previously from
their website I could see there was the word ‘security’ or ‘secure’ used a
lot.

Reading the blog post I wonder how many security specialists this organisation
really has as they would never allow these fundamental errors to be made even
with the explanation that they setup their infra in a rush. Dedicated security
teams would have surely fixed these basic errors.

I would advise anybody looking for ‘secure’ applications to stay away from
these organisations who knows how many possible flaws are deeply embedded in
their systems like zero days, memory leaks and more they did not even have a
basic security policy system in place... please don’t use the word secure

~~~
fartcannon
Isn't a lot of it/all of it reviewable on their github? Does that not help you
make a decision on their quality?

~~~
wockaflocka
Dosent really help if I can just go into their system and introduce my own
code into their SDKs or just sign my own release of a build. It just makes me
question how secure their build process is? Without security people you cannot
claim to be secure?

