
Post Mortem on Salt Incident - sfg75
https://blog.algolia.com/salt-incident-may-3rd-2020-retrospective-and-update/
======
cetra3
This whole salt-stack incident could've been handled a lot better by salt
themselves:

\- the notification was a week ago to a small mailing list, which is tucked
away on their site

\- no notification to the registry to when you go to download salt (at least I
never received an email, but still get plenty of marketing spam)

\- no posts on social media as far as I can tell, I couldn't find a tweet,
anything on reddit, or anything on hn.

\- they only blogged about it on their official site yesterday, way after
damage had been done

\- one week's notice between the initial announcement and the patch coming
out. The patch being released is basically a disclosure of the vulnerability

\- the patch was released late Thursday early Friday depending on your
timezone, giving attackers the weekend head start

\- the official salt docker images were only patched yesterday

\- You can't get a patch for older versions without filling out a form and
supplying details

\- Ubuntu and other repositories are still vulnerable

~~~
mtam
+1, however, from what I read, the vulnerability can only be exploited if the
attacker has network access to the salt masters port, which should never
occur. The people that got compromised had Salt exposed to the Internet, which
is obviously ridiculous.

Not trying to downplay the critical nature of the vulnerability but the ones
that were compromised by this issue have deeper security issues to deal with.

~~~
mike_d
> has network access to the salt masters port, which should never occur

You seem to prescribe to the "hard shell soft gooey center" network security
philosophy. Should people expose an Oracle server to the internet? Absolutely
not. Does moving it behind a firewall change the fact that every mildly
skilled exploit developer is sitting on an Oracle 0day? Absolutely not.

People have legitimate reasons for exposing Salt to the internet. I do. It's
how I bootstrap random VMs and bare metal from the internet. But in my case
the attack was mitigated by the fact that Salt cascades changes in a bunch of
other systems and re-masters minions to a host only reachable over a tunnel. I
blew away the internet master, restored from a backup, and patched.

> the ones that were compromised by this issue have deeper security issues to
> deal with

Or it was just another Monday. When you become sufficiently large you deal
with incidents on a daily basis. Kudos to the people who publicly postmortem
and talk about what went well and what didn't.

(For the record, I've already been working for a few months on a move to
Ansible for non-security reasons)

~~~
empath75
> People have legitimate reasons for exposing Salt to the internet. I do. It's
> how I bootstrap random VMs and bare metal from the internet.

I question that that is a legitimate reason to expose it to the internet.

Defense in depth is a thing and putting the keys to the kingdom at layer 0
doesn’t seem wise even if a vpn or bastion doesn’t offer perfect protection.

~~~
mike_d
Read the sentence after the ones you quoted. The internet connected salt
master is used to provision accepted hosts in to the tunneled (VPN) network
where the real master lives.

------
VWWHFSfQ
The intruders had root access to every server in a salt deployment for who
knows how long and yet everyone is claiming there's no evidence that any data
or secrets (customer's or otherwise) were exfiltrated from the network. This
is a very dangerous assumption. Nobody has any idea what was run on the
servers since it seems that once the initial attack script was deployed it
downloaded and executed new scripts every 60s and then removed themselves.
Pretty standard C&C ops. It may have started as a mining operation, but that
doesn't mean it was the only thing it was doing.

~~~
lasdfas
I agree. I would like to seem more details of how they determined it was only
crypto mining. Finding only mining scripts in your logs doesn't mean they were
not running other code once they had root.

~~~
sterlind
It seems bizarre to me that a crypto miner got in. It wouldn't make much money
on regular CPUs, and the high processor usage would immediately draw
attention. So it looks like a low-effort botnet, which is embarrassing to get
pwned by.

(The coin mining could be a cover like you mention, but it seems unlikely
since it naturally draws attention.)

~~~
itsajoke
I once worked at a place where a minor piece of cloud infra got exploited. All
the attacker did was run a monero miner on it.

~~~
sterlind
Heh, in a way it makes a good bug bounty. Like if popping calc got you a
trickle of income.

------
hawaiian
I haven't been a fan of Salt since learning they decided to roll their own
encryption.

You don't have to look that far to find problems with that:

[https://github.com/saltstack/salt/commit/5dd304276ba5745ec21...](https://github.com/saltstack/salt/commit/5dd304276ba5745ec21fc1e6686a0b28da29e6fc)

------
kureikain
It's weird that these salt master are reach-able from internet and they can
sleep well with it.

Even with zero-trust network or beyondcorp idea, I still found one extra layer
of protection a VPC give are so great. Few years ago, it has an issue with K8S
API Server, and updating k8s isn't a walk in the park. I felt relax back then
because we have everything inside VPC.

You can use SSH or VPN to access service inside VPC. But any of tools that had
permission to manage your infrastructure should never expose to the internet.

Same thing with Jenkins, if you are using Jenkins to manage Terraform or
trigger Ansible/Salt/Chef run, make sure Jenkins is not reachable from
internet. Using different method to route webhook into it.

~~~
trabant00
I never understood the current trent to say VPN is a thing of the past.
Redundancy in security layers is how you dont't get affected by every CVE out
there.

Imo this is THE lesson to learn from this story.

Seondary: salt and ansible are not very mature yet.

~~~
dijit
Salt is definitely immature (been using it for 5 years and the situation has
actually gotten worse in that time) but Ansible is a weird thing to group.

What issues do you have with Ansible?

------
mtam
“We’ve secured the impacted SaltStack service by updating it and adding
additional IP filtering, allowing only our servers to connect to it.”

So this means they had Salt master ports publicly accessible? Why would anyone
have salt ports open/exposed to public/internet?

~~~
dijit
> Why would anyone have salt ports open/exposed to public/internet?

If you're bootstrapping random servers, this is a fine approach.

The whole Salt connection methodology is 'trust on first connect' (a bit like
the default SSH) with a manual stage in accepting an incoming request and the
connection stream is encrypted.

If you're using salt to bootstrap your VPN servers or network appliances then
it's understandable that you'd have it exposed to a more public network, and
the documentation was clear that this was fine.

Not everything is a virtual machine on a cloud provider.

~~~
darkwater
> If you're bootstrapping random servers, this is a fine approach.

Define "random". I think there is an alternative method not involving exposing
you CM server on the Internet for almost any definition of random. In the
Algolia case it's pretty sure because they now filter the access by IP (so
they KNOW the IPs)

~~~
dijit
"Random" can mean "I don't know before I start my instance".

If you're multi-cloud (vultr, DO, AWS and GCP) you almost certainly will not
know your instances IP before it's provisioned and you can't make use of nice
features like network tags or security labels.

If you're producing test environments then bootstrapping those is going to be
significantly more painful than just opening up your salt-master and running
an authenticated API request to allow those new machines.

As other people have mentioned, this was always supposed to be /possible/ it's
akin to SSH. Sure, you can avoid some log spam and potential issues by
firewalling it off- but it's meant to be possible to run it publicly, it has
always been marketed this way so it's not "insane" that people did it.

~~~
darkwater
> As other people have mentioned, this was always supposed to be /possible/
> it's akin to SSH. Sure, you can avoid some log spam and potential issues by
> firewalling it off- but it's meant to be possible to run it publicly, it has
> always been marketed this way so it's not "insane" that people did it.

I'm not blaming anyone, I'm just saying that if you put well-known software
facing the Internet you are exposing yourself to more risks than not putting
them on the Internet. And for a core infra software as SaltStack I don't
really see a good reason to justify it. I don't justify either putting SSH
publicly accessible unless you are a really, really small company or an
individual.

------
lrpublic
Trusting a central control server is the fundamental mistake here.

It creates a very high value target that is difficult to secure.

I prefer a model where the management commands are signed at a management
workstation and those commands are pushed by the server and authenticated at
the managed node against a security policy.

~~~
brianjlogan
What configuration management tools use this methodology?

~~~
lrpublic
A couple that I’ve built - they are not commercially available.

I’d consider open sourcing something based on them if there’s sufficient
interest.

Perhaps as an integration for one of the major players.

------
0x0
Both this and the ghost cms updates seem to hint that the only reason this was
discovered was the fact that loud crypto miners were exhausting resources.
What are the chances a more quiet attacker hasn't thoroughly ploughed through
the entire infrastructure days ahead?

Also think about how many years this vuln has been present and exposed. Who's
to know blackhats haven't sat on this 0day for years, quietly compromosing
private keys and other data? Spooky.

------
ciprian_craciun
I've seen mentioned in the comments various "deployment" tools (or call them
"configuration management" if you will) being called "insecure" or "immature",
or one being claimed better than another; however I think this is a good
opportunity to talk about a deeper problem, namely the architectural choices
each tool has taken.

These choices all impact the reliability and security of the resulting system,
especially the following:

* do they rely on SSH, or they have implemented their own authentication / authorization techniques? (personally I would be very reluctant to trust anything that just listens on a network port for deployment commands, and it's not SSH;)

* do the agents run with full `root` privileges, or is there a builtin mechanism that allows the agent to act only in a limited capacity, within the confines of a set of whitelisted actions? (perhaps even requiring a secondary authentication mechanism for certain "sensitive" actions, for example something integrated with `sudo`, that provides a sort of 2-factor-authentication with a human in the loop;)

* do the operators have enough "visibility" into what is happening during the deployments? (more specifically, are the deployment scripts easily auditable or are they a spaghetti of dependencies? are the concrete actions to be taken clearly described, or are they hidden in the source code of the tool?)

* are there builtin mechanisms to "verify" the results of the deployments?

* and building upon the previous item, are there mechanisms to continuously "verify" if the deployment hasn't changed behind the scenes?

I understand that some of these features wouldn't have helped directly to
prevent this particular case, however it would have helped in alerting and
diagnosis.

------
alexbrower
Can anyone describe the business benefits of an algolia implementation (vs
Elasticsearch?) for a company that doesn't heavily rely on content searches?
It seems expensive and something that I'd build on my own.

(Disclaimer: long-time operator and fledgling programmer)

~~~
aseure
Disclaimer: I'm a developer at Algolia.

IMHO the two main advantages in favor of Algolia, are the sane defaults for
relevancy and speed and the fact that the service is hosted and can grow with
your business without having dedicated engineers to manage both the
configuration and the infrastructure.

Also, on top of the Algolia services per se (search, analytics,
recommendation, etc.), we're providing a lot of backend and frontend libraries
which one would otherwise need to reimplement when using an elastic- or Solr-
based implementations.

------
vbernat
As a point of comparison, you can also expose Puppet masters to the public
Internet but Puppet is using HTTP/HTTPS as a transport, so it is trivial to
put a reverse proxy in front of it, requiring a valid certificate (managed and
signed by Puppet) to contact the service. This way, no need to maintain a
whitelist of legitimate clients.

