
IAM is hard – Thoughts on $80M fine from the Capital One Breach - bharatsb
https://twitter.com/kmcquade3/status/1291801858676228098
======
mesofile
My general experience with crafting IAM policies is very reminiscent of
SELinux, in that it's very difficult to work agnostically while adhering to a
principle of least privilege. Especially given that this kind of task is often
done by admin/ops people, one typically can't know in advance everything that
the app might need to be able to access in order to work correctly. The
process of discovering this is: try running it -- it fails, you note the
permissions error -- look at docs, talk to devs, try to give it the most
granular permissions it needs to get around that error -- rinse & repeat,
probably many many times. This is onerous and you wind up going all around the
mulberry bush trying to understand and satisfy every dependency.

At least, if you're very good and don't mind being perceived as a roadblock,
you try to understand things. If you're more typical, you just find the most
direct route from logged error to added permission (audit2allow approach). And
if you're bad, which is also not uncommon, you just give it the broadest
permissions possible and call it a day.

With respect to IAM in particular, I'm finding in the Lambda world that some
seemingly straightforward functions wind up needing some sort of access to all
kind of other AWS services; these services each have their own funky
permissions structures and attendant quirks. Each one is a temptation to the
IAM admin to just throw their hands in the air and put a wildcard on it.

~~~
freedomben
I've seen this a ton. I have been an on/off security professional so
academically I am committed to the principle of least privilege, but holy hell
it can be painful or impossible in real life.

Where possible I've started adopting the "run it and see" or audit2allow
approach (there are awesome tools that can do this for AWS IAM perms too), but
then before applying the policy, _somebody_ needs to put a quick line beside
each permission that explains why. If the answer is "I don't know" and the
permission is simple/low-risk then maybe let it go. If it's a high-risk
permission then don't do it without answers.

That formula is the only pragmatic one I've seen. Not perfect but sometimes
perfect is the enemy of good.

~~~
devonkim
Years ago I wrote a program to let various services run their course, query
Cloudtrail for successful calls madero different AWS services, and attempt to
find a minimal set of IAM permissions (not applicable for S3 at the time). The
idea was to run an exhaustive test suite with expected allowed actions only
and deny anything else. I believe AWS has a similar tool now for IAM but it’s
not a problem that’ll be resolved satisfactorily for everyone given the
combinatorics of IAM possible. Lateral movement in IAM roles and credentials
is tough and even today not every action that IMO should be flagged is
reported (IAM role assumption failures across accounts is silent when I
checked early last year).

~~~
kmcquade
Did you open source it? If not, you definitely should.

~~~
devonkim
Probably can’t be open sourced given IP under contracts but I could try to re-
write it. There’s some new services in IAM that could be leveraged to make it
more accurate and cheaper to use, too.

~~~
jsperx
Thank you for putting this idea in my head! I’ve been trying to get better at
expressing infrastructure as code, and one of the big blockers has been how
adding new services to e.g. Terraform is tough when you don’t know all their
permissions they need (see also
[https://github.com/hashicorp/terraform/issues/2834](https://github.com/hashicorp/terraform/issues/2834)
for example).

Using a test AWS environment to stage and then checking CloudTrail to see what
was _actually_ called would be a step forward. Having software to extract it
would be even better.

------
john-shaffer
Those problems seem to come from the separation of the IAM admin from the
developer. I'm coding a server now. My IAM roles are defined in a template,
and I just add new permissions to the template as I need them. My code has the
bare minimum permissions that it needs, and it doesn't seem at all onerous for
the benefit it provides. So I think the problem is less "IAM is hard", and
more "coordination is hard".

The one big exception I've run into is that to launch a CloudFormation
template, the role practically needs admin access. I'm considering offloading
the launch to a minimal Lambda function with the requisite (very broad)
permissions. Does anyone have a better approach?

~~~
dasil003
This works if all developers understand IAM and don’t just throw a wildcard in
the first time they don’t understand something.

~~~
bobbiechen
Agreed, there is a large learning curve and it took me a long time to wrap my
head around it.

It requires a lot of knowledge and discipline, which sooner or later will
create holes. For example, if you need to pass a role to a service, you'll
need PassRole; if you grant it on " * ", then oops, you might have just
created the opportunity for privilege escalation [1].

There's also probably issues specific to your company: allowing access to read
resource foo _in general_ is not an issue, except your specific company stores
sensitive data there. If every developer is expected to be a security expert,
the security risks increase, and the productivity overhead may even be worse
than having a dedicated security/IAM team that gatekeeps permissions.

[1] [https://rhinosecuritylabs.com/aws/aws-privilege-
escalation-m...](https://rhinosecuritylabs.com/aws/aws-privilege-escalation-
methods-mitigation/)

------
jedberg
FYI for anyone in the same situation, Netflix built some open source packages
to solve this:

[https://netflixtechblog.com/introducing-aardvark-and-
repokid...](https://netflixtechblog.com/introducing-aardvark-and-
repokid-53b081bf3a7e)

The idea is that the default policy on new things is deny all, and then it
monitors cloudtrail for privilege failures and reconfigures IAM to allow the
smallest possible privilege to get rid of that deny message.

~~~
scoot
So it gives a services any privilege it asks for? I haven't read the article,
but from your description it doesn't sound much better than default allow-all.

~~~
danaur
It sounds a lot better. Set up your script and run it and the tool determines
the minimum set of permissions needed for future runs. You lock that
permission set in for future runs. Read the link

~~~
scoot
> You lock that permission set in for future runs

Thanks, that was the important bit missing from OP's description.

------
igetspam
I am not a bank. My risks are much lower. My CORS policies are strict and I
block merges that are too permissive. I immediately disable and remove keys
that people share in slack or emails or commits. I use IRSA everywhere I can
(and net new services since I joined the current org aren't allowed to use
user key pairs ever). We operate on the principal of least privilege and
everything us RBAC. CapOne made a mistake and it was known. IAM is hard but
when picking places to cut corners, security can't be one of them. Hopefully
this fine sends that message.

~~~
toomuchtodo
Did they fine the management responsible for the decisions around these
systems? Incentives matter. If you’re not exposed to the consequences, you’ll
optimize for your comp and parachute out somewhere else when the shit hits the
fan.

------
solatic
* IAM policies for deployed applications should be kept with those applications

* Use a feature like GitHub's CODEOWNERS to make sure that the service's IAM policies "belong" to your infosec team. Any PRs that attempt to change the IAM policy are then reviewed and approved by infosec.

* Set up monitoring that alerts when IAM policy is deployed that is too broad (i.e. wildcards).

* Eventually, recognize that many of your applications are pretty similar, and move to a smarter deployment model which maps a type of service to the correct IAM policy to be deployed with it. Then, what's stored in the service's code repository is which kind of application it is, and the IAM policies are factored out into a common repository owned by infosec.

Look, IAM isn't hard. Allow, deny, verbs, resources, it's all pretty simple.
Not so different from firewall rules we've had for decades now. What's
difficult is managing, not IAM rules specifically, but _anything_ at scale.
Managing security at scale is hard, because managing anything at scale is
hard. What's _more_ difficult is taking legacy setups which already exist at
scale, are poorly secured because they weren't set up with the correct tooling
to manage them at scale, and migrating them to a standards-based approach that
makes it possible to manage them at scale.

What makes it difficult isn't the technology but the organizational politics
that comes with it. If you build it too early, you're over-engineering and
focusing on the wrong thing when there's more "important" stuff to focus on.
If you build it too late, then you need to migrate stuff onto it, stuff which
"just worked", whose stakeholders ascertain too much risk to the transition,
in environments which generally undervalue security work. What makes it
difficult is _politics_.

------
shadowgovt
In my experience, the hardest thing about this whole space is the number of
developers who don't understand what the problem is until it is costing them
$80 million dollars.

Limiting blast radius, for example. Why does your threat model include the
possibility of one of your applications being compromised and using its
credentials to do desired things to other applications? That implies your
programs are buggy. And clearly, your programs aren't buggy; you're using best
practices. How could they be?

Even in professional development in big name companies, this is a surprisingly
pervasive default attitude. Appropriate level of paranoia is something that I
think has to be taught.

~~~
scarface74
It’s not always about even being malicious. At my last company I was an admin.
But I locked myself down so I wouldn’t make a stupid mistake.

In the past, both Apple and Google made mistakes in their installers that if
anyone else did it, people would assume it was malicious.

There was a bug in the iTunes installer that erased files on people’s hard
disk if there was a space in the name.

There was also a bug in the Chrome installer that made your hard drive
unbootable if you had system integrity protection turned off.

------
grumple
All of the AWS services I’ve used are difficult to work with. Documentation is
often vague, outdated, incomplete, or nonexistent. The whole system seems
designed to create jobs for AWS admins. Yes, you’ve got tons of power and
control, but what we often want is transparency and simplicity, and that’s
what AWS is worst at doing natively.

~~~
recogepelotas
I disagree on the documentation bit. I think AWS's docs are really good
overall. You have the services FAQs for high level overview + AWS Docs to get
deep in to the weeds.

~~~
lostmyoldone
They are good in some areas, but certainly not good in others. Eg when it
comes to how aws codebuild/deploy and how ECS service/instance/task roles
interact.

It seems AWS teams are very vertical, leading to more complex cross services
interaction being badly documented, as it's "nobody" who truly owns it?

~~~
erikerikson
Yes, their organizational structure and the lack of trust that sometimes
flares up between teams really shows in the integration. Which is, of course,
exactly the level where most customers live. IAM used to be exactly that but
they centralized that cross-cutting concern and it's improved a lot.

------
brown
IAM wildcards are the new chmod 777.

------
thayne
It doesn't help that AWS IAM is often confusing and inconsistent. And some
permissions that should have resource-level granularity and/or support
conditions, don't. To be fair, this does seem to be improving somewhat, but
even some newer permissions aren't able to be controlled in as granular a way
as I would like.

------
cactus2093
One thing I find interesting is that AWS has added some safeguards to the
console to protect against exactly this, since it's presumably a very common
issue. As of the last couple years when you make any S3 bucket open to the
world you see a big warning about it.

However if you're following the "industry best practices" and using something
like Terraform to manage all your resources including IAM policies, you won't
ever see the warnings. If you take a step back, it's somewhat bizarre that
we've decided that having infra teams manage hundreds or thousands of lines of
not-very-human-readable JSON across all the IAM resources they manage is the
proper way to do things. I believe there are some linters that have some
semantic understanding of IAM rules that could provide the same benefits, but
at least when I last looked into it they weren't very mature and didn't match
all of AWS' own rules.

After experiencing some of the pains of managing a large Terraform
configuration in the past, I've definitely started to wonder if we take the
idea of "infrastructure as code" too literally. I think manually writing text-
only configuration files for infrastructure should start to be seen as an
anti-pattern as well, and we should mostly be working with better, more
intuitive UIs for creating resources and then outputting the representation in
some log format (which can then be read back by the same tool for
reviewing/diffing, replayed in additional environments, rolled back, etc.)

~~~
StreamBright
>> we've decided that having infra teams manage hundreds or thousands of lines
of not-very-human-readable JSON across all the IAM resources they manage is
the proper way to do things.

This is exactly why with few of my friends started to work on a tool that uses
a typed language to express IaC. We can leverage and or relations for AWS
objects. One quick example. S3 resource is PublicWebsite or ForwardOnly or
PrivateBucket. The individual resources then have a bunch of mandatory
properties (using and relationship between them). It is much easier to read
and we have reduced the number of lines of code that we need to grasp to
understand a service significantly. It is also possible to remove options that
you do not want to give to developers at all (for example PublicWebsite is not
a required option for most teams using S3). I really liked Terraform at the
beginning when I thought they are going to improve significantly over the
years but it did not happen. Instead they went down the same rabbit hole many
other projects, lets invent a new language to express Iac. We do not need one.
ML languages are perfectly capable to capture IaC and those languages are
perfect fit while HCL lacks basic expressive power resulting in
seggfaults/exceptions left and right. I still remember the first time we
accidentally set both forward all requests to and website for an S3 bucket and
we had to debug why Terraform just crashes with a meaningless error message.
Imagine when you are trying to do something security related with such a tool.
Not fun.

~~~
cactus2093
Sounds really interesting, is there a public repo up yet to take a look at?

~~~
StreamBright
Could you reach out on Keybase or email?

------
nine_k
The original thread is good, because it states the problem _and_ links to a
number of tools which mitigate the problem in various ways.

I'd say that IAM is a too low-level interface (or language). It makes it hard
for engineers to think at the convenient enough level of abstraction
correctly. (Imagine writing e.g. a C++ compiler in 6502 assembly.)

An obvious solution would be to introduce a tool / language which allows to
operate at the level engineers are used to think at, validate the structure,
and analyze the potential breach impact for every resulting piece. (Sort of
like, again, a compiler. Or at least something like Terraform as a first
step.)

A few steps in that direction are already done with the tools mentioned in the
original thread. But I suspect that a lot can still be done in this area,
bringing fame and potentially money to those who would come up with a tool
which becomes widespread. (I mean, it could be a wide project / startup idea
for those who understand both IAM and formal methods well.)

~~~
mxz3000
This is exactly the sort of problem solved by cdk:
[https://docs.aws.amazon.com/cdk/latest/guide/home.html](https://docs.aws.amazon.com/cdk/latest/guide/home.html)
with cdk you generally don't have to mess with IAM constructs, and can just
use the provided APIs to setup necessary permissions (and those permissions
are always as narrow as possible by default)

------
bilater
Noob question but doesn't having a private VPC at least limit external users
from accessing anything since they have to be part of the network?

~~~
hogu
S3 buckets are accessible everywhere generally.

~~~
bilater
yes S3 is the one exception cause its global

------
techntoke
The sooner that these cloud companies work to create a common open API the
more secure and better off we'll all be. The idea of spending so much time
being specialized for a particular cloud provider is stupid, when you've built
your tech career not knowing actual technology but just a proprietary overlay
that doesn't even resemble anything useful outside of the organization.

------
Ozzie_osman
Apart from IAM complexity, I found it interesting that the author puts the
cost of this breach at 80M. Yes, that's the cost of the fine, but the true
cost including reputation, etc are definitely way higher than that.

80M to a company like Cap One with almost 30B in revenue is an easy write off.
The other costs can be really hard to recover from.

~~~
digianarchist
You’re missing development cost to remediate all of the security issues
revealed via auditing.

------
peterwwillis
Make IAM part of the design of the application. If you need to use AWS API
calls, then you need your app's design architecture docs (you have those
right?) to list the IAM permissions needed to do each thing, and stuff that
info into ADRs, and link to some IaC that you used to stand up your dev/test
environment. All of this creates 1) formal IaC used to apply the permissions,
2) formal documentation of what functions need what permissions. As a final
step during development, write tests that verify the IAM permissions are as
expected.

You will of course need some way to scale this later, but as long as an
artifact of your build pipeline is auto-generated IAM policy jsons, an org-
wide security team can analyze them with automated tools and remediate as
needed.

------
scoot_718
IAM is fine, AWS IAM is needlessly complicated with garbage policies and lack
of explanation.

------
Trisell
IAM is a mental model close to a language. And until you understand that
mental model it might as well be Japanese to an English speaker. But once you
are able to understand that model crafting extremely fine grained policies
becomes a breeze.

------
gdm85
This thread reads like an ad for PolicySentry

------
ianhutch14
In practice, I've see access controls typically as being too loose creating
risk in areas such as admin access or with wild cards as in the Capital One
breach. The alternative is often too tight in which case developers struggle
with ops to manage the things they need access to. Just in time access
controls (and change logs) and isolating resources through the concept of a
project or tenant is one approach, and, if you will pardon the plug, the
approach we’ve chosen with our no code, infrastructure as code platform
www.duplocloud.com.

------
netsectoday
> And honestly, the problem has been so difficult to solve, that I think every
> AWS customer leveraging Instance Profiles or machine roles is vulnerable to
> this somewhere. If one app gets compromised and you haven't limited blast
> radius, you're screwed.

What a shit product. I loved it back when we could discuss IT infrastructure
without using AWS product names.

They have infiltrated IT and the prices will continue to rise. If you took the
bait; you're screwed.

They are ripping through your data (hosted on THEIR machines) to compete
against you.

Here are my thoughts. IT'S A TRAP! STAY AWAY!

------
Trisell
I think this is why AWS has went to recommending the multi-account model with
a service per account. That model greatly limits the blast radius of
misconfigured IAM in an account so that if you lose a service. You lose that
services data. But almost completely block any cross application compromise.
That being said multi-account can be just as difficult as IAM if you don’t
properly architect for it.

------
cperciva
IAM is hard; but deciding that a web proxy shouldn't have access to IAM
credentials should be easy. This is why I wrote imds-filterd.

~~~
8organicbits
[https://github.com/cperciva/imds-filterd](https://github.com/cperciva/imds-
filterd)

That's clever. The format of the config file looks pretty intuitive as well.

------
ENOTTY
AWS Zelkova is in theory supposed to find these sorts of issues. I haven't
used it, so I'm curious what others think about it.

~~~
dopylitty
I went to a talk about it at Re:Invent and it does seem to solve the issue in
theory but the service based on it (Access Analyzer) seems to only apply to
very limited use cases.

~~~
unixhero
Is Re:Invent worth the trip? For me the cost would be formidable.

------
jariel
Aside from the inherent complexity of complex systems, there's another layer
we don't talk about and that is arbitrary complexity, particularly in
communications, standards and documentation.

A lot of IAM does not need need to be that hard but concepts need to be
poignantly clear. It's harder than it needs to be.

~~~
coreyoconnor
I wonder about the complexity and AWS motivations.

What does AWS gain by improving IAM? There are barely any competitors, so they
won't be losing people for that. They offer their own AWS professional
services happy to charge you for making it "understandable". Their service
agreements largely absolve them of client mistakes. Which usually result in
larger bills from AWS.

~~~
jariel
That's pretty cynical.

AWS is a ball of complexity because it grew organically that way, and they
don't have a culture of explaining, or, keeping things simple.

Both of those things would require strong strategic guidance, and a real
effort to do.

Unless Bezos edicts: "Our APIs must remain simple even as they scale, and we
must document in a manner that keeps the 80% common path easy to use, while
the remaining 20% arcane functionality available ..." then it would happen.

But it won't.

It's reasonably well curated arbitrary complexity, it is what it is.

This is not an issue anyone handles well.

~~~
coreyoconnor
No cynacism meant. My mistake. "motivations" was incorrect. I was trying to
ask about how the business of AWS manifests such a thing. Which I think you've
described. Thanks!

------
gumby
To read as text:
[https://threadreaderapp.com/thread/1291801858676228098.html](https://threadreaderapp.com/thread/1291801858676228098.html)

------
cordite
Are there any good papers out there for anyone crafting IAM systems?

More on theory and overall goals than "here's how you use SELinux"

------
k__
FaaS helps here.

It's easier to reason about the access one function does than all the infra a
monolith needs to access.

~~~
coredog64
You’d think that, but Friday I just helped a dev team deploy a Lambda function
that had full capability to update any Lambda in the account. The devs didn’t
know anything about how the app worked, since it was a drive-by from the
architect. He, in turn, just grabbed an AWS blog post, confirmed that it
worked in his personal account, and called it a day.

Also, since the devs don’t know IAM, every resource request is a wild card. A
function compromise would allow deployment of a new function that could
extract every SSM secret, Cognito identity, and S3 object.

------
neonate
IAM stands for Identity and Access Management:

[https://en.wikipedia.org/wiki/Identity_management](https://en.wikipedia.org/wiki/Identity_management)

------
pi-victor
but still, there are services (e.g. dom9) that implement alert and monitoring
for cloud infrastructure security.

------
lachlan-sneff
What got breached?

------
rho4
Too much swearing in the twitter thread. Puts me off.

------
tedk-42
Another reason why microservices are a good thing. They result in micro level
permissions for individual resources.

In saying that though, k8s in AWS was really shit at limiting what IAM roles
containers could assume without it being the instance role. Crap like kube2iam
and kiam came out to butcher the AWS metadata/instance networking. Thankfully
AWS solved it with their new OIDC IDP.

True DevOps culture workplaces have Devs writing their IAM roles as they know
what services the app might talk to.

Keen to try out some tools mentioned by the twitter person

~~~
meritt
You can just as easily give overly permissive IAM credentials to a
microservice. That's what happened here. Some tiny little web service was
breached, and that service was provisioned a full-access IAM credential.

~~~
tedk-42
If you're deploying a microservice on an individual ER2 inctance with an
instance role, you're doing it wrong. Not sure where you got the idea it was a
microservice. Names like a dodgy app server of some kind.

~~~
meritt
The point I am trying to make is "microservice" does nothing whatsoever to
solve the problem of people incorrectly provisioning IAM permissions.

~~~
tedk-42
Nor does it in make developers better at their jobs.

The point i'm trying to make is if the scope of what a system can do is
limited, it's permission boundary/model is easy to define.

Many things led to the incorrect provisioning of the IAM role. Lake of
understanding of IAM for starters as well as consequences around it.

By no means am I saying microservices would solve the problem. But it sure
does make it easier to define what permissions your app needs to have as well
as limit the blast radius of what is exposed when done correctly.

This is impossible with monoliths on EC2 instances.

