My general experience with crafting IAM policies is very reminiscent of SELinux, in that it's very difficult to work agnostically while adhering to a principle of least privilege. Especially given that this kind of task is often done by admin/ops people, one typically can't know in advance everything that the app might need to be able to access in order to work correctly. The process of discovering this is: try running it -- it fails, you note the permissions error -- look at docs, talk to devs, try to give it the most granular permissions it needs to get around that error -- rinse & repeat, probably many many times. This is onerous and you wind up going all around the mulberry bush trying to understand and satisfy every dependency.
At least, if you're very good and don't mind being perceived as a roadblock, you try to understand things. If you're more typical, you just find the most direct route from logged error to added permission (audit2allow approach). And if you're bad, which is also not uncommon, you just give it the broadest permissions possible and call it a day.
With respect to IAM in particular, I'm finding in the Lambda world that some seemingly straightforward functions wind up needing some sort of access to all kind of other AWS services; these services each have their own funky permissions structures and attendant quirks. Each one is a temptation to the IAM admin to just throw their hands in the air and put a wildcard on it.
I've seen this a ton. I have been an on/off security professional so academically I am committed to the principle of least privilege, but holy hell it can be painful or impossible in real life.
Where possible I've started adopting the "run it and see" or audit2allow approach (there are awesome tools that can do this for AWS IAM perms too), but then before applying the policy, somebody needs to put a quick line beside each permission that explains why. If the answer is "I don't know" and the permission is simple/low-risk then maybe let it go. If it's a high-risk permission then don't do it without answers.
That formula is the only pragmatic one I've seen. Not perfect but sometimes perfect is the enemy of good.
> Not perfect but sometimes perfect is the enemy of good.
This. Add enough hurdles, and people will
a) spend all the energy they're willing to spend on your process, hate it, and as a result never do anything except what you force them to, and in particular, never voluntarily reduce permissions unless forced (because they might need them later and then it'll be a pain). They'll also see you as an enemy, not a partner, which is not the place a security team should be in.
b) optimize for not what is best, but what is least affected by your wall of process, even if it's less secure (e.g. because it's a legacy system that you didn't get around to locking down yet).
c) outsource to less secure vendors, and get it approved because management knows that getting it deployed internally would take forever due to the process
d) in the most extreme case, set up uncontrolled shadow IT with zero security controls and hide it from you - and because they already spent all their energy dealing with "perfect", they don't want to hear the word "security" ever again, and the security posture of their shadow IT shows this.
Thank you for acknowledging this. I worked at a place like that and it put me off security to the point where I didn't want to ever deny any person or app or service permission to anything and was just happy with securing the edge nodes of our network. I got over it and now I'm in a position where my neck is on the line if there is a security breach I feel differently but I can still relate to myself back then and understand the need to be pragmatic and massage people into following the process rather than forcing them.
Years ago I wrote a program to let various services run their course, query Cloudtrail for successful calls madero different AWS services, and attempt to find a minimal set of IAM permissions (not applicable for S3 at the time). The idea was to run an exhaustive test suite with expected allowed actions only and deny anything else. I believe AWS has a similar tool now for IAM but it’s not a problem that’ll be resolved satisfactorily for everyone given the combinatorics of IAM possible. Lateral movement in IAM roles and credentials is tough and even today not every action that IMO should be flagged is reported (IAM role assumption failures across accounts is silent when I checked early last year).
Someone has actually been working on a project like this. While not 100% complete it's the best working one I know of. Can definitely relate to the problem being described here, especially when writing IAM policies for terraform deployments.
Probably can’t be open sourced given IP under contracts but I could try to re-write it. There’s some new services in IAM that could be leveraged to make it more accurate and cheaper to use, too.
Thank you for putting this idea in my head! I’ve been trying to get better at expressing infrastructure as code, and one of the big blockers has been how adding new services to e.g. Terraform is tough when you don’t know all their permissions they need (see also https://github.com/hashicorp/terraform/issues/2834 for example).
Using a test AWS environment to stage and then checking CloudTrail to see what was actually called would be a step forward. Having software to extract it would be even better.
The "I don't know" case is the most painful to ask for since it feels like I am always second guessed.
If I have the permission, I can tell its value in a few minutes but instead I have spend hours doing due diligence to try to justify my answer and still am not confident asking.
I usually end up getting what I need in the end so the admins don't see it as a big issue, but it is a subtle thing that can kills productivity for anything related to AWS for everyone else.
In our company we went the other road. We have the developers write the policies (since I mean, they know what their app needs) and test it in dev environments. After that, ops guys step in during code review to check for too broad allow in the policies. So far it seems to work in acceptable manner.
This sounds like a good process, but it depends a lot on the relationship between dev and ops... I've seen too many dev shops push against changes requested by ops or security because their main pressure is to ship features fast. And then it turns into a management fight and whoever has the more influential management gets the final say while the other side is forced to grumble.
Nice that just like having bots in discord. When administrators hide behind bots with decisions so users cannot retaliate. It also is nice combined with shadow bans, user thinks his posts are going through but no one replies.
Unfortunately for development it is better to get tight integration of dev and ops so you could solve it by discussion and cooperation. Not sure if you can build such teams that often but that would be great.
Well, developers can fight all they want to. I was the dev lead at a company that had to be HIPAA compliant. My neck was on the line if we were found to be out of compliance along with the security director and the operations people.
Now as a consultant working with many large customers, “shipping fast” is nowhere near as important as security.
I'm in the same position and we haven't built our infrastructure yet.
Did you use a managed service?
Or did you build it slowly and carefully on AWS?
I'm currently stuck between the two options. Managed service seems the easiest way to be HIPAA compliant but I'd rather we managed our own infrastructure on AWS since it gives us more flexibility for stuff like blue green deploys and it would be cheaper.
Back then, I was building a green field project on prem. It was more about limiting access and auditing. In the middle of the implementation a mandate came from on high to “move to the cloud”.
I didn’t know anything about AWS back then, they hired an MSP who was just a bunch of old school netops people who knew how to click around on the console and gave us a bunch of VMs.
Long story short, I studied for one AWS certification so I could talk the talk. I learned both all of the things that I could have taken advantage of and saw how much they were making and that changed my whole m.o. and decided to get some experience with AWS and go into consulting.
Next company I went to, the founders outsourced everything technically to an outsourcing company - software and infrastructure and they treated AWS as an overpriced colo. Everything was in one account and everyone had access to it. At first, they were just aggregating publicly available information about doctors for hospitals so it wasn’t a big deal.
They brought a new CTO in and started bringing development in house. I led the charge to first separate out the environment to different accounts, establish a sane CI/CD process and then lock down who had access to prod.
Of course they had secret access keys in config files everywhere. We had to audit the code to make sure that no code was using keys. Locally, every SDK can automatically retrieve the keys from your global config file (that’s nowhere near your git repo) and on AWS it gets permissions based on the attached roles.
Then of course we had to lock down roles. But we couldn’t have the granular permissions we needed because even though they had lots of microservices (we sold access to our APIs to businesses). They were all on two “pet” EC2 instances.
Next step was to move the .Net Core APIs to Docker/Fargate and further restrict the attached roles to those.
Finally, we had to audit all of our AWS dependencies and add encryption where necessary and then sign a BAA with AWS and bring in auditors.
By the time I left a month ago, we could pass the needed certifications and expand our offerings.
It took a lot of upskilling, hiring an internal ops person (I’m a developer who knows AWS) instead of depending heavily on the MSP.
I left for greener pastures - I’m a consultant with AWS.
Currently experiencing this at my current employer. I have a suspicion that slowing the dev and deployment process is in everyone's perceived best interest, until of course the day where we are out competed.
And rightly or wrongly you will be outcompeted by places with more risk appetite, until they have a security breach before they are big enough to swallow the cost.
In my company, developers write policies, but they have to be approved by someone with ops expertise (usually me). And the policies are often either too broad, or not sufficiently tested, and missing permissions for things it needed. Sometimes the same policy has both problems. I don't blame the developers. You can hardly expect every developer to become an expert on AWS's IAM system. Especially given how inconsistent it can be.
You set up a policy in a test environment and run the code there. Of course, generally, the policy can't be identical across environments, so you can run into errors.
That describes my own personal hell with iam, and it drove me to adopt aws-cdk.
aws-cdk is their infra-as-code product and it's extremely valuable for iam alone.
You can say things like: my_lambda.grantRead(s3_bucket)
And it figures out the least privileges necessary to make all of that work.
Plus it's real code, not some annoying DSL, which means you can easily abstract other iam permissions out. I have a fairly tight lambda policy that I reuse in all sorts of places, and it's as easy to use as the above snippet.
As one of those "admin/ops" people with experience of fixing terribly set up systems, a common issue is that users just can't tell the difference between systems that work insecurely and systems that work securely, but they will immediately notice if a system does not work because the security policy is too strict.
You get the best security when everyone is involved in the security design from the "ground up", but quite often there's not enough communication between the people developing some application, whose work may be valued by the number of features they ship and "velocity", and the operations side of things whose work is to provide the infrastructure for running the software, and to keep it running. At worst you just get some 3rd-party consultant to set up a thing somehow and then afterwards have to reverse-engineer it to figure out what the hell they did and how to prevent it from going up in flames immediately.
I always find the lack of communication goes the other direction. Security and compliance teams just enforce an IAM policy without talking to application or product teams. It gets rolled out and lots of things break, even things that have legitimate needs to work the way they do and have considered security best practices heavily already, and then security and compliance just throws their hands up and says too bad, refactor it from first principles regardless of the level of effort, staffing requirements, competing priorities, etc.
That’s weird, because I see it constantly, even for minor systems where the relationship to a compliance requirement is minor / optional. I’ve actually never seen it happen when there is a real security or auditor issue at stake - I’ve only (repeatedly) seen compliance & security teams demand enforcement of a policy that breaks production in circumstances where the whole thing could have been easily prevented if they had gone to product teams and had a conversation first, but they didn’t.
The most recent one I lived through a few months ago was when compliance just all of a sudden decided to wholsesale enforce a bunch or org-wide settings changes to every GitHub repo in the company, and it caused several outages and a huge amount of unplanned triage work as the settings were very sensitive for a bunch of continuous integration systems and jobs.
This was at a Fortune 500 company with a big, well-staffed compliance team. They had to roll back their changes and delay the new settings by several months because only through breaking production did they realize their proposes settings workflow was not feasible given in-house system requirements.
And of course, no apologies at all.
This is pretty run of the mill. I’ve seen the same thing from compliance and security teams in a few other large, “household name” tech companies, and also in a few mid-range startups.
Compliance teams number one MO is to blame product teams for not partnering with them, but it’s the compliance teams who refuse to do the partnering.
> I’ve only (repeatedly) seen compliance & security teams demand enforcement of a policy that breaks production in circumstances where the whole thing could have been easily prevented if they had gone to product teams and had a conversation first, but they didn’t.
This is exactly backwards. Product devs need to reach out to security early in the design phase. There’s no way for a separate security org to understand the app or use case after the fact.
If you want to do $newthing your product dev management needs to involve security, finance, compliance, legal, etc. That’s their job. Developers don’t get to ignore all the normal business constraints the real world offers.
Building within constraints is what engineering is all about.
You are so right on the SELinux comparison. Of course, in this case, there are way more developers that are required to write them.
Reiterating what was mentioned in the thread - the best way to avoid this wildcard situation and make it easier for developers is to use Policy Sentry[0]
Thought I’d mention this for those who read the title and the comments instead of clicking on the tools. This will solve most of your problems with writing IAM policies for machine roles.
If there happens to be an OPA<->IAM adapter for your AWS resources, OPA allows end to end testing of your policies.
As far as adapters go, I know you can get SQL, Kubernetes, Terraform, Kafka, Envoy, s3 (via Minio), EC2/ECS/Lambda (linux). that would cover most use cases I think.
For AWS integration with my commercial tool, I am considering having it inspect it's own permissions and loudly tell you it's misconfigured if you give it permissions to do anything more than what it minimally needs. I wish more tools did this.
There's an incredibly broad set of permissions (at the cloud or OS level). Any app / tool may be written to use any subset of those. And what it uses is rarely documented (because developers don't see IAM security as a primary feature, outside of apps intended for use in regulated environments).
Without automation, this thus requires continual reverse engineering, which is never a healthy, sane long-term solution.
This should be fixed on the product / app side, where folks are much better placed to dump "I need this, and only this" in machine-readable form.
1) put ec2 servers I can't properly IAM lock down in dedicated accounts separated from all other things
2) don't let users create the things.. let users take actions that result in what they want by a programmed set of commands that is peer reviewed (e.g. spin up an EC2 instance with specified config/user data by an MFA'ed credential.)
3) user accounts are created manually (we have <10 employees with AWS access), but their user accounts can only do two things: a) massage their own account basics (rotate password and keys), and b) assume role. User account has MFA at setup and can't be removed (only changed). The roles they some are change controlled and checked regularly.
A LOT, like every damn security talk I've ever been to, says to watch cloud trail and enforce bad action by writing reactive scripts. We enforce before the action is taken, essentially. This works for better for EC2, which, in my opinion, is HORRENDOUS for least privilege.
Serverless items, while sometimes requiring more permissions than I'd like or expect, is far _less_ common to throw a requirement for "Create" on resource '*'..
Lastly, I give developers a playground to learn in that is entirely disconnected from user data. It has some mocked false user data and structures, and it's own isolated domain so the infrastructure can be explored
> And if you're bad, which is also not uncommon, you just give it the broadest permissions possible and call it a day.
Whether this is "bad" depends on what you're targeting... it's bad for security, but good for getting things done. And from an economic standpoint, right now, unfortunately the "good" approach is often the "bad" one.
The $80M file may be less than the cost of doing it right. And until that changes, "good" managers will incentivize the bad approach.
Yeah, the SELinux approach reminds me of IAM - both hard. They need to build in a run code path with full permission propose minimal policy based on accesses seen.
The reality - everyone finds it MUCH quicker to give broad admin rights out otherwise.
One good thing - accounts - you can create an account, give admin to the consultant / outsourced IT group, still bill to org, they can do what they need without endless hassle of IAM. Anyone else using this - it's a pretty rough hammer but seems to work ok so people can get solutions spun up with some efficiency.
At least, if you're very good and don't mind being perceived as a roadblock, you try to understand things. If you're more typical, you just find the most direct route from logged error to added permission (audit2allow approach). And if you're bad, which is also not uncommon, you just give it the broadest permissions possible and call it a day.
With respect to IAM in particular, I'm finding in the Lambda world that some seemingly straightforward functions wind up needing some sort of access to all kind of other AWS services; these services each have their own funky permissions structures and attendant quirks. Each one is a temptation to the IAM admin to just throw their hands in the air and put a wildcard on it.