I'm starting to get curious if there even is an expert who could set up and maintain a bulletproof AWS Account. From the dev/admin accounts to API Gateway to Lambda to RDS and S3; there's just too much to be an expert on. And it's all handled differently (not to mention how many times it's changed in my mere 4 years of experience).
Instances get role data from the metadata service, but containers can't access that metadata and should access the local ECS agent instead (which has its own API). Lambdas must assume a role with a dedicated policy to even write logs, but setting up a scheduled lambda adds an entire different permission object (with its own policy) just to allow the cloudwatch alert mechanism to even trigger a lambda. DB access can be authenticated using roles, but you have to manually set up the users (and their DB permissions), and it doesn't work with every DB type. S3 buckets can get policies from both built-in bucket policies and from users/roles, with inline or managed policies available for each. The API Gateway's authentication requires a Lambda function, but only passes through a single token and expects an IAM entity in response...
Even AWS' tutorials and built-in managed policies seem to throw their hands up in despair; throwing out wildcard permissions (like s3:*) left and right just to make things work.
No wonder people focus on their network and basic root account security - as frustrating and challenging as network security is, it is still a much more tractable problem.
Just a quick aside, but is this can't or shouldn't? I'm 100% positive you can use something like instance profile credentials from within a container (which loads credentials from the instance metadata service).
I think I agree that there's definitely a lot of depth to topics that should be covered here, and whether you want to go down the rabbit hole will vary based on org size and features you're using.
I'd personally prefer: 1. deep-dives into best practices for each feature as opposed to an on the surface glance.
2. enable it with examples. Include CloudFormation or Terraform scripts to set up each piece so that we actually build something. Documentation is important, but you can't learn without doing.
3. test against the security you've put in place.
That said, this is another of those "more ink should be spilled" moments, since preventing access to the instance metadata is something that you SHOULD do from a security point of view.
I dont even primarily use aws
It lets you do anything, including create, remove, read, update both objects and buckets. You can also change permissions on the buckets using the bucket policies.
> Why on earth would i expect the same security process
Because they do kind of use the same security process - IAM permissions. And not just DBs and S3 - all of AWS uses IAM in some way or another. But it's all just different enough to make it damned hard to create any kind of standard.
Some permissions can be fine tuned, some can not. Some can rely on conditionals, some can not. Some can be overridden at the object level, some can not. Some use roles, some users, some policies.
Understanding the differences - what works when - is what makes it all so hard to understand. And when it changes, all that knowledge is useless (or worse, dangerously incomplete) again.
So if you don't want that, don't write it? I don't see how AWS forces anyone to make that infrastructure decision in any specific case. And its not default. You have to go out of your way to do that.
> Because they do kind of use the same security process - IAM permissions. And not just DBs and S3 - all of AWS uses IAM in some way or another. But it's all just different enough to make it damned hard to create any kind of standard.
So? The standard is running your own services and using any of the standard networking options. They are handling provisioning and networking so naturally they need an abstraction for security of network clients they are connecting for you.
> Some permissions can be fine tuned, some can not. Some can rely on conditionals, some can not. Some can be overridden at the object level, some can not. Some use roles, some users, some policies.
So? permissions, conditionals, asset policies aren't new or complicated and well within what you'd deal with in any tech stack in existence. Same goes for roles, policies, and users... Do those abstractions really seem that complicated? And in which stack do they not exist?
> Understanding the differences - what works when - is what makes it all so hard to understand. And when it changes, all that knowledge is useless (or worse, dangerously incomplete) again.
Thats fine, but arguably not better than any other service provider. You could also say those things about any software service out there - especially if its intended for consumption by engineers. Documenting APIs isn't a solved problem for anyone.
I find AWS API incredibly baroque and has a lot of historic baggage. I suspect a lot of this complexity is a result of an accumulation of features made by multiple people in multiple teams over the years and inertia of customers relying on it, so there is (understandably) no will to change it.
How do we fix that though? Standards seem like the only solution but they either don't move fast enough or the early birds (in this case amazon, but another prime example is microsoft) become so entrenched they set the standard themselves.
My own answer up until now has been to work in linux and open standards jobs (now kubernetes) but this requires increasing amounts of effort.
Let new infra come up in the new region with auth-gateways to allow the new to talk to the old and vs versa...
maybe you put an S3 mirror of data from new-bucket-type to an old-bucket-type for RO data access from within the old region for data created in the new...
old users can make functional requests of the new api - but cannot manipulate anything directly...
Or some such model -- but role out wholly new regions and sunset old over time. (A new region can be us-east-3 next to us-east-2 and can sit in the same physical location to allow for in-house data transit on AWS' part, etc.)
What I'd really love to see an end-to-end example of a non-trivial production-ready project, with all its nitty-gritty details. I'd expect that having a sensible baseline you could look to for general guidance would help improve security and reduce risk.
It has recently started becoming popular quite organically, so I might just write a blog post on it soon.
We consolidated users into a bastion account, ran into annoyances with CFN, and have been using iamy ever since for change management across all our accounts (more of a writeup at https://99designs.com.au/tech-blog/blog/2015/10/26/aws-vault...)
I've found depending on how strict your change management policies are, IAM creds can collect cruft over time as people push new policies in ad-hoc. So iamy is handy for such a situation
- iamy can sync in both directions - pull and push IAM config. So you can easily pull down the ad-hoc changes
- In order to use CFN you need to have access, so there is a chicken-egg scenario if you want to manage ALL users in config
- iamy gives you a nice execution plan of aws cli commands, CFN can be opaque
And iamy does ignore any resource managed by CFN, so it works well as complimentary tool.
It has always felt that AWS has been too playful with the naming of services, to the point of obfuscation. Sure, you and I know what EC2 and S3 are, and what an instance and a bucket are. But for new companies adopting AWS, I swear that a third of my time is translating AWS service names into industry terminology for them, and often here statements like:
- "Why don't they just call it a virtual machine or cloud storage?"
- "What the heck is an EBS or a Cognito?"
- etc, etc
Also, the first run of the AWS console can be overwhelming when compared to that of Digital Ocean (though I know the two aren't really comparable in terms of breadth of services offered, but look how obvious DO's call to action is).
Just immediate thoughts that pop into my head.
 AWS web console: http://imgur.com/GwAeBrC
 Digital Ocean console: http://imgur.com/a/cO3Kn
+ instances not really being deleted when 'terminated', so there's unnecessary clutter on your dashboard.
Ticket open since 2008: https://forums.aws.amazon.com/thread.jspa?threadID=26111&sta...
They also still count against your usage limit, so workflow gets interrupted and you have to wait till they are actually deleted, but you can't really be sure when that happens.
So have a coffee, check back, nope not gone.
Get another one, nope not gone.
The ID of the Internet gateway.
This isn't the only example of entirely useless documentation.
Documentation for VPC users and VPC-less users are munged together. VPC users don't care about VPC-less arguments and documentation, and will never (since new customers must use VPCs). They should be completely separate documentation sets.
AMIs are critical but you have to dig into examples to find an up to date list ids, they're spread across multiple examples in multiple locations of the docs with no directions to find them.
Default values for properties and configurations don't appear to be documented anywhere. There are warnings about deleting the default VPC but no mention of how to remake it. Is the default VPC magic?
Object ids must be [A-Za-z0-9]. Why? Neither JSON nor Yaml have syntactic issues requiring this.
The documentation talks about Redis (cluster mode disabled) and Redis (cluster mode enabled). Redis (cluster mode disabled) is referred to as a cluster. But cluster mode is disabled? And on some pages the documentation uses "shards", on others "node groups", sometimes both - apparently these terms refer to the same things.
Scaling Redis (cluster mode disabled) Clusters only discusses single node clusters. Multi node (cluster mode disabled) clusters are discussed in Scaling Redis Clusters with Replica Nodes. Scaling Up Single-Node Redis Clusters shows as Scaling Up Redis Clusters in the sidebar.
Adding nodes to a cluster currently applies only if you are running Memcached or Redis (cluster mode disabled). Adding nodes applies to all clusters. AWS doesn't support adding nodes to an existing cluster in a partitioned Redis set. This is elucidated in the unlinked page Scaling Redis Clusters with Replica Nodes in a completely different section.
Articles frequently contain irrelevant asides, which makes following the documentation harrowing. Like Because Redis (cluster mode disabled) does not support partitioning your data across multiple clusters, each cluster in a Redis (cluster mode disabled) replication group contains the entire cache dataset. in the middle of Scaling Redis Clusters with Replica Nodes.
Edit: Removed a quote I misread.
Could also be https://www.draw.io
At work I co-develop an open source Python library for reading VPC Flow Logs - it can be an easy way to get started analyzing them for security:
For example, just every AWS environment I look at, someone knew that should create an IAM account, and never use the root account. Which is why there is a root account that's never used, and one IAM account with "Administrator" permission that everyone shares.
If you ever propose we review it, someone will point me at an AWS security guide and say "it's fine, we're not using the root".
My current job has about 14 different AWS accounts, a few are prod, some are lab and others are meta accounts. I've been thinking about having a dedicated account just for security related stuff but I see the value in collect cloudtrail, config and other stuff but, I'm not 100% sure it's worth the effort to get setup right now. Thoughts?