* Network ACLs, which describe the ruleset (consider it like a stateless firewall) for subnets and their respective routes. Whilst they are optional, having a default set it straightens out a lot of duplication that may end up in Security Groups (which are more stateful in nature).
* Elastic (public) IPs. NAT instances/gateways require their use, and there is dance to be done around their allocation in account, and attaching to instance interfaces.
* IPv6 components. Egress-only Internet Gateways operate differently to IGWs, as there is no NAT they need a route applied across all subnets both public and private. IPv6 CIDR which allocates the VPCs /56 (and thus each subnet gets a /64, and each instance's interface thus gets a /128 which is bananas, but IPv6 is a second class citizen on AWS). Finally updating the subnets so automatic IPv6 address assignment happens.
* VPC Gateways - these are broken into two types, the older type that support S3/DynamoDB and effectively allow traffic in a public/private subnet to bypass NAT. These enabled can have significant advantages to access and throughput. The newer "PrivateLink" services are different and having pricing costs associated with them.
* DNS and DHCP: It's a rule in the VPC that the delegated resolver lives on ".2" of the VPC's CIDR, and operates in dual-horizon - EC2 hostnames setup accordingly resolved by instances inside the VPC will get the private VPC CIDR address, not any Elastic IP.
IPv6 is another point of contention but again it's not something I've ever used and so, apart from any other controversies with it ("...IPv6 which is only marginally better than IPv4 and which offers no tangible benefit...", https://varnish-cache.org/docs/trunk/phk/http20.html), I'm not qualified to write about it.
EIPs and ENI should probably have been in there but I don't tend to use those that often either so they didn't occur to me.
I'm not sure that VPC Gateways, DNS or DHCP are necessarily need to know things either. VPC Gateways are for a specific routing optimisation which not everyone is going to need. I didn't know the details of the DNS set up for a VPC so thank you for that.
Thank you for the feedback - I really appreciate you taking the time.
Another useful concept (not VPC-specific) is using the Infrastructure-as-Code paradigm (e.g., CloudFormation, Terraform) to capture all of your networking configuration in source control, along with who made any changes and the reasons or design documentation for them.
I inherited an infrastructure that had NetACLs and security groups with duplicate entrypoints and policies, years of accumulated cruft because it was poorly designed and the documentation was even worse (read: nonexistent), security groups all the way down. That one threw me through a hard and annoying mental loop for a couple of hours until picking through with the finest tooth comb revealed what was going on.
The fun part is going to be rebuilding our routing in a new VPC such that it doesn't make the next guy want to put his head in a black hole.
I'd be lying if I said it wasn't a fun challenge in a sordid kind of way, though.
I think the idea is that separate teams with different responsibilities can manage the two different layers. Your app team may manage the security groups but the security team manages network ACLs which limit what can go into or come out of a subnet.
those were frightening times. Entire services would fall over, dogs and cats living together
I'd love to hear your use cases for Network ACLs.
These cloud end user environments are fake eggs and saccharine sweetener.
I think the post does a good job covering the high-level material. NACLs, EIP, and perhaps peering routes would also be good to mention.
AWS you have the same basic concepts of a network, and the terminology aligns enough that you can make sense of it fairly quick if you're in the network realm. Azure however takes all of that 'network' stuff and turns it into this abstraction where you have to carefully follow one of their guides to realize it's out of date, or the UI doesn't show the appropriate information etc. Also you have Azure network portions that block ICMP because of 'security'.
This is all anecdotal from my experience of course, but it's why I keep referring to Azure as the "Excel spreadsheet of the cloud" because the entire design of it is in your face and non intuitive.
For instance if I wanted to make a direct connection like DirectConnect to multiple VPC's in AWS, I'd use the Transit Gateway, connect to it from on-prem, add the VPC and the route, and be done.
In Azure, I'd use expressroute, add the Expressroute circuit to a Subscription, add a gateway for that, and then an additional gateway for each VPC equivalent, create an authorization key for each 'VPC' equivalent and sync them, and then define routing per gateway. Then when you go in to trace the network path ICMP is blocked.
I know AWS is more mature than Azure, so it's not entirely fair to criticize them, but every time I touch Azure I miss AWS, or even GCP. Perhaps it's just me not being familiar enough with Azure. ¯\_(ツ)_/¯
Do you have evidence that they've been saying they block ICMP because of security reasons that I can forward to the right folks? I can help getting this feedback to them to correct that, because I can guarantee 100% that's not the reason why ICMP is not forwarded by the SLB (and engineering/PM would never say it's for security reasons).
You can connect transit gateway to VPCs owned by different accounts.
Whereas with AWS, there might be quirks, but the ability to configure your network across most services is done so well that it’s a huge differentiator.
There's a few things I agree (the UI, even though I like it more than the AWS UI, is not my all time favourite GUI application) and others I disagree, but I wanted to correct some things about your specific example as I'm sure you'll find the information useful:
> In Azure, I'd use expressroute, add the Expressroute circuit to a Subscription,
Anything you create is created inside a subscription and those are not separate steps.
> add a gateway for that, and then an additional gateway for each VPC equivalent,
That's accurate and by design. Each VNet needs their ExpressRoute gateway - that's unless you peer those VNets, then you can use the "hub vnet"'s gateway for all of them, e.g.: https://docs.microsoft.com/en-us/azure/architecture/referenc...
> create an authorization key for each 'VPC' equivalent and sync them,
Only if those VNets belong to other subscriptions, that's why you have to authorize them as the owner of the ExpressRoute. Depending on how your company has structured their access to Azure (they might have a subscription only for ER circuits, I've seen that a few times) you'll need to do this or not.
> and then define routing per gateway.
Not sure what you mean by "define routing". ExpressRoute uses BGP and learns routes from on-prem and from the VNet itself.
> Then when you go in to trace the network path ICMP is blocked.
ICMP is not forwarded by the Azure Load Balancer, but the load balancer is not in the path of traffic between on-premises and your VNets. You totally be able to traceroute to/from on-prem in an ExpressRoute scenario.
Also, as per the transit gateway I believe the equivalent might be the Virtual Want https://docs.microsoft.com/en-us/azure/virtual-wan/virtual-w... but I haven't really dug much into it yet. As far as I'm aware it doesn't support ExpressRoute at the moment.
All in all I believe your opinion is biased due to what you're used to (unconscious bias, plz don't get me wrong). I had the similar feeling when going to AWS, but after using it daily for some time you see the rationale behind the (different) product design decisions. We're humans, that's how our brain works! :)
Now on the other hand, I've got this daily AWS exposure for the last few months and it's very similar networking-wise. Heck, I even guessed existing features based on troubleshooting + my previous Azure knowledge (I'm looking at you "Src dst check" https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-en... - AKA "IP Forwarding" in Azure).
If you really want / need the airgapping that private subnets provide, you'd better be willing to pay for them, and that makes sense to me personally - outside of PCI DSS or HIPAA compliance (or similar) I don't see any reason to use private subnets. That won't apply on a personal project.
Or if you are a true extreme penny pincher -- have your lamba function invoke aws api to set up a NAT Gateway and update the subnet route, then execute your business function and then clean up the NAT.
edit: Apparently not. See below, my mistake.
Public Lambdas can invoke VPC Lambdas (AFAIK, the reverse is not possible without a VPC endpoint).
A free tier for NAT gateways would go a very long way. I wonder why they wouldn't have one.
(I'm the tech lead for Workers.)
I also like this video https://www.oreilly.com/library/view/amazon-web-services/978... (part of http://shop.oreilly.com/product/0636920040415.do ). Full disclaimer, I used to work with Jon.
For example, if you modify the MAC destination address it will not work in AWS. To be able to do that you should disable source/destination checks as is specified in .
The last time I checked you cannot do that in Google Cloud or Microsoft Azure.
When we experienced this issue, Reddit was the best resource for answers. I put the Reddit threads as they can help others working in projects requiring packet crafting:
I did the whole "here is your /56, now segment it yourself" thing. Its crude. It should not be neccessary, if V6 was central to the model, you'd be assigned /64 from your covering prefix automatically, as you deploy regional nodes.
Now the ID thing is 0 problems as you never have to worry about them anymore
Same goes for terraform or any other InfrastructureAsCode
It's a pain, sure, but there's no better solution... explicitness creates certainty in this case..
If anyone knows of a good resource on the subject it would be greatly appreciated.
Or, to ask it another way - what would be the downside of all your resources being in 1 single-Subnet VPC, spread evenly across AZs?
Auditors will want to know which isolation mechanisms you have put in place, and private subnets should be part of your isolation strategies.
- Legacy (or third-party) apps whose security model assumes they are behind some sort of private firewall.
- Hybrid deployment where you need to bridge on-premises (or other clouds) address space(s) with your VPC.
> Or, to ask it another way - what would be the downside of all your resources being in 1 single-Subnet VPC, spread evenly across AZs?
Note that a subnet cannot spread across AZs. So, even if you only need/want public subnets, you will want to deploy at least 1 public subnet per AZ.
Private subnets will allow you to reduce your exposure to the Internet, also can reduce costs with something like a NAT gateway. It's useful for things that don't need to be public facing. Generally things on the private subnet can go outbound directly but not have anything come direct into that subnet, you'd need a solution that interfaces with the public side to facilitate that, or manually create a public IP association per instance.
You generally don't want one big subnet in general, it's a broadcast domain and it can be quite chatty when you get a lot of devices on it. Alongside that if you're doing multi-AZ and spanning layer-2 you end up with a lot of additional complexity to get that network to span and be highly available over multiple AZ's, while another subnet can be mostly independent. I know of some weird edge cases where you'd have to span layer-2, but if you're doing anything cloud-native you should be able to build around it.
I'm just so curious how people do this in the Modern Age. I am totally dependent on RSS readers to subscribe to things but I feel I am living in the past. I get that Medium tried to address the distribution problem, but not everyone posts on Medium. And I get that everyone posts things on Twitter, but I want to scream when people say just follow someone on Twitter because obviously there is a very, very high chance that I will miss if someone posted something new because Twitter is not meant to be read comprehensively.
I've got a mix of Ubiquiti gear and pfSense. Most of it was a matter of just setting up a static route(like in the article) where when I'm on the 192.x.x.x(192.0.0.0/24) network and want to talk to 10.0.0.0/24 I'd put in the gateway(10.0.0.1) as the next hop. Without knowing more about your setup it's hard to say.
r/homelab is also a pretty decent place with a lot of helpful people interested in networking and homelabs.
It seems like going with the Ubiquiti USG instead of the EdgeRouter or a pfsense box was a big mistake.
I'm heavily space/heat/power constrained though.
The primary caveats beyond basic networks that impact designs is that multicasting is not enabled by the network layers but at the network interface (ENI) layer and you need to carefully look at how security groups really work (they’re attached to an ENI fundamentally, which is how you can route between networks with a single instance as long as it’s within the same AZ)
All of this I’ve found was completely disregarded / unknown by almost every company outside the F500 or high end tech start-ups when they first started with AWS and I’ve spent a lot of my career having to migrate production environments between VPCs so that we can get enough room to grow adequately. Making subnets as small as possible is not what you should be doing in AWS, folks. In fact, making them real small means you spent a fair bit of effort which means you decided to put in a lot of effort without stopping to read the documentation in earnest for a couple hours. And using a default VPC CIDR repeatedly from the console is a pretty grand way to make sure you can never let two VPCs communicate with each other via anything other than a third intermediate VPC that you’ll have to migrate to eventually.
Some of the overly-cautious networking approaches I’ve seen include making a VPC for every single application / service, using a NACL for every application (multiplied by every AZ used to isolate each subnet and cutting off cross-AZ routing thereby, of course), creating your own NAT instance that doesn’t do anything better than a NAT gateway, NAT gateways in every AZ (for a whole $1 of traffic / mo each). The story of problems in AWS infrastructure is the same - trying to plan too far ahead for the wrong things and not realizing the limitations of the right things that are not flexible anymore. This is much more common when companies hire traditionally experienced network engineers that have just a little too much confidence.
The common pivot for a company is to go from a b2c company to b2b and that means regulations. That means you can’t do cowboy infrastructure setup like people setup their home network (in fact, most home networks are more secure than what most SaaS devs do as a rule sans router firmware exploits).
Besides that: nice article
The article is well written, but it simply represents maybe 1% of what you need to know here. I would have called it "A simple introduction to networking on AWS".
It's certainly not intended to be exhaustive and I am definitely not a network engineer but I think you could operate at a reasonable scale within a single well laid out VPC. Of course there'll be a point for peering etc. but you might not need that.
Even if you do want to make your internal comms life easier, you can use security-group-based rules that do any/any between the members of the group itself, effectively constructing VLANs easily between services.
If they're outside your account then, you're right, that's a shortcoming in AWS (Azure and GCP both allow multiple destinations in a single rule).
He just described a NETWORK, not AWS.
AWS has renamed lots of things, but all the scary text configs that used to be the domain of wizened sysadmins have been replaced with very simple single-page-app GUI controls. LIke routers and gateways: those terms are largely gone from the AWS vocabulary.
No need to get into subnets and route tables I think.
The majority of clients I've worked with use AWS for web hosting with an ELB load balancer (the most important part), an EC2 instance policy & image (for handling traffic fluctuations), an RDS (database), an S3 and Route53 (external DNS entries)
Point the load balancer to the outside world and then let it spin up instances. That's the most common model I've encountered.
IT's almost cartoonishly simple compared to what the OP wrote here. Almost. Having an understanding of network architecture helps, but not THAT much.
Amazon does not clearly document any of this. (I think if you read enough you'll eventually figure it out, but experimentation was the most straightforward procedure here).
As for just using a single EC2 instance and RDS... that is something you can do, but not everyone's workload is so simple that they can run it on one machine. And not everyone can afford do be down simply because one AZ is down. Hence, multi-AZ VPC setups.
Even when beanstalk is used, all of the stuff mentioned in the OP and more are usually required.
Simple weekend projects maybe, but only small enterprise services work in your model.
Op's article is great IMHO.