Networking on AWS (2018)

becauseiam · on Jan 16, 2019

There's quite a few things you've missed that are significant and should have been included, maybe one for part two:

* Network ACLs, which describe the ruleset (consider it like a stateless firewall) for subnets and their respective routes. Whilst they are optional, having a default set it straightens out a lot of duplication that may end up in Security Groups (which are more stateful in nature).

* Elastic (public) IPs. NAT instances/gateways require their use, and there is dance to be done around their allocation in account, and attaching to instance interfaces.

* IPv6 components. Egress-only Internet Gateways operate differently to IGWs, as there is no NAT they need a route applied across all subnets both public and private. IPv6 CIDR which allocates the VPCs /56 (and thus each subnet gets a /64, and each instance's interface thus gets a /128 which is bananas, but IPv6 is a second class citizen on AWS). Finally updating the subnets so automatic IPv6 address assignment happens.

* VPC Gateways - these are broken into two types, the older type that support S3/DynamoDB and effectively allow traffic in a public/private subnet to bypass NAT. These enabled can have significant advantages to access and throughput. The newer "PrivateLink" services are different and having pricing costs associated with them.

* DNS and DHCP: It's a rule in the VPC that the delegated resolver lives on ".2" of the VPC's CIDR, and operates in dual-horizon - EC2 hostnames setup accordingly resolved by instances inside the VPC will get the private VPC CIDR address, not any Elastic IP.

cure · on Jan 16, 2019

+1 for VPC Gateways. If you want to get decent aggregate performance out of S3 when you have many readers/writers in your VPC, you want one of those.

grahamlyons · on Jan 17, 2019

Network ACLs have been pointed out as missing from this before but quite a few people said that they were right not be included. I didn't put them in because I've never used them so didn't fall under 'need to know' from my perspective.

IPv6 is another point of contention but again it's not something I've ever used and so, apart from any other controversies with it ("...IPv6 which is only marginally better than IPv4 and which offers no tangible benefit...", https://varnish-cache.org/docs/trunk/phk/http20.html), I'm not qualified to write about it.

EIPs and ENI should probably have been in there but I don't tend to use those that often either so they didn't occur to me.

I'm not sure that VPC Gateways, DNS or DHCP are necessarily need to know things either. VPC Gateways are for a specific routing optimisation which not everyone is going to need. I didn't know the details of the DNS set up for a VPC so thank you for that.

Thank you for the feedback - I really appreciate you taking the time.

jcrites · on Jan 17, 2019

I would also add VPC PrivateLinks to the list, which let you establish private connections between systems in different VPCs without having to either peer them or connect them in other ways. PrivateLinks allow you to relieve the pressure that you might otherwise feel to build a lot of systems in the same VPC.

Another useful concept (not VPC-specific) is using the Infrastructure-as-Code paradigm (e.g., CloudFormation, Terraform) to capture all of your networking configuration in source control, along with who made any changes and the reasons or design documentation for them.

dvtrn · on Jan 16, 2019

Network ACLs [...] Whilst they are optional, having a default set it straightens out a lot of duplication that may end up in Security Groups (which are more stateful in nature).

I inherited an infrastructure that had NetACLs and security groups with duplicate entrypoints and policies, years of accumulated cruft because it was poorly designed and the documentation was even worse (read: nonexistent), security groups all the way down. That one threw me through a hard and annoying mental loop for a couple of hours until picking through with the finest tooth comb revealed what was going on.

The fun part is going to be rebuilding our routing in a new VPC such that it doesn't make the next guy want to put his head in a black hole.

I'd be lying if I said it wasn't a fun challenge in a sordid kind of way, though.

AmericanChopper · on Jan 17, 2019

I guess it’s a matter of preference, but I strongly prefer security groups over ACLs, which I don’t use at all. Even if only from a compliance perspective, a security group is equivalent to a host firewall (which personally helps me with PCI - no need for iptables and windows firewall). Whereas an ACL is a bit harder to make that case with. I also find them easier to audit.

javadocmd · on Jan 17, 2019

I like using ACLs for my coarse-grained "this subnet is allowed to talk to this subnet" rules, and security groups for everything finer-grained. Maybe I'm over-cautious, but I don't want one rogue security group opening up a tunnel to sensitive subnets.

ajbourg · on Jan 17, 2019

Yes, this is one of the best reasons to use network ACLs. (You can also achieve this with routes)

I think the idea is that separate teams with different responsibilities can manage the two different layers. Your app team may manage the security groups but the security team manages network ACLs which limit what can go into or come out of a subnet.

AmericanChopper · on Jan 17, 2019

That’s a reasonable design pattern. For my usecase we have those segmentations in place at the VPC level, so ACLs wouldn’t add anything for us.

dvtrn · on Jan 17, 2019

I'm slightly inclined to agree, it's one of those YMMV scenarios. What happened to me was there was some unholy combination of both going on, duplicating each other, in some cases weaving in and out of each other with some bastard frankenstein topology of route tables to nowhere...

those were frightening times. Entire services would fall over, dogs and cats living together

fideloper · on Jan 17, 2019

How do folks use Network ACLs? I haven't used them personally - relying more on security groups and segmenting subnets to specific tasks (e.g. attached to public network via IGW, or private network only)

I'd love to hear your use cases for Network ACLs.

himangshuj · on Jan 16, 2019

Network ACLS are quite tricky to debug. For one of my connections, network calls were failing because esp was blocked at acl layer, ACL blocks all non tcp traffic by default. Funnily, network calls with same data-center was working but was failing when calling to another data-center. I had to look at VPC flow logs to figure that non tcp protocals were being blocked.

romeisendcoming · on Jan 17, 2019

Let me summarize further. If you come from 20 years of application development and network design/administration in 'real' LAN and IGRP networks with 'real' hardware you are going to be learning everything again.

These cloud end user environments are fake eggs and saccharine sweetener.

mz1290 · on Jan 17, 2019

Agreed! I recently transitioned into a developer role and my first project was working with AWS cloudformation to build a network template.

I think the post does a good job covering the high-level material. NACLs, EIP, and perhaps peering routes would also be good to mention.

cxmcc · on Jan 16, 2019

A few more here: VPC Peering. VPC Endpoints.

llama052 · on Jan 16, 2019

Off topic, but as a network guy by heart I've always been fairly happy with how AWS implements the network side of things, especially in comparison to something like Azure.

AWS you have the same basic concepts of a network, and the terminology aligns enough that you can make sense of it fairly quick if you're in the network realm. Azure however takes all of that 'network' stuff and turns it into this abstraction where you have to carefully follow one of their guides to realize it's out of date, or the UI doesn't show the appropriate information etc. Also you have Azure network portions that block ICMP because of 'security'.

This is all anecdotal from my experience of course, but it's why I keep referring to Azure as the "Excel spreadsheet of the cloud" because the entire design of it is in your face and non intuitive.

For instance if I wanted to make a direct connection like DirectConnect to multiple VPC's in AWS, I'd use the Transit Gateway, connect to it from on-prem, add the VPC and the route, and be done.

In Azure, I'd use expressroute, add the Expressroute circuit to a Subscription, add a gateway for that, and then an additional gateway for each VPC equivalent, create an authorization key for each 'VPC' equivalent and sync them, and then define routing per gateway. Then when you go in to trace the network path ICMP is blocked.

I know AWS is more mature than Azure, so it's not entirely fair to criticize them, but every time I touch Azure I miss AWS, or even GCP. Perhaps it's just me not being familiar enough with Azure. ¯\_(ツ)_/¯

filmgirlcw · on Jan 16, 2019

Microsoft employee here - I don’t work specifically on the networking side of Azure but this is really good feedback that I’ll share with the product teams.

bg24 · on Jan 16, 2019

I do not and did not work for Microsoft, but love this very attitude.

late2part · on Jan 17, 2019

Well, it’s pointless, really. They’ve been given feedback that blocking I Mp is harmfulmsince 2011 and it’s done no good.

CloudNetworking · on Jan 17, 2019

Disclaimer: I have worked for Azure in the past -not anymore- and specifically in Networking.

Do you have evidence that they've been saying they block ICMP because of security reasons that I can forward to the right folks? I can help getting this feedback to them to correct that, because I can guarantee 100% that's not the reason why ICMP is not forwarded by the SLB (and engineering/PM would never say it's for security reasons).

late2part · on Jan 20, 2019

I don’t know why they do it and I didn’t say it’s because of security. It’s harmful enough to my experience that I haven’t used them since.

tlynchpin · on Jan 17, 2019

Transit Gateway was announced at re:invent 2018. That's a few months ago. DirectConnect Gateway and Transit VPC were announced the year before. I would guess about nobody is currently in production with Transit Gateway.

curiouserrr · on Jan 17, 2019

Except you can’t yet do cross-account transit, which makes their Transit VPC offering pretty much useless. The whole point of transit is so you can talk to other VPCs across accounts.

Hikikomori · on Jan 17, 2019

You can. Share it with Resource Access Manager, can even share it to accounts outside of your own AWS organisation.

joemag · on Jan 17, 2019

EC2 engineer here.

You can connect transit gateway to VPCs owned by different accounts.

Hikikomori · on Jan 17, 2019

Will BYOIP be sharable with RAM?

Hikikomori · on Jan 17, 2019

Currently testing it, can do lots of fun stuff with the routing domains. Will roll it out once they get Direct Connect support in Q1.

deepsun · on Jan 17, 2019

What about GCP? I've always heard that Google is way better at networking, at least internally (having their own cables, basically own internet), is that true for their public cloud?

MoOmer · on Jan 17, 2019

Mixed reviews from me on GCP. Some of their services don’t offer many of the network controls that you might expect. IIRC network rules with cloud functions and managed database services were a painpoint for a pet project of mine.

Whereas with AWS, there might be quirks, but the ability to configure your network across most services is done so well that it’s a huge differentiator.

baseballMan · on Jan 17, 2019

Yeah virtual network gateways in Azure are probably the lamest feature ever. Takes literally 40+ minutes to provision one. I've worked about 2+ years on Azure and 5-6 months on AWS and while they both of course have their pros and cons, I prefer AWS for things like this. I do love ARM templates though. <3

CloudNetworking · on Jan 17, 2019

Disclaimer: I have worked for Azure in the past -not anymore- and specifically in networking.

There's a few things I agree (the UI, even though I like it more than the AWS UI, is not my all time favourite GUI application) and others I disagree, but I wanted to correct some things about your specific example as I'm sure you'll find the information useful:

> In Azure, I'd use expressroute, add the Expressroute circuit to a Subscription,

Anything you create is created inside a subscription and those are not separate steps.

> add a gateway for that, and then an additional gateway for each VPC equivalent,

That's accurate and by design. Each VNet needs their ExpressRoute gateway - that's unless you peer those VNets, then you can use the "hub vnet"'s gateway for all of them, e.g.: https://docs.microsoft.com/en-us/azure/architecture/referenc...

> create an authorization key for each 'VPC' equivalent and sync them,

Only if those VNets belong to other subscriptions, that's why you have to authorize them as the owner of the ExpressRoute. Depending on how your company has structured their access to Azure (they might have a subscription only for ER circuits, I've seen that a few times) you'll need to do this or not.

> and then define routing per gateway.

Not sure what you mean by "define routing". ExpressRoute uses BGP and learns routes from on-prem and from the VNet itself.

> Then when you go in to trace the network path ICMP is blocked.

ICMP is not forwarded by the Azure Load Balancer, but the load balancer is not in the path of traffic between on-premises and your VNets. You totally be able to traceroute to/from on-prem in an ExpressRoute scenario.

Also, as per the transit gateway I believe the equivalent might be the Virtual Want https://docs.microsoft.com/en-us/azure/virtual-wan/virtual-w... but I haven't really dug much into it yet. As far as I'm aware it doesn't support ExpressRoute at the moment.

All in all I believe your opinion is biased due to what you're used to (unconscious bias, plz don't get me wrong). I had the similar feeling when going to AWS, but after using it daily for some time you see the rationale behind the (different) product design decisions. We're humans, that's how our brain works! :)

Now on the other hand, I've got this daily AWS exposure for the last few months and it's very similar networking-wise. Heck, I even guessed existing features based on troubleshooting + my previous Azure knowledge (I'm looking at you "Src dst check" https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-en... - AKA "IP Forwarding" in Azure).

benmanns · on Jan 16, 2019

NAT gateways are one of the things that blindsided me on the whole "serverless" idea for hobby projects. To have a Lambda function with access to the outside world and your private network resources your $0.01/month function becomes a $35/month+ expense if you don't want to manage your own t2 NAT instance (and required patches, upgrades, scaling, monitoring, etc).

See https://forums.aws.amazon.com/thread.jspa?threadID=234959

iheartpotatoes · on Jan 17, 2019

There are a bunch of microcharges like this that pop up, but reading your thread are you sure AWS is right for your application? You essentially can't afford it and want a free tier and near-free access? That seems a bit unrealistic. Maybe lambda isn't the right solution?

sudhirj · on Jan 17, 2019

Lambda isn't the problem here, the private network (subnet) is. Basically default to public subnet with security groups configured for your incoming connections.

If you really want / need the airgapping that private subnets provide, you'd better be willing to pay for them, and that makes sense to me personally - outside of PCI DSS or HIPAA compliance (or similar) I don't see any reason to use private subnets. That won't apply on a personal project.

sudhirj · on Jan 17, 2019

There's another gotcha, though is that Lambdas seem to default to inside the VPC by default, which triggers the NAT Gateway cost if you want to do anything useful with them. You'll need to explicitly remember to host the Lambdas outside the VPC.

manishsharan · on Jan 17, 2019

You may consider using NAT instance of EC2. A micro instance which can also serve as your bastion host.

Or if you are a true extreme penny pincher -- have your lamba function invoke aws api to set up a NAT Gateway and update the subnet route, then execute your business function and then clean up the NAT.

sudhirj · on Jan 17, 2019

Huh? I can’t make out if this is sarcasm... you suggesting opening the firewall from the inside for each request, finishing the request and then closing the firewall? For starters, what would happen if request 1 closed the firewall while request 2 was still working?

iheartpotatoes · on Jan 20, 2019

Yeah, exactly!

ceejayoz · on Jan 16, 2019

A VPC with a public subnet that's locked down largely via security groups is probably fine for a project that can't justify a $35/month spend.

edit: Apparently not. See below, my mistake.

athrun · on Jan 16, 2019

It's counter intuitive but attaching a VPC Lambda to a public VPC subnet will not give it access to the internet.

See: https://docs.aws.amazon.com/lambda/latest/dg/vpc.html#vpc-in...

ceejayoz · on Jan 16, 2019

Eww. TIL, thanks.

athrun · on Jan 16, 2019

Can't you simply decouple your lambda project into two different parts where you have public lambda(s) calling your private/VPC lambda function(s) when required?

Public Lambdas can invoke VPC Lambdas (AFAIK, the reverse is not possible without a VPC endpoint).

staticassertion · on Jan 17, 2019

Same. I build a project in my free time and I was pretty surprised to see my bill was 99% NAT Gateways and a few hundred dollars.

A free tier for NAT gateways would go a very long way. I wonder why they wouldn't have one.

davidjnelson · on Jan 16, 2019

Have you tried cloudflare workers? The networking gets taken care of for you, plus they are obscenely fast as they run on the edge closest to the client and use v8 isolates to drop 95th percentile latency from cold starts from ~1.5 seconds to about 300ms.

kentonv · on Jan 17, 2019

Hmm, cold starts for Workers depends on your script size, but should be around 10ms, never 300ms. Are you seeing 300ms? Is that actually Workers cold start time, or cold start for a larger application (e.g. that might include things like HTTP cache warm-up)?

(I'm the tech lead for Workers.)

davidjnelson · on Jan 17, 2019

I don’t think I explained it very well, sorry. I meant 300ms from an external load testing tool called wrk with 4 threads hitting it with 1,000 concurrent connections, including network round trip. This was only visible for the first run, after that I was seeing 99th percentile at ~90ms for the round trip. The worker I tested with is server rendering a react app. I’m away from the computer but I think the average speeds were ~20ms round trip. It’s crazy fast. I’m thrilled with it!

mooreds · on Jan 16, 2019

I taught some courses on AWS for a year and a half. The networking piece is something that is trivial for any network engineer, but for any developer (which is my background) working through the network piece is crucial. It takes a while and this looks like a good reference. However, it's best to also check out the AWS docs https://docs.aws.amazon.com/vpc/latest/userguide/what-is-ama... . They are not always the easiest read, but I find them to be pretty authoritative.

I also like this video https://www.oreilly.com/library/view/amazon-web-services/978... (part of http://shop.oreilly.com/product/0636920040415.do ). Full disclaimer, I used to work with Jon.

orangejewce · on Jan 16, 2019

This piece excludes some incredibly common networking complications: VPC Peering Connections, VPC Endpoint Services, VPN connections.

lopopolo · on Jan 16, 2019

I created a collection of terraform modules that gets a minimal AWS network set up for a single-region webapp: https://github.com/lopopolo/hyperbola/tree/master/terraform/...

dvtrn · on Jan 17, 2019

Kudos for this, sincerely!

wslh · on Jan 17, 2019

I want to add a few notes useful for packet crafting. AWS, Google Cloud, and Azure don't work at layer 2 (Ethernet) as expected since they provide services at layer 3 and up.

For example, if you modify the MAC destination address it will not work in AWS. To be able to do that you should disable source/destination checks as is specified in [1].

The last time I checked you cannot do that in Google Cloud or Microsoft Azure.

When we experienced this issue, Reddit was the best resource for answers. I put the Reddit threads as they can help others working in projects requiring packet crafting:

https://www.reddit.com/r/sysadmin/comments/51xypj/vpc_amazon...

https://www.reddit.com/r/networking/comments/51y52n/aws_vpc_...

https://www.reddit.com/r/sysadmin/comments/533e14/google_com...

[1] https://docs.aws.amazon.com/vpc/latest/userguide/VPC_NAT_Ins...

CloudNetworking · on Jan 17, 2019

And for those curious as to how/why this works this way, Azure has published papers on their virtual switch: https://www.microsoft.com/en-us/research/project/azure-virtu...

ninetax · on Jan 16, 2019

Nice, I found this explanation a bit more in depth and super helpful as well

https://start.jcolemorrison.com/aws-vpc-core-concepts-analog...

ggm · on Jan 16, 2019

As long as IPv6 is a second-class citizen, things are going to continue to be painful in AWS.

I did the whole "here is your /56, now segment it yourself" thing. Its crude. It should not be neccessary, if V6 was central to the model, you'd be assigned /64 from your covering prefix automatically, as you deploy regional nodes.

cygned · on Jan 16, 2019

In my opinion, the most annoying thing about AWS networking - and some other services - is that they often use IDs and do not show labels which forces me to remember them partly, go back and forth or have multiple windows open. The AWS console is not the best UX piece on the web, but this part is especially error prone.

bmurphy1976 · on Jan 16, 2019

Having to reference security groups by id instead of name in cloudformation stacks, terraform, and a variety of other places is one of the most infuriating things. Makes everything much more difficult to configure and maintain because the IDs are so opaque and unique. Somebody has to do the grunt work of looking them and and copy/pasting them or writing scripts to propagate configuration forward. What a waste of time and effort.

scrollaway · on Jan 16, 2019

In terraform you should never be using IDs directly unless it's in a variable or, preferably, a data source. The reference should look like aws_something.mything.id.

mvanbaak · on Jan 18, 2019

Since you mention cloudformation here my tips to make it easier: - tag all groups so its easier to remember - use Export in outputs so you can use ImportValue in other templates - use only cloudformation, no edit/create/add actions in the console

Now the ID thing is 0 problems as you never have to worry about them anymore

Same goes for terraform or any other InfrastructureAsCode

chrisacky · on Jan 16, 2019

I agree, it's frustrating. However, considering you can share security groups, and even share VPC as a resource to different AWS accounts, and that "names" are non-unique, how would you solve the problem of allowing people to select a non-unique resource and know which one they actually mean.

It's a pain, sure, but there's no better solution... explicitness creates certainty in this case..

bmurphy1976 · on Jan 17, 2019

You would use the arns. Those are unique but predictable.

iheartpotatoes · on Jan 17, 2019

Agreed. I wish they would use the NAMES that I gave everything instead of the cryptic IDs. On the plus side, I'm improving my memory?

tambourine_man · on Jan 17, 2019

I’ve been banging my head against the wall for a week trying to set up a site to site VPN in AWS with a Cisco ASA. The auto generated config file have a lot of missing info.

If anyone knows of a good resource on the subject it would be greatly appreciated.

mattbillenstein · on Jan 17, 2019

IMO closed-source stuff is just impossible to get working unless you're certified with that equipment. Try OpenVPN:

https://openvpn.net/vpn-server-resources/site-to-site-routin...

scubbo · on Jan 17, 2019

Something that's missing from this (otherwise great!) guide, that has puzzled me for a while - what's the point? What does this configuration actually gain you/AWS? My best guess is that private subnets are for DDOS protection, but that seems like something that would be better handled by throttling. Given the amount of complaints I've heard about how difficult VPC/Subnet setup is, why bother with it at all? Staving off IP address exhaustion?

Or, to ask it another way - what would be the downside of all your resources being in 1 single-Subnet VPC, spread evenly across AZs?

athrun · on Jan 17, 2019

Typically, compliance requirements will drive you to implement private subnets.

Auditors will want to know which isolation mechanisms you have put in place, and private subnets should be part of your isolation strategies.

Other use-cases:

- Legacy (or third-party) apps whose security model assumes they are behind some sort of private firewall.

- Hybrid deployment where you need to bridge on-premises (or other clouds) address space(s) with your VPC.

> Or, to ask it another way - what would be the downside of all your resources being in 1 single-Subnet VPC, spread evenly across AZs?

Note that a subnet cannot spread across AZs. So, even if you only need/want public subnets, you will want to deploy at least 1 public subnet per AZ.

scubbo · on Jan 17, 2019

Thanks! That's certainly helpful!

llama052 · on Jan 17, 2019

Not sure I'm qualified to answer all of your questions on this but, from a networking perspective..

Private subnets will allow you to reduce your exposure to the Internet, also can reduce costs with something like a NAT gateway. It's useful for things that don't need to be public facing. Generally things on the private subnet can go outbound directly but not have anything come direct into that subnet, you'd need a solution that interfaces with the public side to facilitate that, or manually create a public IP association per instance.

You generally don't want one big subnet in general, it's a broadcast domain and it can be quite chatty when you get a lot of devices on it. Alongside that if you're doing multi-AZ and spanning layer-2 you end up with a lot of additional complexity to get that network to span and be highly available over multiple AZ's, while another subnet can be mostly independent. I know of some weird edge cases where you'd have to span layer-2, but if you're doing anything cloud-native you should be able to build around it.

motive · on Jan 17, 2019

Just as an fyi, inside Amazon's virtual network topology, there is no such thing as layer 2, and thus, no broadcast topology. Normally you'd be 100% correct in seeking to limit that bandwidth, but in Amazon everything works just a little differently.

llama052 · on Jan 17, 2019

Ah yes are right. I was thinking specifically networking, not AWS.

scubbo · on Jan 17, 2019

Thanks! I had to look up "broadcast domain" to understand the last paragraph, but that helped illuminate some things. Thank you!

grahamlyons · on Jan 17, 2019

That's good feedback - thank you.

abalone · on Jan 16, 2019

Tangential question: This guy's blog has fantastic content but I don't see an RSS feed or any other way of subscribing (apart from a much broader Twitter feed). What's the best way to keep up?

grahamlyons · on Jan 17, 2019

Flattery will get you everywhere. It's a Jekyll blog so I'm sure RSS is just a plugin away. And then I'd better write some more content...

abalone · on Jan 17, 2019

Thank you! But seriously, do you rely exclusively on Twitter to get the word out that you've posted something?

I'm just so curious how people do this in the Modern Age. I am totally dependent on RSS readers to subscribe to things but I feel I am living in the past. I get that Medium tried to address the distribution problem, but not everyone posts on Medium. And I get that everyone posts things on Twitter, but I want to scream when people say just follow someone on Twitter because obviously there is a very, very high chance that I will miss if someone posted something new because Twitter is not meant to be read comprehensively.

make3 · on Jan 16, 2019

follow him on twitter : https://twitter.com/grahamlyons

vvanders · on Jan 16, 2019

Been recently putting together a homelab and it's done wonders to help make some of these more abstract things like routing tables and CIDR mentioned a lot more concrete.

voltagex_ · on Jan 16, 2019

Are you able to share any details? I started putting together a more complex setup and ended up flattening things out because I couldn't get routing between e.g. 10.0.0.1 and 10.1.0.1 working.

vvanders · on Jan 16, 2019

Yeah, totally depends on what you're using for a router/gateway/firewall.

I've got a mix of Ubiquiti gear and pfSense. Most of it was a matter of just setting up a static route(like in the article) where when I'm on the 192.x.x.x(192.0.0.0/24) network and want to talk to 10.0.0.0/24 I'd put in the gateway(10.0.0.1) as the next hop. Without knowing more about your setup it's hard to say.

r/homelab is also a pretty decent place with a lot of helpful people interested in networking and homelabs.

voltagex_ · on Jan 17, 2019

I've got an all-Ubiquiti setup (and once you start digging into the forums / need IPv6, boy is it unimpressive)

It seems like going with the Ubiquiti USG instead of the EdgeRouter or a pfsense box was a big mistake.

I'm heavily space/heat/power constrained though.

vvanders · on Jan 17, 2019

Same USG here. If you're on two different networks in the Unifi setup you'll also probably need a firewall rule between them. In my case the 10.0.0.0/24 was on pfSense so the Unifi router just passed it along and didn't do anything else.

gravypod · on Jan 16, 2019

Would be really good to go into managed VPNs and VPC peering. These are some of the amazing things that the VPCs provide you that took me a while to figure out.

devonkim · on Jan 17, 2019

The part that irks me is that if you’re doing any VPC design that’s going to even potentially include peering you need to carefully understand the limitations first. This means that within the same region you can refer to security groups in rules as if they’re in the same VPC, but you can’t do that if you’re peering across regions. Then add in some DNS restrictions (like not being able to directly resolve a peer VPC’s entries, somewhat solvable by use of VPC private zones to serve as DNS across regions) and it can be real awkward. Then there’s overlapping VPC CIDR issues (VPC Transit Gateways can only sorta help this)

The primary caveats beyond basic networks that impact designs is that multicasting is not enabled by the network layers but at the network interface (ENI) layer and you need to carefully look at how security groups really work (they’re attached to an ENI fundamentally, which is how you can route between networks with a single instance as long as it’s within the same AZ)

All of this I’ve found was completely disregarded / unknown by almost every company outside the F500 or high end tech start-ups when they first started with AWS and I’ve spent a lot of my career having to migrate production environments between VPCs so that we can get enough room to grow adequately. Making subnets as small as possible is not what you should be doing in AWS, folks. In fact, making them real small means you spent a fair bit of effort which means you decided to put in a lot of effort without stopping to read the documentation in earnest for a couple hours. And using a default VPC CIDR repeatedly from the console is a pretty grand way to make sure you can never let two VPCs communicate with each other via anything other than a third intermediate VPC that you’ll have to migrate to eventually.

Some of the overly-cautious networking approaches I’ve seen include making a VPC for every single application / service, using a NACL for every application (multiplied by every AZ used to isolate each subnet and cutting off cross-AZ routing thereby, of course), creating your own NAT instance that doesn’t do anything better than a NAT gateway, NAT gateways in every AZ (for a whole $1 of traffic / mo each). The story of problems in AWS infrastructure is the same - trying to plan too far ahead for the wrong things and not realizing the limitations of the right things that are not flexible anymore. This is much more common when companies hire traditionally experienced network engineers that have just a little too much confidence.

jugg1es · on Jan 17, 2019

Planning your network is something everyone has to do whether or not they are using AWS. I think its unreasonable to expect to just throw stuff in AWS without thinking about it and then complain when something comes up unexpectedly.

devonkim · on Jan 17, 2019

The issue is that lots of folks get started on AWS (or any cloud provider) and because everything’s built around getting developers in a hurry to push out some code this step is increasingly taking less and less time. Sure, it’s not a big deal for the usual start-up that fizzles in a couple years but once it’s going and there’s actual users the system is setup that there will be a forced outage of some sort when a couple hours investment would have prevented a lot of headaches.

The common pivot for a company is to go from a b2c company to b2b and that means regulations. That means you can’t do cowboy infrastructure setup like people setup their home network (in fact, most home networks are more secure than what most SaaS devs do as a rule sans router firmware exploits).

mvanbaak · on Jan 16, 2019

For the public subnets where the NAT gateways are, you can use 1 route table for all public subnets together.

Besides that: nice article

grahamlyons · on Jan 17, 2019

Would the default route table do that job? Subnets would be associated with that if they weren't explicitly assigned to another route table, right?

mvanbaak · on Jan 18, 2019

Yes. But i prefer to create my own. Create a riutetable, call it “public” using a tag, attach the internetgateway and attach all public subnets, add a default route that points to the internetgateway and you’re done

simonebrunozzi · on Jan 17, 2019

The title says "everything you need to know about networking on AWS". I wish it were this simple.

The article is well written, but it simply represents maybe 1% of what you need to know here. I would have called it "A simple introduction to networking on AWS".

grahamlyons · on Jan 17, 2019

I'd contest that this is everything that you need to know. Or perhaps more accurately, everything I need to know.

It's certainly not intended to be exhaustive and I am definitely not a network engineer but I think you could operate at a reasonable scale within a single well laid out VPC. Of course there'll be a point for peering etc. but you might not need that.

rkangel · on Jan 17, 2019

Does anyone know of any similar resources for Google Cloud?

Gelob · on Jan 16, 2019

AWS security groups and ACLs are the most worthless things. you cant treat them like a real firewall. you end up just allowing anything outbound or inbound. they dont let you be detailed enough

kondro · on Jan 17, 2019

100% disagree with this. There's no reason to not use whitelist first for almost everything… especially inbound.

Even if you do want to make your internal comms life easier, you can use security-group-based rules that do any/any between the members of the group itself, effectively constructing VLANs easily between services.

AzMoo_ · on Jan 16, 2019

Why are you just allowing anything outbound or inbound? You can specify Allow/Deny on any combination of source subnet, dest subnet, source port, dest port for starters. That gets you a pretty comprehensive ability to lock down a VPC on its own.

Gelob · on Jan 16, 2019

Say I want to allow outbound http/https to 10 different IPs. I can't do that in 1 rule like a traditional firewall.

justnoise · on Jan 17, 2019

Just in case those IPs are within your AWS account: you can apply a single security group to those machines and then use that security group as the destination in the outbound rule.

If they're outside your account then, you're right, that's a shortcoming in AWS (Azure and GCP both allow multiple destinations in a single rule).

Gelob · on Jan 17, 2019

Yes coming from outside aws, you're fucked

iheartpotatoes · on Jan 17, 2019

Dumb question: if the IPs are coming from Route53 for web addresses, why don't you just point them as aliases to the same load balancer? Done and done, right?

AzMoo_ · on Jan 17, 2019

It's about 2 seconds work in CloudFormation though.

iheartpotatoes · on Jan 17, 2019

Ugh! That's the most complex explanation of AWS I've ever seen.

He just described a NETWORK, not AWS.

AWS has renamed lots of things, but all the scary text configs that used to be the domain of wizened sysadmins have been replaced with very simple single-page-app GUI controls. LIke routers and gateways: those terms are largely gone from the AWS vocabulary.

No need to get into subnets and route tables I think.

The majority of clients I've worked with use AWS for web hosting with an ELB load balancer (the most important part), an EC2 instance policy & image (for handling traffic fluctuations), an RDS (database), an S3 and Route53 (external DNS entries)

Point the load balancer to the outside world and then let it spin up instances. That's the most common model I've encountered.

IT's almost cartoonishly simple compared to what the OP wrote here. Almost. Having an understanding of network architecture helps, but not THAT much.

jrockway · on Jan 17, 2019

Having just gone through the process of using EKS, which requires a VPC, I think the article is quite applicable to anyone doing anything of average complexity. I found myself quite often wondering things like "do I have to run an Internet Gateway in every availability zone?" (no) and "do I attach my NAT Gateway to the public subnet or the private subnet in each AZ?" (public, and then add an entry for the gateway in the private subnet).

Amazon does not clearly document any of this. (I think if you read enough you'll eventually figure it out, but experimentation was the most straightforward procedure here).

As for just using a single EC2 instance and RDS... that is something you can do, but not everyone's workload is so simple that they can run it on one machine. And not everyone can afford do be down simply because one AZ is down. Hence, multi-AZ VPC setups.

sl1ck731 · on Jan 17, 2019

As someone who has worked full time putting companies into AWS for almost 4 years as a cloud consultant, I've experienced only a couple occasions where your description is accurate and only for a small portion of the customer's portfolio.

Even when beanstalk is used, all of the stuff mentioned in the OP and more are usually required.

Simple weekend projects maybe, but only small enterprise services work in your model.

baseballMan · on Jan 17, 2019

It depends on how complex the environment you're working in is. If you're at a large enterprise that wants to build a platform capable of scaling to thousands of apps, you most definitely do need to care about everything written here, plus a lot more networking specific things not mentioned.

CloudNetworking · on Jan 17, 2019

The fact that you don't need to know it because you only work with very small customers doesn't mean other folks don't need it :)

Op's article is great IMHO.