Just have a segregated network, and let the VPC/dhcp do all the hard stuff.
Have your hosts on the default VLAN(or Interface if your cloudy), with its own subnet (Subnets should only exist in one VLAN.) Then if you are in cloud land, have a second network adaptor on a different subnet. If you are running real steel, then you can use a bonded network adaptor with multiple VLANs on the same interface. (The need for a VLAN in a VPC isn't that critical because there are other tools to impose network segregation.)
Then use macvtap, or macvlan(or which ever thing that gives each container a macaddress) to give each container its own IP. This means that your container is visible on that entire subnet, either inside the host or without.
There is no need to faff with routing, it comes for free with your VPC/network or similar. Each container automatically has a hostname, IP, route. It will also be fast. As a bonus it call cane be created at the start using cloudformation or TF.
You can have multiple adaptors on a host, so you can separate different classes of container.
Look, the more networking that you can offload to the actual network the better.
If you are ever re-creating DHCP/routing/DNS in your project, you need to take a step back and think hard about how you got there.
70% of the networking modes in k8s are batshit insane. a large amount are basically attempts at vendor lock in, or worse someone's experiment thats got out of hand. I know networking has always been really poor in docker land, but there are ways to beat the stupid out of it.
The golden rule is this:
Always. Avoid. Network. Overlays.
I have bare metal servers tied together with L3 routing via Free Range Routing running BGP/VxLAN. It Just Works.
No hard coded vlans between physical machines. Just point-point L3 links. Vlans are tortuous between machines as a Layer 2 protocol, given spanning tree and all of its slow to converge madness.
Therefore a different Golden Rule:
Always. Overlay. Your. Network.
Leave a note if you'd like more details.
> Always. Avoid. Network. Overlays.
What do you think VPC is?
but, unless you have an actual reason, why put another layer over the top? especially given the performance and tooling hit.
Edit: also the OSI layer model was specified in the eighties, and isn't all that accurate in 2019 to describe how our networks actually work.
A subnet should only be in one vlan, but there are networks where there is more than one subnet in a vlan.
Whether that is appropriate or not, that would be a different topic.
A VLAN will isolate macs so that only those adaptors in that VLAN can see each other. Granted, there isn't really a concept of a netmask based subnet, but then that's because you don't really have control over one's physical address.
Now, you can have an adaptor in more than one VLAN, which is the point of them. As I said its not a perfect analogy, but then they are there to achieve different things based on different semantics.
We encrypt 100% of our machine-to-machine traffic at the TCP level. There's a lot of shuffling of certs around to get some webapp to talk to postgres, then have that webapp serve https to haproxy, etc.
I'd be awesome if there was a way your cloud servers could just talk to each other using wiregaurd by default. We looked at setting it up, but it'd need to be automated somehow for anything above a handful of systems :/
I don't understand why you'd want to do this?
I use wireguard to join machines on disparate networks into one.
However to do it inside the same VPC, I just don't get. If you don't trust your VPC surely you need to be moving off the cloud?
Further, leaving your VM, you hit a shared NIC and network cables, so you start to worry about phyiscal layer attacks.
Amazon specifically states they handle these issues, and indeed they likely do, but how do you know? If you're able to easily encrypt by using something like istio, then why not?
"Packet sniffing by other tenants: It is not possible for a virtual instance running in promiscuous mode to receive or“sniff” traffic that is intended for a different virtual instance. While customers can place their interfaces into promiscuous mode, the hypervisor will not deliver any traffic to them that is not addressed to them. This includes two virtual instances that are owned by the same customer, even if they are located on the same physical host. Attacks such as ARP cache poisoning do not work within EC2. While Amazon EC2 does provide ample protection against one customer inadvertently or maliciously attempting to view another’s data, as a standard practice customers should encrypt sensitive traffic."
I merely pointed it out because the OP was talking about encryption done at the TCP layer. :)
In my experience, IPv4 has the strong advantage of being familiar and well-supported, which means that when (not if) your network infrastructure starts to act up, it's easier to figure out what's going on. IPv6 works great if you have robust, reliable multicast support on all your devices and nothing ever goes wrong.
In IPv4 you're going to need RFC1918 addresses, and then you're going to have to make sure that _your_ RFC1918 addresses don't conflict with any _other_ RFC1918 addresses that inevitably absolutely everything else is using or else you'll get hard-to-debug confusion. No need in IPv6, you should use globally unique addresses everywhere, there are plenty and you will not run out.
Everybody who has ever used a single byte to store a value they were convinced wouldn't need to be more than a few dozen, and then it blew up because somebody figured 300 ought to fit and it doesn't already knows in their heart that they shouldn't be using IPv4 in 2019.
I'm hesitant to use IPv6 because it is not merely IPv4 + more addresses, it's IPv4 + more addresses + a very clever design that hides the L2 vs. L3 distinction by relying heavily on multicast groups + a replacement for ARP + a replacement for DHCP + etc. etc. etc. I know I shouldn't be using IPv4 in 2019, but I don't have a better option. I'm not excited about clever systems, hiding, the assumption that multicast works reliably, losing the last few decades of monitoring and debugging tools, happy eyeballs, etc., and I'm not willing to subject my users to the resulting outages simply because it'll save me the headache of thinking about numbering.
ZeroTier supports a mode where it emulates NDP for v6 and works without having to do multicast or broadcast at all. It does this by embedding its cryptographic hash vl1 addresses into v6 addresses.
I could run IPv6 on the inside and IPv4 on the outside, sure. I worry this is going to trigger more edge cases than either running IPv6 the way it was intended or IPv4 the way it was intended.
Huh? Are you assuming large flat L2 networks addressed with IPv6?
IPv6 works great at scale, just route everything everywhere, stick with unicast & anycast, and don't roll large L2 domains.
Multicast is entirely unnecessary aside from the small amount needed for ND/RA between host and ToR.
And, for operations, a routed IPv6 network without NAT, VXLAN, or VLANs spanned across switches is much easier to troubleshoot and generally has fewer moving parts to fail.
I will grant that IPv6 + ULAs + BGP + flat networks is easier to think about than IPv4 + 10.0.i++.0/24 + BGP + flat network because you have basically unlimited ULAs, but "You have to pick a unique 10.0.i++.0 for each machine, and that's annoying" doesn't seem like the primary thing the article is trying to forget. If you can do a hierarchical routed IPv6 network, you can almost certainly do it with IPv4, too.
Quagga is available in the default package managers of most distros so its a good place to start.
And they have their own apt repo at deb.frrouting.org for other Debians.
Microsoft runs FRR on SONiC.
Vyos runs FRR.
6wind runs FRR.
Cumulus Networks runs FRR.
Juniper runs FRR in certain products.
VMware runs FRR.
Broadcom is integrating it.
I don't think you are very familiar with the scope of changes that have gone in since Quagga. Not to detract from BIRD - which is a great, solid BGP implementation - but it is disingenuous to say FRR isn't used in production.
Would you trust two compared TCP implementations using those stats as well?
For something simple like this post, using quagga is completely fine and probably much better that using the latest Swiss Army knife.
The Quagga source repo's certificate expired over 6 months ago. Looking at the Bugzilla report (also with an expired certificate) there are 14 blockers, 49 critical and 69 issues that have not been resolved.
So no, I'd agree with the parent comment that using a project as seemingly dead as Quagga for something as critical as BGP routing is putting yourself on shaky ground at the very least.
It’s like someone doing a demo on some text processing where they use grep and the top comment is some jerk saying that map-reduce would be better because some new large systems use it and it’s being actively developed.
Not entirely correct.
Linux has had unicast vxlan for quite some time.
Flannel is doing unicast and works pretty much anywhere.
See "Unicast with dynamic L3 entries" section:
Historically vxlan was a multicast thing, but not anymore.
Flannel (popular among the container networking solutions) will maintain its state in etcd by watching the Kubernetes resources then program the linux data plane with static unicast entries for the neighbors.