With respect to AWS, in the historical "enhanced networking" case Amazon dedicated hardware by offering SR-IOV capable NICs. SR-IOV is a well understood and effective technique for approaching bare metal performance for virtualized environments, but it tends to lock you into a particular vendor, if not specific model, of hardware. I gather ENA does something a bit different, but I don't know the details.
In Google's case, we dedicate hardware to the Andromeda switch in the form of processor cores (the "SDN" block in the linked post). This allows us to be flexible in terms of NIC hardware while presenting a uniform virtual device to guests, in addition to simplifying universal rollout of new networking features to all zones/instance types.
Both approaches have tradeoffs, although I think even with ENA AWS hits ~70µs typical round-trip-times while GCE gets down to ~40µs. Amazon's largest VMs in some families do advertise higher bandwidth than GCE does currently.
(I was the tech lead for the hypervisor side of this launch — Jake, the post's author, leads the fast-path team for the Andromeda software switch)
[ec2-user@ip-10-0-1-56 ~]$ sudo ping -f 10.0.1.111
PING 10.0.1.111 (10.0.1.111) 56(84) bytes of data.
--- 10.0.1.111 ping statistics ---
115480 packets transmitted, 115479 received, 0% packet loss, time 5385ms
rtt min/avg/max/mdev = 0.037/0.039/0.226/0.008 ms, ipg/ewma 0.046/0.040 ms
[ec2-user@ip-10-0-2-191 ~]$ netperf -v 2 -H 10.0.2.52 -t TCP_RR -l 30
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.2.52 () port 0 AF_INET : first burst 0
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
20480 87380 1 1 30.00 21178.69
Alignment Offset RoundTrip Trans Throughput
Local Remote Local Remote Latency Rate 10^6bits/s
Send Recv Send Recv usec/Tran per sec Outbound Inbound
8 0 0 0 47.217 21178.689 0.169 0.169
It's certainly a nice improvement over what we see on the c4s. Is that using a placement group to ensure proximity (I believe our tests do, but I'd have to double check)? Our benchmarking philosophy is generally to aim for "default" numbers for GCP and "best" numbers for others -- keeps us honest about our "fresh out of the box" behavior.
Also, if we should be seeing better on earlier instance types, I'd love to know what we're potentially doing wrong.