
Show HN: Cross Cloud Latency – A tool that measures latency from AWS to GCP - rmanyari
https://cclatency.com/
======
inopinatus
This is using "ping", i.e. ICMP echoes. That's easy to implement, but
problematic in more ways than you might at first think. Firstly, some network
elements may be configured to give ICMP echo request/response packets a
different delivery priority, which could invalidate results for your probably-
not-ping-based application. Secondly, a virtual machine may have more jitter
in scheduling the transmission and the response. Thirdly, you don't get a
picture of how congested the route is - is it good for a gigabit blast, or
only a trickle of data?

But there's a deeper problem: ping measures round-trip time.

Interesting fact about IP routing: quite often, route out != route back.
There's no requirement that paths in the Internet be symmetric, and very
often, they aren't. What's more, congestion is often one-way.

So let's say you have a 92ms RTT between two sites; you can't know, from the
ping alone, if that's an even 46ms each way, or 53ms one way and 39ms the
other, or perhaps even 83ms + 9ms. If your application is sensitive enough to
latency that this tool might be interesting, then it's quite possible that
such asymmetric results are also relevant.

(obviously the speed of light can give you lower bounds on the split, if you
have knowledge of DC locations).

There have been substantial projects to accurately measure one-way latency.
For example, the RIPE Test Traffic project from the RIPE NCC
([https://www.ripe.net/analyse/archived-
projects/ttm](https://www.ripe.net/analyse/archived-projects/ttm)) was a
large-scale and long-running observatory that kept more statistics besides,
such as packet loss. Sadly the successor to this service appears not to
measure one-way latency. For precision, it required both an appliance and a
GPS antenna to be installed, so major cloud providers were unlikely to
cooperate.

~~~
jo909
> Secondly, a virtual machine may have more jitter in scheduling the
> transmission and the response.

I have very recently been over exactly that with a customer that had
considerably increased network latency on some VMs on his VMWare cluster when
the CPU was under load. From the very beginning I was pointing to the
hypervisors scheduler, and ended up measuring the time when the interrupt
handler for the network card was active to receive the packet (which no linux
or program setting could really influence). It took some convincing and
arguing, but they found the magic setting on the hypervisor that made the
problem go away (aptly named Latency Sensitivity).

I'm sure both AWS and GCP teams have that under very tight control, but
especially the cheapest instances with the smallest and burstable CPU budgets
that you would run for such a project are probably running with the noisiest
neighbors on the most oversubscribed hardware.

~~~
sofaofthedamned
Exactly. I've seen on a particular cloud provider too where, although the VM
was idle it was discarding 20%+ inbound packets. That only seems to happen
with providers that overcommit their VMs or don't configure their NICs
correctly.

------
DaSilentStorm
That's pretty amazing, nice work!

The company I work for has a similar tool (no fancy API yet, though) which
shows the latency from some AWS regions to "the world" via different transit
providers.

You can check it out at [https://latency-test.datapath.io/](https://latency-
test.datapath.io/).

The reason we only have three AWS regions at the moment is that we're using
real hardware to do the measurement on network level.

------
tpetry
Its interesting that the latency in the example from GCP to AWS is all the
worse than AWS to GCP. Does anyone know the reason? The biggest argument for
GCP is mostly their really great network.

~~~
jlgaddis
Asymmetric routing, probably. Can't know for sure without bidirectiional
traceroutes.

------
good_intentions
Website down as of 2017-Jul-10?

