
How Good Is Amazon's Route 53 Latency Based Routing? - latch
http://openmymind.net/How-Good-Is-Amazons-Route-53-Latency-Based-Routing/
======
FlyingAvatar
This analysis is flawed in that it assumes that the host with the closest
geographical proximity should always provide the lowest network latency.

Very often it does, but especially for smaller countries that do not have the
same number of peering points that exist in the US or Europe, latency will
appear out of whack when compared to geographical proximity.

For this test to be valid, you would need to measure the latency to the remote
host from all four servers to determine if indeed the ideal route was chosen.
Even then the latency may vary depending on the time of day or other network
conditions.

From a practical standpoint, a simpler and more accurate metric would be to
compare page load times in different geographies before and after enabling
Route 53.

~~~
nefasti
Also there a lot of people using OpenDns, Google Dns, etc... that don't have a
server on their actual country, making those request appear to be from another
location.

~~~
interurban
From a comment on the linked post:

"Both Google DNS and OpenDNS implement the (Google-proposed) ecdn-client-
subnet DNS extension, that basically forwards the higher part of the client IP
to the authoritative DNS for the specific purpose of latency-based routing...

I assume you're not using latency-based routing for static data (otherwise a
CDN would solve it). In that case, you're out of luck because it looks like
Route 53 doesn't support ecdn-client-subnet"

~~~
dfc
A lot of people/daemons do not support edns-subnet. Sadly I recently found out
that pdns-recursor does not do edns-subnet.

Id love to hear people's recommendations for caching dns servers that supports
edns-subnet

------
Canausa
This issue is common among DNS providers and is difficult to solve. I formerly
worked for a major DNS company as a developer and this problem came up all the
time.

The issue is a problem with how DNS works at the basic level. As a reminder
there are 2 types of DNS servers. Authoritative DNS servers serve requests for
domains assigned to them. Recursive DNS servers will search out the answer for
a domain request by talking to root and Authoritative DNS servers.

Now a user trying to access a site could potentially use a recursive DNS
server that is located close to them but most people have the option to change
who their DNS provider is. In countries where ISP recursive dns servers are
slow or fail regularly many users opt to change their recursive dns server.
When a user chooses to do so they could pick a recursive dns server that is
used in an anycast dns server setup ( As a reminder anycast is an ability to
trick the internet in to thinking that alot of servers in many locations look
like 1 server and requests are routes to the closest server in anycast ie alot
of servers in many data centers all respond to dns requests to the ip address
8.8.8.8).

Now with Route53's latency based routing Amazon tracks that the 8.8.8.0/24
subnet could respond fastest in US-East and so any one using google's
recursive dns service will be give the dns record corresponding to US-East
regardless of the clients IP address ( Amazon only sees a dns request coming
from google's ip address not the client ip address). With the dns response the
client with that ip address will then connect to the ip address returned from
Amazon through google.

There are DNS products that can bust through this issue.

------
aristus
The intention is good, but GeoIP is laughably wrong. If it worked you wouldn't
need Route 53 in the first place.

Figure out how to measure real user latency[0] and compare to whatever routing
you had before.

[0] Hint: <http://carlos.bueno.org/2009/11/dismal-guide-to-dns.html>

<http://www.slideshare.net/aristus/doppler-12564220>

~~~
papsosouid
Did you miss the part where they tested and confirmed that the IPs in question
have better latency to the "right" datacenters, and are still being sent to
the wrong ones? GeoIP is not the problem, route53 is.

~~~
aristus
I mean that the basic idea of GeoIP routing, ie that physical location is a
good enough proximity for "internet distance" doesn't work, especially at the
edges.

The fact that the OP isn't hosted in EC2 but in "locations [that] map well to
AWS regions" throws the whole thing off. It doesn't matter if their servers
are across the street -- Amazon likely doesn't have good latency and routing
information for datacenters it does not own.

R53 exists to make DNS-based global server load balancing optimal for EC2
servers. They have little incentive (and limited ability) to make it right for
random datacenters. And OP's intuition is right -- balancing at the DNS level
means balancing at the DNS resolver level (leaving aside edns stuff), which
adds another layer of coarseness out of Amazon's control.

Either way, data wins arguments and I'm prepared to be wrong. To test this
more thoroughly, OP could spin up an instance in each EC2 region their real
systems "map well" to, proxy to the real systems, and see if the routing gets
better.

~~~
latch
OP here. Little harsh.. laughably wrong?

I did point out that we've seen this same behavior with EC2 instances (1)
before. So there's a tiny bit of evidence to suggest it might not be such a
joke.

Maybe this is what's lame, but I just don't think you are right about AWS vs
non-AWS locations. Amazon's stance on this does make it clear that you aren't
necessarily going to get the most optimal route (2) , true. But latency within
Singapore tends to be measured in single digit ms (it's 710 KM2). What you are
suggesting (if I understand correctly) is that a Singaporean's latency to US-
East-1 can be faster than his or her latency to US-Southeast-1. I took some of
those IPs, pinged them from EC2 instances in those regions and it's the same
result.

I agree GeoIP is a best-guess effort, especially with public dns. But if
nothing else, the analysis merely confirms that.

(1) <https://forums.aws.amazon.com/message.jspa?messageID=384090>

(2)
[https://forums.aws.amazon.com/message.jspa?messageID=330523#...](https://forums.aws.amazon.com/message.jspa?messageID=330523#330523)

------
csears
The article lost credibility right off the bat when it said...

"We don't use EC2, but our 4 locations map well to AWS regions: Washington DC
to US-East-1, Los Angeles to US-West-1, France to EU-West-1 and Singapore to
AP-Southeast-1."

Latency based routing is not geo-load balancing. Just because their data
centers are in the same physical area, doesn't mean latency is comparable.
Route 53 monitors latency from around the internet to AWS data centers, not
the author's data center. Unless they are colo'd in the AWS facilities and
hanging off their border router, you can't expect good results from using
Route 53 like this.

~~~
latch
Washington datacenter is Softlayer. They peer with US-East-1. Latency tends to
be < 1ms.

SG datacenter is also Softlayer. While I don't think they peer directly with
AP-Southeast-1, latency in SG tends to be 2-8ms (you know, it's a super small
country with only a few carriers). From within SG, latency to any SG location
will be 50x-100x better than US-East.

The only location that's really off is Ireland -> France..which can explain
some, but not other (most) cases.

~~~
rahimnathwani
The latency B<->C doesn't inform you about the latency difference between
A<->B and A<->C. The traffic may take different routes on its way from A,
depending on peering arrangements and traffic conditions.

Real-life example: I live in China, and passing traffic through a VPN in
Singapore to which I have low latency can speed up (lower latency and better
bandwidth) connections to some hosts.

Ping times from my domestic ADSL connection to that host are about 90ms
because they have direct peering with PCCW. Other hosts in Singapore can be
200ms or more away, even though all of them are <10ms away from each other.

------
aioprisan
This analysis is very much wrong. So many people don't use their ISP's DNS
servers, especially in Europe/Asia. Physical proximity does not guarantee
improved latency, and many times latency varies to/from the same node, based
on traffic.

~~~
latch
How does that make the analysis wrong? The very point that you make is
mentioned as a likely reason. The rest of the conclusion is that we need to do
more testing to get any definitive answer.

------
pyvpx
proximity is one factor, not very high up, in a long list of latency factors.
GeoIP is a joke and I'd love to see more people learn how BGP works than
download another maxmind database.

------
mstrem
"Maybe the IP-To-Country Database Is Wrong?"

I recently wrote a post about AWS on the Netcraft website (I work there). I
had a chance to look into detail at Amazon IP ranges etc. and I confirm that
you cannot always rely on normal IP to country databases with Amazon. They
often re assign IP ranges internally to different data centers than what you
would expect.

------
ck2
A far less thorough test:

<http://www.dnscomparison.com/speed.html>

Here is a list of DNS providers to look into comparing:

[http://socialcompare.com/en/comparison/hosted-
authoritative-...](http://socialcompare.com/en/comparison/hosted-
authoritative-dns-providers-r5oyhz1)

------
iamthebest
I'd just like to point out that there are considerations other than latency
that you should be thinking about. Sure, you can optimize for latency, but
just wait and see what happens when you get hit with a DDOS attack...

~~~
pfg
They could remove the affected servers from Route 53 and point to some other
region until the DDOS stopped. True, they'd have to use a small TTL and
clients using a bad-behaving DNS might receive the wrong IP for a while, but
depending on their SLA, that might be acceptable.

Would something like Anycast allow them to failover faster? I would imagine
route propagation isn't instantaneous either, so I'm curious how big the
difference would be.

------
magoon
Why not just spin up EC2 to prove your analysis?

------
JOnAgain
Love the analysis. Great stuff.

