
Why ALIAS-type DNS Records Break The Internet - alexbilbie
https://iwantmyname.com/blog/2014/01/why-alias-type-records-break-the-internet.html
======
colmmacc
Amazon Route 53's ALIAS implementation takes a different approach. We only
allow ALIASes to data that we know about authoritatively; so with Route 53 you
can ALIAS to an S3 website Bucket (which also lets you do HTTP redirects), a
CloudFront distribution (which can serve as a bridge to any arbitrary domain
you may care), an ELB or to other records in your zone (which lets you combine
routing policies in a compositional way).

The main benefits are;

    
    
       No dependency on a third party DNS service, we stand behind our 100% SLA
     
       DNS-based routing policies still work correctly
    
       No delay in responding to health check failures
    

but there's another (future, for us) ancillary benefit too;

    
    
       Compatibility with offline-signed DNSSEC
    

The big downside of course is that the record has to be in our system in order
to ALIAS to it. For HTTP services, using CloudFront is a pretty good
workaround, it can handle dynamic and static sites. If you merely want to
redirect from your apex to your www. domain, then S3 with a redirect works
great too.

We're also open to enabling ALIASing to other zones hosted in Route 53 on a
case by case basis. If you have a multi-tenant service and you're managing (or
willing to manage) a zone on Route 53 with something like [customer-or-
resource-identifier].yourservicename.com we can enable ALIASing to those
names. If you're interested, get in touch via the limit increase process;
[http://docs.aws.amazon.com/general/latest/gr/aws_service_lim...](http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html)
.

Obvious question: Why don't we enable ALIASing across zones by default?
Firstly, zones which can be aliased to have to be replicated and available in
all shards of our partitioned datastore (some day we may use some kind of
cross-shard query protocol to resolve that, but even then we'd prefer to
minimize the traffic) and secondly we want to ensure some stability for ALIAS
targets and avoid situations where targets may be discontinued or deleted by
their owners leaving our mutual customers stranded.

~~~
aleem
I host a high profile news site on AWS infrastructure.

> The big downside of course is that the record has to be in our system in
> order to ALIAS to it.

I need DDoS protection as the site gets attacked from time to time. I went
with CloudFlare since ELB doesn't say much if anything about DDoS protection.
This means I can't use Route53 and so I cannot work around apex issues and I
can't point my apex to ELB either.

The suggested work around I keep coming across is to poll the ELB IP and
update it at CloudFlare via an API though I am not sure how good that would
work if someone cached the IPs for whatever reason (even though ELB will honor
requests up to one hour after the IP change I read somewhere else).

Do you have any suggestions either for DDoS mitigation or finding a work
around to my apex woes?

~~~
colmmacc
We've opened up a little lately, and some of our DDOS mitigation details are
now available in my colleague Nate's talk from Re:Invent:
www.youtube.com/watch?v=V7vTPlV8P3U .

What I can repeat from that talk is that we've handled DDOS attacks in the
same orders of magnitude that are as large as anything else we've ever heard,
that we use them ourselves for our own services.

We'd certainly be interested in a conversation about your needs, attacks
you've experienced, and how we can meet them. (e-mail: colmmacc@amazon.com).

ELB is a multi-IP service and can dynamically scale from 1 to ~100 IPs. In
general, when scaling down, IPs are retired from DNS but aren't removed from
an ELB until we've seen traffic drain (which would explain the hour, but it's
variable). If you want to know all of the IPs for an ELB, you can query;

    
    
       all.[elb-name].[region].elb.amazonaws.com
       all.ipv6.[elb-name].[region].elb.amazonaws.com
    

and you'll get them all. Though we strongly recommend against CNAMEing to
those names directly, it can useful if you need the scrape the IPs
systematically for some reason (and it sounds like this might be such a case).
I'm not sure what other workarounds may be available via CloudFlare.

------
shawabawa3
I don't see how the caching thing is a problem specific to alias records. If
you were using an A record instead, you would have to manually update the IP
address, and then it would still be cached for however long.

The geoIP thing is pretty minor, ideally you should be able to tell the dns
server the IP you are proxying for.

Who cares that there's no standard? Complete non-issue.

Even if I agreed that alias records were bad, whats the alternative? Manually
updating A-records?

~~~
JoachimSchipper
If I point foo.mydomain at bar.otherdomain with a CNAME, otherdomain can
transition to a new IP without doing anything more than is required to make
bar.otherdomain work.

If I point foo.mydomain at whatever bar.otherdomain's IP is (i.e. use an A
record), otherdomain breaks my stuff whenever they change IP addresses, and
there's not a lot they can do about that. Automating this (via the ALIAS
mechanism mentioned in this article) just means that the breakage eventually
goes away.

The correct solution, as pointed out, is to use CNAME - which requires that
you use a subdomain.

~~~
donavanm
Can you illustrate this with an example? How about explicit records and a 60
second ttl on A and CNAME to humor me.

If implementing recursion based ALIAS foo.mydomain RRs MUST respect the TTL of
bar.otherdomains RRs. With that proviso Im missing what the effective cache
lifetime difference is between the two.

~~~
JoachimSchipper
I agree that this would work decently well if you could actually specify a
60-second TTL; but there are quite a few DNS resolvers that cache responses
for at least a day, and there's not a whole lot you can do to change that.
(This makes some sense - re-resolving all the time doesn't help performance!)

~~~
imbriaco
"Quite a few" is actually a very very small percentage these days.

If your ISP is disregarding TTL, you're just as bad off with CNAMEs pointing
at a third party as you are with the ALIAS-type record.

------
donavanm
Im missing quite a lot in this article. Personally i have my reservations
about ALIAS. Recursion backed implementations in particular are full of
Dragons and sharp corner cases. It's a shame that its missing any substantial
critisicm or examples of poor implementations of ALIAS records.

    
    
      An authoritative nameserver ... can deliver records in a predictable speed

Thats a nice to have. And totally unrelated to the AA bit. Query latency is
usually implemented via caching then querying multiple/fastest available
authoritative NS.

    
    
      An ALIAS record resolves on request the IP address of the

destination record and serves it as if it would be the IP address for the apex
domain requested First, backing with dns resolution is just one implementation
of the idea. Secondly _it is the ip address of the domain requested_. An
authoritative ns setting AA makes it so. There's nothing that specifies what
the backing data store or resolution method is for AA answers. Implementation
detail.

    
    
       you will send traffic for your mapped apex domain to the wrong address until the record expires in all caching

resolvers. Now weve discovered, but not actually mentioned, TTLs. How is this
any different than the ttl & expiry on a traditional CNAME + A record chain
thats proposed at the end?

    
    
      you request the IP address from the nameserver of your DNS provider, not from your actual location.

Assuming implementation is backed by dns recursion, sure. Good thing theres a
standard like EDNS client-subnet that provides a method to propagate and vary
based on the network of the original requester. But point taken, DNS is a
complicated protocol and you should probably understand it before developin
new features and implementations.

~~~
belorn
> How is this any different than the ttl & expiry on a traditional CNAME + A
> record chain thats proposed at the end?

The TTL on a CNAME is about caching the canonical name, not the IP address.

 _Example with CNAME:_

Client ask the authoritive nameserver for example.com about bar.example.com.
and get back a CNAME to foo.example.com. with a TTL of 2 days. The client
store this in his resolver cache.

Client then goes to ask about foo.example.com. and get back a A record for
203.0.113.3, with a TTL of 2 hours.

Client cache looks like this: "bar.example.com. CNAME foo.example.com." that
expires in 2 days, and "foo.example.com. A 203.0.113.3" that expires in 2
hours.

If the domain foo.example.com. changes its A record, it will be reflected in 2
hours for clients who uses bar.example.com.

 _Example with ALIAS:_

Client ask the authoritive nameserver for example.com about bar.example.com.
and get back A record to 203.0.113.3 with a TTL of 2 days.

Client resolve cache will be a single entry of: "bar.example.com. A
203.0.113.3" that expires in 2 days.

If the foo.example.com. changes IP address, its TTL is not considered by
clients who uses bar.example.com., because its not in their cache.

~~~
bredman
I see your point; ALIAS records are not a CNAME substitute and should not be
treated as one. However, they do have their uses and the risk is minimal _if_
you understand what ALIAS records are doing and how to use them.

I'm having a bit of a hard time following your example but I believe if the
authoritative DNS provider simply inherited the TTL of the alias target then
the behavior would be as desired. This gets back to donavanm's point that this
is more an issue with the way ALIAS records are implemented then a problem
with the concept of ALIAS records.

One other issue that the author doesn't touch on is that we now have 3
implementations of ALIAS records that I'm aware of (Route 53, github, and
Heroku) and there are differences about how they behave. However we have those
3 providers using the same ALIAS to describe similar but significantly
different things. This is clearly confusing and potentially disastrous for
users.

------
mslot
In general, implementing ALIAS records through recursive resolution is a bad
idea.

Geo routing and other resolver IP-based policies will probably break, since
the authoritative name server has no way of knowing the resolver IP. Edns-
client-subnet does not suffice in this case, since the authoritative name
server may still base part of its decision on the resolver IP.

A much bigger problem is that the forwarding name server can be used for DDoS
amplification attacks and that its own resources can easily be exhausted by an
attacker if the authoritative name server is slow to respond. If the
forwarding name server opens a new socket for every query it makes, the set of
available port numbers can easily be exhausted. If the forwarding name server
reuses port numbers, then spoofing attacks become straight-forward.

This does not mean ALIAS records are a bad idea in general. The Amazon Route
53 implementation resolves aliases internally. It is therefore limited it to
AWS services, but you could, for example, point an ALIAS to CloudFront and
point CloudFront to your website. CloudFlare offers a similar service.

------
rhengles
Only I found it funny that the article is on a site without a subdomain?

~~~
abritishguy
They are a DNS provider so they have to use A records anyway.

------
kalleboo
Can someone explain the rational in the standard disallowing CNAMEs on apex
domains?

Before I learned about this restriction I actually had my site configured like
that (my DNS provider had poor validation...) and it seemed like it worked for
at least 2/3 users (who apparently had lenient DNS servers)

~~~
colmmacc
CNAMEs are name-level aliases. So if for example I make the following query;

    
    
      name=www.example.com type=A
    

and get a response of the form;

    
    
      www.example.com 3600 IN CNAME www.example.org 
    

a resolver will then recurse and lookup name=www.example.org, type=A. The
CNAME indirection is cacheable for all queries to www.example.com though, of
any type. So if I make a query for;

    
    
       name=www.example.com type=AAAA
    

the resolver can skip any query to the nameservers for www.example.com and go
straight to a query for name=www.example.org, type=AAAA. So in effect a CNAME
"masks" all records of any type, for a given name.

At zone-apexes this becomes a problem. Zone apexes are required to have SOA
and NS record sets which play important roles in record-not-present responses
and nameserver discovery respectively. Another problem is masking MX records,
but you could always copy the MX records at the target name to work around
that.

------
dmourati
First had to read the initial link in OP. From there, understand that this so-
called ALIAS record is specific to one (or more?) DNS provider called
dnsimple.com.

Broken? Maybe. I've never used one. The problem with the apex record not being
able to be a CNAME is easily avoidable. I'd prefer to stay RFC compliant
rather than use some one-off.

------
zx2c4
What's the consensus here? [http://zx2c4.com](http://zx2c4.com) or
[http://www.zx2c4.com](http://www.zx2c4.com) ? For a while I did the latter,
then I switched to the former, and now I can't make up my mind and am tempted
to go back to the latter.

Opinions? Thoughts?

~~~
estebank
I prefer the later because that way you can have cookies set only for the site
subdomain and serve static elements from a cookieless subdomain, instead of a
different domain altogether. Just add a redirect from the naked domain and
you're golden.

~~~
zx2c4
Compelling enough.

------
snake_plissken
Why don't admins just set a smaller TTL for anything that is aliased?

------
acconrad
This seems kind of funny given that Github's most recent advice was to use
ALIAS DNS records for apex domain types.

------
MatthewWilkes
Can anyone explain the difference between an ALIAS and a DNAME record?

~~~
jlgaddis
_> DNAME record?_

You mean a CNAME?

belorn gave a pretty good explanation here:
[https://news.ycombinator.com/item?id=7023412](https://news.ycombinator.com/item?id=7023412)

