DNS taking 48+ hours to propagate is a myth

LogicX · on Dec 27, 2011

TL;DR: You have a fine point to test a new domain's resolving status using dig +trace to not poison your cache. Existing domains need not apply.

You may need to explain your theory on your own website, to bring it back up under this load elsewhere... I can't get to your article because its down, but I see you have your TTL set to 72 hours:

  www.simonluijk.com.	259200	IN	CNAME	simonluijk.com.
  simonluijk.com.		259200	IN	A	46.102.244.108

Whatever you wrote in your article... the 'myth' aspect has to do with everyone's TTLs involved... many of which are out of your control: ex: the listing of your domain in the TLD:

  simonluijk.com.		172800	IN	NS	a.ns.zerigo.net.
  simonluijk.com.		172800	IN	NS	b.ns.zerigo.net.
  simonluijk.com.		172800	IN	NS	d.ns.zerigo.net.
  simonluijk.com.		172800	IN	NS	c.ns.zerigo.net.
  ;; Received 261 bytes from 192.12.94.30#53(e.gtld-servers.net) in 114 ms

Which I see is set at 48hours...

So someone could have a combination of 48 hour TTL cache'd response for your domain's DNS servers, and 72 hour cache'd response by your own DNS servers for your record. Thats not even taking in to account resolvers which ignore the TTL values and substitute their own.

Update: Finally got your article to load. Yes, for a brand new domain, manually testing resolution using dig +trace first, until you confirm it works (avoiding poisoning your cache with a negative response) is a fine suggestion.

Surely the registrar warnings exist for the more likely scenarios of any changes to existing domains. Added TL;DR at the top.

Update 2: Removed alternative resolver rants, and updated to emphasize the dig +trace option - as per author's comment below, and original article.

rawrly · on Dec 27, 2011

This is actually sadly exactly what the author missed in their article. DNS propagation is directly controlled by the TTL setting on a domain entry.

TTL stands for Time To Live, this is the number (in seconds) that the DNS entry tells people to keep it active in the DNS server cache's (presuming the DNS server will not over-ride this for either a higher or lower number, which is entirely their choice but not common.) This is done so that any request to adomain.com will not have to require a DNS lookup to the main serve for every page request.

It is true that if you have not done a lookup on the domain, then your computer and DNS servers would presumably not have any active DNS records for the domain. So you can make a change and "viola" within 5 minutes (the next time you visit the site) you will have the updated record. However, if you had recently done a DNS inquiry and eceived the record for the old DNS entry, you will need to wait for the old DNS entry to expire before the DNS server you are using will choose to look it up again. This doesn't go into any of the fun of what happens when you have 2 or more DNS servers setup, but ultimately what people are seeing is that the "48 hour" waiting period is substantially less, however most ISPs will stick to this default number to reduce worrisome support from their clients who think otherwise but don't know anything about how DNS works so support will never be able to explain this in laymen terms (or wait, did I just do that?)

seanp2k2 · on Dec 27, 2011

me: web hosting sysadmin also dealing with clients. Yes, people really do freak out about DNS problems, and we quote 72 hours because we have clients on 6 continents.

Realistically, it takes 30 minutes - 4 hours for DNS updates to stick. Use http://host-tracker.com/ to check the IP of your site -- that's what we do. It tests something like 80 locations, and the results show the IP returned.

You are absolutely correct regarding the TTLs, and although I've seen well-intentioned help articles suggesting things like setting your TTL to 10-300 seconds...most "big" recursive resolvers will ignore TTLs below 3600 seconds (1 hour), so this doesn't really help.

Props to anyone who knows what RFC covers this behaviour and cites a minimum valid TTL. I'm not aware of any, but I'm not totally up on my RFCs :)

rada · on Dec 27, 2011

http://www.ietf.org/rfc/rfc1034.txt:

The TTL is assigned by the administrator for the zone where the data originates. While short TTLs can be used to minimize caching, and a zero TTL prohibits caching, the realities of Internet performance suggest that these times should be on the order of days for the typical host. If a change can be anticipated, the TTL can be reduced prior to the change to minimize inconsistency during the change, and then increased back to its former value following the change.

and http://www.ietf.org/rfc/rfc1912.txt:

1-5 days are typical values.

devicenull · on Dec 28, 2011

Exactly. If you tell someone it's going to take 24 hours and due to caching it takes 48, they're going to be pretty pissed. On the other hand, if you tell someone it's going to take 72 and it really takes 2, they are going to be quite happy.

There are so many variables involved in DNS TTL's, that it really makes more sense to over-estimate things.

fuddie · on Dec 27, 2011

Please provide further detail on these '"big" recursive resolvers' that ignore TTLs. I'm yet to see one in the wild and so I'm somewhat dubious of the claim.

(Please don't be vague - post the addresses of the resolvers in question.)

jaylevitt · on Dec 27, 2011

We just moved our DNS from Network Solutions to Route 53 this month, and I can verify that there are indeed resolvers that'll ignore TTLs. Ours were 3-6 hours, but it took some sites about 24 hours to pick up our new SOA.

Which would have been fine - the A records were the same - but no, NetSol instantly starts serving a blank "Business Profile" landing page A-record. Thanks, people who used to run the Internet.

I know one such caching server was ns1.dns.rcn.net. (But only from inside RCN; querying it from Comcast gave different results. Same IP address, so I'm assuming it's anycast.) whatsmydns.net reported others as "Bell South" and "Cox" (I can't recall the locations, I think one was in Georgia).

fuddie · on Dec 28, 2011

I just queried ns1.dns.rcn.net for an rrset that has a TTL of 120 seconds and it returned appropriate TTLs.

EDIT: It also does the right thing with even shorter TTLs - try `dig 40.2.+.rp.secret-wg.org txt @ns1.dns.rcn.net`.

jaylevitt · on Dec 28, 2011

Oh, it returned appropriate-looking TTLs even at the time; we didn't watch them go down to zero and wrap to their original value, but I suspect that's what they did.

Also, if you're not on RCN, you aren't getting the same NS1 as someone who is. (Again, I assume anycast or load balancing, but I'm handwaving; I haven't understood routing since gated.conf changed.)

My boss was on RCN at home, and I was a few miles away on Comcast. We both pointed dig at 207.172.3.8 and hammered on our domain name; he saw stale results, I saw fresh ones.

Would've loved to have the expertise and tools set up to figure out what went wrong, but we just went to bed and by lunch it sorted itself out.

fuddie · on Dec 28, 2011

I let it cache a record, disabled the zone the record came from and left it to expire. It did. I won't deny that it could behave differently from different addresses, but based on the evidence available I'm sure you can understand why I remain unconvinced.

SLuijk · on Dec 27, 2011

I was not expecting so much traffic. I kicked in a few more gunicorn instances. Hope that helps.

Well thats the point of using the -trace option. It makes dig act as the resolver bypassing all of the caches.

LogicX · on Dec 27, 2011

My apologies - I glossed over your explanation of trace, as I've used it for years for other purposes, without the primary intention being this. expunging my other resolver rants from my OP

sunsu · on Dec 27, 2011

This just isn't universally true. Some ccTLDs will indeed take anywhere from 12 to 48 hours to update.

If you are launching a service on a new domain and are about to link to that domain to make the launch public, do not assume that just because its working on your computer after 10 minutes that it is working for the rest of the US, or the world.

ck2 · on Dec 27, 2011

Before 2004 Verisign only updated the authoritative DNS twice a day.

Then they changed it in January 2004 to every five minutes.

So yes, if you are an "old timer" you may remember it really taking up to 48 hours for everywhere around the world to be able to find a new .com

But also what they mean by 48 hours is for existing DNS, some users around the world may be on ISPs that heavily cache DNS.

Even wifi routers today have some persistent DNS caches and people rarely reboot or turn them off.

We moved a site a few weeks ago and the old IP still got some hits until last week. I gave up trying to trick all the caches and just used iptables to forward the packets from the old server to the new until everything finally caught up.

sabat · on Dec 27, 2011

That's true, but "48 hours to propagate" is misleading because it sounds as though that's business as usual for DNS -- instead of the reality, in which rogue DNS servers cache beyond domains's declared TTL.

seanp2k2 · on Dec 27, 2011

Explaining that to clients is hard. We say "it'll take about 72 hours for everything to sync over". If we're switching them to a new IP, we'll leave the old server on until 72-96 hours after we update DNS. We seriously still see traffic on the old server that long after the change, and ~72 hours is about how long it takes to get ~99% updated.

fuddie · on Dec 28, 2011

Please post the address of one of these rogue DNS servers.

OstiaAntica · on Dec 27, 2011

Yeah this is bad advice. In the real world, it often takes a couple days to fully propagate as many ISPs take awhile to update their cache-- consider edge cases like rural satellite internet. Even with a brand new domain, here's a common problem: some clients will have clicked on the domain url in their email before the DNS is setup, and have a local cache of the wrong DNS.

micro-ram · on Dec 27, 2011

This is why I use OpenDNS. You can update THEIR cache. Once I make a change and update their cache, I test again. If all is good, I know the work is done and move on knowing the rest of the world will catch up.

http://opendns.com/support/cache

dsl · on Dec 28, 2011

Even if you don't use OpenDNS, it's a great URL to keep bookmarked if you are a site owner. Instantly propagate DNS changes to a few percent of the internet with no worries.

jc4p · on Dec 27, 2011

I think the big missing point here is that it can take up to 48 hours (or even 72 hours) for every ISP to reset their cache and get your new destination. Sure, I can always run dig or even dscacheutil manually to wipe my computer's DNS cache, and I'm pointed over at OpenDNS and Google's DNS so I don't really have to worry about DNS caching issues, but my customers aren't.

I can't say "Hey look, it works on my machine so you must have a lazy ISP, your fault!" when they say "Hey it's been 6 hours and my website doesn't work yet", the proper answer is simply that in most cases it takes up to 48 hours to work.

sfjustin · on Dec 27, 2011

I have been creating A records for subdomains lately, and found that the TTL on the A record matters, but even more important for fast response time is the minimum time to live (reverse cache) on the SOA record. Setting the A record to a TTL of 1 hour and the minimum time to live on the SOA of 1 minute results in almost instant record propagation. Not exactly sure what the SOA minimum time to live even does though.

rachelbythebay · on Dec 27, 2011

People who know how DNS really works don't need to know that it doesn't really "propagate" in the conventional sense.

People who don't know how DNS works (and don't care) can be given it as a useful abstraction. If you have worked in tech support with people who want things to work and quite rightly don't want to know how the sausage is made, this might be ideal.

With anything, it depends on the audience.

colmmacc · on Dec 27, 2011

It depends a lot on the prior state of the domain. If it was assigned at all, or is being transfered, then is prudent to wait out the TTL of the NS record-set on the parent zone. Here's how it works.

When a resolver tries to lookup the IP for my website - www.notesfromthesound.com - it probably has the name-server set for "com" cached, it knows those servers, so I'll skip that step for now, but the same principle applies at that level.

So, the resolver queries a com server, and gets a referal;

  cuan% dig www.notesfromthesound.com @i.gtld-servers.net.

  ; <<>> DiG 9.6-ESV-R4-P3 <<>> www.notesfromthesound.com @i.gtld-servers.net.
  ;; global options: +cmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25366
  ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 2
  ;; WARNING: recursion requested but not available

  ;; QUESTION SECTION:
  ;www.notesfromthesound.com.	IN	A

  ;; AUTHORITY SECTION:
  notesfromthesound.com.	172800	IN	NS	ns-593.awsdns-10.net.
  notesfromthesound.com.	172800	IN	NS	ns-431.awsdns-53.com.
  notesfromthesound.com.	172800	IN	NS	ns-1199.awsdns-21.org.
  notesfromthesound.com.	172800	IN	NS	ns-1820.awsdns-35.co.uk.

  ;; ADDITIONAL SECTION:
  ns-593.awsdns-10.net.	172800	IN	A	205.251.194.81
  ns-431.awsdns-53.com.	172800	IN	A	205.251.193.175

Note the TTL on the NS record set; 2 days . This is the TTL for the rrset in the parent zone. That TTL means "Feel free send queries for names within the notesfromthesound.com zone to these nameservers for up to two days".

Now, when I query the authoritative nameservers for the child zone, I might get a different TTL value for the same rrset;

  cuan% dig www.notesfromthesound.com @ns-593.awsdns-10.net.

  ; <<>> DiG 9.6-ESV-R4-P3 <<>> www.notesfromthesound.com @ns-593.awsdns-10.net.
  ;; global options: +cmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29364
  ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0
  ;; WARNING: recursion requested but not available
 
  ;; QUESTION SECTION:
  ;www.notesfromthesound.com.	IN	A

  ;; ANSWER SECTION:
  www.notesfromthesound.com. 300	IN	A	72.32.231.8

  ;; AUTHORITY SECTION:
  notesfromthesound.com.	7200	IN	NS	ns-431.awsdns-53.com.
  notesfromthesound.com.	7200	IN	NS	ns-593.awsdns-10.net.
  notesfromthesound.com.	7200	IN	NS	ns-1199.awsdns-21.org.
  notesfromthesound.com.	7200	IN	NS	ns-1820.awsdns-35.co.uk.

Just two hours. But as a resolver, which value do I go with? Some resolvers take the position that the child zone is most authoritative about the operator's intent. That's called "child centric". Some take the position that the parent zone's intent matters more, and we shouldn't be bugging the parent zone nameservers very often because of misconfigured child zones, that's called parent-centric.

Data from actual experiments suggests that at least 3% of resolvers are parent-centric; https://mex.icann.org/ar/node/22921 , and as an operator I can confirm that it's regular to see queries coming in for up to 2 days after an upstream delegation change upstream.

In fact, you really have to wait out the higher of the two relevant TTLs, or risk queries being black-holed.

TLDR; honestly, wait 2 days.

SLuijk · on Dec 27, 2011

Yes I quite agree with you, for established domains. It's interesting that only 3% of resolvers are parent-centric.

I was referring more to when registering a domain. To prevent the IPS resolver caching a non existent NS record for negative TTL.

colmmacc · on Dec 27, 2011

The article suggests that both Google Public DNS and nominum are parent centric, which might be a significant portion of the 3% (or larger at this point).

These days with the number of resolvers that have fall-back catch-all records designed to redirect you to a search / suggest feature, I think that you also need to worry about positive TTLs.

You're right that if a domain is pristine, and has never been queried, that in all likelihood, you'll be able to have it resolvable within minutes, not hours, but this still seems like a relatively uncommon case.

In practice, people do query for their domain as its propagating, and do buy meaningful names that are likely to have some low-level background rate of queries, and there's not much to stop the legion of bots that are watching for whois updates either.

I guess I take the most issue with your headline. DNS taking 48+ hours to propagate is not a myth.

Terretta · on Dec 27, 2011

Problem is, both your link headline here and the premise headlined on your blog are flat wrong, and are going to give sysadmins everywhere headaches if clients come across your article and think they've learned something.

The RFC snippet quoted in this comments thread is the right approach: keep a long TTL in normal practice, shorten it at least double the TTL in advance of a change (e.g., if 2 day TTL, shorten it 4 days before changes), dropping it down to 3600 or 300 depending on your tastes, and bring it back up after the change is stabilized.

In the case of registering a brand new, never existed before, domain, avoiding cache poisoning can help.

But DNS taking up to (TTL x number of layers of cache) is not a myth. We routinely see 5 - 7 days (globally) on 1 and 2 day TTLs, and 2 - 3 days (globally) on 5 minute TTLs (thanks to ISPs with 1 day min TTLs).