
What happens when you update your DNS - kiyanwang
https://jvns.ca/blog/how-updating-dns-works/
======
eat_veggies
One of my favorite revelations about the network tracing tools (things like
`traceroute` and `dig +trace`) that might not be obvious for people like me
who work higher up in the stack, is that the data they provide isn't usually
made available during "normal" usage. Packets don't just phone home and tell
you where they've been. Something else is going on.

When you send a DNS query to a recursive server like your ISP's or something
like 1.1.1.1, you make a single DNS query and get back a single response,
because the recursive DNS server handles all the different queries that Julia
outlines in the post. As the client, we have no idea what steps just happened
in the background.

But when you run `dig +trace`, dig is actually _pretending to be a recursive
name server_ , and making all those queries _itself_ instead of letting the
real recursive name servers do their work. It's a fun hack but that means it's
not always 100% accurate to what's going on in the real world [0]

[0] [https://serverfault.com/questions/482913/is-dig-trace-
always...](https://serverfault.com/questions/482913/is-dig-trace-always-
accurate)

~~~
closeparen
Traceroute’s trick is amusing. It abuses the TTL field, sending out packets
with too-low TTLs and waiting to see who complains about them. When layers
reveal themselves they are doing it voluntarily, and those wise to the game
can choose not to participate, or to troll it.

[https://www.theregister.com/2013/02/15/star_wars_traceroute/](https://www.theregister.com/2013/02/15/star_wars_traceroute/)

~~~
Tepix
It's not abusing the TTL, it's working as designed

~~~
closeparen
AFAIK the design purpose of TTL is to prevent an infinite loop in case the
route contains a cycle.

------
muppetman
Glad to see this. One of my (stupid) pet peeves is people that say "You have
to wait for the DNS to propogate". DNS _does not_ propogate. What you're
actually waiting for is the cache TTL to expire so those name-servers that
have cached it have to query the real answer again, thus getting the newly
pushed information. Of course it appears exactly like it "takes time to
propagate" which is why it's actually a pretty sound description of what's
happening, and thus why it's a stupid pet peeve. Pointless rant ends.

~~~
tialaramex
Yes, I'm annoyed about this too.

The most egregious case I've seen was an Amiga site. The site went down and
for several _days_ reported that users would need to wait for the updated
records to propagate and lots of loyal fans were insisting anybody who
couldn't read the site was just being too impatient.

What was actually wrong? They wrote their new IP address as a DNS name in
their DNS configuration rather than as an IP address. Once they fixed that it
began working and they acted as though that was just because now it had
successfully propagated.

On the other hand propagation _is_ a thing when it comes to distributing
modified DNS records to multiple notionally authoritative DNS servers.

This can be a problem for using Let's Encrypt dns-01 challenges for example,
especially with a third party DNS provider.

Suppose you write a TXT record to pass dns-01 and get a wildcard certificates
for your domain example.com. You submit it to your provider's weird custom API
and it says OK. Unfortunately when you do this all it really did was write the
updated TXT record to a text file on an SFTP server. Each of the provider's
say three authoritative DNS servers (mango, lime, kiwi) check this site every
five minutes, download any updated files and begin serving the new answers.

Still they said OK, so you call Let's Encrypt and say you're ready to pass the
challenge. Let's Encrypt calls authoritative server kiwi, which has never seen
this TXT record and you fail the challenge.

So you check DNS - your cache infrastructure calls lime, which has updated and
gives the correct answer, it seems like everything is fine, so you report a
bug with Let's Encrypt. But nothing was wrong on their side.

Now, unlike typical "DNS propagation" myths the times for authoritative
servers are usually minutes and can be only seconds for a sensible design
(SFTP servers is not a sensible design) so you can just add a nice generous
allowance of time and it'll usually work. But clearly the Right Thing™ is to
have an API that actually confirms the authoritative servers are updated
before returning OK.

~~~
henrikschroder
> So you check DNS - your cache infrastructure calls lime, which has updated
> and gives the correct answer, it seems like everything is fine

Been there, done that, got burned. If you're mucking around with DNS records
that are going to be verified by someone else, never trust your local lookups,
always try to verify through a fresh third party that everything resolves
correctly _before_ you submit.

Because oops, something went wrong, the verification hit a wildcard entry with
a cache time of days, and now you have to wait that long before trying again,
because that entry isn't budging from other resolver's caches...

~~~
m3047
Yeah, flushing the cache in your own recursive resolver doesn't flush them all
over the internet.

------
asciimike
[https://howdns.works](https://howdns.works) is one of my favorite educational
booklets on the subject. Not as in depth as many other resources, but highly
amusing and fairly sticky.

~~~
logikblok
This is brilliant thanks.

------
Tepix
There is a very elegant way to update your DNS if you are running djbdns: You
can optionally specify date ranges for every record![1] The server will
automatically adjust the TTL. By having two records with different time ranges
you can switch IP addresses at an exact moment.

The timestamps are provided in DJBs TAI64 format, use something like
[https://github.com/hinnerk/py-tai64](https://github.com/hinnerk/py-tai64) to
convert them.

[1] config file spec at [https://cr.yp.to/djbdns/tinydns-
data.html](https://cr.yp.to/djbdns/tinydns-data.html)

------
JoshMcguigan
DNS infrastructure is really interesting. I did a bit of a deep dive on it a
few months ago, culminating in running my own authoritative name servers [0]
for a while.

[0]: [https://www.joshmcguigan.com/blog/run-your-own-dns-
servers/](https://www.joshmcguigan.com/blog/run-your-own-dns-servers/)

~~~
rhizome
One neat way of retaining that control is running your own SOA(s), but getting
robust secondaries and listing _those_ in WHOIS so that they take all of the
wild queries. Then you just work with your little SOA and everything just
propagates as necessary and you don't get hammered.

------
jrockway
This reminds me that I wish DNS had some way to define a load balancing
algorithm for clients to use, so browsers could make load balancing decisions.
This would eliminate the need for virtual IP addresses, having to pass
originating subnet information up recursive queries, having to remove faulty
VIPs (or hosts) from DNS, etc.

It is baffling to me that inside the datacenter, I can control the balancing
strategy for every service-to-service transaction, but for the end user's
browser, all I can do is some L3 hacks to make two routers appear as one (for
failover purposes). L3 balancing would be completely unnecessary if I could
just program the user agent to go to the right host, after all. The end result
is unnecessary cost and complexity multiplied over a billion websites.

~~~
LinuxBender
That is a use of SRV records [1], however it was not accepted into the HTTP
protocol specification. I bring it up every time there is a new protocol
version but I am too lazy to write an RFC addendum for it and hope that
someone else will. Existing protocols may not be modified in this manor once
ratified. Maybe HTTP/4.0? /s

Some applications use SRV records for load balancing. Many VoIP and video
conferencing apps do this. There is a better list on Wikipedia.

    
    
        _service._proto.name. TTL class SRV priority weight port target.
    

[1] -
[https://en.wikipedia.org/wiki/SRV_record](https://en.wikipedia.org/wiki/SRV_record)

~~~
jrockway
Yeah, I always liked SRV records. It seems that they proved inadequate for
gRPC balancing, so there are new experiments in progress (mostly xDS).

------
r1ch
Another cool thing about DNS - you only need the IP of a single root server to
be able to get the nameservers, IPs and everything else to resolve any name in
any TLD. The hierarchical nature of DNS is very neat to see in action. I built
a toy DNS checker tool[0] while I was learning about it to get a more visual
overview, and it ended up being one of my tools I still use every few days to
verify a domain is properly delegated if I suspect it has issues.

[0] [https://r-1.ch/r1dns/](https://r-1.ch/r1dns/) (and yes it has a lot of
bugs, don't break it :)).

~~~
kalleboo
Off topic,

you wouldn't be the R1ch of old something awful forums fame, would you? I used
to run a waffleimages mirror :)

~~~
r1ch
Yup, that's me! The SA forums inspired a lot of my random side projects, the
TF2 server integration stuff was especially fun (got me into reverse
engineering Valve's game .so to fix bugs). I miss those days when I had all
the free time in the world :).

------
mrb
" _What I’d expect to happen in practice when updating a DNS record with a 5
minute TTL is that a large percentage of clients will move over to the new IPs
quickly (like within 15 minutes),_ "

That's not true. The vast majority of clients will move to the new IP within
the TTL, within 5min (not 15min). Then there will be some stragglers that
slowly update over the next hours/days (typically poorly written bots)

Source : my own experience updating a site with 500k hits per month and
sniffing and watching network traffic at the 3 endpoints: DNS, old IP, new IP.

~~~
vbsteven
Or any proxy using default nginx configuration which caches DNS resolution for
upstream blocks at first use and never invalidate until the config is reloaded
or Nginx restarted.

------
rkagerer
How distributed is DNS these days in practice, compared to 10 or 20 years ago?

If the major internet powers agreed to stop serving responses to DNS servers
they detected weren't respecting reasonable TTL's, do you think they could
"bully" the industry into tightening things up? (Kind of how Google and others
compelled the web toward widespread HTTPS)

~~~
rswail
I'm not sure why people don't run their own caching server instead of stubs.
Or at least for the LAN? You don't _have_ to use your ISPs DNS servers, unless
they are evil and capture port 53.

------
ricardo81
Recursive DNS servers can also throw you off the scent a bit by giving you an
answer that is not the same as the authoritative server.

I've seen 8.8.8.8 return something other than NXDOMAIN for some domains that
do not exist

Cloudflare will not honour dns ANY requests

Knowing how to query the authoritative nameservers is a handy tool for
debugging.

~~~
LogicX
Agreed. There's a lot of 'magic' that goes into running a quality recursive
resolver, least of which is eDNS0 and EDNS Client Subnet - which intentionally
returns different answers based on the requester's source IP -- in most cases
for the most-optimal CDN location to be returned.

Test with:

dig @ns1.google.com www.google.es +subnet=193.8.172.75/24

dig @ns1.google.com www.google.es +subnet=157.88.0.0/16

Note how you get different IPs returned.

~~~
preinheimer
Here's a pretty clear demo of different results around the world:
[https://wheresitup.com/demo/results/5ef1403cb8e31e3fb3298503](https://wheresitup.com/demo/results/5ef1403cb8e31e3fb3298503)

------
jamesholden
As someone newly trying to learn DNS, I don't _use_ 8.8.8.8 personally. So I
was confused at first why they kept offering it up on the page. It might help
in the first reference to it, so say 'Google DNS' along with the 8.8.8.8

~~~
devdas
1.1.1.1 is Cloudflare. 8.8.8.8 is Google 9.9.9.9 is PCH/IBM as Quad 9. They
all also offer IPv6.

------
z3t4
I have an idea: Because DNS requests are made from a server close to the user,
the TLD should use a GEO table in order to give the the two closest DNS
servers. Kinda like Anycast without having to configure routing/BGP sessions.

~~~
m3047
A lot of DNS infrastructure is anycast.

~~~
z3t4
Yes. But most domains eg. websites dont have Anycast. And Anycast is expensive
if you just have a private web site or blog. And Anycast services have poor
coverage, its only clodflare that has decent coverage. But they only offer
proper DNS service to enterprise customers.

~~~
m3047
> But most domains eg. websites dont have Anycast.

Are you talking about the (mathematical) "domain" in the DNS specs, or the
popular domain i.e. the web server?

The latter is arguably true, in which case the geoip proposition is moot:
there is only one web server. Maybe you mean the web server has multiple
addresses instead of being anycast. Ok, yes that happens; and some DNS servers
do use geoip to tailor replies to try and hand the closest address. Here is
from tbe BIND ARM:

"By default, if a DNS query includes an EDNS Client Subnet (ECS) option which
encodes a non-zero address prefix, then GeoIP ACLs will be matched against
that address prefix. Otherwise,they are matched against the source address of
the query"

Regarding the former, does anyone have info on how many DNS providers use
anycast? I think a lot; or maybe I should say that a lot of domains are hosted
on anycast, the DNS isn't as distributed as it used to be. If you're using DNS
as a distributed key/value store, I hope you're doing a better job thinking
about externalities (leakage) than e.g. the antivirus companies in terms of
locating authoritatives and how you update them opaquely.

Personally I think stub resolvers are stuck in the 1980s. They could do a lot
more by monitoring traffic health and editing DNS replies. Due to peering
arrangements you could be in the same IX as someone else but that might not be
the best route. Traceroute, SYN exchanges, (IP) TTLs might be better signals
for determining the health of a particular path. I'd never thought about it
until this thread started, maybe the stub resolver could use netflow analysis
to inform editing the responses it returns to the applications.

~~~
z3t4
DNS getting less distributed is a problem, as public DNS services generally do
not hold a cache for long. They also give up if it's unlucky and tries a DNS
server that is down!

So my case for top level (TLD) GeoIP: I have many DNS servers for my web-
address/domains: Three in EU and two in US. The problem is that when the TLD
servers sends the list of DNS servers it's randomized. Instead I want it to
return the list in GEO order (and also network health order), so that the
recursive resolver ask the best/closest DNS server. So in worst case scenario
for a recursive resolver in EU tries a DNS server in USA, and it happens to be
down, and then gives up. The best case scenario is that it tries the closest
server in EU.

Trying to solve my problem I've tried the top 10 DNS providers (ranked by
uptime and query speed), which uses anycast. Only two could be used as
secondary/slaves, and both of them took over two days to propagate an update
(they did not use TTL). The reason why I need fast updates is because of
Letsencrypt which requires DNS challenges for wild-star domain SSL/TSL.

About anycast use, the root server's have been using anycast for a while now.
Some TLD's use anycast I think (haven't actually checked). For web-hotels, and
ISP's most do not use Anycast. ISP however have the DNS server's very close to
the end users and they are good at caching. Which is the second reason why I'm
against DNS centralization. Querying from example 8.8.8.8 is often 10x
_slower_ then using ISP DNS (assuming the ISP have the query cached).

Anycast, although proven to work nice for the root servers, which makes it
harder to DDOS-attack, doesn't actually work that great. I argue they could
just list the servers in GEO order instead of configuring BGP routes. When I
evaluated the "top 10" Anycast DNS providers, sometimes my amateur setup got
lucky (eg test server from US got the US IP first and vice versa) and thus
beat the Anycast network in query performance/latency.

~~~
m3047
> So in worst case scenario for a recursive resolver in EU tries a DNS server
> in USA, and it happens to be down, and then gives up.

Recursive resolving algorithm for caching servers is actually addressed in the
RFCs, it /should/ be trying all of them and using its own findings to prefer
the best-performing one(s). But it doesn't know about anycast, if a recursive
was switching between anycast nodes (with the same address) then that would
imply that routes were flapping. :-(

Interesting data points.

I think the elephant in the room is the Universal Terrestrial Radio Access
Network (UTRAN) a.k.a. "mobile" and I don't work with provisioning much so all
I can say is that I suspect that if mobile is your concern you just prostrate
yourself to the UTRAN masters and co-locate wherever they tell you to.

SSL/TLS cert management is a fiasco in my opinion, it's a shame that DNSSEC
hasn't achieved market dominance so that if you own a domain you can
automatically sign certs for it yourself. (Then we wouldn't need CA lists in
browsers and OSes either.)

------
fomine3
The title "update your DNS" looks ambiguous

