Hacker News new | past | comments | ask | show | jobs | submit login
The uncertainty of measuring the DNS (apnic.net)
46 points by known on July 19, 2018 | hide | past | web | favorite | 8 comments

That is a mind-blowing article. I mean, I know the basic model, and I could imagine how chaotic stuff could get. But I had no idea that it was that bad.

This especially amazes me:

> Our experiences with DNS measurement using unique timestamped labels and an online ad distribution system to see the queries from end users appears to support this observation — around 40% of queries seen at our authoritative servers reflect an original query that was made days, months or even years in the past, something we’ve termed DNS Zombies.

DNS sounds so simple when described as "servers in a hierarchy that tell you either what other server to ask, or which host has a particular hostname".

My perspective shift was when I talked to someone who worked at NetNod with operating i.root-servers.net: he spoke about DNS the way people speak about distributed key–value stores. Except this was invented over 30 years ago. And services nearly the entire internet. I don't remember how long he'd been working with it, but he absolutely could not wrap his head around the fact that it still works. That it hasn't broken down entirely was apparently something he saw as a little miracle.

Facetiously, I have started thinking of DNS as the only truly web scale NoSQL.

You should look at TCP over DNS. It has some uses.

Now SQL over DNS TXT values, that would be nice!

That is surprisingly high. Presumably the queries weren't actually made years ago. There are many possible explanations: incorrect DNS clients, incorrect system clocks, automated scripts scraping into a DNS database, misconfigured embedded/IoT devices, etc.

But if they timestamped the labels?

It's unclear what clock the labels were timestamped based on. Their article sort of indicates that it was the user clock, but I'd assume someone like APNIC would be smarter than that.

However, they did investigate further and as far as I can tell, the main reason for these insane numbers is the one second TTL they put on the records in the experiment. Some caches intent on efficiency will apparently keep re-quering for the record every second to have a fresh record to serve to their users.

So, basically, what they discovered was that

1. some queries are performed repeatedly without user involvement, which is hardly surprising; and

2. if ant-size your TTL, you can induce the amount of re-queries to increase compared to regular queries.

This is relatively obvious once you see the explanation, but I guess their point is that until you do see the explanation, the behaviour can appear extremely non-obvious and hard to predict.

(They also mention some hosts that seem to have as their sole purpose to perform queries for records someone has already accessed (i.e. the originators of requests are never behind these hosts). While that would generate a high number of zombies, I don't see how that – on its own – would lead to queries that happen months after the original event.)

its an NTP synced clock on our head servers. the DNS label is a wildcard cert backed 1x1 pixel so the name is of the form


We do get local clock feedback, but its not what we use to assign the time to an experiment

we're dumb: we're not that dumb.

The linked https://blog.apnic.net/2016/04/04/dns-zombies/ is also a good read.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact