Hacker News new | past | comments | ask | show | jobs | submit login
RFC8482 – Saying Goodbye to ANY (cloudflare.com)
162 points by jgrahamc 10 days ago | hide | past | web | favorite | 66 comments





(Title of the submission should probably capitalize “Any” as “ANY”.)

I liked the ANY type as a convenience and debugging tool, but I guess the disadvantages far outweigh that perceived convenience. And now that I read about the actual semantics of ANY with respect to caches, turns out I was using it wrong anyway!


In recent versions of BIND ANY queries still work pretty well as a debugging tool when `minimal-any` is turned on, because you get the traditional full-fat response over TCP, and dig automatically uses TCP for ANY queries.

I implemented `minimal-any` in BIND to reduce problems caused by large responses in DDoS attacks. It helps in situations when RRL is not quite enough.


Awesome, thanks a lot for that! I was indeed mostly using ANY through dig (as do most, I suspect), glad to know there's still a way to make it work!

Many people were. I and others have repeatedly pointed out that "ANY is not ALL" over the decades. As a randomly selected example, here is Paul Jarc back in 2004:

* https://dns.cr.yp.narkive.com/MSBPuJCn/newbie-question-about...


This is great. It's so important to try to push for improvements externally, rather than just sucking it up and implementing onerous workarounds.

It doesn't always work out, but this sounds like a good and reasonable outcome this time. (Though it's their blog, so of course it sounds that way!)


Even though I know how it works, and why it works, I'm still kind of surprised that individual DNS records have individual TTLs. It feels unfamiliar/alien, compared to all the other hierarchy-of-caches systems I work with.

Every intuition I have about caching would lead me to imagine an alternative implementation of DNS where zones are documents, and intermediate resolvers retrieve and cache entire zone documents from their canonical hosts, in order to respond to queries for individual records from stub resolvers.

Is there some big use-case for the DNS records within a zone to have different TTLs? Would it really be so inefficient if the zone as a whole just had one TTL, which was set as the minimum of the TTLs required by any of its records?

Does DNS not have spacial locality to its access patterns? I.e., if an intermediate resolver retrieves the A records for a domain, does that not imply that it'll soon also want the MX records for that domain, and the TXT SPF records, and—essentially—all the records with some probability? Is there no justification for pre-fetching these?

And, if both of those statements were to be true—that DNS could get by just fine with one TTL per zone, and that DNS queries do have a spacially-local access pattern—then why would you want to design DNS in a way where intermediate resolvers retrieve individual records, rather than entire zone documents? Would a modern ground-up DNS architecture do things this way?

(I have the suspicion that the answer lies in security-through-obscurity, vis. the reason AXFRs are prohibited. So, as an addendum: I ask the same questions as above, but this time presuming your DNS daemon has Row-Based Access Control (RBAC), such that an intermediate resolver retrieving a zone document would only see the records its upstream resolver thinks that particular client should be allowed to see—e.g. enterprise Intranet clients get everything, while public Internet clients get a trimmed view. This implies secure DNS, but just take that as a given.)


Record-level TTLs are useful in practice. Let me give an example. Let's say I am about to make a significant change to the operation of a system. I'm going to flip service.example.com from pointing one place, to pointing somewhere else. If something goes wrong during this deployment, then I want to be able to roll back quickly.

In preparation for this deployment, I might lower the TTL on service.example.com considerably. This way I can roll out my change quickly and observe the effect across all clients, and I can also roll it back quickly. Once the deployment is done, and I don't expect to make further changes to service.example.com, then I can raise the TTL again.

More generally, you might want the ability to propagate changes to the definition of some names much faster than others, and record-level TTLs let you do this. I might be planning to change service.example.com shortly while not planning to change www.example.com any time soon. The downside of low TTLs is a higher volume of queries and thus higher costs, with some implications for availability.

Record TTLs are something that I think about, and consider changing, whenever I'm about to make a significant change to a name.

(Yes, if TTLs only existed for zones, then you could still break out service.example.com into its own zone, and define the TTL there. But if you frequently want different TTLs for different records, then having to do this would be inconvenience without benefit. Service.example.com would be part of its parent zone initially, then you'd have to separate it out into its own zone just for the sake of a different TTL, and then merge it back in later. Under this use-case, zone-wide TTLs only will add complexity.)


The problem with this is that there may not be a single zone file. Either because of location aware resolvers, or because the entry won't exist until it's requested. This means that caching the whole zone wouldn't allow for dynamic resolving and would require lots more caches at whatever granularity someone else wants for their local server architecture.

> Does DNS not have spacial locality to its access patterns? I.e., if an intermediate resolver retrieves the A records for a domain, does that not imply that it'll soon also want the MX records for that domain, and the TXT SPF records, and—essentially—all the records with some probability? Is there no justification for pre-fetching these?

Probably not. If I want to visit your website, I might just need A and AAAA. That doesn't mean I will soon want to send you emails (MX) or look at your google site verification token (TXT).

> Is there some big use-case for the DNS records within a zone to have different TTLs? Would it really be so inefficient if the zone as a whole just had one TTL, which was set as the minimum of the TTLs required by any of its records?

Definitely no. I tend to set low TTLs for A and AAAA records because in practice they do change very often. Depending on your setup, it possibly changes every deployment. But on the other hand MX is long-lived and I set it to many days. This is important security implications: if say, Gmail, has cached my MX records for many days, it will be very difficult for a DNS takeover attack to maliciously redirect my mail or other sort of DNS poisoning to do the same.


Fun fact: mailers will use the A record if a domain doesn't have an MX.

See RFC 5321


> Even though I know how it works, and why it works, I'm still kind of surprised that individual DNS records have individual TTLs. It feels unfamiliar/alien, compared to all the other hierarchy-of-caches systems I work with.

In HTTP, TTLs are per-resource.


A self hosted dyn dns requires a short TTL. eg most of a zone could be 2 days, but myhome.example.com can be set to 15 minutes.

In that use-case, there's no reason that myhome.example.com couldn't be broken out onto its own apex, yet still hosted by the same authoritative DNS servers. (I'm continually surprised that there's no easy way in any DNS registrar's UX to say "I want this DNS name to be an NS record pointing back to your own servers, with you automatically updating that association; and then I want you to expose a new top-level zone for me to edit with that new apex FQDN, just as if it was another asset I had purchased from you. The new zone should have an automatically-managed SOA record derived from its parent zone's SOA record, which prohibits transfers; and the parent zone's SOA should be locked down such that it cannot be transfer-enabled either, as long as the child zone exists [just because no other registrar likely supports receiving this mess.]" Totally possible, but no registrar does it. Maybe because it'd make reselling too easy?)

But if things are done your way then because myhome.example.com needs to have a TTL of 15 mins now all of example.com must have a 15 min TTL instead of two days.

Now imagine that www.example.com has A LOT of visitors, and myhome.example.com has only OP and his family using it. So now millions of clients need to make up to 96 times more DNS queries each (one day divided by 15 minutes, because 24 hours seems to be the actual maximum amount of time that for example Windows caches positive DNS query replies) because if they are visiting www.example.com and the way too low TTL has expired, all because a handful of people needs a low TTL for myhome.example.com.

Per-record TTL is great and I see no reason to do it the way you are suggesting instead.


> But if things are done your way then because myhome.example.com needs to have a TTL of 15 mins now all of example.com must have a 15 min TTL instead of two days.

Er, why? I was proposing precisely the opposite of that: that if you made myhome.example.com a separate zone from its parent, it'd have a separate TTL.

Honestly, there's no reason (other than the awful UX of current DNS servers and registrars) that any/all subdomains that have reason to change independently, aren't made into independent zones that are simply "managed under" their parent zone.

In software, you decouple components that have different rates of change (e.g. policy and implementation layers), so that teams corresponding to those rates of change can deal with the components using engineering strategies suited to their lifecycles. (E.g. a policy DSL can be quickly modified and pushed by the ops team; while modifying the implementation requires code review and a successful CI build.) Because this is pretty much an unalloyed good, we do it more than we need: we modularize everything, even things with the same needs, because "keep everything modular" is a lot easier of a principle to follow, and it allows us to group things into different change-flow-rate layers after the fact.

So why, oh why, do we not obey this principle with our DNS subdomains? The same team who owns the "microsoft.com" zone is responsible for any changes required to the "technet.microsoft.com" zone, and the "zune.microsoft.com" zone? How crazy is that, as the best-supported, idiomatic-UX default to have in DNS systems?


Unless I’m misunderstanding something, what you’re talking about is already possible. You can delegate subdomains using NS records (glue records optional, depending). Then you can create a record at the apex, effectively making it a single record zone. The only real exception being that you can’t create a CNAME at the apex. And you can delegate a subdomain of the subdomain in the same way. The delegations can be to completely separate DNS services/servers managed by completely different teams. This hierarchy and ability to delegate not just resolution but also control is pretty fundamental to the protocol.

In practice creating a zone per record would be messy and unnecessary if you didn’t need to delegate control but you could do it if you wanted to.


My point was precisely that the fact that this process is “messy” is purely a UX problem; and the fact that it’s seen as “unnecessary” (rather than “desirable”) is purely down to the fact that it is more “messy” to do.

This could all be handled internal to the registrar, such that you just see a tree of managed zones, with the ability to create new directories (zones) in the tree with a single click. All that is is a different UX for the capability that already exists. But if you can make a CNAME record with one click; or make a new child zone with an apex A record with one click; then why not the latter?

The reason I was saying that you shouldn’t be able to delegate such child zones—not all child zones, just the ones created through this one-click process—is precisely that being able to do so would make the process of creating them in the first place not one-click, because then they’d have to have SOA records that are independently managed (and so, probably, created with a form you fill out on zone creation, like regular zones), rather than being automatic derivatives of their parent zone’s SOA information. Sure, you could have a control-panel option to turn one of these automatically-managed child zones into a fully-separate top-level zone with the ability to edit its SOA information, delegate it, etc.

But that’s just a two-click process to get what we already have, whereas what I care about is the one-click process that gets you something we don’t have: child zones that are required to be bound to the same authority, and therefore are managed with the simplicity of team-based ACLs [like e.g. the AWS resources in a project] rather than with separate top-level accounts in different registrars. It’d be a one-click operation for the domain administrator of “example.com” to break off “foo.example.com”, and then one more click (of an ACL assignment drop-down) for the domain admin to assign the foo team the use of the child zone “foo.example.com” as a namespace to use however they please (including creating further child zones off of.) But all of that would live under what would currently be considered one account with a DNS registrar or DNS management service.

IMHO, this paradigm would be “the obvious thing” for services like AWS Route53 to offer—it fits in much better with the rest of the “enterprise-wide policy-controlled access on pooled resources” philosophy that these IaaS clouds have, than the current approach (every domain and all its sub domains being managed as single resources with single IAM owners) does.


> I was proposing precisely the opposite of that: that if you made myhome.example.com a separate zone from its parent, it'd have a separate TTL

Sorry, I misunderstood your comment then.


Individual resource records have individual TTL fields, but RFC 2181 requires all of the resource records in a resource record set to have the same TTL value as one another.

A more imaginative alternative DNS would make these first-class entities in their own rights.


Could someone explain why standards are "RFC"s? Can't anyone request comments for something? Why would a request for comments be treated as a standard?

not all RFCs are standards. RFCs that become one are labeled as STD and a number, but nobody uses those (afaik)

https://www.ietf.org/rfc/std-index.txt

https://en.m.wikipedia.org/wiki/Internet_Standard


Because the internet used to be unimaginably small. It really was just a couple of researchers talking to each other. They weren't co-located (the entire point of internetworking) so they had to mail stuff to each other, but it was more like when you send a copy of a document to your co-worker down the hall. "Hey, Jane, watcha think about this?" The first telnet (predecessor to SSH) RFC was 9 pages, including an example of the connection handshake.

If you are interested history behind RFCs and early Internet, I suggest reading "Where wizards stay up late: the origins of the Internet" from Katie Hafner and Matthew Lyon. Thats good book about arpanet formation and how arpanet and other networks together became Internet.

Years ago when the internet was still a bunch of universities, the main few folks working in the network were often times tuning hardware like routers. I do not remember what the first rfc was about, but it was genuinely a "request for comments" document mailed it to others. Ever since then it stuck.

I am happy to see it go. For years I've had to block it in iptables using:

    iptables -t raw -I PREROUTING -i eth0 -p udp -m udp --dport 53 -m string --hex-string "|0000ff0001|" --algo bm --from 40 --to 65535 -j DROP
Most bots are poorly coded, so most of them don't try the individual records when ANY times out. ANY is also a decent generalized amplification attack vector. There are plenty of other ways to do amplification attacks, but this one is quite brain dead.

Thankyou to the folks at CloudFlare for submitting this.


Why would it be harder to implement then simply iterating over the known types and catenating the results of those queries.

Say we have A, CNAME and MX type records in our system. If we see ANY, let's convert it up-front into into three queries: A, CNAME and MX. Combine the results and that's it.

ELIA5


Here's why: suppose some types are easier to answer than others. Maybe A and CNAME and AAAA are cached, MX is in the database but not cached, and NSEC3 requires computing a result. In order to correctly implement ANY (pre-RFC8482), you must convert it into all of those, and wait for the slowest one. An implementation like Firefox that asks for ANY because it just wants A and AAAA is wasting both the client and the server's time.

However, also, pre-RFC8482 that's not what ANY is defined to mean. A non-authoritative caching resolver is permitted to respond with only the records it has cached if something is in cache. So you're already getting a random subset of records. So the behavior is unhelpful for clients and also unhelpful for authoritative servers, and not in a way where the unhelpfulness trades off for helpfulness elsewhere.


That semantics is indeed useless.

And to begin with, applications should never be interested in all record pertaining to a domain; they want specific records. If I'm sending mail, I want a MX record for a domain. I don't want the SPF record unless I'm receiving mail which implicates that domain. If I'm making a HTTP connection, I don't want any of these; I want an IP address. So no semantics of ANY is easily justifiable.

On the other hand, we can't claim that this semantics is hard to implement.


It's hard to implement in a way that makes sense in the constraints of the world as it is (people asking for ANY when they have some meaning in mind other than the nonsensical specs) and especially in a way that doesn't enable DNS reflection attacks. In an ideal world, sure, you could enable ANY provided nobody either good or evil ever used it. And in fact Cloudflare says they did previously implement it just fine, by littering if statements everywhere.

The article gives some insight. For example, to quote:

    Should it forward the "ANY" query to authoritative?
    Should it respond with any record that is already in cache?
    Should it do some a mixture of the above behaviors?
    Should it cache the result of "ANY" query and re-use the data for other queries?
Pair that with inconsistent implemented behavior.

If we treat ANY as universally quantified, for absolute correctness/up-to-dateness we must always send the request upstream, because we are not the authority on all that may exist. However, we can sacrifice a bit of correctness with some reasonable heuristics. If we execute an ANY query for a given domain, we can cache the fact that we did such a query, in addition to caching the records that come back. Then, for some time after that, we may treat those cached records as being everything that is known about that domain; if another ANY query comes in for that domain, we give those records to the client. Furthermore, in an independent cache entry, we can associate the list of record types with that domain: it informs us that for that domain, records of those types are known, or have been known. When we get an ANY query for that domain, if we have this list, we can walk the list of types and query upstream for the missing records that have expired from our cache, rather than forwarding the ANY request. Then we serve the client a mixture of cached and fresh records.

Oh my. Multiple heterogenous but interacting caches? Heuristics? And if I understood it correctly, if any of the records you got from ANY are expired, you selectively query for new records from a list of cached types, not using ANY? What if the upstream added a record from another type in the meantime? Why have ANY at all then?

How is any of that "simple"? How do you transition the existing infrastructure to that behavior? Why?


> What if the upstream added a record from another type in the meantime?

Then we don't see it for a while until the type list is refreshed; that's the up-to-dateness case that is sacrificed.

> How do you transition the existing infrastructure to that behavior? Why?

If it's an alternative to ALL being rendered inoperative (and someone else is already taking responsibility for that extreme measure), I might have a lot of leeway that adds up to "try it and whatever the hell happens, happens".


No. Don't add complexity, state and attack surface with no reasonable cause.

I’m no DNS-expert but I can tell you missed AAAA, SRV and INFO, so clearly it’s not that easy.

"Say we have ..."

Yes, but now consider that you have to forward the query. If you unpacked the ANY to individual known types at the frontend, will you only forward A, CNAME and MX (which you yourself know about in your example), and omit AAAA, SRV and INFO? (And also be responsible for packing them back into a single reply, I guess?)

More correctly you should probably forward the ANY as ANY, so you have to do it slightly later than "up front". And then cache the individual answer records as if they came from individual queries.

I haven't fully thought this through, though, because: a) As you can see, it's not necessarily "simple"; b) it's not impossible to suss out something that makes sense, but the article at least claims that the resolver side is undefined so far, and several implementations do several different things by now. Giving up on ANY is probably for the best, at this point.


I just finished replying to your earlier comment.

But... Qmail! :)

Patched not to do this for many years, now. By me.

* http://jdebp.eu./Softwares/djbwares/qmail-patches.html#any-t...


What does qmail use it for?

Qmail used to do ANY.

Qmail, as opposed to other software, was sane enough to understand that lack of MX in ANY didn't mean it wasn't there. It meant it wasn't in a cache, so it would retry with just MX.

Our initial proposal (to do REFUSED for ANY) indeed had a chance to break some older qmail installations. This is why we engaged in a longer process and found acceptable solution - HINFO. HINFO is both backwards compatible (qmail will work fine) and solves our problems with ANY. Win-win.


Technically, there can be multiple queries in a single DNS request, though this is under-specified in the RFCs. Also, most authorities don't support it.

Why did it do ANY?

> This is really just an old workaround that's no-longer needed. Let me explain...

> Once upon a time, back in 1996, there was a really unfortunate bug in the most popular DNS server software (BIND 4.9.3): it did not respond correctly to "CNAME" requests (that is to say, requests for any CNAME data about a particular domain name). This is critical information that an email server needs to know to do its job. Thankfully, there was a way to work around the problem: "ANY" requests. These requests ask the DNS server, essentially, for any and ALL information it has about the domain name in question, including CNAME information.

> These ANY queries have two big problems:

> As you might imagine, for big domains with lots of mirrors (e.g. gmail.com), that's a lot of information, and so the response can be quite big. Big responses pose two problems: first, it's a waste of bandwidth, and second, it can expose a bug in qmail's handling of large DNS responses (see the next patch).

> ANY queries are often not cached by relaying DNS proxies (for whatever reason), and so ANY queries cause more traffic EVEN behind a caching DNS proxy server.


Also there used to be a bug with CNAME lookups and ANY was a workaround. http://www.memoryhole.net/qmail/#any-to-cname

I used to use Qmail! But not since about 2005 when I finally converted to Postfix if memory serves.

Holy crap I remember when RFCs had three digits!

> This got us thinking: do we really need it? "ANY" is not a popular query type - no legitimate software uses it (with the notable exception of qmail).

Cloudflare once again pushes for broad changes to the way the net operates that benefits themselves and other centralized corporate players without any benefit to individuals that actually use the net in the sense of using the internet and not just a web browser.

The kind of exploration and direct learning that was possible when I was a kid growing up 90s/00s is slowly being phased out as the money seeps in.


What exactly about the deprecation of ANY do you disagree with (other than Cloudflare proposed it and it used to exist)? Also the blog post and RFC seem to make convincing arguments for the general health of the internet, do you disagree with these and if so why?

I've always used it to explore the internet for fun as a person without any profit motive.

Cloudflare's argument's are all based around ANY making it slightly harder for them to make a profit. A small chance of DNS amplification attacks in DNS is what Cloudflare thinks about because it's their business. But there's no reason to believe this is more important than individual users wishing to see what servers are behind a domain.

Combined with GDPR killing off whois the internet is a much more boring, less transparent place.


(A) This days Cloudflare can sustain all the DNS amplifications without even blinking. This work doesn't help us in any way on the DoS front. This work helps smaller players that don't have spare network capacity. If anything, making amplification attacks harder _reduces_ our benefits.

(B) You are free to run ANY as you like on your domain. For domains you own, this RFC doesn't change anything.

(C) Would you advocate for responding to AXFR / zone transfers? There generally is consensus that allowing enumeration is not desired.


Would you give an equivalent argument that Microsoft pushes automatic security updates because Windows vulnerabilities make it "slightly harder for them to make a profit"; that "a small chance of becoming part of a botnet" is what Microsoft thinks about because it's their business; but that there's no reason to believe this is more important than individual users wishing to experiment with A-Life?

Sorry, but less DNS amplification attacks is more important than you having fun on the internet.

And relatively counter to Cloudflare's interest. The more people get DoSed the more reason they have to use Cloudflare.

I hear you. I think whois was neutered for anti-abuse purposes by the pay-for fake front proxy "privacy" vendors. Perhaps like you I use any in conjunction with dig, and muscle memory alone is going to keep the query alive in spite of 8482.

> individual users wishing to see what servers are behind a domain

Since most servers are web servers these days, you can get pretty close to this goal from certificate transparency logs.


Limiting ANY queries to just TCP would have been a reasonable middle ground, but now Cloudflare has a cleaner codebase and does not have to do any difficult thinking around what to do with synthetic rrsets in response to ANY queries. Good for Cloudflare!

> Limiting ANY queries to just TCP would have been a reasonable middle

This was proposed solution, the problem is that in case of attack, against a valid Authoritative service, launched via open resolvers, the open resolvers would just download Gigabits of ANY traffic with TCP. Read about this here: https://fanf.livejournal.com/140566.html


They could have done this locally, just for their authoritative servers, without trying to force their ideas about ANY on the rest of the Internet.

> But there's no reason to believe this is more important than individual users wishing to see what servers are behind a domain.

None of your business, really.

> Combined with GDPR killing off whois the internet is a much more boring, less transparent place.

Thank lord for that, there's literally no benefit from it, with WHOIS evil still did evil and normal users got their privacy violated, without WHOIS evil can still do evil and normal users' privacy is protected.


I’m not convinced that ANY is that bad. They wrote some bad code, they have special goals and all of a sudden ANY is not good for you.

People are using it for reflected DDoS attacks. This has resulted in DNS providers scoping ANY to the point of breaking it. So even without removing it, it was horribly broken and returned inconsistent results.

In essence all this change does is remove the fiction of ANY, with or without RFC8482 ANY wasn't reliable enough for real usage.


Given how bulky ANY is, and the fact that we're changing the standard anyway, would not just restricting ANY to TCP queries "fix" the DDoS issue? The attempted TCP connection would not be formed (the victim would RST it).

(I agree with other issues pointed out by the article, and there are other reasons why, as a RR type, I would still axe ANY. But the functionality of being able to query all RRs on a server is often useful for debugging, though I think there are other practical ways to work around that. (Issue a query for many common RR types.))


The problem is that no one could tell reliably what ANY even was. Chalk it up to English being ambiguous in every way imaginable.

Or is it "any way" imaginable?


Agreed!



Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: