"And the maximum size of an IP packet is 16 bit (via the IPv4 total length / IPv6 payload length). And that is now our limiting factor: 65536 bytes for DNS overhead + payload."
While they conclude the right size, this is incorrect. They seem to overlook that TCP DNS header has its own 2 byte length field (it's a 14 byte header). That is the limiting factor, nothing to do with the lower layers. If the DNS message itself is over 65514 it will require multiple IP packets due to TCP overhead and in practice MTU sizes are almost never over 9000 and usually <1500, so it's going to be split up at much lower sizes.
"'What about jumbo frames?' That doesn't really buy us anything, either, because TCP itself is also limited to 16 bits, albeit semi-indirectly via the 16-bit urgent pointer"
This isn't correct either. The urgent pointer (it's own can of worms) is completely ignored in non URG packets.
Yes that rubbed me the wrong way too. The author has a flawed understanding of TCP (from the application pov) and confused the DNS header with the TCP header. Apart from what you said about practically all TCP/IP packets on the internet being <= 1500 bytes and thus the reply being effectively delivered in multiple TCP packets, all this is completely opaque to the application layer. The application (DNS server) has no idea how many packets the data it's handing over to the kernel will be split into, what their size is, and in what order they arrive. At the same time, the client's DNS resolver has no idea what happened on the lower layers, like how many packets the data it receives was spread across, whether any of these were lost and needed to be resent, etc. What might happen is that the kernel tells your app that some data arrived, and when you go fetch it, it's not as many bytes as you expected, so you can guess that your data was split across multiple packets and where one of those boundaries was, but that's about it.
The article asserts "A large DNS response must fit into a single TCP packet." but skips the source on that one. I know that's true of UDP but is it really in the standard for TCP?
That's an excellent question! When I was writing that I was pretty deep in DNS code and standards I think I just understood that as a rule of DNS. Or at the least, I believed I understood. Looking back a decade later I can't find any standard which supports my statement. However lots of other secondary sources seem to agree with me.
"In these situations, the client needs to re-transmit over TCP for which the size limit is 64000 bytes" [0]
"The first response via UDP is, no surprise, truncated, so we retry via TCP. But now the DNS result delivered via TCP is also truncated! That is, the DNS server has determined that the result will not fit into the maximum response size. Why is that?
"Our payload is 4096 * 16 = 65536 bytes RDATA, which should fit into the DNS packet, which uses a two byte RDLENGTH field. But we also need to again account for the overhead noted above: 12 bytes DNS header, 36 bytes for the query, 11 bytes additional records, and 16 bytes for each A record, yielding (4096 * 16) + 12 + 36 + 11 = 65595 bytes in total. And the maximum size of an IP packet is 16 bit (via the IPv4 total length / IPv6 payload length). And that is now our limiting factor: 65536 bytes for DNS overhead + payload."
My best guess is that DNS just makes the assumption that everything needs to fit in a single IP packet and it doesn't care if it is UDP or TCP.
I think that second reference has a couple errors, and this has nothing to do with IP packets (datagrams).
As js2 notes, the entire DNS message has to be prefixed with a 2 byte field (and they seem to have omitted that in their diagram at the top of the page, the DNS header is 14 bytes, not 12 for TCP). That is the limiting factor. The DNS server doesn't really care what the underlying limits of the TCP stack are or whether the response fits in a single IP datagram (that likely needs to be split up).
As they seem to figure out the actual limit is RD payload + DNS overhead <= 64k bytes. What they are forgetting is that the TCP segment overhead means that this maximum wouldn't fit in a single IP datagram anyhow. Since the minimum TCP header size is 20 bytes any DNS message >65514 bytes will have to be split.
In practice none of this matters though as the OS had better make the TCP maximum segment size less than the MTU to avoid fragmentation, often something 1460 or less.
Any DNS query substantially over that is going to be going in multiple IP datagrams.
> Messages sent over TCP connections use server port 53 (decimal). The message is prefixed with a two byte length field which gives the message length, excluding the two byte length field. This length field allows the low-level processing to assemble a complete message before beginning to parse it.
This is the actual reason. TCP being a stream-oriented protocol, it would be perfectly valid (although totally inefficient) for the DNS server to send back an answer in 25 small-sized TCP packets. The DNS client would possibly even not notice it.
> My best guess is that DNS just makes the assumption that everything needs to fit in a single IP packet and it doesn't care if it is UDP or TCP.
It's not an assumption, it's a requirement. "Legacy" DNS has no provision for a response to exceed a single IP packet size. The assumption part is that this is adequate for a DNS response. Before DNSSEC, it was.
I believe DoH and friends don't have this restriction.
EDIT: too late to delete but this is wrong. see child comments.
Where is this requirement? A DNS server responding over a TCP socket has no idea of the underlying IP datagram size. In a typical Ethernet, the TCP MSS will be 1460 bytes so the IP packets will all be under 1500 bytes. You certainly can get regular TCP DNS responses bigger than 1460 bytes.
See elsewhere in this thread, "legacy" RFC1035 DNS over TCP has a 16-bit message prefix which limits the total DNS message to 64kiB.
A valid maximum DNS TCP response can exceed (the almost entirely theoretical) max IP packet size by 22 bytes, FWIW.
An IP packet can be much larger than a single TCP segment. It can be up to 65536 bytes. IP doesn't have datagrams, it has packets. Well, technically it does have datagrams but what I'm getting at is that with fragmentation (let's say you have broken PMTUD) an IP packet can be split up.
That said, terminology confusion aside, you are still correct and I was wrong. The response can indeed be larger than a single IP packet. 65536 + IP overhead, split across as many IP packets as needed (doesn't have to be a single very large fragmented packet).
> Well, technically it does have datagrams but what I'm getting at is that with fragmentation (let's say you have broken PMTUD) an IP packet can be split up.
Of course, but in the happy usual case your network stack should be starting off with a TCP MSS <= MTU + overhead and the IP datagrams will not be fragmented.
I used the term "datagram" because to higher layers like TCP, it is formally specified this way. I'm not too concerned about pedantry (except to avoid confusion) but this is the literal RFC9293 text: "The application byte-stream is conveyed over the network via TCP segments, with each TCP segment sent as an Internet Protocol (IP) datagram. "
TCP absolutely does have packets, it's just that by convention userland is supposed to ignore them and proxies are allowed to change the boundaries.
If you control enough of the network and set the right socket options, you can observe them being preserved though! And as a result of this observation, people ignorant of the convention have written programs that most people now consider "buggy".
TCP itself doesn't have packets with a size field though, it has segments. If speaking informally and everyone is on the same page, maybe it's OK to say. But this might be a good place to be pedantic, since there is confusion. The 64 KiB limit for TCP/IP is due to the 16-bit size field of the IP header. Crucially, TCP segments themselves don't have a size field (hence TCP only works over a lower layer that encodes size). TCP segments have a 16-bit window size, but that can be increased by TCP window scale option.
So TCP/IP has a 64 kbyte limit, but TCP itself does not. There was a proposed RFC for IPv6 "jumbograms" but that never gained adoption.
In the end, this 64k limitation being discussed comes from the DNS protocol itself where it encodes the length as a message prefix, this isn't related to TCP or IP.
Depends on the query type, but, hypothetically, lots!
As a purely theoretical example, I could define a (useless) DNS zone to contain, for each valid host name, one A record for every valid IPv4 address and one AAAA record for every valid IPv6 address, send an AXFR (zone transfer) query, and receive a response containing ≈2^32 + ≈2^128 IP addresses for each of its 36·37^62 + 36·37^61 + ⋯ + 36·37 + 36 hosts.
(Some DNS query types, such as AXFR, can split responses across any number of individually length-limited DNS messages.)
Rather than reinvent the wheel, one place I worked used SRV records for service discovery. Unlike A recrods, SRV records can have host names, so the limit is a lot lower, and they hit it. They might have even hit it twice when a client is UDP-only.
The RFC for DNS-over-TLS [0] seems to indicate that tunneling DNS requests through a TLS channel has no effect on the potential size of the payload returned.
https://www.netmeister.org/blog/dns-size.html