Wouldn't it have been simpler and more secure to skip the payload altogether?
In a security protocol it seems strange the designers of the protocol opted for a more complex protocol than necessary.
"To perform PMTU discovery, HeartbeatRequest messages
containing padding can be used as probe packets, as described in
"In particular, after a number of retransmissions without
receiving a corresponding HeartbeatResponse message having the
expected payload, the DTLS connection SHOULD be terminated. "
"When a HeartbeatRequest message is received and sending a
HeartbeatResponse is not prohibited as described elsewhere in this
document, the receiver MUST send a corresponding HeartbeatResponse
message carrying an exact copy of the payload of the received
If a received HeartbeatResponse message does not contain the expected
payload, the message MUST be discarded silently. If it does contain
the expected payload, the retransmission timer MUST be stopped."
The "why" is that networks are subject to packet loss, and to timing delays. If you send out two heartbeats 10s apart, and only receive one reply back, then is the reply responsive to the first heartbeat, or to the second 10s later? Without some unique data in each of the packets you send, which is echoed by the other end, you can not disambiguate this question.
Now, one could argue that a simple sequence counter defined by the protocol would suffice. But that is open to exploit by a misbehaving server because once the server recognizes your heartbeat counter pattern, it can issue anticipatory heartbeat replies with the proper sequence number that you will accept as valid. By making the payload arbitrary bytes, then you have the option of sending a sequence number, or generating random data (and tracking it yourself), or both. The random data aspect thwarts misbehaving servers from determining your pattern and replying with the right data without ever receiving your requests. The arbitrary sized payload also makes the packet useful for path MTU discovery as the first paragraph I quoted from the RFC indicates.
This document describes the Heartbeat Extension for the Transport
Layer Security (TLS) and Datagram Transport Layer Security (DTLS)
protocols [...] DTLS is designed to secure traffic running on top
of unreliable transport protocols.
I wonder if the payload should have been specified only for use with DTLS, not TLS.
The HeartbeatResponse must contain the same payload as the
request it answers, which allows the requesting peer to verify it. This is necessary
to distinguish expected responses from delayed ones of previous requests, which can
occur because of the unreliable transport.
I don't think it is probable that any peer will want to know the order of the "ping" replies. A heartbeat is not used to diagnose anything. Any heartbeat response will suffice to keep the connection alive.
The need for a payload/sequence number is presented as a "truth" of networking but it seems to be invalid here.
A 32 bit sequence number would suffice, except for the fact that DTLS also uses this packet for path MTU discovery. Strangely, an IP packet cannot exceed the ethernet MTU of 1500 bytes anyway and at present I have no idea how this DTLS MTU discovery is supposed to work. Supposedly, the only way to check for MTU limits on routers is to set a bit in the IP header that says "do not fragment" and then wait for possible ICMP replies. This should work for UDP as well. But perhaps you can also discover UDP fragmentation at the receiving end since UDP might not automatically recombine the fragments. Then, any protocol on top of UDP would be able to do this as well. It was introduced in DTLS so that any applications using DTLS wouldn't need to do it themselves.
Whatever the case may be, and however useful or stupid the design, these messages could not exceed 1500 bytes. So why 64k? Probably just because a byte is max 255 which is too low, a word (2 bytes) is max 65535 which is enough. I don't know what features UDP/DTLS have for reassembling packets, but apparently they thought it easier to make $FFFF the max value than to impose a better limit on this number.
Also the padding is required just in case some peer sends simple sequence numbers as payload. A 32-bit sequence number + 3 bytes of header would be 7 bytes. Adding a 16 byte padding of random data would scramble the plaintext so the plaintext of the encryption (all of this is encrypted) could not be guessed.
In the unlikely case that some implementation uses 13 bytes of payload with 16 bytes of padding, and the block cipher uses 128-bit blocks, which is 16 bytes, then this whole security measure is voided since the block cipher simply encrypts bocks of 128 bits in serial.
All in all it doesn't sound like it's very well thought out.