For actually understanding the RFCs, I've found it useful to crack open wireshark and then look at an actual TLS connection, and then cross reference the RFC to figure out what's going on. It makes everything more concrete in my opinion.
Most importantly TLS 1.3 _always_ starts by the two parties setting up a secure encrypted connection, and only then is any effort expended on trying to figure out who anybody is, whereas in earlier versions these elements happen somewhat simultaneously.
A TLS 1.0 guide was pretty helpful in understanding TLS 1.2, but a TLS 1.2 guide is probably just misleading for TLS 1.3
I wonder what the motivation behind that was --- I'm no cryptographer, but setting up what is effectively an anonymous (EC)DH session first seems to provide no extra protection from an active MITM because it's unauthenticated. The only other protocol I've seen do this is obfuscated BitTorrent, using a deliberately short keylength and RC4, where the goal was not protection from MITM but to resist traffic analysis and blocking by making essentially all of the traffic look completely random. Meanwhile, TLS 1.3 still retains the plaintext record headers and framing from previous versions.
(One thing that I've always wondered about and never gotten a good answer is the fact that as far back as SSL 2.0, and presumably 1.0, there seemed to be no attempt to make the whole protocol encrypted, but instead the messages were very identifiable. Why not? One would think that a protocol designed to conceal data should itself be hard to distinguish from random noise.)
While the Internet is supposed to be end-to-end, there's no shortage of intermediate systems that pretend to understand the upper-layer protocols being carried, and that act on that what they think they see. Every single change to the visible parts of a protocol, no matter how small, risks breaking when one of these observers are present. TLS has been hit particularly hard by that; TLS 1.3 was delayed for many months while the protocol designers tried to figure out how to change the protocol without it being rejected by these interlopers. In the end, the solution was to make all TLS 1.3 connections pretend to be TLS 1.2 session resumption connections, and hide the real protocol negotiation within the encrypted session.
> setting up what is effectively an anonymous (EC)DH session first seems to provide no extra protection from an active MITM because it's unauthenticated
It's actually authenticated with the server certificate (and client certificate if one is being used); the authentication covers the whole protocol negotiation.
Since with active MITM the intermediate box has to be one of the ends of each half of the connection (otherwise the authentication will fail since the ECDH keys won't match), the protocol negotiation protects against misunderstandings: if the MITM box does not understand TLS 1.3 (or any future version), it will negotiate TLS 1.2, instead of getting confused by all the changes to the protocol in TLS 1.3.
> as far back as SSL 2.0, and presumably 1.0, there seemed to be no attempt to make the whole protocol encrypted
The goal back then was to protect the data, not the metadata. Only recently have the bad experiences with protocol ossification caused by broken middleboxes led protocol designers to also protect as much as possible of the metadata.
So first let's explicitly point out that it stops passive snooping.
Even in the presence of MitM, it provides protection. An attacker can get access to a specific handshake, but the client will know and can kill the connection and alert the user. It can't be done behind the scenes on every connection.
It makes the protocol easier to reason about coherently, which in TLS 1.3 was a priority for the first time. The cryptographers reasoned about an abstract protocol and produced various types of proof about the properties this protocol has, and then the engineers in the TLS WG figured out which bits to send on the wire to communicate that protocol in a way that doesn't blow up too many middleboxes.
We can consider two separate possibilities:
1. The usual case, the two parties were actually Alice and Bob. Bob (and optionally Alice) now bind proof of identity to the secure channel using CertificateVerify messages, which is possible because they are, in fact, Alice and Bob, and we have an authenticated secure channel between Alice and Bob.
2. The MITM case. Alice is talking to Mallory. Mallory may also be talking to Bob. Mallory cannot show Bob's binding to the Mallory-Bob channel to Alice as proof she is Bob. It's bound to the wrong channel. She can't make her own proof, that's the whole point of the public key cryptography used for the proofs. So Mallory can't prove to Alice that she is Bob, and so Alice won't send her any messages for Bob.
But also yes a subsidiary rationale is the one a sister message explains, that encrypting things stops a middlebox from tampering with them. Consistently over its lifetime _everything_ in the TLS protocol that a middlebox can read, even if it's explicitly not safe to meddle with it, gets meddled with. This rationale has driven QUIC work more than TLS 1.3, the QUIC WG even argued about whether to have a single unencrypted bit (the spin bit) in their protocol because they concluded middleboxes would undoubtedly tamper with it even though that's hauntingly stupid.
The long-term lesson has been that middlebox vendors (usually ostensibly "security" vendors) don't know anything about cryptography, or security, or networks, or possibly even about basic computer science ideas like counting. Their customers are often convinced that these idiots are protecting them, and the most we can safely do about that is try to discourage them from "protecting" anybody else.
Acual, real products your CIO can insist you spend money on to "protect" your network do things like run regular expression parsers over the opaque bytes in the X.509 certificate to look for malware, or lazily copy the random bytes from an attacker's packet into the "random" bytes of their own packet and are then surprised that all the cryptographic security doesn't work ...
If someone knows of a high-level document about 1.3 that goes through the handshake process and differences with 1.2, that would be a great reference to have!
The handshake process and difference is complete.
Arguably server_name isn't part of making sure who is who - the protocol doesn't care what you put in there, it's just that you can reasonably expect that if you say you wanted to talk to Charlie then you shouldn't be surprised if you get Charlie instead of Bob.
Eventually in eSNI the server_name will only be a bluff, since it's in plaintext and we'd rather not tell eavesdroppers who we're talking to, it will just have some generic masking name e.g. it might say some.cloudflare-server.example and then an encrypted record would reveal which actual Cloudflare server you wanted only to Cloudflare who are answering the connection.
I mean, they take significant effort to devise these standards and write them up, presumably with modern tools, AND THEN force weird typewriter CSS upon them? It looks awful. I can't understand it.
Plain ASCII text is a very compatible format, and probably will be for a very long time. For example, if you were to pipe this to a TCP socket on modern printer on port 9100 unchanged, it’ll likely print out roughly correct.
If there's a zombie apocalypse and we end up raiding the computer history museum for 80 column line-printers, sure, it makes sense to select the ASCII version to make print outs so that we could rebuild the internet.
But for the HTML version? Come on! Freshen it up a bit.
Imagine you've lost most of web technology after a nuclear apocalypse or something and have to recreate it from specs. You have no software guaranteed to render HTML correctly and completely. (We don't have it even now BTW.) It's like bootstrapping a compiler toolchain on an entirely new architecture. You have to start with assembly, and it looks awful.
CloudFlare takes a strong position on freedom, and I admire them for it: