The privacy of the TLS 1.3 protocol [pdf]

brians · on July 3, 2019

Oof. This is an important contribution about the handshake and resumption protocols, but the abstract is badly misquotable. I worry this will lead to problems as it’s reported elsewhere.

TLS 1.3 doesn’t encrypt the SNI, doesn’t encrypt the destination IP address, and doesn’t mask the size or ordering of packets. In practice, TLS 1.3 protects secrecy of the bits you send—but not privacy of to whom or how much.

As I wrote five years ago when TLS 1.3 was getting started, https://weblog.evenmere.org/posts/2014-05-16-tls-is-not-for-... , the privacy folks have needs misaligned with the prevalent technology.

3xblah · on July 4, 2019

When I saw the title of the paper I expected it would be about what the parent comment describes. Instead, they deliberately omit the case of SNI and state that this issue "deserves its own paper".

I think any serious consideration of the issue cannot proceed on the assumption that the SNI extension is a "must-have". It is optional. Users can prefer websites that do not use SNI. There are still plenty of TLS-protected websites not using CDNs and SNI.

CDNs often do not check the Host header against the SNI name.

For example, request a page from www.globalsign.com from host marketcircle.com.

   echo -e "GET /en/ HTTP/1.1\r\nHost: www.globalsign.com\r\nConnection: close\r\n\r\n"|openssl s_client -no_tls1 -no_tls1_1 -no_ssl2 -no_ssl3 -ign_eof -no_ticket -host marketcircle.com -port 443 -tlsextdebug -servername marketcircle.com -verify 9

brians · on July 4, 2019

This is a tricky question to turn off. It’s rarely clear how to construct an attack with it, and it does allow a kind of privacy by telling your ISP & DNS providers one name, while asking for another.

nmjohn · on July 3, 2019

> doesn’t encrypt the destination IP address

How would that work?

brians · on July 3, 2019

The simplest system I know to mask destination IP is an anonymizing mixnet. Tor is a good approximation: delegate your privacy to a system design. A CDN, or what Google’s doing with signed exchanges, is a bad approximation: delegate your privacy to an entity. For certain very specific goals, Psiphon is an intermediate solution.

Alternatively, “it doesn’t work.” As Colm says elsewhere, the placement of TLS in the stack isn’t compatible with privacy goals. There’s a big project to nudge the current privacy-ignoring design of the Internet into a privacy-respecting design, while continuously maintaining compatibility and providing performance incentives to bring the mega corps along.

The engineers pushing that are supremely skilled and passionately motivated. If anyone can do it, they can. I am not optimistic, mostly because I expect the mass of less-skilled but equally-passionate consumers of that technology to latch on to intermediate solutions—like the recent Firefox-CF DoH experiment—and lock them in.

londons_explore · on July 3, 2019

The test I like to use is "If I go into my bedroom and browse porn online, I expect nobody to know what I'm up to".

Today, pornhub.com knows I visited and what I clicked on. My ISP knows I went to pornhub. Even people on my local network know I went to PornHub. Through packet size analysis, all those parties even know what I typed in the search box at PornHub (each possible keystroke results in a different number of bytes of autocomplete Ajax).

TLS failed to achieve any of the privacy the user expected. The infosec community has missed the big picfure...

jrockway · on July 4, 2019

I don't think those were ever the goals of TLS, they are just some random things you want or think people want.

What TLS does is:

1) Prevent your local network or ISP from tricking you into visiting a fake pornhub. If you type that in and it loads over TLS, you know you're at the real one.

2) Prevent your local network or ISP from reading any of the data you exchange with pornhub.

This, of course, provides quite a bit of value. It's not everything you want, but it is a big win for many users.

Dylan16807 · on July 4, 2019

It does the first one. The second one doesn't do so well against packet size analysis.

jrockway · on July 4, 2019

Yeah, but that's a pretty obscure attack. All bank customers have the same length account number. All the videos on YouTube have the same length identifier and usually last about 10 minutes and 1 second. Some information is leaked through packet size and timing. But a lot isn't.

I am also guessing that 1 capture doesn't give you a very good signal to noise ratio. Sure, if you capture someone's keyboard-interactive ssh exchange 100 times a day for 6 months, you have a lot of data. For your porn searches or whatever, I doubt the sample size is enough to leak a ton of information. And it's certainly better than just letting the person capturing packets read them and see what's in there.

saurik · on July 4, 2019

You still don't understand. While the average time length of a YouTube video might be 10m1s, the time length of a specific video isn't. Worse, the file sizes are all quite different even if the time length is the same due to video compression. The way these sites work is you download segments of the video of fixed time length (such as two seconds), roughly in order, and so it becomes trivial to fingerprint which video you are watching by what sequence of specific file sizes someone downloads.

This is absolutely not an "obscure" attack: it has been used, successfully, to build a tool to guess what city you are looking at with Google Maps. There, tiles are downloaded to render a map (similar to the segments used to render a video on YouTube). While each tile may have a fixed dimension, again, due to image compression, each tile has a different file size. Since tiles are downloaded in clusters, you can pretty readily fingerprint different regions of the map a user is looking at (and if it weren't for local caching, it would work almost instantly 100% of the time, live).

With type-ahead searching, as each character you type returns a different list, given the file size response sequence you should be able to determine exactly what search query someone made with almost total accuracy almost all of the time when they type it, with the only reason it ever not working well being due to caches and delayed queries (if you type fast enough on many type-ahead query boxes it will avoid sending queries for the prefixes, to reduce load for something that by the time the result is fetched won't even render).

jrockway · on July 4, 2019

I'm just saying, it's outside the scope of TLS to fix this problem. Map tiles are different sizes because it makes the page load faster and costs less money to run the service and uses less data on your mobile plan. You CAN prevent this attack, but it's too expensive and nobody cares. TLS is a comfortable medium; adding additional security, while not adding too much additional cost.

arkadiyt · on July 4, 2019

> Yeah, but that's a pretty obscure attack.

I'm certain nation states work on developing these attacks. Also a commercial company just has to build it once, then they can sell it to ISPs who make money selling your data.

> All bank customers have the same length account number.

This isn't the only problem with packet size data, there have been compression attacks that can recover plaintext using packet sizes, like CRIME and BREACH.

> All the videos on YouTube have the same length identifier and usually last about 10 minutes and 1 second

Here's a research paper from 2017 where someone in a MITM position could detect exactly what netflix video you were watching by looking at packet sizes: https://mjkranch.com/docs/pubs/CODASPY17_Kranch_Reed_Identif...

Of course TLS still provides a critical level of security and privacy, but there is room for improvement.

colmmacc · on July 3, 2019

Even for the bits you send, although TLS1.3 includes support for record padding, it's not mandatory. In practice, both Content-Length finger-printing attacks and traffic analysis attacks work against TLS1.3.

colmmacc · on July 3, 2019

This paper is misleading IMO. The abstract says - "On the positive side, we prove that TLS 1.3 protects the privacy of its users at least against passive adversaries, contrary to TLS 1.2, and against more powerful ones."

But if you read the summary, it also says "both TLS 1.2 and TLS 1.3 session resumption present serious privacy flaws despite not using concrete authentication elements, such as certificates ... While [PSK-DHE] provides a measure of backward security, it does nothing to improve privacy."

TLS1.3 is awesome, but it's still a layer 4 transport scheme, and there are plenty of ways that a passive adversary can derive privacy sensitive information. I mean it's /trivial/ for a passive adversary to tell that you're visiting an embarrassing website ... to pick just one obvious example.

nickserv · on July 4, 2019

Right, and it's in the name, after all: Transport Layer Security. Security is not quite the same as privacy.

TLS was never meant as a way of guaranteeing privacy in a broad sense as far as I know, and this is the first time I'm seeing it described as such.

Now, one may argue that security is a needed component of privacy, but it's certainly not enough by itself.

BuildTheRobots · on July 3, 2019

> Another feature we omit is the Server Name Indication (SNI) extension, which allows a single server to run TLS handshakes on behalf of multiple domains, using multiple public keys.

I don't understand how you can seriously use TLS and privacy in the same headline whilst actively ignoring the mess that is SNI...

dagenix · on July 4, 2019

If you connect to a website behind a CDN hosting many websites, a passive observer can tell that you connected to the CDN, but has no idea which website you requested (let's pretend that they can't use a length fingerprinting attack). However, unless the CDN supports domain fronting, which most don't, you have to use SNI to tell the CDN which website you want so you can get the right cert. As SNI is unencrypted, a passive observer now knows what website you are talking to. Privacy defeated.

If you connect to a website not behind a CDN, you probably don't have to use SNI, but, the website is revealed by doing a simple reverse DNS query. Privacy defeated.

Unencrypted SNI doesn't hurt privacy when compared to the status quo. Encrypted SNI will boost privacy. But until then, TLS is basically the best you can do for privacy, outside of using some more exotic service.

cortesoft · on July 4, 2019

Not for all CDNs. Some provide dedicated VIPs which don't require SNI.

Of course, an adversary can then figure out who you are connecting to based on the IP.

3xblah · on July 4, 2019

"If you connect to a website not behind a CDN, you probably don't have to use SNI, but, the website is revealed by doing a simple reverse DNS query."

As an example, I tried a reverse DNS query for the IP address of matrixssl.org. All I got was a subdomain at gandi.net.

Reverse DNS was originally intended for troubleshooting. It is not required for websites (cf. email) and not everyone bothers to set it up. That is one group of websites where we have to do more work to get the names that are using the remote IP address and figure out which one the user asked for.

In fact, DNS is not required for a functional website. IP address of course works fine. That is another group of websites where we have to do additional work to figure out what is at the remote IP address. We do not know what the user sent in her HTTP headers.

By comparison, SNI makes the process of invading user privacy easy and reliable, less work. The user is required to send a name, and to send that name in the clear.

gruez · on July 4, 2019

>As an example, I tried a reverse DNS query for the IP address of matrixssl.org. All I got was a subdomain at gandi.net.

While the gp is wrong in saying that it's as simple as a reverse DNS lookup, his general idea is correct. It's trivial to crawl the internet to find all the ip -> domain mappings for all public domains. It's even easier if the attacker is an ISP because they can log DNS queries/responses.

3xblah · on July 4, 2019

If the parent comment is suggesting it is no easier with SNI, then I disagree. Not all domains are "public" and not all users send their DNS queries to ISPs or third party DNS providers. There are alternative sources for IP to domain mappings for TLS-enabled websites besides reverse DNS, crawling the entire internet is not necessary, but sniffing SNI is easier and more reliable than relying on DNS.

nfoz · on July 4, 2019

Could you elaborate? What's the problem with SNI? (I haven't dived deep into these protocols)

mschuster91 · on July 4, 2019

SNI exposes the target domain to everyone with sniffing capabilities - including everyone on your private/corp network as well as all involved ISPs.

gruez · on July 4, 2019

That's an non issue because the target domain is in certificate that the server sends back. This happens with or without SNI.

toast0 · on July 4, 2019

In TLS 1.3, the certificate is now sent encrypted with an ephemeral key. A given IP can serve differwnt certificates depending on SNI, so if SNI can become unsniffable, determining the certificate based on observing traffic to the server as well as generating traffic would be much harder for shared IPs anyway.

jajaioxjeyo · on July 4, 2019