"So, while it is technically still possible to censor HTTP/3 connections based on the SNI in TLS, the collected data shows that hardly any censors actually parse and use this information (even when they do parse the SNI in traditional HTTPS traffic)."
What about the privacy issues raised by SNI.^1 Why have the "tech" company sponsors of QUIC failed to address this problem. Instead they actually wanted to _require_ SNI.^2 There is zero attention to user privacy in the development of QUIC. The protocol increases surveillance and reconnassiance capabilities of the "tech" company sponsored web browser, and user privacy against "tech" companies is completely ignored. Unlike QUIC, at least the TLS project is working on a solution to the SNI problem. Cloudflare offers one of the early solution protoypes, ESNI. I have been using it for several years now with no problems.
In CurveCP, a predecessor of QUIC (and possibly the inspiration for QUIC), SNI is not required. Multiple sites can be securely hosted on the same address.^3 Both TLS1.3 and thereby QUIC use encryption developed by the author of CurveCP. The thinking of the companies behind QUIC regarding available technology seems to be "If it benefits "tech" company strategy then use it. If it only benefits users but not "tech" companies, then ignore it."
1. TLS worked before SNI was introduced, and the web worked before AWS and their ilk existed. There are still many, many, many websites that use TLS but no not require SNI. If one only sees the web as a handful of CDNs, then it is easy to ignore these other TLS sites. However they exist, regardless of whether CDN devotees choose to acknowledge them. And no, trying to perform reverse DNS lookups on IP addresses in real-time is no substitute for grabbing cleartext domain names from SNI. It requires more effort and does not work consistently.
> TLS worked before SNI was introduced, and the web worked before AWS and their ilk existed. There are still many, many, many websites that use TLS but no not require SNI. If one only sees the web as a handful of CDNs, then it is easy to ignore these other TLS sites. However they exist, regardless of whether CDN devotees choose to acknowledge them.
CDNs are not the only use case for virtual hosts on the web.
Admittidly, subject-alt-name does help with that compared to the pre-sni web.
In any case, if SNI is not needed, than the ip address is neccesarily unique, in which case no privacy is lost by having the sni as nefarious people can just look at IP, so it doesnt really matter.
"We also accounted for several resources that get loaded from different web servers due to the sub-queries performed when a website is requested. The set of all these IPs contacted is referred to as the Page Load Fingerprint (PLF) of the website."
This method is restricted to clients that automatically load resources and/or run Javascript, e.g., so-called "modern" browsers. It will not work with non-browser clients.
For example, I make HTTP requests using TCP clients and HTTP generators. There is no automatic loading of resources nor Javascript support. Another example is CCBot, the Common Crawl's Nutch crawler. It does not request CSS, images, Javascript, etc.
Some researchers claim that sites can be fingerprinted by HTTP response size. People claim this is mitigated by HTTP/2 multiplexing and pipelining.^1 Because I use TCP clients not a browser to make HTTP requests, I can use HTTP/1.1 pipelining. Thus HTTP response size is not reliably predictive of any specific site. I can easily vary the size using pipelining.
So there is not a lot to be done about the ~50% of sites where the IP is sufficient to identify the domain.
For the others, where they suggest a heuristic based on a PLF, it seems something like uMatrix (or a proxy based equivalent) could reduce, or change the fingerprint. Potentially at the cost of the page looking different (or ugly).
I happen to use uMatrix simply to reduce b/w and make better use of my display (i.e. use the full width, not a silly narrow column), and most of the web is still usable.
However such techniques would be beyond the likes of my parents and probably also my siblings.
I guess to mitigate this, you need something like a VPN, Apple Private Relay or the Tor network. Those don't solve the problem of a global passive adversary though.
No one is using this theoretical method to support surveillance capitalism let alone censorship.^1 However many entities are using SNI for such purposes. I am not going to send SNI when it is not required and I am not going to send cleartext SNI if I can use ESNI. Why make surveillance and censorship easier.
1. "The real-world inference will be slightly different from our closed-world assumption because a wider dataset will be available to the adversary. It can happen that a PLF signature that might seem unique in our study can actually belong to two different websites; it's optimistic but we have identified IP addresses that have mappings to unique domains and these can potentially be used to uniquely profile websites."
Sniffing SNI requires no such inferences nor assumptions. It works reliably in the real-world.
Generally, you don't. With CDNs and other hosting providers, sometimes one can use heuristics. For example, all AWS sites require SNI and do not support TLS1.3. As such, before connecting, I can match the IP against a list of AWS IP ranges. If the site is using AWS then the page is retrieved from archive.org instead of Amazon. Internet Archive does not require SNI and supports TLS1.3.
How do "tech" company-sponsored browsers tell if a site requires SNI before connecting. They don't. What so-called "modern" browsers do is they _assume_ all sites require SNI. That solution favours "tech" companies over user privacy. Same story with QUIC.
I take a different approach. I assume no SNI is required and I train the proxy to recognise the sites I have accessed. I assume TLS TLS1.3 will be supported. No user-agent header. And so on. This is why I know there are many, many TLS sites that do not require SNI. As I learn about sites that require SNI or do not support TLS1.3, or have some other requirement, they are added to lists. These lists are "map" files that are loaded into the proxy's memory.
Considering the two approaches, each involves a "guess" as to whether SNI is required before connecting. The first approach, i.e., assuming all sites require SNI, is, for me, wrong most of the time. I am not only accessing sites that require SNI. The second approach, i.e., "machine learning", uses stored information to educate the decision of whether to send SNI, use TLS1.3, etc. The second approach is, for me, right most of the time. And on a return visit to a site, the proxy is almost never wrong (unless perhaps the site has changed hosting providers); the proxy "knows" whether the site requires SNI.
The IP address "185.199.108.153" is seen while sniffing the network.
What is the domain name.
This site does not require SNI. Visiting the address and/or examining the certificate will not reveal the domain name.
Consider that sniffing the SNI from a TLS packet requires no guesswork and no additonal lookups. It is designed to be high throughput and used in making routing decisions. The question is not whether solving this hypothetical is possible. Of course the answer is, "Yes, it is". Rather, the question is whether solving it requires no more work than sniffing the SNI from a TLS packet. To use bawolff's words, "... people can just look at IP, so it does not really matter."
Also, remember, the comment I made is about privacy from "tech" companies, resistance to surveillance capitalism by American companies, not censorship in China.
> The IP address "185.199.108.153" is seen while sniffing the network.
Its a github page. I mean you're right, in the case where a domain is using a wildcard cert, and the subdomain is the sensitive part, then esi is the critical leak (assuming the adversary is not sniffing your dns, poisioning your dns, or you are using DoH).
Its a minority case, most sites do not fit into this bucket, but it is a case where you are right.
One cannot get that domain name by looking at the certififcate. The Censys scan for the IP address fails to list it. Passive DNS sources fail to list it. And the name in the reverse DNS (PTR RR) is cdn-185-199-108-153.github.com.
echo -e "GET / HTTP/1.1\r\nHost: about.censys.io\r\nConnection: close\r\n\r\n" \
|openssl s_client -connect 185.199.108.153:443 -tls1_3 > 1.htm
firefox ./1.htm # or whatever the preferred browser. I use a text-only one
I do not use recursive DNS when making HTTP requests for web pages. I fetch DNS data in bulk at selected intervals and store it. The IP address for about.censys.io is loaded into the memory of a localhost forward proxy. The only DNS request before the HTTP request for the about.censys.io page is over the loopback to an authoritative DNS server. It returns the localhost address of the proxy.
Web browsing history or at least a list of domains visited is something that not only a "nefarious" person or "adversary" would find useful. The data has commercial value to companies that are actively trying to destroy user privacy. For example, SNI hands ISP's and intermediary "tech" companies this data on a silver platter. People endlessly debated encrypting DNS. Meanwhile the same people are still sending domain names in the clear, thanks to SNI. The companies behind QUIC do not see that as a problem.
I do not agree that "Its a minority case". For example, if we take all the sites submitted to HN as an example, the sites using CDNs will of course present an ambiguity problem on the scale of the Github case and the ones not using CDNs, IME, do not normally work without hostnames nor do they consistently reveal the correct hostname in the certificate. The certificate will often list several names. As such, I cannot take a list of IP addresses for all those sites and quickly, reliably transform them into the domains submitted to HN.
The convenience of SNI for conducting surveillance of domain names visited is not paralleled by trying to convert IP addresses to domain names.
Which appears to be implemented as a github page unless i am mistaken.
> One cannot get that domain name by looking at the certififcate.
Really, because this is the certificate i get when i view that domain (with correct SNI):
https://crt.sh/?id=6682474444
Its right there in the common name field.
Unless you mean that its not in the default cert for that IP, which sure, but in the scenario where we are evesdropping on SNI, if we couldnt do that, why not just evesdrop on the certificate itself. It should be just as easy.
I'm not sure what your point is here. If this was a foo.github.io page and github had a wildcard cert on *.github.io it would support your position. But about.censys.io is not doing that. This is an example where you need SNI to choose the correct certificate as its being hosted on a cdn hosting many different sites.
> I fetch DNS data in bulk at selected intervals and store it
That's cool and all, but that's not how normal people use dns, nor is it likely that that will become common, so its useless when talking about internet privacy. If doing something totally abnormal is a valid solution, we might as well just say everyone should use tor.
> Web browsing history or at least a list of domains visited is something that not only a "nefarious" person or "adversary" would find useful. The data has commercial value to companies that are actively trying to destroy user privacy
When people say nefarious, that is one of the groups they mean. Heck, this is almost a dictionary definition of nefarious.
> The certificate will often list several names.
It is very rare for the subject-alt-name to be for unrelated sites, all hosted on the same ip, and not require an SNI to select the correct certificate. So rare, that i challenge you to actually find a real example.
Does it indicate the domain name the user intended to visit, about.censys.io, as would SNI.
SNI reveals more than "Its a github page" or a list of sites that are "related", i.e., hosted on the same IP address.^1
SNI provides the network observer with the exact domain name that a user intended to visit. Aside from faking SNI, domain fronting, or similar tactics, the results are reliable.
IP addresses do not indicate the exact domain name that a user intended to visit. Methods used to try to guess the domain name are unreliable.
A list of remote IP addresses observed on the wire is not the same as a list of domain names observed on the wire as servernames (SNI).
The former requires more work than the later and produces ambiguous, unreliable results.
1. A network observer could look for Subject Alternative Names in certificates passing over the wire in cleartext and try to guess which domain name the user intended to visit. However this will not work for TLS1.3 because the certificates will be encrypted.
> If bawolff knew the domain name, why did he not give that as the answer to the hypothetical. Strange.
You misunderstand me.
My point is that there are two cases - the one where the SNI reveals something useful but you also need some sort of SNI because the server serves lots of things and needs to know which you are requesting, and the one you dont need SNI but its trivial to figure out what domain regardless. Your example is an instance of the former. You can't get rid of the SNI in this case because the web server at the other end depends on it to function.
So its a bad example for your purposes because the example neccesarily depends on the SNI existing to exist. If your proposed solution of removing SNI was adopted the site would not work, so removing SNI does not increase privacy in this case as the site would neccesarily have to adjust how it works in a waythat removes any of the privacy gains.
(You could probably say ESNI, but if so, its already in progress albeit slow, so i still dont see your point)
The site does not support TLS1.3, so the certificate retrieved above can be sniffed for SANs on the wire. However as above it does not list all the sites/endpoints.
If we try reverse DNS, we get 3e8.org. Passive DNS data shows 3e8.org and the subdomains www and mail.
One of the other endpoints at this IP is api.call-cc.org.
It is possible to find that name by searching the IP at censys.io.
Despite hosting multiple sites, SNI is not required. Hence I do not send api.call-cc.org in plaintext over the wire.
If the user sent SNI, the network observer has no work to do. She can see the exact domain name the user is accessing.
But without SNI she has to do work. She has to figure out if the user is accessing 3e8.org, www.3e8.org, api.call-cc.org or some other site.
Of course, with some detective work, this is possible. But it is not as easy as sniffing SNI.
By not sending SNI the user is not handing over a comprehensive list of every domain name accessed to ISPs, "tech" company intermediaries or others sniffing network traffic, as she would if using a "modern" browser to send HTTP requests directly to IP addresses.
One of things I like about Cloudflare's ESNI is that it is really fast. I am looking forward to the next iteration of a solution to the SNI privacy leak.
Anyone who thinks the SNI privacy "does not matter", who has no issue with handing over a comprehensive list of every domain name accessed to any party who is sniffing the wire, is encouraged to contact the folks working on encrypting SNI and tell them to stop. :)
> Despite hosting multiple sites, SNI is not required. Hence I do not send api.call-cc.org in plaintext over the wire.
And how do you verify that a man in the middle attack is not in progress if the server is not serving the correct certificate?
I appreciate that active attacks are a bit harder than passive listening, but they are still rather trivial. You are proposing making figuring out the domain name slightly harder in exchange for allowing the entire connection to be eavesdropped.
This seems like an incredibly bad privacy trade off.
The person operating api.call-cc.org is the same person who operates 3e8.org. Using the default certificate, with the SAN 3e8.org, is fine.
For recreational web use, where I am not using a grahical browser and I am in fact sniffing the traffic myself, the tradeoff is acceptable. The probability of someone on path sniffing every domain name sent in the clear over the wire is high, IMO. It is too easy. I would bet on it. Of the sites that do not require SNI, I am using the default certificate. I am not particularly concerned about the person who controls the default certificate at that IP address being able to see the traffic for all the sites hosted at that address. The large CDNs do require SNI. Generally the ones that do not require SNI are at IP addresses that host only a small number of other sites. The only domain names this person can observe are the ones sent to _that IP address_. The person on path sniffing SNI would can see the domain names sent to _every IP address_.
When using the web recreationally, for noncommercial purposes, I cannot see the point of going through the trouble to encrypt the non-confidential contents of web pages and at the same time expending no effort to not send domain names in the clear, and to encrypt SNI where possible. To me, comprehensive recretional browsing history _is_ worth encrypting, perhaps even more than the public web page contents. This is no different than HN users who wish to avoid "smart TVs" that log every program that their owners watch.
And for anyone who believes that this evil person controlling the default certificate may be modifying the contents of web pages, then I can easily compare the contents to the same pages retrieved from Internet Archive or Common Crawl.
Perhaps it is helpful to clarify what I mean when I write "SNI is nor required" or "SNI is required". The language I use may not be the same as that used by web developers or people at large commercial CDNs that are trying to influence "upgrades" of traditional internet protocols that were originally developed by people at universities.
What I mean by "SNI is not required" is that I can send an HTTP request over TLS without SNI and succesfully retrieve the resource I specified in the HTTP method line and Host header, e.g., I send
GET /5/doc/index.html HTTP/1.1
Host: example.com
Connection: close
and I receive index.html.
Whereas if I do not receive index.html unless I also send SNI, then, in the language I use, "SNI is required".
In the 173.230.137.156 example I chose, I can still retrieve /5/doc/index.html from api.call-cc.org regardless of whether I send SNI. Yes, the api.call-cc.org domain name does have its own certificate. It does not matter. I can send no SNI, Ican send SNI "example.com" or I can send SNI "3e8.org" and still retrieve /5/doc/index.html in every case (using the certificate for 3e8.org for the public key, etc., the default certificate sent by the httpd at IP address 173.230.137.156). In this example, the operator of the httpd is not checking the SNI against the Host header.
The reader may wonder about certificate verification on the client side. For example, if a "tech" company-sponsored browser detects that the domain name specified in the Host header does not match a domain name specified in a certificate it may refuse to send the HTTP request. For many years, applications using SSL/TLS often failed to do certificate verification correctly or did not do it at all. Today, browsers sponsored by "tech" companies can do certificate verification. However they are not the only programs that can do it.
I use a localhost forward proxy to do certificate verification not a graphical web browser. This is because most HTTP requests I make are (a) noncommercial and (b) done with commandline utilities that do not support graphics and do not do certificate verification (nor SNI). "Noncommercial" here means things like banking, shopping and so forth. For commercial uses such as those, I use a "modern" graphical browser.
In some ways, but not others, the phrase "SNI is required" is like the phrase "Javascript is required". I routinely make successful HTTP requests to sites whose web developers claim "Javascript is required". To initiate HTTP requests I use commandline utilities that do not support Javascript. Those requests are not over the wire, they are sent to a loopback address. Only the forward proxy sends HTTP requests and receives responses over the wire. To read HTML or consume other media types, I use a text-only browser or some other program. I have no trouble retrieving media from these sites. Neverthless, there are HN commenters who would still argue "Javascript is required".
A phrase can have different meanings to different people.
Perhaps it is helpful to clarify what I mean when I use the phrase "SNI is not required" or "SNI is required". The meaning of those phrases to me may not be the same as the meaning of those phrases to web developers or people at large commercial CDNs that are trying to influence "upgrades" of traditional internet protocols that were originally developed by people at universities.
What I mean by "SNI is not required" is that I can send an HTTP request over TLS without SNI and succesfully retrieve the resource I specified in the HTTP method line and Host header, e.g., if I send
GET /index.html HTTP/1.1
Host: example.com
Connection: close
without SNI and I receive index.html then, to me, "SNI is not required".
Whereas if I do not receive index.html unless I also send SNI, then, to me, "SNI is required". No one should interpret this phrase to indicate I do not understand the purpose behind SNI and why it exists. Nonetheless, I think the phrase is consistently misinterpreted.
In the 173.230.137.156 example I chose, I can still retrieve /5/doc/index.html from api.call-cc.org regardless of whether I send SNI. Yes, the api.call-cc.org FQDN name does have its own certificate. But I can send no SNI, I can send SNI "example.com" or I can send SNI "3e8.org" and in every case I can still retrieve /5/doc/index.html (using the certificate for 3e8.org for the public key, etc., the default certificate sent by the httpd at IP address 173.230.137.156). In this example, the operator of the httpd is not checking the SNI against the Host header.
Thus, to me, "SNI is not required" for this website. It does not matter whether there is one site hosted at the IP address, a handful of sites hosted or thousands of sites hosted. If I can get the media without sending SNI, then I do not send SNI. For anyone on-path who wants to know what sites internet users are visiting, it is none of their business what I have put in the Host header. That is why the Host header is encrypted. It is why TLS1.3 encrypts the certificate. And it is why ESNI/ECH encrypts the SNI as well. Not to mention it is why people _try_ to encrypt DNS. No one needs to know the sites a www user is visiting except the websites themselves.
The reader may wonder about certificate verification on the client side. For example, if a "tech" company-sponsored browser detects that the domain name specified in the Host header does not match a domain name specified in a certificate it may refuse to send the HTTP request. For many years, applications using SSL/TLS often failed to do certificate verification correctly or did not do it at all. Today, browsers sponsored by "tech" companies can do certificate verification. However those are not the only programs that can do it.
I use a localhost forward proxy to do certificate verification, not a web browser. This is because most HTTP requests I make are (a) noncommercial and (b) done with commandline utilities that do not support graphics and do not do certificate verification (nor SNI). "Noncommercial" here means free, recreational activities, and excludes things like banking, shopping and so forth. For commercial uses, I use a "modern" graphical browser.
In some ways, but not others, the phrase "SNI is required" is like the phrase "Javascript is required". I routinely make successful HTTP requests to sites whose web developers claim "Javascript is required". To initiate HTTP requests I use commandline utilities that do not support Javascript. Those requests are not over the wire, they are sent to a loopback address. Only the forward proxy sends HTTP requests and receives responses over the wire. To read HTML or consume other media types, I use a text-only browser or some other program. I have no trouble retrieving media from these sites. Neverthless, there are HN commenters who would still argue "Javascript is required".
A phrase can have different meanings to different people.
Since that we're talking practicals, let's look at China's answer: just block the IP, we already knew what's behind it anyways. Besides, in an SNI-less world the censors would just visit the IP address - that's not significantly different on today's checking of SNI. Besides, some email filtering now also checks the IP address' associated domains, which not just includes the rDNS (that's easy) but also using a DNS datafeed that shows which domains point to that IP.
In fairness - counterexample: at various times china has blocked zh.wikipedia.org, but doesnt want to block en.wikipedia.org as that makes them look draconian to visiting foreigners. Both sites are on the same IP with same wildcard cert.
Their options are basically only SNI sniffing or DNS poisioning.
> In fairness - counterexample: at various times china has blocked zh.wikipedia.org, but doesnt want to block en.wikipedia.org as that makes them look draconian to visiting foreigners.
While I'll accept this as a hypothetical, I've just verified this one since I remember Wikimedia has a rather complicated structure (https://wikitech.wikimedia.org/wiki/Data_centers), and I can verify that in the US you'll see an Let's Encrypt-issued wildcard certificates, while the Singaporean datacenter sends Digicert certificates that is apparently separated (except for desktop and mobile one - so for example accessing en.wikipedia sends you en.wikipedia and en.m.wikipedia), and the European ones (if my understanting is the current setup) uses Globalsign-issued certs, so probably they're dealing with censorship in a methodical matter.
> if SNI is not needed, than the ip address is neccesarily unique
I think your are confused about what SNI does. SNI is not part of the request, and it does not indicate to the server which website is being requested (this is the HTTP Host header, still in use as a pseudo-header in HTTP/2 and HTTP/3).
SNI is part of the response from the server back to the client. The idea is that if an attacker somehow poisoned your DNS to point the domain of one legitimate site (for example, paypal.com) to another (let's say pastebin.org) - your browser would notice the mismatch and refuse to load the page or send any potentially sensitive data to the server.
Furthermore - there is no need for it to be sent in the clear. Cloudflare in particular has thrown their weight behind a proposal/protocol extension called ESNI (Encrypted SNI), in an effort that truly turned around my opinion of the company.
This is largely incorrect. SNI is used to tell the server which domain it needs to send a certificate for (i.e. to securely prove it owns that domain). You are right that it is separate from the HTTP host header, but in practice they are always the same because (in lieu of other information) the client wants assurance that the server owns a valid certificate for the HTTP host it is requesting to.
What makes SNI complicated to secure is that until the client has securely established the server's identity, it could be a MITM. But to prove the identity to the client, the server needs to know which proof the client needs.
Alternative solutions like "send certificates for all domains" have other issues such as performance, scalability and trivially leaking to anyone the full list of domains served.
ESNI is clever, but more complex to deploy. It's only in recent years that HTTPS has become widely used at all, and that's mainly thanks to deployment becoming largely trivial via ACME and free via Let's Encrypt.
Not in the request but as part of connection establishment (ClientHello). At that point in time the protocol (eg http/1 or /2 or something completely different) isn’t even determined - and so no [http] request exists yet.
> SNI is part of the response from the server back to the client. The idea is that if an attacker somehow poisoned your DNS to point the domain of one legitimate site (for example, paypal.com) to another (let's say pastebin.org) - your browser would notice the mismatch and refuse to load the page or send any potentially sensitive data to the server.
Others have mentioned that this is incorrect - i just wanted to say i think the confusion came in here because the acronym SNI (Server name indicator) is very close to the acronym SAN (subject Alt Name), which is probably what you were thinking of.
Computer people and our acronyms are impossible to keep straight.
> Before a client can create a CurveCP connection, it needs to know (1) the server's long-term public key and (2) the server's address
That (1) seems wildly impractical? How would you distribute, securely, privately, anonymously and generally ( for the whole planet) the public keys? The only way i can think of is something like the HSTS preload lists which is quite centralised ( per browser), or DNS. Also it's IPv4 only, which is only kind of excusable for "last version 2011".
I don't see a way around either SNI-like mechanism or a pre-shared key for hosting multiple sites on the same IP, unless everyone moves to IPv6 (which would still be annoying for web hosting, to have an IP per domain).
Edit:
> The client and server administrators don't meet secretly to share an encryption/authentication key. The server has a long-term public key S visible to everyone; the client uses this key S to encrypt data for the server and to verify data from the server
So the wording is a bit unclear, the client needs to know the key, but the key is visible to everyone.
Sure, but then you're blocking the user's DNS server, rendering the entire internet unusable for them, not just one specific site. That was always possible with or without deep packet inspection.
Well, if you're using something like 1.1.1.1 for your DoH, you don't really need to inspect SNI, I don't think that there're other domains on that IP address.
Yes, blocking stuff is the ultimate way and you can't really deal with that. You can just block any ECH (former ESNI) traffic as well.
Encrypting domain is a way to prevent someone learning things about your traffic. It's not a way to stop blocking. I'd say it's a way to provoke more blocking.
It might sound strange, but to reduce blocking you should remove encryption altogether. Usually censors apply censorship on a specific web pages. So if you're browsing wikipedia without encryption, censors can block that specific article, but rest will be available to you. With encryption the entire wikipedia should be blocked just to restrict access to a single article.
BTW you can remove encryption but keep authentication. TLS (at least previous versions) allowed for that. Theoretically it'd allow censors to apply their censorships, but it'd prevent them from changing content. I think that in the future people will think more about that concept.
So you suggest that we give up all privacy (including the privacy of all our online passwords), and lose access to every page that a government censor might take issue to, just to prevent the hypothetical problem of the government over-blocking pages that it doesn't strictly need to?
Actually I think that over-blocking would be good, as it would encourage more people to use Tor and VPNs, it would encourage more negative public opinion against the government spies/censors, and it would act as a form of economic sanctions against the regime because their citizens would get less value from the internet than they would without the over-blocking.
Seems like both HTTPS and HTTP/3 can be censored by looking at the Server Name Indication (SNI)? The report says that’s exactly what Russia is doing today. Then I’m confused by this statement
> So, while it is technically still possible to censor HTTP/3 connections based on the SNI in TLS, the collected data shows that hardly any censors actually parse and use this information (even when they do parse the SNI in traditional HTTPS traffic).
Most of these countries would use censorship software developed elsewhere. Once this software becomes capable of reading the SNI in HTTP/3 traffic, seems like we would return to the status quo?
> Most of these countries would use censorship software developed elsewhere. Once this software becomes capable of reading the SNI in HTTP/3 traffic, seems like we would return to the status quo?
Yes. Outside of China countries tend to purchase off-the-shelf middleboxes similar to those a typical US corporation or school might use for this purpose. Actual functionality is not hugely important for these products, the sales person's job is to persuade somebody who controls the purchase decision to give them money, and the actual product capabilities are moot.
The problem for such products isn't reading SNI out of an HTTP/3 stream but successfully switching off matching streams. For TLS all they did is shut down the TCP connection on a match. But there is no connection for HTTP/3 because QUIC is a connectionless protocol. So you just get a bunch of UDP packets and now it's your job to figure out which ones to drop as part of your "legitimate network security product".
So you can expect "There's an off switch for HTTP/3" plus existing per-network customisations to be the extent of their work for the foreseeable future. If the customer mentions HTTP/3 point them at the switch. Remember that (unlike China) these programmes aren't really focused on practical censorship, because the real goal was achieved once the purchase order paperwork was signed.
> But there is no connection for HTTP/3 because QUIC is a connectionless protocol.
True. Worth pointing out that RFC9000 (QUIC) has deliberate and considerable counter measures against network flow analysis and ossification, in general. Side-effect of that is the middleboxes can't tell one QUIC datagram from another. Almost everything relevant to the middleboxes is either encrypted, obfuscated, or calculated off-band.
The masque-wg at IETF are at it, tightening the noose up further.
QUIC is connection-ful to its participants, and connection-less to any Mallory snooping in. QUIC enforces the end-to-end principle.
> The problem for such products isn't reading SNI out of an HTTP/3 stream but successfully switching off matching streams. For TLS all they did is shut down the TCP connection on a match. But there is no connection for HTTP/3 because QUIC is a connectionless protocol.
This is not true. QUIC is a Protocol that features a stateful connection. On top of that connection multiple streams are multiplexed. The SNI information is transmitted as part of the TLS ClientHello - which is transferred on connection establishment (before any streams and anything related to HTTP/3 exists).
QUIC is running on top of a connectionless protocol (UDP) - but that doesn’t make it connectionless only itself. Same with TCP - it isn’t connectionless just because it runs on top of IP.
Sure, that wording was misleading. For the two peers using QUIC there's a connection, but for the middlebox no discernible connection exists, and so it can't shut down that connection.
Censorship resistance? It's the exact opposite despite the efforts.
When you include CA based TLS in the protocol and there's no option for clear text, combined with no human person in existence running a recognized CA, what you get is complete non-human control of the entire web. It becomes no longer possible to host a visitable website without getting (temporary) approval from some CA. And guess what? If that CA isn't already a government entity, it sure can be influenced by them and other outside forces. Protocol baked in CA TLS is very dangerous for human persons despite how great it is for corporate persons.
CA based TLS isn't included in the protocol. The UA is free to decide how to trust certificates. If you can come up with a better model than the CA model, QUIC supports your model without changes.
It is required to send a sni in clear at every connection? In theory, after the first connection, you should be able to just fully encrypt the entire packet by using the certificate (public key) of the website.
If that is the case, is it possible to "preload" a list of keys into the useragent to avoid having to pass the SNI in clear? I understand that is not always practical, but can it help against censorship?
> you should be able to just fully encrypt the entire packet by using the certificate (public key) of the website
The public key is not used (unless you are doing RSA key exchange which is terrible and you should stop, this is no longer possible in TLS 1.3) to encrypt anything.
Modern schemes, including TLS 1.3 and thus HTTP/3 do ECDH key agreement and then the public key is used for signatures to authenticate your peer.
What about the privacy issues raised by SNI.^1 Why have the "tech" company sponsors of QUIC failed to address this problem. Instead they actually wanted to _require_ SNI.^2 There is zero attention to user privacy in the development of QUIC. The protocol increases surveillance and reconnassiance capabilities of the "tech" company sponsored web browser, and user privacy against "tech" companies is completely ignored. Unlike QUIC, at least the TLS project is working on a solution to the SNI problem. Cloudflare offers one of the early solution protoypes, ESNI. I have been using it for several years now with no problems.
In CurveCP, a predecessor of QUIC (and possibly the inspiration for QUIC), SNI is not required. Multiple sites can be securely hosted on the same address.^3 Both TLS1.3 and thereby QUIC use encryption developed by the author of CurveCP. The thinking of the companies behind QUIC regarding available technology seems to be "If it benefits "tech" company strategy then use it. If it only benefits users but not "tech" companies, then ignore it."
1. TLS worked before SNI was introduced, and the web worked before AWS and their ilk existed. There are still many, many, many websites that use TLS but no not require SNI. If one only sees the web as a handful of CDNs, then it is easy to ignore these other TLS sites. However they exist, regardless of whether CDN devotees choose to acknowledge them. And no, trying to perform reverse DNS lookups on IP addresses in real-time is no substitute for grabbing cleartext domain names from SNI. It requires more effort and does not work consistently.
2. https://github.com/quicwg/base-drafts/issues/794 (intersting coicidence: group member arguing for SNI mandate works for a CDN)
3. https://curvecp.org/addressing.html