More than 50% of the websites found through HN at present do not require SNI. Do you think this is also true of the https-enabled www as a whole?
SNI was introduced to address the problem of shared hosting with respect to SSL/TLS. In the not-too-distant past, SSL/TLS required a dedicated IP address for each HTTPS-enabled website. One certificate per IP address.
There are still many, many websites that satisfy that requirement.FN1 I know this because I use unpopular customised HTTPS clients and have SNI disabled by default. I only enable it when necessary; I have automated this to happen automatically if I receive an SNI error.FN2 Most of the time, I get no such errors.
FN1. I am tired of articles which ignore this fact. If one assumes that all HTTPS-enabled websites require SNI, then one is making an incorrect assumption. For users, the current SNI implementation (sending hostname in plain text) is a problem; people are trying to fix it, as the blog mentions. For website owners and hosting providers, current SNI implementation may not be a problem. I am viewing SNI from the user perspective. I respect other perspectives.
FN2. Most of the sites requiring SNI are Cloudflare sites. As the blog mentions, unless the customer pays additional fees to use their own certificate, Cloudflare acts as a MITM and the user only sees Cloudlfares certificates, not the target websites. This is HTTPS as between the user and Cloudflare, and then (hopefully) between Cloudflare and the target website. It is not HTTPS between the user and the target website.
Edit: Note, FWIW, I do not use third party DNS and in fact do not use recursive DNS at all. I retrieve, store and update DNS data and serve it from localhost bound authoritative nameservers. (I also encrypt DNS packets with CurveDNS in the home lab and with the few remote servers on the intenret that support DNSCurve.) I wrote a custom "non-recursive" stub resolver (for speed not privacy) that outperforms a cold recursive cache. It does not query root servers and minimizes queries to TLD servers and remote authoritative nameservers as it learns from previous answers and updates constant databases and zone files accordingly. Over time, it sends less queries than a recursive cache.
> the current SNI implementation (sending hostname in plain text) is a problem
1. Even if you don't do SNI, the hostname is on the certificate that the server sends.
2. For most users, the domain name already leaks in the DNS request, does it not? That is, SNI provides no less security.
3. I have no idea what the status of WPAD is these days, but if you can convince a client to use a WPAD file, IIRC, you can leak all the domains they connect to, include HTTP(S) sites.
4. For any IP address that doesn't require SNI, even if the certificate didn't give away the site: the IP address is sent in the clear at the IP layer; it should be pretty easy for an adversary to build up a mapping from IP->DNS for the vast majority of popular domain names. Alternatively, the attacker could just form a connection to that IP and see what site responds.
While it's certainly a privacy leak, it doesn't appear to me that the solution is as simple as turning off SNI.
> 1. Even if you don't do SNI, the hostname is on the certificate that the server sends.
Not necessarily, think about things like
*.tumblr.com or *.blogspot.tld
And while for example Cloudflare's free TLS certificates are not wildcard certificates, they bundle dozens of unrelated sites in a single certificate.
The DNS remark still applies, since wildcards themselves can't be cached by a caching DNS server, only individual instances of a wildcard can be cached (as usual). However, with a caching DNS server there will likely be many fewer DNS requests that can be observed compared to HTTP requests.
> 2. For most users, the domain name already leaks in the DNS request, does it not? That is, SNI provides no less security.
It does but, in many cases, that DNS request/response is not traversing the Internet. At $work (ISP), ~95% of our customer's DNS requests (and the associated responses) originate and terminate within our network [0]. They would not be visible to a passive attacker listening to, say, our upstream network links.
> 4. For any IP address that doesn't require SNI, even if the certificate didn't give away the site: the IP address is sent in the clear at the IP layer; it should be pretty easy for an adversary to build up a mapping from IP->DNS for the vast majority of popular domain names.
This is mitigated by the fact that most (public) web servers host more than one web site. While you may have hundreds of A RRs pointing towards a single IP address, there's usually only a single PTR RR for a given IP address (and the PTR often resolves to something like "www14.example.com", giving away no additional information about the sites it hosts).
Just by listening passively, you could obviously determine that a user visited a web site hosted on 198.51.100.42. If there are 100 web sites hosted on that IP address, however, you know that the user visited one of them but not necessarily which one.
> Alternatively, the attacker could just form a connection to that IP and see what site responds.
I'm not sure if it still is but it used to be common practice (on servers hosting multiple sites) to configure a "default virtual host" that simply served up a blank page. This site would only be served up when connecting directly to the IP address (i.e., with a missing Host: header in the HTTP request) and would provide no useful information.
In my experience, though, a web server hosting a single site is often configured to respond with that site whether the correct Host: header is sent or not.
[0]: This is for customers who use our caching, recursive DNS servers. It obviously does not apply to those who use, e.g., Google's public DNS servers, or other DNS servers outside of our network.
"1. Even if you don't do SNI the hostname is on the certificate that the server sends."
What if the certificate contains an IP address, instead of hostname(s)?
This can alleviate reliance on (third-party controlled, e.g., ICANN) DNS. The blog hints at this reliance (i.e. unencrypted DNS) as an inherent weakness of HTTPS.
None of this was really unknown, but perhaps not considered enough.
The article is simply making the point that HTTPS, by itself, doesn’t mean you can stop thinking about security & privacy. But I would have thought most people knew that already...
Well, if your site is already running through Cloudflare, and uses a number of third party javascripts, the surveillance introduced by accessing said site over insecure wifi is probably not your biggest worry.
The Googles, Facebooks and CDNs are known to capitalize that surveillance while your local pub might or might not. Describing that as a "limitation" of TLS feels like a bit of a stretch. TLS is not really designed to counter passive surveillance.
SNI was introduced to address the problem of shared hosting with respect to SSL/TLS. In the not-too-distant past, SSL/TLS required a dedicated IP address for each HTTPS-enabled website. One certificate per IP address.
There are still many, many websites that satisfy that requirement.FN1 I know this because I use unpopular customised HTTPS clients and have SNI disabled by default. I only enable it when necessary; I have automated this to happen automatically if I receive an SNI error.FN2 Most of the time, I get no such errors.
FN1. I am tired of articles which ignore this fact. If one assumes that all HTTPS-enabled websites require SNI, then one is making an incorrect assumption. For users, the current SNI implementation (sending hostname in plain text) is a problem; people are trying to fix it, as the blog mentions. For website owners and hosting providers, current SNI implementation may not be a problem. I am viewing SNI from the user perspective. I respect other perspectives.
FN2. Most of the sites requiring SNI are Cloudflare sites. As the blog mentions, unless the customer pays additional fees to use their own certificate, Cloudflare acts as a MITM and the user only sees Cloudlfares certificates, not the target websites. This is HTTPS as between the user and Cloudflare, and then (hopefully) between Cloudflare and the target website. It is not HTTPS between the user and the target website.
Edit: Note, FWIW, I do not use third party DNS and in fact do not use recursive DNS at all. I retrieve, store and update DNS data and serve it from localhost bound authoritative nameservers. (I also encrypt DNS packets with CurveDNS in the home lab and with the few remote servers on the intenret that support DNSCurve.) I wrote a custom "non-recursive" stub resolver (for speed not privacy) that outperforms a cold recursive cache. It does not query root servers and minimizes queries to TLD servers and remote authoritative nameservers as it learns from previous answers and updates constant databases and zone files accordingly. Over time, it sends less queries than a recursive cache.