Hacker News new | past | comments | ask | show | jobs | submit login
TLS Everywhere, not https: URIs (2015) (w3.org)
207 points by schallertd on Aug 8, 2016 | hide | past | web | favorite | 91 comments

Users/user agents need to know whether to expect a connection to be secure. Unfortunately, you can't necessarily trust any random link you follow to reliably tell you. If I can get you to use HTTP when you should've used HTTPS, I might be able to sniff your traffic. If I can get you to use HTTPS when you should've used HTTP, it might be a DoS.

Incidentally, this is the same problem as public key distribution. You need a trusted channel to receive public keys, and a trusted channel to know whether to use a public key. Why can't these be the same channel? Right now we have HSTS preloading[1] for the latter, but in that case why not preload certificates (or hashes thereof) too?

Then we can finally cut out the middle-men and realize the truth: that the browser is the ultimate certificate authority.

[1] https://hstspreload.appspot.com/

> If I can get you to use HTTP when you should've used HTTPS, I might be able to sniff your traffic

That is not worst case scenario. If someone can force http, they can also inject malicious code into the stream and do anything from bank transfers to create botnets. With the worst case scenario of always https being DoS, and worst case scenario of allowing http is code injection, I would prefer deprecating http in favor of https.

There are a few use-cases for standalone unencrypted HTTP. The two big ones:

• HTTP is redundant and costly when you're already in some other tunnel: a pre-negotiated IPSec tunnel for port 80 traffic to a given peer (e.g. a load balancer to its backend); talking directly to an HTTP proxy sitting on the jump box you're VPNed or SSH tunnelled into; etc.

• HTTP is actually a great wire protocol for non-networked RPC, such as between Nginx and your application server, running on the same box, over a Unix socket. FCGI, WSGI, etc. are just half-assed implementations of HTTP; you may as well just use HTTP. (Though the non-front-of-line-blocking benefits of HTTP2 RPC would be even better here, for green-threaded runtimes that can C10K.)

I do agree, though, that unencrypted HTTP can likely be deprecated for web browsers. The browser-addressible web is really a pretty strictly-bounded subset of the web as a whole, and we should strive to make it safe to browse.

That being said, such statements put me in mind of a future where your browser literally is not allowed to talk to all those old servers from 1997 that are still hosting whatever they were hosting back then. Instead, all requests for those "legacy" domains that nobody's updating any more would have to go to some trusted mirroring site served over HTTPS, like the Internet Archive. (The spidering logic for such "legacy mirroring" would also have to be slightly different from today's "latest mirroring" logic: if the IA's spider got MITMed to see something else, it should "doubt" the new version based on how long the previous site endured without change, and if its confidence is low enough, just continue showing people the old version.)

Is that a future we want? I'm honestly not sure.

Incorrect on public keys. You do not need a trusted channel to receive a key. You could receive one via smoke signal, carrier pigeon, or billboard. Existing key distribution systems may or may not be encrypted, but the reason for encrypting the channel is far more to protect the interests of the requestor than the integrity of the key itself. That last is independent of the key distribution channel.

What matters is that the web of trust associated with that key is sound (that is, you have assurance that the key belongs to whom you think it does), and that the integrity of the private key has been maintained.

The first of those problems is difficult, but not intractable. The second problem is rather difficult, especially in the case of persistent data, though the core requirement is that the key was valid when a message was generated, if you're looking at the sender of information. For your own information, you are relying on the recipient to maintain integrity over their private (decryption) key going forward, such that the data you'd transmitted remains encrypted against all others.

The first problem you point out, that any encrypted channel is not necessarily a secure channel, is valid, though given your misunderstanding on subsequent points I'm not sure how well that applies to this discussion (I still need to RTFA).

> Existing key distribution systems may or may not be encrypted, but the reason for encrypting the channel is far more to protect the interests of the requestor than the integrity of the key itself.

btrask wasn't saying that encryption is necessary for key distribution; he/she was saying that HTTPS guarantees identity and integrity, both of which are necessary to trust a key.

> What matters is that the web of trust associated with that key is sound (that is, you have assurance that the key belongs to whom you think it does), and that the integrity of the private key has been maintained.

That's a possible alternative to btrask's proposal, though you're equating "assurance that the key belongs to whom you think it does" with "web of trust". btrask's proposal is a special case of that, in which the web of trust is simply the sender.

> The first problem you point out, that any encrypted channel is not necessarily a secure channel

Correct, but not what btrask said. The first problem he pointed out was the fact that clients need to know whether a host expects secure communication before ever connecting to it.

> though given your misunderstanding on subsequent points I'm not sure how well that applies to this discussion

That's not very nice.

Clarifying my own post: I'm insisting that neither a trusted or an ecrypted channel are necessary.

I said that an encrypted channel could be used, and that it might not be, but that if used encryption would largely serve as a protection to the requestor, who might otherwise be subject to traffic and/or interest analysis based on the specific keys they requested, which could be presumed to be of interest, or signing keys (I'm thinking PGP protocol here) of keys of interest. Either piece of information would reduce search space for an Eve.

I'm not equating trust of keys to web of trust, I'm stating that in existing (PKI/PGP) protocols, that is the assurance mechanism. And it is independent of either trust OR encryption of the key delivery channel itself.

There seems to be a rather profound difficulty in distinguishing what I've said with what I've said btrask said. I'm not sure how I could be clearer, but I'm open to pointers.

You're both right. I only replied because you responded to points that btrask hadn't made, then claimed he/she misunderstood the topic.

I said trusted, not encrypted. I wasn't talking about private keys at all. I think I understand the issues involved. Thanks though.

And I still disagree on that point.

Maybe I misunderstand (though I also think I understand teh issues involved pretty well), or maybe one or the other or both of us are communicating poorly.

How would you distinguish a trusted, encrypted, and untrusted channels, say?

In the context of sharing public keys, I'd say you merely need authentication. Web of Trust being one possible mechanism. This isn't a particularly advanced topic.

Relevant to my original post, information about whether the connection should be encrypted also merely needs to be authenticated, not encrypted itself. Of course, the HSTS preloading site uses HTTPS (with encryption) because it's easy and why not.

Thanks. So re keysharing, authentication is a form of secure channel.

I'm reading the auth and channel as independent. Auth is something of a metachannel, perhaps.

Fair enough. :)

What if someone intercepts the carrier pigeon and swaps in a different public key of their own?

Then the signatures don't match, or the fingerprint is wrong. If you're relying on long-term data access, messages encrypted against or signed by the true key don't match. This is an area in which PGP and SSH differ markedly. PGP is used to encrypt and authenticate data which tends to persist, SSH data used only in session. While both can use long-lived keypairs, it's the PGP keys you're more likely to notice changing (though SSH cclients tend to report this happening).

Yes, that means verifying your keys, and probably through an out-of-band method.

Chrome does preload public key pins for large sites, not that it's the ultimate solution to what you describe.

Firefox also does this.

>> If I can get you to use HTTPS when you should've used HTTP, it might be a DoS.

Can you explain what you mean by this? Genuinely curious to know how it can lead to DoS?

One way would be to re-direct cacheable assets to HTTPS, thus foiling edge caching and increasing load on the origin server.

In general, caching is a big problem with the naive approach to "HTTPS everywhere." A mechanism to deliver signed cacheable payloads would be great, so that static assets etc. can continue to be edge-cached.

It would still need to be encrypted, or I could tell a lot about what you're doing on the site by looking at what cacheable resources you fetch.

Not everything is about privacy, and arguably privacy advocates have done a lot to harm our ability to have a trusted internet by conflating verification, encryption and anonymity.

The most annoying thing about HTTPS everywhere is that it ruins cacheability. This is a problem the distros solved ages ago by signing their content but acknowledging it's mostly pointless hiding it in transit.

But its absurd that in HTTP2 we have out of the box encryption, but we don't have a mechanism for doing authenticated caching.

> arguably privacy advocates have done a lot to harm our ability to have a trusted internet

We don't have a trusted internet. Not when a country on the other side of the world can mis-configure BGP and re-route all traffic through them. Not when our ISPs intercept and modify our traffic. Not when there are nearly 10x as many "trusted" root certificates as there are nations in our world.

The internet is the wild west, and we need to protect our computers from it. Currently, encryption is our best bet for doing that. If edge caching is a casualty, then so be it.

If someone can come up with a method for protecting content from end-to-end while keeping it secure against tampering and eavesdropping (because this too matters, both to us in the first world and the majority of others who are not), then let's start getting it put in place.

Agreed that unencrypted signed static assets provide a vehicle for activity monitoring.

Your statement can be misinterpreted to imply that merely by encrypting all traffic, such analysis can be prevented. There's plenty of metadata in a typical encrypted page load that can be used to do so.

For example, the view-discussion page might download three static assets, a js file, and two CSS files, one small and one large, whereas the post-comment page might load zero assets and js files, and one small CSS file.

Point being, making a privacy-protecting website takes careful planning even when fully encrypted. As such, it'd be great to have tools (such as signed content) available for performance optimization. Sure, naive usage might lead to attack vectors, but naive usage of HTTPS already leafs to many such attack vectors anyways.

That seems like a good idea -- a simple scheme where the browser validates every http response from a particular domain against a key specified in that domain's SSL cert (if the appropriate field is present in the https cert) -- seems like it would work well?

the only way I can think of is if the site doesn't support https

> Users/user agents need to know whether to expect a connection to be secure.

Why not expect it to be secure? Connect to https before http.

Behavior like that needs to come with a huge warning label.

It would be trivial for any man-in-the-middle to block https and server http.

This is exactly why browsers warn about such redirects. That said, this reminds me of a similar discussion on mail servers. There, STARTTLS sees much more use.

The main problem is preventing downgrade attacks. With mail it is easy to just remember the setting for every server. Not so with websites.

I've seen quite a bit of criticism of it for mail servers [1] because an attacker can simply block the 'STARTTLS' message and (many) clients will silently accept that.

[1] https://www.agwa.name/blog/post/starttls_considered_harmful

They could display that same "this page is not secure"-page that they display on broken certificates.

I'm not sure you can assume that the same URL with https will be the same content as at http. It could be an entirely different site that you may not have wanted.

That isn't really viable yet. Browser vendors could decide that they will introduce this functionality in a few years though.

IMHO, the feature would need to be implemented as some have suggested, by enabling any website to transmit securely or insecurely, but for the web browser to request a secure TLS connection first (trying HTTP and HTTPS to reduce incompatibilities) and if a website appeared to have issues, then try insecure connections. If an insecure page were to be served, the browser should indicate this with a broken padlock.

Furthermore, I believe that browsers should warn when any data is input, e.g. clicking items that cause JS calls or text is typed - this strict implementation is important. Single page JS applications have made it possible to send any input data via JSON, we cannot only warn the user on a form submission, since it would be very possible to capture details via AJAX. E.g. If I were impersonating an e-commerce solution, I could hope the user would not notice the padlock and use AJAX to send the data preventing any form submission warnings. This would be annoying for users when they were using such websites regularly, but this would be a good thing - pressuring websites that handle user inputs to act responsibly and use encryption via TLS.

what if you block the https request in some way? You can now force an Insecure connection.

A good solution for HTTP sites is to load the https version after first loading over http. If they have similar content, show a bar at the top of the browser with a message along the lines of "A secure connection is possible, click here to go to use the secure version of this page".

Then it would be good to remember this setting and always pull the HTTPS.

Hen and Egg, where do you get the secure browser from?

It's bundled with your OS. If you don't like the bundled one you can use it to download a different one using HTTPS.

The problem is of course that moving things from http: space into https space, whether or not you keep the rest of the URI the same, breaks any links to. Put simply, the HTTPS Everywhere campaign taken at face value completely breaks the web.

Tim Berners-Lee is certainly an authority in the area, but I (an amateur) fail to see any major problem here, let alone one that "completely breaks the web".

Can someone illustrate a use case where either this fatal link-breaking cannot be solved by a simple HTTP->HTTPS redirect, or any other scenario where the user is so much worse off?

In a way it is arguably a greater threat to the integrity for the web than anything else in its history. The underlying speeds of connection of increased from 300bps to 300Gbps, IPv4 has being moved to IpV6, but none of this breaks the web of links in so doing.

I'd venture to say that IPv6 probably wishes it had the traction that HTTPS Everywhere has...

The URL that's supposed to redirect to HTTPS is still vulnerable to MitM. It can be modified in transit to serve up the same data as the HTTPS URL, but in plaintext, and potentially with a different form action attribute, etc. There are different things that can help with that, but none of them universally protect privacy.

That would mean HTTPS is not necessarily an improvement security-wise, but that does not explain how it "completely breaks the web" by "breaking links to".

To be more specific, I'm referring to the "Don't break the Web" section in the article.

The problem with opportunistic TLS is that while it protects from wide-scale DPI it doesn't protect against MITM. Personally I think that the effort that would be put to implementing opportunistic TLS in all web servers and browsers would be better put to migrating all web applications to HTTPS only.

Agreed, and MitM is arguably a bigger threat than DPI to most people, especially if you use coffee shop wifi, etc.

It only breaks the web if you cut everything over from http to https. If you can serve both you don't have a problem.

This works fine if you use anchors without protocols in your html:

a href="//site.com/resource"

Some websites break if you try to access https:// - from my experience of using Https Everywhere.

It's a simple fix to whitelist the one website though. It doesn't break the Internet. It breaks sometimes, for some users, and is trivially fixable when it does break.

"Fundamentally breaking the internet", to me, is something that actually breaks the usability of the internet in a non-trivial-to-fix way for the end user where the end user isn't even in control of the fix. That's breaking the web.

Failing to support IE5 is "breaking the web" in the same way Https Everywhere breaks the web. In a way that is to be fixed on the user-end.

(Although sites that fail to serve over https:// should fix their site)

There's probably a technical reason I'm unaware of, but why are you allowed to have HTTP and HTTPS handled differently (besides then encryption portion)?

Technical reason: Multiple ports = multiple apps.

HTTP and HTTPS use different ports (TCP 80 and 443). You can run one web server application on the HTTP port serving content A, and a completely different web server application on the HTTPS port serving content B. With firewalls doing NAT port translation this could even result in HTTP requests going to a completely different machine than the HTTPS requests.

From a non-tech reason, there are some types of sites that the HTTPS content should never be available on a HTTP site. For example an online payment form. In such cases a sensible website will either disable HTTP entirely (and use a subdomain for secure content, rather than the top leve/ www. domain), or have a basic HTTP site that transparently redirects to the HTTPS version.

Servers can really do whatever they want, based on whatever they want. They can serve a different page to clients based on e.g. "Accept-Language": "en" might be a completely different page than "fr", rather than just a translation.

That's just how it happened. And historically, it was common to put only the "important things" behind TLS. That never really made sense from a security perspective, but it certainly saved CPU cycles.

I don't really like when I'm not able to read the news because HTTPS didn't want to collaborate with an unreliable connection.

Because no one ever prohibited it (ignoring issues of whether that is a good idea), and it's way too late to change it now.

I think its safe to say "it breaks the web" is from the user perspective, not the server.

It only takes 1 server to serve differing content through each protocol to break things for the user

Yeah, but that's a subset of "misbehaving servers break the web" as far as I can tell. HTTPS doesn't inherently cause this problem, it just provides one theoretical nucleation point for it.

Serving different things on https and http is completely to spec, though. It's not really a server misbehaving.

(I do agree that it's contrary to expectations, but I'm sure this doesn't stop sites from relying on it.)

Protocol-less URLs are usually a bad idea. They mean "use http or https depending on whatever the current page is", but there are very few situations where that is a useful way of making that decision. What you should do is specify https in the link, unless you know that the target site doesn't support https, in which case of course you have to specify http.

Protocol-less urls use the same protocol as the current page. This resolves nothing.

Which is good. With TLS variant you would have to ensure that anyway for security reasons or you open doors to injection attacks. Mitigated by keep alive with pipelining and socket reuse, but there.

It has nothing to do with the points made in TFA.

TBL's points here are well-made. In particular, there's the issue that security is a multidimensional probability field, not a binary state.

The questions of secure document transfer and/or interchanges are:

1. Am I talking to the party I intended to?

2. Is the communication free from third-party interception?

3. Is the message itself originated by the party I intended?

4. Are the contents of that message as originally intended by the author?

(Possibly more, but those strike me as the Big Four.)

There are various ways for this to fail, and there are different and independent assurances which can be afforded. I remeber the first time I heard phrases to the effect of "you can trust our secure webserver" in the context of commercial transactions, and cringed.

The present HTTP / HTTPS split addresses only a subset of these concerns, and few of them well, whilst breaking multiple elements of functionality.

I will note that TBL seems to be concerned over the expiration of old, previously-valid URLs. To that I can only say that this appears to be a lost battle. The duration of a contemporary URL is on the order of 40-45 days, I think from the Internet Archive. That's scarcely longer than an old-school Usenet post might be relied on to persist online, and suggests to me that perhaps the successor to Usenet is the Web, with origins and various archival services (archive.org, archive.is, the NSA, ...) providing robust storage needs to various audiences.

I find the HTTP/HTTPS "ring-fence" or "oil/water boundary", as he describes it, to be the most frustrating aspect of HTTPS, and recently wrote about it in some more detail (and how it might be fixed): https://alexcbecker.net/blog.html#towards-universal-https-ad...

What exactly is he proposing? Without some additional change, if I make a link to http://bank.com, any MITM can trivially force an unencrypted connection and somehow the user needs to notice (or be lucky enough to have HSTS know about bank.com).

I can see an argument for having DANE-like records include an HSTS instruction, but nothing like that is mentioned in the article.

5-10 years and http:// will be as rare as having a lottery jackpot. our kids kids will ask - what is this http? like our kids did with floppy disks...

5 years? Don't see any chance for extinction.

There are tons of non-HTTPS websites out there. A myriad of forsaken ones that are still running because no one had remembered to do anything to them, and a myriad of ones whose sysadmins just don't care about TLS at all.

A non-obtrusive "insecure connection" warning is probably going to happen quite soon, but I just don't see any chance of mass HTTPS migration besides the high-profile sites, newborn sites and geeks that would stand for the cause.

Otherwise, a pretty large fraction of the WWW is going to be lost.

I'm guessing in 10 years Chrome will refuse to connect to http hosts, given Google's track record of aggressively unilaterally deprecating and disabling web features they consider harmful.

if there was just one scheme for both and some kind of protocol upgrade needs to take place wouldn't it be easy to manipulate the connection and just doesn't let it take place?

I used to think HTTPS everywhere was overkill, then I discovered HTTP2/ H2. H2 has some massive performance improvements and I believe we have barely scratched the surface of what is possible e.g. once the connection is established there's no need for additional HTTP handshakes, massively reducing latency.

I'm also a big fan of Let's encrypt, democratizing SSL certificates is only a good thing.

Since most browsers now a days support SSL/TSL it's safe to redirect all traffic from http to the same url on httpS.

Off topic, but I'm a native english speaker, and have never heard this construction before:

"There follow some thoughts following many recent discussions of "HTTPS Everywhere" and points west."

What does "and points west" mean here?

I think you can interpret "and points west" as "and beyond."

I think it's based on the expression "all points west." Can't find a good source at the moment. "Points" is used as a plural noun there, not a verb. So, I think the expression "all points west" means "all locations to the west," and used metaphorically here to mean "all things beyond."

I have heard the phrase "points west" used when listing the places where a train is going. After listing the next several stops, they say "...and points west" to sum up all the stops the train will make that are too far away to be worth listing.

So I guess it's a metaphor for the indefinite beyond that makes sense to people who live on the east coast of the US and ever take the train.

It's okay; I'm a native English speaker and I had trouble following much of Berners-Lee's grammar and writing style. He should consider editing the piece.

I have no idea what "and points west" means in this context.

Yeah... His English is extremely hard to decipher (Native Brit here). I thought that the man that invented World Wide Web would write more eloquently...

If we're designing in hindsight...

Browsers should connect to port 80 and perform a GET for /tls-cert with an Accept: header listing all the certificate formats it knows and the applicable Host: header.

The server would respond with the certificate for that host-name and the browser would validate it. After that, the HTTP connection would switch into TLS mode using the key in that certificate.

If the server responds to this initial request with 404 or some other error, the browser either shows an error or continues in insecure mode.

(To complete my above comment...)

This would have the benefit of bringing that initial certificate negotiation outside of the TLS black-box. For years, TLS deployment was held back because you couldn't have multiple domains on a single IP, long after the problem had been solved for HTTP.

Later, SNI was added to TLS, but the change wouldn't be rolled out to Windows XP users (except Firefox which used its own TLS implementation).

By using HTTP, you'd have the Host: header right away and could even introduce new certificate formats by looking at the Accept: header. This sort of thing is built into HTTP but had to be retrofitted with much pain and anguish into TLS.

Delivering a TLS certificate over HTTP is useless as a MITM can simply substitute his own certificate.

Just like real-world TLS, the browser would validate the certificate before using it. TLS can be MITM'd too - if you can find a way around browser validation of the certificate.

Looks like ppl didn't listen. I saw more https movement this days. and surely enough, it breaks...lots of stuff.

as an ignorant user, I would just like to have a browser setting such as 'only talk to sites using TLS'. I would be happy to deal with the fallout. does chrome or other browser have that?

The EFF's 'HTTPS Everywhere' Firefox addon [1] has a setting to block all HTTP requests. This is not enabled by default [2], but when turned on, will exhibit the behavior you want.

[1] https://addons.mozilla.org/en-US/firefox/addon/https-everywh...

[2] https://www.eff.org/https-everywhere/faq#faq-When-does-HTTPS...?


Just in case, note that it is not against "TLS Everywhere", just against the separation of http: and https: schemes in the URI. TBL essentially argues that http: should upgrade to TLS without any change in the URI (I think that this also implies that, when TLS deployment is finally universal, http: should equal to today's https:).

If you set HTTP Strict Transport Security (which any site that believes it will continue to competently run SSL can and should do), it will implicitly upgrade all http:// URLs to https://, accomplishing the goal requested in this article without the security risks of optional/opportunistic encryption.

Only after you visit a domain.

Or if you preload with a browser vendor, which in all fairness doesn't scale in its current incarnation.

It's scaling well enough so far. A domain can be submitted at https://hstspreload.appspot.com/ and doesn't take long to show up in Chromium and then other browsers.

[a] If the ownership for the site's domain changes, how does the new owner undo the previous owner's preload?

[b] Is the only way to obtain the full preload list to extract it out of Chromium or Firefox source code? [1][2]

[1] https://chromium.googlesource.com/chromium/src/+/master/net/...

[2] https://dxr.mozilla.org/comm-central/source/mozilla/security...

The page linked above has instructions for removal.

Yes, the submission site talks about removal, and how it takes until the next major release of the browser for the changes to appear in the codebase, and every user will have to upgrade to pick up the changes.

If it were a HSTS preload registry service and API, that'd be different, but it's not.

No it doesn't. You cannot preload every of the Internet, i event doubt you could load 5% of al domain names.

With this approach we can have it for big sites, or important sites but we want it everywhere.

As of April, Let's Encrypt had issued 2 million certificates. Assuming that each domain is about 10 bytes long, that there's no compression (not even includeSubdomains=true, which is a conservative assumption because LE doesn't do wildcards), and that each domain is active, on the public internet, and wants HSTS, that's 20 megabytes of data. That's a lot of data, yes, but it's smaller than the Chrome or Firefox installer. Even if you account for other CAs, that's still the same order of magnitude as the browser itself. So it's not unreasonable for this data to be delivered as part of the initial browser download, and for updates to be delivered as part of browser automatic updates.

There's no sense in which you "cannot" preload every site with an SSL certificate. You absolutely can, and it would work totally fine. We can talk about whether there are better designs, but preloading everything is definitely a realistic option.

It's also an option that works today. If we figure out a better solution in the future (DNSSEC? Bloom filters and OCSP responders?), we can seamlessly transition the current preload list to it, but we're also getting the security advantage of the preload list immediately.

Indeed. Perhaps a better title for this would be "'https://' considered harmful".

We changed the title to that of the HTML document, which is clearer.

This is from early 2015. Should be noted in the title, since it more than 18 months old.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact