If their plan works out as they've laid out, and as certain recent commits in the boulder source code would suggest...
It would seem they intend to:
1. Allow renewal of already-issued certificates to same account holder to revalidate via TLS-SNI-01 for some limited time period.
2. Whitelist certain shared infrastructure providers who have lots of LE certs in force and who have demonstrated that they are not vulnerable. I'm betting especially for those that manage the whole certificate retrieval and deployment process. It's not clear if this is temporary but longer than #1 or "temporary" but very long term or "temporary" until we come up with another inband process.
3. Otherwise TLS-SNI-01 is gone.
Meanwhile, they'll work with the ACME WG to see if they can't figure out a better TLS-SNI method which would not be vulnerable. That looks less and less workable absent some special TLS extensions or ALPN and server support for those.
For those who have options, it's worth pointing out that the purely DNS based validation methods are literally closest to the facts that the CA's validation process wishes to prove. Domain Control validated means that you're showing effective control of the domain. Nothing says I control the domain like the ability to add and remove data from the authoritative DNS servers for the domain label (and children thereof) in question.
Today's update from Let's Encrypt did allude under "ACME Protocol Updates" that there might be further work done to the protocol to attempt to remediate the risks.
Probably they don't want to get specific because even if a concrete proposal were ready to begin coding today, it would take time to build the reference client, time to build server infrastructure, test, etc.
Then before that work would have any benefit to the various websites needing validation, it would require server software upgrades to facilitate those extensions or ALPN negotiations.
My utter speculation is that they're thinking it would likely take long enough that everyone will have to be off TLS-SNI-01 before its replacement becomes available.
It seems they are predicting TLS-SNI-0x going away for a lengthy period of time.
That said, the ALPN proposal is a start.
Though rather than just having it as a mere marker, it should incorporate features to securely indicate which domain label it is attempting to validate and achieve consensus on part of validator and the endpoint being validated.
I am hopeful such a scheme may be useful for future deployments down the road. I think it is likely before there is infrastructure in place utilizing a new mechanism of that kind that current needs will need to be met with one of the other mechanisms.
The speed and resource with which Let's Encrypt is working on solutions to migrate users to non-TLS-SNI validations might well be a signal.
Like you suggest, it's important to be explicit, and if they wish to lean on yet another protocol, now is an opportunity to enumerate the exact behaviors they want. It's good that this work has begun, but I hope it won't be rushed.
I hope that wasn't against protocol.
It will do favor to no-one to rush this. It's broken bad enough that it needs a fresh cycle of iteration and testing.
The Alexa scan idea is weak. Of course no one advertises "acme" as an ALPN name now. There's no incentive to today. If the proposal Mr. Rudenberg made were accepted, there'd be plenty of incentive for these same broken shared hosts to advertise an ALPN identifier of "acme". It just wouldn't be coupled with any incentive to fix the other issues.
On that topic, the proposed edit does not even attempt to define what circumstances/facts/assertions a presenter of the ALPN "acme" is hypothetically promising. No attempt is even made to extract a gentleman's agreement that a shared host vulnerable to the attacks which have been described would not advertise this ALPN. Of course, there's no real point to trying to extract that. The shared host would have no incentive to hold up their end of that promise.
But if this proposal as suggested moved forward without further revision, there would be incentive to make your TLS endpoint with "acme" tomorrow, whether or not your infrastructure is secure against the actual attack vectors that have effectively disqualified TLS-SNI-01 and TLS-SNI-02 at the present.
Let's make TLS-ALPN-1, have the protocol as "acme-verify", and respond with a simple custom protocol - ignoring HTTP.
A link to that discussion, for reference: https://groups.google.com/d/msg/mozilla.dev.security.policy/...
Unfortunately this makes life difficult for us.
We run a whitelabelled platform with tens of thousands of users - it's hard enough to get many people to understand setting a CNAME record (vs just setting an A record).
Requiring them to update a TXT record once would be a big enough challenge, but doing it every 30-90 days is never going to happen. The smaller customers would probably be happy with us hosting their DNS, but we're not going to do that - the larger ones wouldn't.
TLS-SNI being gone and DNS being unworkable means we're left with HTTP only.
You just allow .well-known/* to be passed on to reflect the challenge responses you've generated for the client, while 301 redirecting everything else to their https:// site.
I'm confused how that would be harder for a web host at that scale?
EDIT: I get people trying to run a server off their cable modem / rtr public IP, and 80 might be taken by something other than the target the port forward for 443 is going to -- and that's a problem for those use cases -- but that kind of concern wouldn't exist in a significant hosting infrastructure.
after that it could actually just put the cert into that store again and reload all public facing webservers
But who's to say that another similar bug won't be found in common shared-hosting platforms that forces LE to turn that challenge off too?
Because almost all of the CAs utilize a web control mechanism, with many of them probably having processes not as rigorous as HTTP-01, it is likely that there would be significant backlash and a lengthier migration away from the method for that case.
That said, anyone who can would be well advised to figure out how their DNS based mechanism would work if it were ever needed.
As I and others have pointed out, there are clever and fully supported hacks for validating dns-01 without dynamic control of the full domain zone. (CNAME to another zone for the _acme-challenge labels, NS delegation to refer each _acme-challenge label as an independent zone at a different NS, etc.)
The usability of the HTTP and TLS challenges is still better in most cases, but that would give you an alternative in scenarios where neither is an option for some reason.
That aside, Let's Encrypt had an exemplary response to this issue throughout, and has made the right call here. A best-effort whitelist will enable a smoother transition for those on some known-good hosts, while mitigating this vulnerability and keeping their cert ecosystem uncompromised.
The work will now begin to add support to the other validation methods in ACME software that's lacking them, and to engage with the other custodians of the ACME protocol to rectify this particular flaw in design.
While the http-01 challenge is the recommended migration path, and likely the easiest to automate with greenfield software, the dns-01 challenge is the one with the fewest amount of intermediate assumptions -- such as the ones made when designing tls-sni-*, which in this case turned out to be faulty -- and represents the one most likely to be futureproof. After all, what better way to prove you own a domain itself than being able to add arbitrary records to it that all nameservers then echo back?
As you point out, domain control validation is best performed by having the application demonstrate control of the domain.
This is because the great majority of DNS hosts do not provide sufficiently granular permissions to only allow changes to _acme-challenge RRs.
e.g. Cloudflare, as far as I can tell, only gives you one API key which grants all access to all zones.
e.g. Most domain registrars who offer DNS hosting who provide an API grant access to all sorts of management functions, not just DNS zone changes.
e.g. Route53 IAM doesn't let you restrict to a single RR, you expose modifications to the entire zone.
I am really not comfortable giving my web application these kinds of powers.
TLS-SNI was useful because it was relatively protocol agnostic, so some of that flexibility is now gone.
There's a cool solution to this that I learned from someone else on the Let's Encrypt forums (where I often help do support). The Let's Encrypt DNS-01 validator will follow CNAMEs. Therefore, you can make _acme-challenge be a CNAME to an arbitrary text record which can be in another zone (including a zone dedicated for this purpose). For example, you could say
_acme-challenge.example.com. IN CNAME foo.acmevalidation.example.net.
Now an application can just have API keys to update RRs under acmevalidation.example.net, which does not need to be used for any other purpose (or even necessarily hosted on the same infrastructure as example.com's own DNS). The CNAME can be created manually at the outset and does not need to be updated for renewals.
This has been possible for a long time, but if it becomes more widely known and more widely supported by client applications and DNS providers, it should make use of DNS-01 authentication much more practical, and safer, for a pretty wide range of people.
Like so: Assume that Actor 1 has example.com and example.net. You then add this to the example.com and example.net zones, respectively:
_acme_challenge.example.com. CNAME example.com._.actor1._.your-special-domain.com.
_acme_challenge.example.net. CNAME example.net._.actor1._.your-special-domain.com.
The ISC BIND DNS server allows cryptographic authentication for updates with ACLs that let you get as granular as only being able to add/delete TXT records within this branch of zone X.
In the alternative, you can place static CNAME records in your real DNS zone that would refer out the _acme_validation queries to another zone entirely. Run that zone with entirely different credentials.
That is especially useful for webserver plugins.
Also this is much better when there are security policies that (for maybe misguided but well-intentioned reasons) completely block or redirect all HTTP traffic.
What would be insecure about a https-01 challenge, that esentially works identical to the http-01 challenge but allows any certificate?
There's a specific reason http-01 is HTTP-only, and it's actually quite similar to the tls-sni-01 situation. In many of the major web servers, including apache and nginx, the web server will use the first HTTPS vhost in its configuration for any unmatched domains, unless you explicitly specify a default vhost. In practice that means an attacker on the same hosting environment used by the victim could get themselves in a position where they control this default vhost and obtain a certificate for their domain. The vhost order is often based on the alphabetic order of the domain, so that's fairly easy to pull off. http-01's predecessor did allow HTTPS, but this attack came up during the IETF ACME standardization process and, IIRC, was fixed before Let's Encrypt entered public beta.
http-01 does permit the CA server to follow redirects to HTTPS, including to ones with self-signed, expired or otherwise invalid certificates, so common setups with HSTS and redirects to HTTPS are fine, you'll only be in trouble if you can't use HTTP on port 80 at all.
I realize in current real-world setups you would normally start with a HTTP-only config and only later or maybe never configure HTTPS for that domain, or configure both protocols simultaneously. And almost never the opposite where you configure HTTPS only and someone else would be able to grab your HTTP traffic. So that's still a good argument to do HTTP only, thank you for explaining it.
I did not know http-01 would follow redirect to HTTPS, that is also really good to know and should be a good way for some setups.
I've been asked if we'll turn off TLS-SNI in Caddy and the answer is no; as their announcement says, some accounts will still be able to use TLS-SNI for a limited timeframe until it is turned off completely. Caddy won't try the TLS-SNI challenge as long as the ACME server doesn't advertise it in an exchange.
I have a few services which were using Go’s acme/autocert package. I now need to update them to the HTTP challenge.
EG: LE sends a challenge of a.b.c.acme.invalid
You must reply with a certificate of d.e.f.acme.invalid
Doesn't that entirely mitigate the shared hosting issue since any shared hosting setup will require SNI to match the certificate name that you reply with?
So attacker requests to validate for a name that you have pointed via DNS to that hosting infrastructure.
The names that you need to respond on and have certs for are then calculated by attacker. Attacker, who is also a customer of same hosting service creates the necessary “sites” and uploads the matching challenge response certs, and successfully receives a cert for your domain.
If I'm able to upload a certificate for a.b.c.acme.invalid, the validation TLS-SNI request for a.b.c.acme.invalid will reply with a certificate for a.b.c.acme.invalid and thus fail.
If I'm able to upload a certificate for d.e.f.acme.invalid, the validation TLS-SNI request for a.b.c.acme.invalid will not match my uploaded certificate and the challenge will thus fail.
I may well be misunderstanding the situation, but, I just don't see how.
You misunderstand how the TLS balancer chooses which certificate to present.
When the TLS connection comes in and presents a SNI name of "a.b.c.acme.invalid", the balancer checks its configuration to see if the host has a "website" called "a.b.c.acme.invalid". It discovers that it does. It looks at what certificate in the database was uploaded for that website configuration.
It doesn't actually check the certificate details at all....
It presents the certificate that was uploaded by the "owner" of the "website" a.b.c.acme.invalid.
And if that name needs to present a certificate that says "d.e.f.acme.invalid" then that is the certificate that the attacker will have uploaded for his "a.b.c.acme.invalid" site.
There are numerous web hosts who would permit this and it would work just like that.
The mechanism you're describing is similar to the changes in TLS-SNI-02. It has already been determined that TLS-SNI-02 is deficient as it is vulnerable to the attack I've parroted here.
The trouble is that the people who wrote both TLS-SNI-01 and TLS-SNI-02 apparently had little knowledge of the vast breadth of behaviors exhibited by a plethora of shared web hosts. The assumptions they made, upon which all the current TLS-SNI-0x protocols rely to provide security, simply are not upheld and honored by the real world marketplace of web hosts.
That seems like an absurdly broken implementation, I get why LetsEncrypt feels the need to disable it but I really hope they come up with some alternate solution.
I really like the tls-sni authentication method as it keeps authentication entirely inband to the final SSL goal. With HTTP you need to listen on/control 80+443, with DNS you need to control DNS. With tls-sni you need only control port 443. I'm a huge fan of x/crypto/acme/autocert.
Custom ALPN-signaled protocol should be doable and should solve all of this, I hope they do it.
It imagines there was a whole different set of operating circumstances at shared web hosts than the reality exhibits.
I actually had not read the protocol specification for that challenge as I utilize http-01 and dns-01 on all my various systems. Then, when the early report without details was released, I read the protocol and realized almost immediately that there were several circumstances in the field which could yield actual vulnerability.
They also made the mistake of failing to align to a promise which the other mechanisms do make: the other mechanisms tie the validation directly to the target domain label being authorized or a known child thereof. The TLS-SNI-01 and TLS-SNI-02 don't do that. And they knew that, because they wanted to be able to perform a TLS-SNI validation without having to change server software. I believe that was a bad decision.
The proposed TLS-SNI-03 ALPN "acme" extension that Mr. Rudenberg has put forth will not be resilient to these attacks, ultimately. I think they should do a real ALPN protocol and do the validation through that. But let's assume time to market for that would exceed a year. In the mean time, people reliant on TLS-SNI-01 are likely going to need to do something else.
In short, a mechanism which would achieve much the goal of getting validation off of a single TLS port running the right software can happen, but I believe it should borrow pretty much nothing from the current TLS-SNI-0x proposals.
It should be a whole new real ALPN protocol.
> But let's assume time to market for that would exceed a year.
I'm sure it would take a year or more for good packages for most languages to exist, but, it doesn't seem so complex that it should take a year for it to exist and for it to be usable if you're sufficiently motivated (EG: you're willing to write your own software).
> In the mean time, people reliant on TLS-SNI-01 are likely going to need to do something else.
I just moved to using a commercial wildcard certificate. I didn't particularly want to, but, for this and a few other reasons, LetsEncrypt became non-viable for me.
(Which is a thing a shared hosting provider should support - as a user I only want to provide a single cert for example.com and www.example.com, and if I'm paying for a cert and hosting multiple websites, I don't want to pay for two different certs for example.com and example.org if my CA will let me get those both on the same cert.)