Hacker News new | comments | show | ask | jobs | submit login
Update Regarding ACME TLS-SNI and Shared Hosting Infrastructure (letsencrypt.org)
116 points by okket 8 months ago | hide | past | web | favorite | 42 comments



It looks like Let's Encrypt has come up with the best plan possible for the goal of balancing mitigation of the security risks with compatibility for users.

If their plan works out as they've laid out, and as certain recent commits in the boulder source code would suggest...

It would seem they intend to:

1. Allow renewal of already-issued certificates to same account holder to revalidate via TLS-SNI-01 for some limited time period.

2. Whitelist certain shared infrastructure providers who have lots of LE certs in force and who have demonstrated that they are not vulnerable. I'm betting especially for those that manage the whole certificate retrieval and deployment process. It's not clear if this is temporary but longer than #1 or "temporary" but very long term or "temporary" until we come up with another inband process.

3. Otherwise TLS-SNI-01 is gone.

Meanwhile, they'll work with the ACME WG to see if they can't figure out a better TLS-SNI method which would not be vulnerable. That looks less and less workable absent some special TLS extensions or ALPN and server support for those.

For those who have options, it's worth pointing out that the purely DNS based validation methods are literally closest to the facts that the CA's validation process wishes to prove. Domain Control validated means that you're showing effective control of the domain. Nothing says I control the domain like the ability to add and remove data from the authoritative DNS servers for the domain label (and children thereof) in question.


I think a similar process, but built off of ALPN rather than SNI, seems like the clear solution. Nobody at LE seems to have spoken about this or acknowledged it as a possible option yet though.


Ryan Sleevi of Google has discussed (specifically) such a possibility on the m.d.s.p group.

Today's update from Let's Encrypt did allude under "ACME Protocol Updates" that there might be further work done to the protocol to attempt to remediate the risks.

Probably they don't want to get specific because even if a concrete proposal were ready to begin coding today, it would take time to build the reference client, time to build server infrastructure, test, etc.

Then before that work would have any benefit to the various websites needing validation, it would require server software upgrades to facilitate those extensions or ALPN negotiations.

My utter speculation is that they're thinking it would likely take long enough that everyone will have to be off TLS-SNI-01 before its replacement becomes available.


Work has already started on a new challenge using ALPN: https://mailarchive.ietf.org/arch/msg/acme/mrKOeRK1K6H_42Hxb...


Let's Encrypt has also just added a new post in which they've been working tirelessly on a new nginx and apache plugin to certbot utilizing HTTP-01 validation:

https://community.letsencrypt.org/t/help-test-certbot-apache...

It seems they are predicting TLS-SNI-0x going away for a lengthy period of time.

That said, the ALPN proposal is a start.

Though rather than just having it as a mere marker, it should incorporate features to securely indicate which domain label it is attempting to validate and achieve consensus on part of validator and the endpoint being validated.

I am hopeful such a scheme may be useful for future deployments down the road. I think it is likely before there is infrastructure in place utilizing a new mechanism of that kind that current needs will need to be met with one of the other mechanisms.

The speed and resource with which Let's Encrypt is working on solutions to migrate users to non-TLS-SNI validations might well be a signal.


The draft work which proposes what amounts to an ALPN protocol-name echo as an implicit signal to some heretofore unspecified compliance with ACME workarounds is a mistake. Even the suggestion to "scan alexa's top sites" to see if accidentally-clashing behavior is observed in the wild is naive at best, mind-numbingly misguided at worst.

Like you suggest, it's important to be explicit, and if they wish to lean on yet another protocol, now is an opportunity to enumerate the exact behaviors they want. It's good that this work has begun, but I hope it won't be rushed.


Agree 100%. I actually just joined the ACME mailing list to comment along those lines.

I hope that wasn't against protocol.

It will do favor to no-one to rush this. It's broken bad enough that it needs a fresh cycle of iteration and testing.

The Alexa scan idea is weak. Of course no one advertises "acme" as an ALPN name now. There's no incentive to today. If the proposal Mr. Rudenberg made were accepted, there'd be plenty of incentive for these same broken shared hosts to advertise an ALPN identifier of "acme". It just wouldn't be coupled with any incentive to fix the other issues.

On that topic, the proposed edit does not even attempt to define what circumstances/facts/assertions a presenter of the ALPN "acme" is hypothetically promising. No attempt is even made to extract a gentleman's agreement that a shared host vulnerable to the attacks which have been described would not advertise this ALPN. Of course, there's no real point to trying to extract that. The shared host would have no incentive to hold up their end of that promise.

But if this proposal as suggested moved forward without further revision, there would be incentive to make your TLS endpoint with "acme" tomorrow, whether or not your infrastructure is secure against the actual attack vectors that have effectively disqualified TLS-SNI-01 and TLS-SNI-02 at the present.


Hmmmm I really don't think the best option is to make TLS-SNI-3 STILL working on the basis of providing the wrong Host header/SNI hostname.

Let's make TLS-ALPN-1, have the protocol as "acme-verify", and respond with a simple custom protocol - ignoring HTTP.


Ah thank you - I have found that now.

A link to that discussion, for reference: https://groups.google.com/d/msg/mozilla.dev.security.policy/...


> Nothing says I control the domain like the ability to add and remove data from the authoritative DNS servers for the domain label (and children thereof) in question.

Unfortunately this makes life difficult for us.

We run a whitelabelled platform with tens of thousands of users - it's hard enough to get many people to understand setting a CNAME record (vs just setting an A record). Requiring them to update a TXT record once would be a big enough challenge, but doing it every 30-90 days is never going to happen. The smaller customers would probably be happy with us hosting their DNS, but we're not going to do that - the larger ones wouldn't.

TLS-SNI being gone and DNS being unworkable means we're left with HTTP only.


But if you're a web post with 10k+ users, what's the problem with the HTTP-01 challenge?

You just allow .well-known/* to be passed on to reflect the challenge responses you've generated for the client, while 301 redirecting everything else to their https:// site.

I'm confused how that would be harder for a web host at that scale?

EDIT: I get people trying to run a server off their cable modem / rtr public IP, and 80 might be taken by something other than the target the port forward for 443 is going to -- and that's a problem for those use cases -- but that kind of concern wouldn't exist in a significant hosting infrastructure.


I would say the .well-known/ is actually easier. one could just create a nginx (or somehow in haproxy) backend that will actually load the data to generate the cert from a trusted store. (I mean no user will probably use the .well-known endpoint (hopefully))

after that it could actually just put the cert into that store again and reload all public facing webservers


> what's the problem with the HTTP-01 challenge?

Nothing, yet.

But who's to say that another similar bug won't be found in common shared-hosting platforms that forces LE to turn that challenge off too?


True.

Because almost all of the CAs utilize a web control mechanism, with many of them probably having processes not as rigorous as HTTP-01, it is likely that there would be significant backlash and a lengthier migration away from the method for that case.

That said, anyone who can would be well advised to figure out how their DNS based mechanism would work if it were ever needed.

As I and others have pointed out, there are clever and fully supported hacks for validating dns-01 without dynamic control of the full domain zone. (CNAME to another zone for the _acme-challenge labels, NS delegation to refer each _acme-challenge label as an independent zone at a different NS, etc.)


An option that's often overlooked is to use a CNAME record for the _acme-challenge label pointing to a domain under your control. acme-dns[1] explains this approach in detail.

The usability of the HTTP and TLS challenges is still better in most cases, but that would give you an alternative in scenarios where neither is an option for some reason.

[1]: https://github.com/joohoi/acme-dns


Agreed - looks like a good plan. It definitely must have been a busy few days to a week for them : First Meltdown/Spectre to consider and now this to deal with - however I am glad to see the update and the total transparency each step of the way!


The tls-sni challenges rest on the assumption that a hosting provider will somehow ensure that a self-signed cert uploaded by the user contains "truthful" information, even though the second half of the cert is blatantly fake and is being abused to carry data. This is a bold assumption to make, considering the entire point of CAs is to say that the information being presented in a cert has been vetted, so why presume that anyone will vet any claims in a self-signed cert?

That aside, Let's Encrypt had an exemplary response to this issue throughout, and has made the right call here. A best-effort whitelist will enable a smoother transition for those on some known-good hosts, while mitigating this vulnerability and keeping their cert ecosystem uncompromised.

The work will now begin to add support to the other validation methods in ACME software that's lacking them, and to engage with the other custodians of the ACME protocol to rectify this particular flaw in design.

While the http-01 challenge is the recommended migration path, and likely the easiest to automate with greenfield software, the dns-01 challenge is the one with the fewest amount of intermediate assumptions -- such as the ones made when designing tls-sni-*, which in this case turned out to be faulty -- and represents the one most likely to be futureproof. After all, what better way to prove you own a domain itself than being able to add arbitrary records to it that all nameservers then echo back?


I completely agree on the dns-01 challenge. Those who are migrating off of TLS-SNI-01 and have a capability to standardize on dns-01 will be better off in the long run.

As you point out, domain control validation is best performed by having the application demonstrate control of the domain.


The problem with dns-01 is that it much of the time, it requires granting far too much privilege to the system that requests the certificate.

This is because the great majority of DNS hosts do not provide sufficiently granular permissions to only allow changes to _acme-challenge RRs.

e.g. Cloudflare, as far as I can tell, only gives you one API key which grants all access to all zones.

e.g. Most domain registrars who offer DNS hosting who provide an API grant access to all sorts of management functions, not just DNS zone changes.

e.g. Route53 IAM doesn't let you restrict to a single RR, you expose modifications to the entire zone.

I am really not comfortable giving my web application these kinds of powers.

TLS-SNI was useful because it was relatively protocol agnostic, so some of that flexibility is now gone.


> The problem with dns-01 is that it much of the time, it requires granting far too much privilege to the system that requests the certificate. This is because the great majority of DNS hosts do not provide sufficiently granular permissions to only allow changes to _acme-challenge RRs.

There's a cool solution to this that I learned from someone else on the Let's Encrypt forums (where I often help do support). The Let's Encrypt DNS-01 validator will follow CNAMEs. Therefore, you can make _acme-challenge be a CNAME to an arbitrary text record which can be in another zone (including a zone dedicated for this purpose). For example, you could say

_acme-challenge.example.com. IN CNAME foo.acmevalidation.example.net.

Now an application can just have API keys to update RRs under acmevalidation.example.net, which does not need to be used for any other purpose (or even necessarily hosted on the same infrastructure as example.com's own DNS). The CNAME can be created manually at the outset and does not need to be updated for renewals.

This has been possible for a long time, but if it becomes more widely known and more widely supported by client applications and DNS providers, it should make use of DNS-01 authentication much more practical, and safer, for a pretty wide range of people.


Another similar option would presumably be to delegate _acme-challenge.example.com to different nameservers with an NS record, then give your application the required privileges to control solely that nameserver.


Yes. Or even to the same name server, breaking out each whole label starting with _acme-challenge as its own independent zone, with its own access policies.


Yes, but there’s no need to have separate zones; you can grant update access to subdomains and have the CNAMES point into one zone with a subdomain dedicated to each separate actor which needs access.

Like so: Assume that Actor 1 has example.com and example.net. You then add this to the example.com and example.net zones, respectively:

  _acme_challenge.example.com.  CNAME  example.com._.actor1._.your-special-domain.com.

  _acme_challenge.example.net.  CNAME  example.net._.actor1._.your-special-domain.com.
Then you give update access to Actor 1, but not to the whole “your-special-domain.com” zone, but to the “_.actor1._.your-special-domain.com” subdomain. The ACME system would then be configured to send updates to the correct subdomains of that subdomain. Or “your-special-domain.com” could even be a subdomain itself of another domain; it doesn’t matter.


The DNS providers need to up their API game.

The ISC BIND DNS server allows cryptographic authentication for updates with ACLs that let you get as granular as only being able to add/delete TXT records within this branch of zone X.

In the alternative, you can place static CNAME records in your real DNS zone that would refer out the _acme_validation queries to another zone entirely. Run that zone with entirely different credentials.


You are exactly correct. It's honestly pathetic that we've let cloud DNS providers (not to mention most alternative resolvers) get away with providing such inadequate interfaces compared to BIND. It's not actually difficult to admin, it's incredibly capable, and it fucking accepts AXFR/IXFR. It's insane to me that anyone puts up with such standards-hostile software.


From my point of view the big advantage of TLS-SNI is that it uses the same protocol and port as 90%+ of certificate users want to use with the issued certificate: HTTPS.

That is especially useful for webserver plugins. Also this is much better when there are security policies that (for maybe misguided but well-intentioned reasons) completely block or redirect all HTTP traffic.

What would be insecure about a https-01 challenge, that esentially works identical to the http-01 challenge but allows any certificate?


> What would be insecure about a https-01 challenge, that esentially works identical to the http-01 challenge but allows any certificate?

There's a specific reason http-01 is HTTP-only, and it's actually quite similar to the tls-sni-01 situation. In many of the major web servers, including apache and nginx, the web server will use the first HTTPS vhost in its configuration for any unmatched domains, unless you explicitly specify a default vhost. In practice that means an attacker on the same hosting environment used by the victim could get themselves in a position where they control this default vhost and obtain a certificate for their domain. The vhost order is often based on the alphabetic order of the domain, so that's fairly easy to pull off. http-01's predecessor did allow HTTPS, but this attack came up during the IETF ACME standardization process and, IIRC, was fixed before Let's Encrypt entered public beta[1].

http-01 does permit the CA server to follow redirects to HTTPS, including to ones with self-signed, expired or otherwise invalid certificates, so common setups with HSTS and redirects to HTTPS are fine, you'll only be in trouble if you can't use HTTP on port 80 at all.

[1]: https://mailarchive.ietf.org/arch/msg/acme/B9vhPSMm9tcNoPrTE...


But that behavior is true and exploitable for HTTP as well, isn't it? It is a risk if there is no specific vhost config for the validated domain, which means a customer pointed the DNS to the shared host without also configuring that host to serve content for his domain from his account.

I realize in current real-world setups you would normally start with a HTTP-only config and only later or maybe never configure HTTPS for that domain, or configure both protocols simultaneously. And almost never the opposite where you configure HTTPS only and someone else would be able to grab your HTTP traffic. So that's still a good argument to do HTTP only, thank you for explaining it.

I did not know http-01 would follow redirect to HTTPS, that is also really good to know and should be a good way for some setups.


Happy to see this. I was very critical of their plan to re enable Tls-sni and I’m happy to see they reconsidered. They made the right call here.


For the record, I am pretty sure Caddy will be unaffected by this. Any programs using xenolf/lego as their ACME client should be fine as well, as long as one other validation method is still available. (lego also uniquely supports a wide variety of DNS providers for automated negotiation of the DNS challenge. Caddy supports them too, as long as they are plugged in and configured.)

I've been asked if we'll turn off TLS-SNI in Caddy and the answer is no; as their announcement says, some accounts will still be able to use TLS-SNI for a limited timeframe until it is turned off completely. Caddy won't try the TLS-SNI challenge as long as the ACME server doesn't advertise it in an exchange.


One would think that most of the Caddy users who previously used TLS-SNI-01 could also avail themselves of HTTP-01 validation.


Yeah, I imagine so. Many sites still have port 80 open to redirect to 443.


RIP

I have a few services which were using Go’s acme/autocert package. I now need to update them to the HTTP challenge.


xenolf/lego arguably has the widest support in the sense of ACME verification methods, but autocert might get other methods too: https://github.com/golang/go/issues/21890


I don't understand why TLS-SNI can't work fine if you just have the response certificate be entirely distinct from the server name.

EG: LE sends a challenge of a.b.c.acme.invalid

You must reply with a certificate of d.e.f.acme.invalid

Doesn't that entirely mitigate the shared hosting issue since any shared hosting setup will require SNI to match the certificate name that you reply with?


No, that’s the problem. A lot of shared hosts will allow any customer to board a new website — as long as it’s not already taken on that hosting provider — then allow you to upload any TLS cert for that.

So attacker requests to validate for a name that you have pointed via DNS to that hosting infrastructure.

The names that you need to respond on and have certs for are then calculated by attacker. Attacker, who is also a customer of same hosting service creates the necessary “sites” and uploads the matching challenge response certs, and successfully receives a cert for your domain.


None of what you're saying is an issue in my outline.

If I'm able to upload a certificate for a.b.c.acme.invalid, the validation TLS-SNI request for a.b.c.acme.invalid will reply with a certificate for a.b.c.acme.invalid and thus fail.

If I'm able to upload a certificate for d.e.f.acme.invalid, the validation TLS-SNI request for a.b.c.acme.invalid will not match my uploaded certificate and the challenge will thus fail.

I may well be misunderstanding the situation, but, I just don't see how.


It is still an issue, actually.

You misunderstand how the TLS balancer chooses which certificate to present.

When the TLS connection comes in and presents a SNI name of "a.b.c.acme.invalid", the balancer checks its configuration to see if the host has a "website" called "a.b.c.acme.invalid". It discovers that it does. It looks at what certificate in the database was uploaded for that website configuration.

It doesn't actually check the certificate details at all....

It presents the certificate that was uploaded by the "owner" of the "website" a.b.c.acme.invalid.

And if that name needs to present a certificate that says "d.e.f.acme.invalid" then that is the certificate that the attacker will have uploaded for his "a.b.c.acme.invalid" site.

There are numerous web hosts who would permit this and it would work just like that.

The mechanism you're describing is similar to the changes in TLS-SNI-02. It has already been determined that TLS-SNI-02 is deficient as it is vulnerable to the attack I've parroted here.

The trouble is that the people who wrote both TLS-SNI-01 and TLS-SNI-02 apparently had little knowledge of the vast breadth of behaviors exhibited by a plethora of shared web hosts. The assumptions they made, upon which all the current TLS-SNI-0x protocols rely to provide security, simply are not upheld and honored by the real world marketplace of web hosts.


Ok, that makes sense, thanks for explaining it a bit more.

That seems like an absurdly broken implementation, I get why LetsEncrypt feels the need to disable it but I really hope they come up with some alternate solution.

I really like the tls-sni authentication method as it keeps authentication entirely inband to the final SSL goal. With HTTP you need to listen on/control 80+443, with DNS you need to control DNS. With tls-sni you need only control port 443. I'm a huge fan of x/crypto/acme/autocert.

Custom ALPN-signaled protocol should be doable and should solve all of this, I hope they do it.


It is a rather limited protocol. Really, the implementation isn't so bad... In a perfect world.

It's naive.

It imagines there was a whole different set of operating circumstances at shared web hosts than the reality exhibits.

I actually had not read the protocol specification for that challenge as I utilize http-01 and dns-01 on all my various systems. Then, when the early report without details was released, I read the protocol and realized almost immediately that there were several circumstances in the field which could yield actual vulnerability.

They also made the mistake of failing to align to a promise which the other mechanisms do make: the other mechanisms tie the validation directly to the target domain label being authorized or a known child thereof. The TLS-SNI-01 and TLS-SNI-02 don't do that. And they knew that, because they wanted to be able to perform a TLS-SNI validation without having to change server software. I believe that was a bad decision.

The proposed TLS-SNI-03 ALPN "acme" extension that Mr. Rudenberg has put forth will not be resilient to these attacks, ultimately. I think they should do a real ALPN protocol and do the validation through that. But let's assume time to market for that would exceed a year. In the mean time, people reliant on TLS-SNI-01 are likely going to need to do something else.

In short, a mechanism which would achieve much the goal of getting validation off of a single TLS port running the right software can happen, but I believe it should borrow pretty much nothing from the current TLS-SNI-0x proposals.

It should be a whole new real ALPN protocol.


> I think they should do a real ALPN protocol and do the validation through that.

Agreed.

> But let's assume time to market for that would exceed a year.

I'm sure it would take a year or more for good packages for most languages to exist, but, it doesn't seem so complex that it should take a year for it to exist and for it to be usable if you're sufficiently motivated (EG: you're willing to write your own software).

> In the mean time, people reliant on TLS-SNI-01 are likely going to need to do something else.

I just moved to using a commercial wildcard certificate. I didn't particularly want to, but, for this and a few other reasons, LetsEncrypt became non-viable for me.


You can upload a certificate with both names on it, no?

(Which is a thing a shared hosting provider should support - as a user I only want to provide a single cert for example.com and www.example.com, and if I'm paying for a cert and hosting multiple websites, I don't want to pay for two different certs for example.com and example.org if my CA will let me get those both on the same cert.)




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: