Hacker News new | comments | show | ask | jobs | submit login
Let's Encrypt tls-sni-01 disabled due to credible vulnerability report (status.io)
374 points by regecks 5 months ago | hide | past | web | favorite | 84 comments



Josh from Let's Encrypt here. I'm not able to give many more details yet, but here's what I can add now:

1) This isn't a relatively simple issue like a bug in our CA code would be. It's an interaction between the protocol and provider services.

2) Disabling TLS-SNI is a complete mitigation for us, meaning it's no longer possible to get an illegitimate certificate from Let's Encrypt by exploiting this issue.

3) We have not yet reached a conclusion as to whether or not the TLS-SNI challenge will need to remain disabled permanently.

4) At this point we have no reason to believe that the vulnerability has been exploited by anyone other than the researcher who figured it out and reported it to us.

Our focus now is on sharing information with relevant parties and looking for less drastic mitigations that might allow us to restore the TLS-SNI challenge option to people who rely on it.

We will, of course, share more information as soon as we can. That might be as soon as the next few hours, things are moving quickly.


New Update, explaining the issue with shared infrastructure and crediting the discovery to Frans Rosén of Detectify: https://community.letsencrypt.org/t/2018-01-09-issue-with-tl...


Do we get points for speculation based on these hints?

My guess would be that some major public CDN (Cloudflare etc) will let the attacker deploy their TLS-SNI challenge certs, and thus validate for other victim domains using the same CDN service.

EDIT: Main reasoning being that I can't think why it wouldn't work - apart from the TLS-SNI challenge certs somehow being considered invalid by the CDN provider and refusing to deploy them, but I find it hard to trust that happening with 100% certainty.


Too late for me to edit this comment anymore, but in order to avoid spreading any false rumors, I'd like to explicitly state that I have no reason at all to believe that Cloudflare in particular would be affected by this, and I do not wish to imply that customers of cloudflare would be at risk from this vulnerability. That was just simply the first example of a public CDN deploying user-provided certificates that came to mind.

The big public CDNs would certainly have the most impact from this, but also the most likely to get their cert validation right. I'd reckon that the risk is probably in the larger number of smaller providers that have their own home-grown cert automation for deploying user-provided certificates.

In fact, it seems a little odd to me to hear LE talking about collecting a blacklist of vulnerable providers to prevent from using TLS-SNI... TBH, how can you tell what providers will be affected... should that be a whitelist instead?


I haven't seen anything on our internal security mailing lists about this. If it does somehow involve Cloudflare I'd be happy to receive a report directly via HackerOne (https://hackerone.com/cloudflare). Happy to assist.


I don't think CF is affected because TLS-SNI-01 has never worked for cf-proxied sites (I've tried). CF doesn't allow clients to use invalid certificates for domains they don't own, which is key for the tls-sni-01 process.


Sorry, I just chose cloudflare as a random example when speculating, I don't have any information about what specific providers this would affect, and I'm not implying that cloudflare would be affected.

Seems like my guess was right, though!


No, need to apologize.

When you say "Seems like my guess was right, though!" are you saying that there is some way that Cloudflare is involved?

EDIT: I see (https://community.letsencrypt.org/t/2018-01-09-issue-with-tl...) now. I don't think LE has been in contact with us so don't think we're affected but happy to be told otherwise.


I would indeed like to explicitly apologize, because I regret mentioning "cloudflare etc" as an example. That's exactly the kind of bad speculation that leads to harmful rumors based on misunderstandings.

> When you say "Seems like my guess was right, though!" are you saying that there is some way that Cloudflare is involved?

No. It seems like I was right about the general nature of the vulnerability. It remains to be seen what providers are affected.

That being said, at this point, I'd personally be happier seeing a list of providers NOT affected, rather than a list of affected providers... It's probably also in "major public CDN provider's" interest to demonstrate that their user cert validation processes would have prevented this attack, and their customers were not at risk before LE pulled the plug...


One of the requirements listed [1] as being needed for exposure to this for the provider is:

Users have the ability to upload certificates for arbitrary names without proving domain control.

When you say you were right, are you saying Cloudflare allows that?

[1] https://community.letsencrypt.org/t/2018-01-09-issue-with-tl...


I suppose it's possible in theory but I'm not sure if Cloudflare would perform other checks.

It's certainly the case with the Business and above accounts you can upload custom certificates but I don't have one of those accounts to check if further validation is made regarding domain control.

If they didn't it would seem on the face of it that both the conditions in the article* are met.

* >Many users are hosted on the same IP address

>Users have the ability to upload certificates for arbitrary names without proving domain control.


Indeed, I just tried a CDN provider that I recalled allowed custom SNI certs, and was able to deploy a ".acme.invalid" certificate to the global network. It's a real issue for public CDNs, but we can rejoice that Cloudflare and (I assume) Cloudfront are not affected, at least.


As long as we're tossing hats into the ring...

I don't think so - _maybe_ a MITM that would affect all TLS-SNI, but I don't think the model you propose exists (I'm a bit dated on that tho), and if it did, I think it would be 'wider' than the CDN's.

"It's an interaction between the protocol and provider services" -- I will wager it's that some providers don't validate the CA or RootCA authentication method/mechinism, and so would accept a SHA1 or MD5 where a SHA256 or ROT13 would be preferred. (Think JWK/JWS 'null' issues)

I assume standard rules apply - one internet point to the winner, in the event of a draw, fisticuffs at dawn facing opposite coasts until we're both bored with it?


"and so would accept a SHA1 or MD5 where a SHA256 or ROT13 would be preferred"

I don't think ROT13 means what you think it means.


I don't think Cloudflare/CDNs are affected since TLS-SNI-01 doesn't work for cf-proxied hosts. (I've tried.)

Reading the detailed report[1], it sounds like tls-sni-01 requires being able to serve an invalid certificate temporarily. As far as I know, CDNs won't let you do that. This problem is probably more related to shared webhosts (which is hinted at in the detailed report).


Thanks for sharing this! Please share it on your blog as well https://letsencrypt.org/blog/

I subscribed the RSS to keep me up-to-date


I just read the details of the tls-sni-01 challenge again.

I can certainly see a problem. With respect to the final validation stage, where the validator is communicating with the TLS endpoint, all the information needed to satisfy the validator is presented IN the SNI request data.

It looks like anyone could request a tls-sni-01 validation with any account key, for a host that they new would resolve to a TLS server infrastructure with a certain behavior and in turn be able to get the certificate issued.

The qualifying behavior would be a TLS server infrastructure which has some sort of opportunistic generation of a self-sign certificate for an unrecognized SNI name, such that the just-in-time minted certificate would have a single SAN dnsName being that exact value requested in the SNI. Unlike the other challenges where the final validating query from the VA does NOT contain the "answer" needed, it would appear that the in the final pass of the TLS-SNI-01 protocol exchange, the final "answer" is indeed in the request.

You know, I'm trying to recall the details, but I'm pretty sure I came across a consumer router platform once that, until and unless administratively configured with a certificate, would keep opportunistically recreating certificates to match the SNI name. This was to make the only validation failure be "signed by an untrusted..." instead of "name mismatch", etc, etc.... I wonder if a critical mass of things like that has arisen?


If tls-sni-02 is affected too (as Josh indicated it probably is below), I'd suspect something like a major CDN or hosting provider allowing deployment of arbitrary, attacker-controlled certificates on arbitrary domains under "acme.invalid". That would be the only thing I can think of that would bypass the "anti-parrot" mitigation introduced in -02.

Otherwise, it's probably what you're describing.


I can totally believe that may exist too.

After all, for the CDN / Hosting company, what's the real risk?

It wouldn't shock me if lots of infrastructure lets you board a new and novel host label that hasn't already been boarded and starts letting you configure things like a custom TLS certificate to be associated with that label.

As long as they watch out for uniqueness, what's the risk to the CDN? That a config context gets created for a DNS label that isn't yet pointed there? It's not an obvious risk from the CDN's perspective. Even for an invalid domain.

"Oh, sad, I just wasted mere kilobytes of storage on a configuration for a domain that you're never going to actually be able to get into the DNS and point to me?" That sounds kind of low cost, from the CDN's perspective.


If it's something like that, the fix would be to define a tls-sni-03 with a couple changes:

The SNI name indication from the validation MUST be a child element desired certificate domain label.

(If I want a cert for a.com, the SNI indication in the TLS-SNI-03 would need to be something like obvious-acme-challenge.a.com)

Further, the self-signed cerfificate needs to have the SAN dnsName as in the SNI -- AND ALSO -- another authentication token signed by the account key stuffed into the certificate in some other certificate field.


What is the reason to enforce a specific root domain?

tls-sni-02 requires the generated self-signed certificate to contain an additional subjectAltName (SAN B) that is not send as part of the request and thus only known to the actual client, not to any host that automatically generates self-signed certificates (if such hosts exist).


It eliminates one possible foot gun.

A web host or CDN might allow arbitrary domains to be added, and arbitrary certificates to be uploaded for said domains, all without any validation. That would ... not be my favourite implementation, but if you didn't know about how the Web PKI and ACME works, that's an implementation you might come up with and not expect a whole lot of issues.

However, it's unlikely that the web host would allow you to do this for a domain already associated with an account, or a subdomain of such a domain. Unlike the first case, this would effectively allow an attacker to fully control any subdomain, so even without any Web PKI involvement, that would be a vulnerability in and of itself.

It remains to be seen what the two affected providers were actually doing, and I don't really have enough data to make a call on whether it's actually worth changing this aspect of tls-sni-02, but it's something to consider.


Thanks for the update and good work. Are you able to share and/or clarify whether the TLS-SNI-02 challenge is affected as well, in addition to TLS-SNI-01 which is currently Production on LE?


Haven't put much thought into -02 since nobody actually runs it in production but I suspect the same issue applies.


Curious question: Why do you write those details here on HN and not on the Letsencrypt Blog or letsencrypt.status.io?

Are HN readers somehow more entitled to these information that those who "merely" subscribed to your blog's RSS feed?


I agree that a blog post would be more helpful, but the barrier to creating one is higher. This sort of update wouldn't fly there because its an off-the-cuff status report that will be buried in a day or two. The Let's Encrypt blog will be available and readable for all eternity (give or take a few years), so it requires an update that's more in-depth and vetted by all the people involved in solving this issue.

I'd imagine that once they fix this issue the first thing they'll do is to post a post-mortem on their blog.


I see. So this is more to be interpreted as a rushed preliminary report, forced upon them by appearing at the HN front page, to prevent uninformed rumor to appear and to spread. Or, did I miss something?

EDIT: Apparently the latter didn't work, as the very first response to their comment starts with "Do we get points for speculation based on these hints? ..."


It prevented _uninformed_ rumours :D

Almost all the speculation was about the exact scenario that Let's Encrypt have subsequently confirmed - you can trick some bulk hosting or CDN providers into letting you (a customer) answer a challenge for one of their other customers' systems.


Interesting, definitely looking forward to the details, and great to see Let's Encrypt react this quickly even though this might cause a small amount of disruption to users.

The latest ACME draft - mostly referred to as what will become ACME v2, which Let's Encrypt supports on the staging environment as of a few days ago - has a slightly revamped version of the TLS challenge (tls-sni-02). The TLS-SNI challenge works roughly like this: The validation (CA) server sends a "fake" SNI hostname, generated by the CA server, to the IP behind the domain the CA is trying to validate. Domain control is assumed to be given if the server responds with a certificate that contains the CA-generated hostname in its SAN extension (where certificates store the domains and other identifiers they're valid for).

One of the concerns people had with tls-sni-01 is that it made it possible for a TLS server to "solve" such a challenge by effectively echoing back the requested SNI value blindly. This was changed in tls-sni-02 - just taking the SNI value and putting it in the SAN field is no longer enough to pass such a challenge. Until now, there was no reason to believe anyone was running TLS servers that showed this behaviour, so the was no real rush to deprecate tls-sni-01 right away (as opposed to just rolling out ACME v2, which only has tls-sni-02). I wonder if someone's found a lot of TLS servers that turned out to do this, or if there's some other vulnerability in the design or implementation.


> TLS server to "solve" such a challenge by effectively echoing back the requested SNI value blindly

I would be mildly surprised if they pulled tls-sni because of this, since it is basically a client vulnerability, and both http-01 and dns-01 suffer from similar scenarios (e.g. I tricked a major email provider into serving /.well-known/acme-challenge/ for a domain I shouldn't have been able to, and a friend managed to get a wildcard for a ccTLD).


I think we're talking about slightly different scenarios. HTTP-01, for example, cannot be solved by just echoing back the file name the validation server requests because the client is supposed to return "token || '.' || base64(JWK_Thumbprint(accountKey))", but the file name is just "token".

dns-01 is not affected either because the requested label is always just "_acme-challenge.<FQDN>".


Any links to discussions on this topic? Sounds suspiciously like SNI proxies.


I'm not aware of any public discussion of the ongoing incident. This[1] is the thread on the ACME WG mailing list that lead to tls-sni-02 being introduced.

[1]: https://mailarchive.ietf.org/arch/msg/acme/s8gaZ6ev-iqoSQjOZ...


The best way of explaining tls-sni-01 is that its broken such that it allows the user to shoot themselves in the foot in a bad way.

When I wrote an implementation of LE with tls-sni-01 I was quite shocked that it worked that way when dns and HTTP were done "correctly".

I would be very surprised if this was related to their reason for pulling tls-sni-01.


Just wanted to jump on this for Caddy users [1]:

> Until further notice, when starting Caddy, we recommend using the '-disable-tls-sni-challenge' flag. This will require either HTTP or DNS challenges to be functional in order to renew your certificates.

By default, Caddy randomly chooses either the HTTP or TLS-SNI challenge to obtain and renew certificates. Your sites will likely not go offline even if you do not use this flag because Caddy tries up to 2 times per day, 30 days out, to renew an expiring certificate, as long as you keep it running. The chances that it would choose TLS-SNI sixty times in a row is extremely low. (We -- meaning myself and many people who contributed their feedback and code -- thought about these kinds of scenarios and Caddy is prepared to handle them.) However, since the TLS-SNI challenge will fail 100% of the time while it is disabled on the server end, might as well have the client not even try it.

Also note that all certificate maintenance routines are logged to the process log, so be sure you always run Caddy with the '-log' flag in production so you can see what's going on.

Since this outage may be temporary, check back later about re-enabling it. I recommend having more than one way to perform verifications when possible. (For Go programmers, the xenolf/lego library [2] supports all verification methods -- and is being upgraded for ACMEv2 currently; Sebastian is doing an awesome job! It also supports numerous DNS providers for easy setup of the DNS challenge.)

One more thing: wait for a full report from Let's Encrypt rather than speculating. Most questions can't be answered until there's more information. I don't think there's anything you need to do, no alarms to raise... just use another verification method until we get more info.

[1]: https://twitter.com/caddyserver/status/950926718004428800

[2]: https://github.com/xenolf/lego


If you use caddy, you'll have other issues to think about anyway.

The DNS RFCs (and later the URL RFCs) standardize hostnames to be a set of labels, separated by ".", a relative hostname ending in a label, an absolute hostname ending in a dot.

If you are in a corporate system, your DNS resolver will try to append its domain to any relative hostname.

So in the domain kuschku.de, if I do ping lithium, it will resolve to lithium.kuschku.de, and use that IP address.

This feature is used for example by kubernetes' DNS resolver.

This also means that if I enter the domain caddyserver.com, and try to ping it, it will first resolve to caddyserver.com.kuschku.de (or caddyserver.com.svc.default.cluster.local).

Now, to avoid these unnecessary lookups, you can simply try to access caddyserver.com. — a valid domain, valid URL, and per RFC it should point at the original, outside of any corporate domain. Except, it doesn't work.

Nginx, Apache, IIS, as well as the GCP, Azure and AWS ingress system support absolute URLs. (And all major websites in the Alexa Top 1 million that don't use caddy or traefik also support this. As does news.ycombinator.com.)

Caddy does not support serving these (instead mholt recommends you just copy-paste every config rule you have twice), and the bug was closed as WONTFIX.


> randomly

What? How does that make any sense? Why wouldnt you just try all undetstood (and not manually disabled) challenges until you get a successful result?

This isnt like picking an ephemeral port number.


In addition to what was already mentioned, it's also to be polite. Imagine if Caddy and other clients already did what you suggest: Let's Encrypt would suddenly be receiving ~50% more traffic during an already-stressful time. When we were first integrating ACME into Caddy, traffic spikes like this were a problem (for other reasons, long since fixed), which we now are very sensitive to avoid.


The default recommended client does support multi challenge, and from what the logs show, the ACME protocol even handles this without extra load - it returns the challenge modes it will accept.

I wasnt aware of this and you arent either apprently. The difference is, I'm not shipping a commercial web server package whose key feature is LetsEncrypt (and thus ACME) support.


Caddy's current implementation is more resilient than most other ACME clients in the event of an incident like the one Let's Encrypt is currently experiencing. Typical client implementations (such as certbot, but most others too) are configured to use exactly one challenge type at the time of setup and will continue to use this challenge type for renewal attempts, eventually failing to do so in time if the challenge type remains disabled for more than 30 days (or whatever schedule the client uses).

We can argue about the trade-offs of picking the challenge type at random versus, say, trying one and then falling back to another (both approaches have their pros and cons), but that's already more than what you'd typically get from ACME clients.


> Typical client implementations (such as certbot, but most others too) are configured to use exactly one challenge type at the time of setup

Certbot has a preferred challenges flag, which takes an ordered list. Setting it to tls-sni-01,http-01 means it will try to use the TLS challenge first, and fallback to the HTTP challenge. I've tried this myself in dry-run mode (because my existing certs all still have valid authorizations against the TLS-SNI challenge) and it will drop back to http-01.


It's definitely an option, but the question is whether someone who has set up certbot at some point in the past will benefit from it without manual intervention (as is the case for Caddy). If you're following the standard setup, using the instructions on certbot's website, you'll select exactly one plugin (with each plugin mapping to one challenge type, unless you do something fancy), and if that challenge type goes away, manual intervention is required.


> but the question is whether someone who has set up certbot at some point in the past will benefit from it without manual intervention (as is the case for Caddy)

This whole thread is responding to the author of Caddy saying specifically that people need to intervene or Caddy won't necessarily renew properly.

I specifically said in the above comment, that doing a renew in dry-run mode (because a --force-renewal re-uses the existing challenge authorisation my certs have against TLS-SNI) correctly falls back to doing the http challenge. The only "manual" part was that I ran the command - if the certs are due for renewal and the regular cron/systemd timer calls it, the TLS-SNI authorisation will either still be valid, and it will renew as expected, or it will need to re-authorise via challenge, and it will fall back to the http challenge as I just demonstrated, and renew as expected.

> and if that challenge type goes away, manual intervention is required

If all the methods you chose (e.g. TLS-SNI,HTTP) went away, sure, intervention would be required. But if they all went away, no client would work.

> If you're following the standard setup, using the instructions on certbot's website

If you follow the standard setup for Caddy you have a 50/50 chance of getting a cert today. Either you accept that people can deviate from the most basic "this is how you can run it as an example" or we stick to the defaults both ways. Claiming "oh but the standard setup.." when this issue requires caddy users to change their freaking init/sevice files is disingenuous.

For reference, I use HAProxy + Certbot and a tiny shell script - and it hasn't been susceptible to any of crazy issues Caddy has had just in the last 12 months, due to weird intentional behaviour of the program.


> This whole thread is responding to the author of Caddy saying specifically that people need to intervene or Caddy won't necessarily renew properly.

No, that's not what Matt said:

> Your sites will likely not go offline even if you do not use this flag because Caddy tries up to 2 times per day, 30 days out, to renew an expiring certificate, as long as you keep it running.

> the TLS-SNI authorisation will either still be valid, and it will renew as expected, or it will need to re-authorise via challenge, and it will fall back to the http challenge as I just demonstrated

This is not an accurate description of how you'd be affected with typical certbot deployments. Two examples:

Debian Stretch with apache2. Certbot's website recommends the following command:

    sudo certbot --apache
Currently, this fails with the following error message:

    Client with the currently selected authenticator does not support any combination of challenges that will satisfy the CA.
Same OS, nginx. Certbot recommends this command:

    sudo certbot --nginx
This causes the same error.

Renewal would fail with the same error message; You can find examples on Let's Encrypt's Community Forum if you don't believe me[1].

Is it possible to set up renewal in a way that would cause certbot to fall back to a working challenge? Sure. But it's far from common or even the default.

> If you follow the standard setup for Caddy you have a 50/50 chance of getting a cert today. Either you accept that people can deviate from the most basic "this is how you can run it as an example" or we stick to the defaults both ways. Claiming "oh but the standard setup.." when this issue requires caddy users to change their freaking init/sevice files is disingenuous.

Again, you're misinterpreting what Caddy does. Renewal will only fail if you're unlucky 60 times in a row, as opposed to 100% of the time for most clients in common scenarios.

[1]: https://community.letsencrypt.org/t/client-with-the-currentl...


> Is it possible to set up renewal in a way that would cause certbot to fall back to a working challenge? Sure. But it's far from common or even the default.

Is it possible to make Caddy deterministically use the capabilities of the ACME protocol to try the intersection of challenges you want to use, and the ACME server supports? No. It's not.

The rest is fucking moot. My point in every one of these threads is that Caddy is presented as "it just works" until it fucking doesn't and you have the weirdest shit happening.

With regular tools, it's expected that you have a clue what the fuck you're doing and configure it to meet your requirements.

Inexperienced people may use certbot and end up needing to do something. Sure. But I don't expect inexperienced people to be managing production environment servers. Do you?

You claim Caddy's defaults are better - thats an opinion, but the problem is it's not a "default" it's forced fucking behaviour. You simply can't tell it to operate sanely like you can with certbot + haproxy.


> Is it possible to make Caddy deterministically use the capabilities of the ACME protocol to try the intersection of challenges you want to use, and the ACME server supports? No. It's not.

Very, very few ACME clients have a renewal mechanism that supports automatic fallback to a different challenge type. I know a couple and the only one I can think of is certbot, and even there that's only true if you use the standalone or manual plugin (the others don't support "--preferred-challenges").

However, if your question is "Can I make Caddy use the exact challenge type I want it to", then the answer is yes, as you can guess from the "-disable-tls-sni-challenge" flag. This is what most clients will let you do. No one is saying that those clients can't renew with manual intervention.

> The rest is fucking moot. My point in every one of these threads is that Caddy is presented as "it just works" until it fucking doesn't and you have the weirdest shit happening.

You're changing goal posts.

> With regular tools, it's expected that you have a clue what the fuck you're doing and configure it to meet your requirements.

Part of knowing what you're doing as a developer is knowing that you either have sane defaults or stuff is going to break for a lot of people.

> Inexperienced people may use certbot and end up needing to do something. Sure. But I don't expect inexperienced people to be managing production environment servers. Do you?

Are you implying anyone using the default instructions for certbot is inexperienced?

> You claim Caddy's defaults are better - thats an opinion, but the problem is it's not a "default" it's forced fucking behaviour. You simply can't tell it to operate sanely like you can with certbot + haproxy.

If you think you can do better with your own approach, by all means do that. Most people are not PKI or ACME experts and won't have the necessary knowledge to make all the right calls, so they're better off with a "managed" solution like Caddy.

To be clear: This is an edge case and I'm in no way saying the certbot team is incompetent because they did not foresee this scenario and handle it as part of the default setup. It's just silly to go on about how terrible Caddy is when they happen to have a mitigation in place.


> Very, very few ACME clients have a renewal mechanism that supports automatic fallback to a different challenge type. I know a couple and the only one I can think of is certbot, and even there that's only true if you use the standalone or manual plugin (the others don't support "--preferred-challenges").

And that's why I use and recommend certbot in standalone mode.

> You're changing goal posts.

No, I'm trying to explain my overall point.

> Part of knowing what you're doing as a developer is knowing that you either have sane defaults or stuff is going to break for a lot of people.

Right, but in this case Caddy doesn't have "defaults" that experienced dev's can change, it has "this is just how I fucking work".

> Are you implying anyone using the default instructions for certbot is inexperienced?

I'm suggesting that anyone whose sole research into how to use ACME generated certificates in production is "what's the quick start for certbot say?" is inexperienced.

> so they're better off with a "managed" solution like Caddy.

Managed, except that they have to keep intervening or updating to a new build to restore functionality because it manages things like a drunk on a snow mobile.

> It's just silly to go on about how terrible Caddy is when they happen to have a mitigation in place.

Caddy requires a 'mitigation'. Certbot can and is deployed with a configuration that doesn't require any mitigation. It's just silly to keep claiming Caddy's approach is "the best" when you require a mitigation.


So an attack against one challenge doesn't deterministically succeed.


The much greater likely scenario is an error in comms/le (such as this issues today) means Caddy will un-deterministically fail


That is a trade-off for false negatives rather than false positives. It is not an unreasonable choice when dealing with certificates.


The default client recommended by LE (certbot) supports multi-challenge registration/renewal. I'm pretty sure I trust their judgement over that of the Caddy project, given their (Caddy) history of weird decisions that backfire and cause user issues.


Cool. Now's a good time to remind everyone that Caddy became popular as a fully open source web server and now charges for commercial use of its binary. https://caddyserver.com/products/licenses However, there's little technically in place to prevent people from using the free version for commercial products. The result of this is that there's probably a bunch of people who downloaded Caddy when it was free for commercial use, who probably just upgraded it, and are now violating the license. A sticky situation. Better to just use nginx or apache (actually apache compares favorably to nginx when it's used properly) until another thing like caddy comes along that's free.

Ironically it's quite a similar tax to $10 SSL certificates, only it's a higher upfront cost, of $25. If people were willing to pay $25 for every HTTP server, HTTPS everywhere could have started growing quickly without LetsEncrypt.


> Ironically it's quite a similar tax to $10 SSL certificates, only it's a higher upfront cost, of $25

Except that was $10 a year and you could use the cert on as many servers as you want. This is $25 a month per instance.


Or perhaps now is not a particularly good time to introduce some random personal vendetta-point into a discussion of a wholly unrelated subject.


I don't get this. Is caddy so hard to build from source that you'd pay 10$ for a binary? The source code is ASL licensed so allows commercial use.


It's probably not very hard, however for an organisation or a team with several members, paying $X can be worth it - they don't have to spend effort on building and maintaining it (allows them to focus on other tasks), just deploy the binaries and contact Caddy when they need some support.


I feel like some misinformation is spread here since what you are really only paying for is building from source. The whole project is open source and you can even use commercially the source. You just have to build it yourself.

Does it not make sense that these guys would like to earn some money from this effort? That way they are also more incentivised to continue with the effort.

What kind of company finds it too costly to build Caddy themselves AND can't affort paying the license?


Though not relevant to the bulk of your comment, could you elaborate or provide some relevant links on why apache compares favorably to nginx when used properly?


Apache, like nginx, is powerful and well-maintained. It has a longer history than nginx, though, and has to support some features that probably wouldn't have been implemented if the project was created more recently. One such feature is .htaccess, which makes it so an app's directory, belonging to the app's user, can configure the web server. This is a potential attack vector if the app's directory is writable (not an issue for configurations in /etc which are only writable by root). This feature can be turned off by setting AllowOverride None in /etc/apache2 (/etc/httpd on CentOS). There are other defaults that are better in nginx than apache as well. Here's a post that has the AllowOverride None suggestion and two others: https://www.jeffgeerling.com/blog/3-small-tweaks-make-apache...


Off the top of my head: use event MPM, disable any modules you don't need, disable .htaccess config, etc.


nice one caddymeister


We've now posted more details about the issue and our plans.

https://community.letsencrypt.org/t/2018-01-09-issue-with-tl...


A suggestion: if the solution takes too much to develop (long enough that certificates risk expiring before being renewed), could you at least allow renewals of existing LE certificates if they are from the same LE account (perhaps only if they are still at the same IP address)?


For certbot-nginx plugin users, I've had success with --webroot authentication:

For nginx: location ^~ /.well-known/acme-challenge/ { default_type "text/plain"; root /home/www/letsencrypt; }

Then, for SELinux: chcon -Rt httpd_sys_content_t /home/www

Reload nginx and 'certbot renew --webroot -w /home/www/letsencrypt' has a fighting chance.


I've been doing this ever since they released certbot, but I can't really vouch for the stability of this method. Changes to your virtual host config can break this very easily, especially if rewrites to https are involved.


Background history that might be helpful here:

The http-01 proof of control as originally defined allowed you to use HTTPS instead for the URL. This was never enabled in production because many bulk web hosts had a configuration where if anyone (say Let's Encrypt) asks for https://not-ssl-enabled.customer1.example/blah the bulk host's server will send over the answer for https://aaaa.ssl-enabled.customer2.example/blah because it just picked the alphabetically first SSL enabled name as default instead of giving an error for no match.

The Ten Blessed Methods don't say not to do this, but Let's Encrypt did not want a service which can be trivially exploited on common bulk hosts so they disabled it as they've now done for tls-sni-01.

I suspect a researcher has found a configuration that similarly causes an attacker to be able to pass tls-sni-01 for names using some shared infrastructure such as the same CDN or same web hosting as the attacker.

[Let's Encrypt posted a follow-up to their Discuss outlining exactly the above scenario but without the historical digression a few minutes after I wrote this]


To the badass people who get their LE certificates with Apache mod_md: you have chosen well!

mod_md checks the challenge list from the ACME server and choses one that it supports. So, if your server listens on port 80, everything will continue to work. You do not need to change anything.

If your server is only reachable via port 443, there seems currently no way you can sign up with Let's Encrypt. You will need to open port 80 for certificate renewal/signup to work. Some Advice:

* port 80 needs to be available only during a renewal/signup. Once you have your certificates, you may close it again. You need to mind renewal periods then and should check your server logs more frequently.

* you can safely redirect your port 80 to 443 with the 'MDRequireHttps' configuration directive. This redirection takes automatically care that challenges from an ACME server are still being answered while all other requests are redirected.

In case you find issues or have additional questions, visit the github repository at https://github.com/icing/mod_md and file an issue.


The shutdown of tls-sni-01 doesn't affect the http-01 challenge, so the workaround is to switch your code over to the latter if this is affecting you.

We're using Greenlock (https://github.com/Daplie/node-greenlock, previously node-letsencrypt via npm) for our app (https://Clearalias.com) and this library supports switching challenges fairly easily. It's even easier if you're just using an Express server, since you can use a Node library like Greenlock-express (https://github.com/Daplie/greenlock-express, previously known as letsencrypt-express), which makes it dead simple to use http-01.

Best of luck to anyone who's scrambling to fix their cert layer right now. It seems like there's a chance the TLS-SNI challenge stays disabled, so it's best not to hold your breath and instead quickly switch to a different challenge mode if you get a chance.


I'm actually glad they openly admit there's an issue when there's an issue. Waiting for the full report.


That's a pretty bad blow... A lot of Go software relies on the TLS-SNI-01 challenge, I believe. Will TLS-SNI-02 be a viable replacement? What should be done about servers currently using TLS-SNI-01?


In particular, autocert (https://godoc.org/golang.org/x/crypto/acme/autocert) relies on tls-sni and does not support the other DV methods.

Hopefully it is not fatal flaw in the design of the method, otherwise a lot of software will need to be re-built.


I'm not an expert, but the autocert package appears to support both tls-sni-01 and tls-sni-02.

The title of this HN post says "tls-sni-01" is disabled, but on the linked page it says "tls-sni challenge disabled".

So I'm confused. Is tls-sni as a whole disabled? Or just tls-sni-01? If the former, then I don't think package autocert will continue to function. If the latter, then autocert users should be okay.


tls-sni-02 is not supported on the production ACME server. It is part of the latest ACME draft (ACME v2), which recently got deployed on Let's Encrypt's staging server, but the certificates signed in that environment aren't publicly trusted.


Thanks. So the upshot is that package x/crypto/acme/autocert can no longer obtain production certs. I have bumped this issue, volunteering to do the work to add http-01 support to package autocert: https://github.com/golang/go/issues/21890


And tls-sni-02 does not fix the problem.


For LetsEncrypt, the acme-v01 API is the only production endpoint as of this time [1], which only supports the -01 version of tls-sni.

[1] https://letsencrypt.status.io/


All tls-sni has been disabled. The one production API, and two staging APIs, are affected.


Yeah, I'm using autocert for a non-HTTP TLS server. It seems like the best option, but now I may have to move over to an HTTP or DNS challenge. Not ideal.


I'm pretty annoyed, because when I first started using Let's Encrypt I was hamstrung by their restrictions on the various "automated" methods of deploying and renewing certs. I went with tls-sni because it was the least-bad method for my use case.

I listened with an open mind to their justification of the extremely short 90 day max cert length period in the "automate all the things world". The sysadmin in me was skeptical, even though I have also fought the "crap, we haven't renewed this cert in years and nobody knows how to do it anymore!" emergencies over the years, and understand how renewing frequently could, at least in theory, replace that problem with a lesser problem.

But, turns out my skepticism was justified. Thankfully I'm not using it in production yet, but too often these new projects and paradigms suffer too much from "what could possibly go wrong?" thinking, and you have to follow all the right forums and keep all your configurations in mind to know when a problem like this will bite your infrastructure. I only stumbled across this randomly while checking my Let's Encrypt community account for something else.

Now, granted, this just happened yesterday, but I missed the HN thread on it when it happened, which means I could well have missed it until it's too late and a bunch of certs expire. Then it's a scramble to fix your certs AND fix your automation all at once.


Looks like Ted Unangst was right?

https://archive.is/VZrOS


Nice, acmi4j v2 (java client) already disabled this - https://github.com/shred/acme4j/blob/master/README.md#known-...


Last year I made a letsencrypt client[1] that only supports DNS mode of validation.

It currently only supports cloudflare and AuroraDNS, but it is very easy to use any other DNS provider[2]

1. https://github.com/komuw/sewer

2. https://github.com/komuw/sewer#how-to-use-a-customunsupporte...


Unfortunately, this means that Traefik's default Let's Encrypt integration (without setting a DNS provider) does not work anymore. Although the logs now say "could not find solver for: http-01", they actually use tls-sni-01.


Are there any servers that automatically generate self signed certs for any SNI they receive? I think servers like that (if they exist) would also be vulnerable.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: