
Let's Encrypt tls-sni-01 disabled due to credible vulnerability report - regecks
https://letsencrypt.status.io/pages/incident/55957a99e800baa4470002da/5a55777ed9a9c1024c00b241
======
jaas
Josh from Let's Encrypt here. I'm not able to give many more details yet, but
here's what I can add now:

1) This isn't a relatively simple issue like a bug in our CA code would be.
It's an interaction between the protocol and provider services.

2) Disabling TLS-SNI is a complete mitigation for us, meaning it's no longer
possible to get an illegitimate certificate from Let's Encrypt by exploiting
this issue.

3) We have not yet reached a conclusion as to whether or not the TLS-SNI
challenge will need to remain disabled permanently.

4) At this point we have no reason to believe that the vulnerability has been
exploited by anyone other than the researcher who figured it out and reported
it to us.

Our focus now is on sharing information with relevant parties and looking for
less drastic mitigations that might allow us to restore the TLS-SNI challenge
option to people who rely on it.

We will, of course, share more information as soon as we can. That might be as
soon as the next few hours, things are moving quickly.

~~~
terom
Do we get points for speculation based on these hints?

My guess would be that some major public CDN (Cloudflare etc) will let the
attacker deploy their TLS-SNI challenge certs, and thus validate for other
victim domains using the same CDN service.

EDIT: Main reasoning being that I can't think why it wouldn't work - apart
from the TLS-SNI challenge certs somehow being considered invalid by the CDN
provider and refusing to deploy them, but I find it hard to trust that
happening with 100% certainty.

~~~
jgrahamc
I haven't seen anything on our internal security mailing lists about this. If
it does somehow involve Cloudflare I'd be happy to receive a report directly
via HackerOne
([https://hackerone.com/cloudflare](https://hackerone.com/cloudflare)). Happy
to assist.

~~~
terom
Sorry, I just chose cloudflare as a random example when speculating, I don't
have any information about what specific providers this would affect, and I'm
not implying that cloudflare would be affected.

Seems like my guess was right, though!

~~~
jgrahamc
No, need to apologize.

When you say "Seems like my guess was right, though!" are you saying that
there is some way that Cloudflare is involved?

EDIT: I see ([https://community.letsencrypt.org/t/2018-01-09-issue-with-
tl...](https://community.letsencrypt.org/t/2018-01-09-issue-with-tls-
sni-01-and-shared-hosting-infrastructure/49996)) now. I don't think LE has
been in contact with us so don't think we're affected but happy to be told
otherwise.

~~~
terom
I would indeed like to explicitly apologize, because I regret mentioning
"cloudflare etc" as an example. That's exactly the kind of bad speculation
that leads to harmful rumors based on misunderstandings.

> When you say "Seems like my guess was right, though!" are you saying that
> there is some way that Cloudflare is involved?

No. It seems like I was right about the general nature of the vulnerability.
It remains to be seen what providers are affected.

That being said, at this point, I'd personally be happier seeing a list of
providers NOT affected, rather than a list of affected providers... It's
probably also in "major public CDN provider's" interest to demonstrate that
their user cert validation processes would have prevented this attack, and
their customers were not at risk before LE pulled the plug...

------
pfg
Interesting, definitely looking forward to the details, and great to see Let's
Encrypt react this quickly even though this might cause a small amount of
disruption to users.

The latest ACME draft - mostly referred to as what will become ACME v2, which
Let's Encrypt supports on the staging environment as of a few days ago - has a
slightly revamped version of the TLS challenge (tls-sni-02). The TLS-SNI
challenge works roughly like this: The validation (CA) server sends a "fake"
SNI hostname, generated by the CA server, to the IP behind the domain the CA
is trying to validate. Domain control is assumed to be given if the server
responds with a certificate that contains the CA-generated hostname in its SAN
extension (where certificates store the domains and other identifiers they're
valid for).

One of the concerns people had with tls-sni-01 is that it made it possible for
a TLS server to "solve" such a challenge by effectively echoing back the
requested SNI value blindly. This was changed in tls-sni-02 - just taking the
SNI value and putting it in the SAN field is no longer enough to pass such a
challenge. Until now, there was no reason to believe anyone was running TLS
servers that showed this behaviour, so the was no real rush to deprecate tls-
sni-01 right away (as opposed to just rolling out ACME v2, which only has tls-
sni-02). I wonder if someone's found a lot of TLS servers that turned out to
do this, or if there's some other vulnerability in the design or
implementation.

~~~
regecks
> TLS server to "solve" such a challenge by effectively echoing back the
> requested SNI value blindly

I would be mildly surprised if they pulled tls-sni because of this, since it
is basically a client vulnerability, and both http-01 and dns-01 suffer from
similar scenarios (e.g. I tricked a major email provider into serving /.well-
known/acme-challenge/ for a domain I shouldn't have been able to, and a friend
managed to get a wildcard for a ccTLD).

~~~
pfg
I think we're talking about slightly different scenarios. HTTP-01, for
example, cannot be solved by just echoing back the file name the validation
server requests because the client is supposed to return "token || '.' ||
base64(JWK_Thumbprint(accountKey))", but the file name is just "token".

dns-01 is not affected either because the requested label is always just
"_acme-challenge.<FQDN>".

~~~
regecks
Any links to discussions on this topic? Sounds suspiciously like SNI proxies.

~~~
pfg
I'm not aware of any public discussion of the ongoing incident. This[1] is the
thread on the ACME WG mailing list that lead to tls-sni-02 being introduced.

[1]: [https://mailarchive.ietf.org/arch/msg/acme/s8gaZ6ev-
iqoSQjOZ...](https://mailarchive.ietf.org/arch/msg/acme/s8gaZ6ev-
iqoSQjOZWUpZ41mA0M)

------
mholt
Just wanted to jump on this for Caddy users [1]:

> _Until further notice, when starting Caddy, we recommend using the
> '-disable-tls-sni-challenge' flag. This will require either HTTP or DNS
> challenges to be functional in order to renew your certificates._

By default, Caddy randomly chooses either the HTTP or TLS-SNI challenge to
obtain and renew certificates. Your sites will likely not go offline even if
you do not use this flag because Caddy tries up to 2 times per day, 30 days
out, to renew an expiring certificate, as long as you keep it running. The
chances that it would choose TLS-SNI sixty times in a row is extremely low.
(We -- meaning myself and many people who contributed their feedback and code
-- thought about these kinds of scenarios and Caddy is prepared to handle
them.) However, since the TLS-SNI challenge will fail 100% of the time while
it is disabled on the server end, might as well have the client not even try
it.

Also note that all certificate maintenance routines are logged to the process
log, so be sure you always run Caddy with the '-log' flag in production so you
can see what's going on.

Since this outage may be temporary, check back later about re-enabling it. I
recommend having more than one way to perform verifications when possible.
(For Go programmers, the xenolf/lego library [2] supports all verification
methods -- and is being upgraded for ACMEv2 currently; Sebastian is doing an
awesome job! It also supports numerous DNS providers for easy setup of the DNS
challenge.)

One more thing: wait for a full report from Let's Encrypt rather than
speculating. Most questions can't be answered until there's more information.
I don't think there's anything you need to do, no alarms to raise... just use
another verification method until we get more info.

[1]:
[https://twitter.com/caddyserver/status/950926718004428800](https://twitter.com/caddyserver/status/950926718004428800)

[2]: [https://github.com/xenolf/lego](https://github.com/xenolf/lego)

~~~
benatkin
Cool. Now's a good time to remind everyone that Caddy became popular as a
fully open source web server and now charges for commercial use of its binary.
[https://caddyserver.com/products/licenses](https://caddyserver.com/products/licenses)
However, there's little technically in place to prevent people from using the
free version for commercial products. The result of this is that there's
probably a bunch of people who downloaded Caddy when it was free for
commercial use, who probably just upgraded it, and are now violating the
license. A sticky situation. Better to just use nginx or apache (actually
apache compares favorably to nginx when it's used properly) until another
thing like caddy comes along that's free.

Ironically it's quite a similar tax to $10 SSL certificates, only it's a
higher upfront cost, of $25. If people were willing to pay $25 for every HTTP
server, HTTPS everywhere could have started growing quickly without
LetsEncrypt.

~~~
hyperpower
Though not relevant to the bulk of your comment, could you elaborate or
provide some relevant links on why apache compares favorably to nginx when
used properly?

~~~
benatkin
Apache, like nginx, is powerful and well-maintained. It has a longer history
than nginx, though, and has to support some features that probably wouldn't
have been implemented if the project was created more recently. One such
feature is .htaccess, which makes it so an app's directory, belonging to the
app's user, can configure the web server. This is a potential attack vector if
the app's directory is writable (not an issue for configurations in /etc which
are only writable by root). This feature can be turned off by setting
AllowOverride None in /etc/apache2 (/etc/httpd on CentOS). There are other
defaults that are better in nginx than apache as well. Here's a post that has
the AllowOverride None suggestion and two others:
[https://www.jeffgeerling.com/blog/3-small-tweaks-make-
apache...](https://www.jeffgeerling.com/blog/3-small-tweaks-make-apache-fly)

------
jaas
We've now posted more details about the issue and our plans.

[https://community.letsencrypt.org/t/2018-01-09-issue-with-
tl...](https://community.letsencrypt.org/t/2018-01-09-issue-with-tls-
sni-01-and-shared-hosting-infrastructure/49996)

~~~
cesarb
A suggestion: if the solution takes too much to develop (long enough that
certificates risk expiring before being renewed), could you at least allow
renewals of existing LE certificates if they are from the same LE account
(perhaps only if they are still at the same IP address)?

------
sk5t
For certbot-nginx plugin users, I've had success with --webroot
authentication:

For nginx: location ^~ /.well-known/acme-challenge/ { default_type
"text/plain"; root /home/www/letsencrypt; }

Then, for SELinux: chcon -Rt httpd_sys_content_t /home/www

Reload nginx and 'certbot renew --webroot -w /home/www/letsencrypt' has a
fighting chance.

~~~
zingmars
I've been doing this ever since they released certbot, but I can't really
vouch for the stability of this method. Changes to your virtual host config
can break this very easily, especially if rewrites to https are involved.

------
tialaramex
Background history that might be helpful here:

The http-01 proof of control as originally defined allowed you to use HTTPS
instead for the URL. This was never enabled in production because many bulk
web hosts had a configuration where if anyone (say Let's Encrypt) asks for
[https://not-ssl-enabled.customer1.example/blah](https://not-ssl-
enabled.customer1.example/blah) the bulk host's server will send over the
answer for [https://aaaa.ssl-enabled.customer2.example/blah](https://aaaa.ssl-
enabled.customer2.example/blah) because it just picked the alphabetically
first SSL enabled name as default instead of giving an error for no match.

The Ten Blessed Methods don't say not to do this, but Let's Encrypt did not
want a service which can be trivially exploited on common bulk hosts so they
disabled it as they've now done for tls-sni-01.

I suspect a researcher has found a configuration that similarly causes an
attacker to be able to pass tls-sni-01 for names using some shared
infrastructure such as the same CDN or same web hosting as the attacker.

[Let's Encrypt posted a follow-up to their Discuss outlining exactly the above
scenario but without the historical digression a few minutes after I wrote
this]

------
icing
To the badass people who get their LE certificates with Apache mod_md: you
have chosen well!

mod_md checks the challenge list from the ACME server and choses one that it
supports. So, if your server listens on port 80, everything will continue to
work. You do not need to change anything.

If your server is only reachable via port 443, there seems currently no way
you can sign up with Let's Encrypt. You will need to open port 80 for
certificate renewal/signup to work. Some Advice:

* port 80 needs to be available only during a renewal/signup. Once you have your certificates, you may close it again. You need to mind renewal periods then and should check your server logs more frequently.

* you can safely redirect your port 80 to 443 with the 'MDRequireHttps' configuration directive. This redirection takes automatically care that challenges from an ACME server are still being answered while all other requests are redirected.

In case you find issues or have additional questions, visit the github
repository at
[https://github.com/icing/mod_md](https://github.com/icing/mod_md) and file an
issue.

------
lunaru
The shutdown of tls-sni-01 doesn't affect the http-01 challenge, so the
workaround is to switch your code over to the latter if this is affecting you.

We're using Greenlock ([https://github.com/Daplie/node-
greenlock](https://github.com/Daplie/node-greenlock), previously node-
letsencrypt via npm) for our app
([https://Clearalias.com](https://Clearalias.com)) and this library supports
switching challenges fairly easily. It's even easier if you're just using an
Express server, since you can use a Node library like Greenlock-express
([https://github.com/Daplie/greenlock-
express](https://github.com/Daplie/greenlock-express), previously known as
letsencrypt-express), which makes it dead simple to use http-01.

Best of luck to anyone who's scrambling to fix their cert layer right now. It
seems like there's a chance the TLS-SNI challenge stays disabled, so it's best
not to hold your breath and instead quickly switch to a different challenge
mode if you get a chance.

------
lawl
I'm actually glad they openly admit there's an issue when there's an issue.
Waiting for the full report.

------
jchw
That's a pretty bad blow... A lot of Go software relies on the TLS-SNI-01
challenge, I believe. Will TLS-SNI-02 be a viable replacement? What should be
done about servers currently using TLS-SNI-01?

~~~
regecks
In particular, autocert
([https://godoc.org/golang.org/x/crypto/acme/autocert](https://godoc.org/golang.org/x/crypto/acme/autocert))
relies on tls-sni and does not support the other DV methods.

Hopefully it is not fatal flaw in the design of the method, otherwise a lot of
software will need to be re-built.

~~~
enneff
I'm not an expert, but the autocert package appears to support both tls-sni-01
and tls-sni-02.

The title of this HN post says "tls-sni-01" is disabled, but on the linked
page it says "tls-sni challenge disabled".

So I'm confused. Is tls-sni as a whole disabled? Or just tls-sni-01? If the
former, then I don't think package autocert will continue to function. If the
latter, then autocert users should be okay.

~~~
pfg
tls-sni-02 is not supported on the production ACME server. It is part of the
latest ACME draft (ACME v2), which recently got deployed on Let's Encrypt's
staging server, but the certificates signed in that environment aren't
publicly trusted.

~~~
enneff
Thanks. So the upshot is that package x/crypto/acme/autocert can no longer
obtain production certs. I have bumped this issue, volunteering to do the work
to add http-01 support to package autocert:
[https://github.com/golang/go/issues/21890](https://github.com/golang/go/issues/21890)

------
ams6110
Looks like Ted Unangst was right?

[https://archive.is/VZrOS](https://archive.is/VZrOS)

------
rconti
I'm pretty annoyed, because when I first started using Let's Encrypt I was
hamstrung by their restrictions on the various "automated" methods of
deploying and renewing certs. I went with tls-sni because it was the least-bad
method for my use case.

I listened with an open mind to their justification of the extremely short 90
day max cert length period in the "automate all the things world". The
sysadmin in me was skeptical, even though I have also fought the "crap, we
haven't renewed this cert in years and nobody knows how to do it anymore!"
emergencies over the years, and understand how renewing frequently could, at
least in theory, replace that problem with a lesser problem.

But, turns out my skepticism was justified. Thankfully I'm not using it in
production yet, but too often these new projects and paradigms suffer too much
from "what could possibly go wrong?" thinking, and you have to follow all the
right forums and keep all your configurations in mind to know when a problem
like this will bite your infrastructure. I only stumbled across this randomly
while checking my Let's Encrypt community account for something else.

Now, granted, this just happened yesterday, but I missed the HN thread on it
when it happened, which means I could well have missed it until it's too late
and a bunch of certs expire. Then it's a scramble to fix your certs AND fix
your automation all at once.

------
SureshG
Nice, acmi4j v2 (java client) already disabled this -
[https://github.com/shred/acme4j/blob/master/README.md#known-...](https://github.com/shred/acme4j/blob/master/README.md#known-
issues)

------
komuW
Last year I made a letsencrypt client[1] that only supports DNS mode of
validation.

It currently only supports cloudflare and AuroraDNS, but it is very easy to
use any other DNS provider[2]

1\. [https://github.com/komuw/sewer](https://github.com/komuw/sewer)

2\. [https://github.com/komuw/sewer#how-to-use-a-
customunsupporte...](https://github.com/komuw/sewer#how-to-use-a-
customunsupported-dns-provider)

------
jcassee
Unfortunately, this means that Traefik's default Let's Encrypt integration
(without setting a DNS provider) does not work anymore. Although the logs now
say "could not find solver for: http-01", they actually use tls-sni-01.

------
Buge
Are there any servers that automatically generate self signed certs for any
SNI they receive? I think servers like that (if they exist) would also be
vulnerable.

