Preparing to Issue 200M Certificates in 24 Hours

jiggawatts · on Feb 11, 2021

Something I would like the community do is force vendors such as Azure and AWS to support the ACME protocol to provide the option of free certificates to users. HTTPS isn't an add-on any more, and it shouldn't come with a yearly tax.

Unfortunately, a yearly tax has a margin, so the vendors have been dragging their feet. DigiCert and GoDaddy have bribed the Azure team with kickbacks and wholesale discounts, so Azure is now refusing to implement anything else. Why would they? They get a margin on every issued certificate!

So please: If you have a presence in AWS or Azure, call your sales representative and pressure them. Submit requests through the feedback portal, and vote up existing requests.

Raise this regularly, or it will never happen by itself. The incentives just aren't there!

Pressure must be applied by a large fraction of the customers.

You are that customer.

Apply pressure.

mrud · on Feb 11, 2021

AWS provides already free certificates. It doesn’t use ACME but also AWS ACM supports auto renewal if the host name has been dns validated.

Initially these certificates were only supported for load balancer and cloud front but with the Secure Enclave you can use the certificate also on an ec2 instance.

a012 · on Feb 11, 2021

While on GCP, you can self-host cert-manager and its certificates can be used at Load Balancers and VMs.

evntdrvn · on Feb 11, 2021

FWIW, Azure has started rolling out free certificates for App Service - they have some kinda annoying limitations, but it's a start and some of them are planned to be lifted: https://docs.microsoft.com/en-us/azure/app-service/configure...

jiggawatts · on Feb 11, 2021

By "kinda annoying" you mean ludicrous and show-stopping, right?

    Does not support wildcard certificates.
    Does not support naked domains.
    Is not exportable.
    Is not supported on App Service Environment (ASE)
    Does not support A records. For example, automatic renewal doesn't work with A records.

There's probably at least several other restrictions not listed there. I recently tried to buy a certificate for App Service for a government agency, but it was refused because of a design flaw in GoDaddy's validation code.

Not to mention that this doesn't cover other very common scenarios, such as Application Gateway, VM scale sets, API Gateway, or... anything else.

Each Azure team seems to be operating under the model that HTTPS with a custom domain is some sort of bolt-on that's unique and special to their service. The verification and enrolment is distinct for every service, with gaps and weird and wonderful limitations.

It's like they've been told, recently, that HTTPS is something they should do, so they've all gone and done "something" to tick that checkbox. Some free. Some not free. Some with restrictions. Some without. Some automatically renewing. Some not. Some with ECC, some without. HSTS on some, not others. Etc...

It's a shit show.

evntdrvn · on Feb 11, 2021

it's Azure, what can I say ¯\_(ツ)_/¯

est31 · on Feb 11, 2021

If you wanted to save money, you wouldn't use Azure or AWS anyways.

jiggawatts · on Feb 11, 2021

Which reputable vendors would you recommend that have a full PaaS offering for ASP.NET and have a significant presence in Australia?

As far as I know Azure's App Service has no direct competition other than AWS and GCP.

wkcmp · on Feb 11, 2021

What would you use then?

HotVector · on Feb 11, 2021

Linode, DigitalOcean, or a $200 used server from EBay. Both Linode and DO also have really good Kubernetes services. AWS and alike force users to stick to their ecosystem of products, and come with all sorts of weird pricing models. And if you really need some sort of serverless features or BaaS, you can just self-host that on a VPS.

ehwhyreally · on Feb 11, 2021

Binary lane for Australia is fantastic.

ggm · on Feb 11, 2021

Echo this. great pricepoint, I'm on a BSD box and their console/backend has been flawless. And, its full dualstack v4/v6

tialaramex · on Feb 10, 2021

> Normally ACME clients renew their certificates when one third of their lifetime is remaining, and don’t contact our servers otherwise.

At least newer versions of Certbot, and I believe some other ACME clients, will also try to discern if the certificate is Revoked when considering it. So if you have a daily cron job running Certbot, and your certificate with 60 days left on it has been revoked since yesterday, Certbot ought to notice that and attempt to replace it as if it had expired.

If you are doing OCSP stapling, and if your stapling implementation is good (sadly last I looked neither Apache nor nginx were) this ought to be enough to make a mass revocation event survivable for you. Your server will notice the latest OCSP answers now say it's revoked and continue to hand out the last GOOD answer it knew, some time later before that OCSP answer expires your Certbot should replace the certificate with a good one. Seamless.

The new ACME feature is welcome, not least because there are a tremendous number of those bad Apache servers out there, but (unless I misunderstand) I think it's already possible to survive this sort of catastrophe without service interruption.

phasmantistes · on Feb 10, 2021

The new ACME feature isn't just about surviving the revocation event itself. Suppose that the new API didn't exist, but every client polled on a daily basis to check to see if their cert was revoked. Then great -- within 24 hours, every server gets the new replacement certificate.

And then 60 days later, every single client tries to renew that certificate. That's another 200 million certs in 24 hours. And that'll repeat every 60 days.

So the ACME draft is also about being able to pro-actively smooth out that renewal spike. Some clients would be told to renew again immediately, less than 24 hours after their replacement. Others would be told to wait the whole 60 days. And then after a couple months of managing that, things would be back to normal.

hannob · on Feb 10, 2021

FWIW Apache has a new stapling implementation that is not suffering from all the major problems the old one did. Can be activated with "MDStapling on".

tialaramex · on Feb 10, 2021

Thanks, that's good to know. I also read their documentation explaining why they couldn't (or at least didn't) fix the old one. I will try to publicise this rather than simply saying the Apache httpd OCSP stapling is garbage in future.

It explicitly mentions two big flaws with the old one, but not the one most relevant here - out of the box Apache's old OCSP stapling would merrily staple BAD answers simply because they're newer, which makes no sense. I assume that's corrected, but if you know this'd be a good place to say.

Apofis · on Feb 11, 2021

Shouldn't that be the default config option then?

ivoras · on Feb 10, 2021

Oh single points of failure, where art thou...

I hope everyone realises that Let's Encrypt is by now an essential part of the Internet, somewhat like DNS is, just massively centralised.

jaywalk · on Feb 10, 2021

If Let's Encrypt went away tomorrow, I'd Google "free SSL certificate" and find somebody else. Worst case, I'd either go back to the old way and buy a cheap certificate or go without SSL until a new free option comes along.

tlb · on Feb 10, 2021

The hits for "free SSL certificate" certainly won't be able to issue 200M certs in a day.

strken · on Feb 10, 2021

AWS ACM is free-as-in-complimentary-peanuts, and I'd bet they could handle a pretty big chunk, although 2000rps for a request that's calling external services and cryptographically signing things is a bit intimidating.

sciurus · on Feb 11, 2021

That's only viable if you're using AWS's load balancers or CDN, though.

alexbilbie · on Feb 11, 2021

You can also use ACM certificates directly on nitro-powered EC2 instances too

https://docs.aws.amazon.com/enclaves/latest/user/nitro-encla...

notafraudster · on Feb 10, 2021

Honest question (I know just enough about SSL to be dangerous): Does certificate pinning throw a wrench in this plan?

tialaramex · on Feb 10, 2021

Before choosing to use pinning you should have planned for this situation. Rather than write it out here, I'll link a post I wrote in a Let's Encrypt forum in response to someone who'd just blown up their system due to a different pinning mistake.

https://community.letsencrypt.org/t/certificate-failure-due-...

rntksi · on Feb 10, 2021

If proper revoking procedures are being followed, no.

You can start reading more info here: https://developer.mozilla.org/en-US/docs/Web/Security/Certif...

(HPKP is considered obsolete, if by cert pinning you meant that. If you're confusing HSTS with HPKP, just know that HSTS makes it much harder to mistakenly access the site via http, while HPKP [now deprecated] is the practice of ensuring some hashes is found in the cert that the server sends to mitigate some attacks)

Wowfunhappy · on Feb 10, 2021

Please don't enforce certificate pinning in user-facing software, at least without an opt-out. It becomes impossible to inspect my own traffic.

cheeze · on Feb 10, 2021

It also leads to a ton of clients who freak out when their app suddenly goes offline because they forgot to update the pinned cert. Seen that one countless times.

Pinning has uses, but if I'm running an app, I'm not doing pinning unless I have to.

Pinning to the issuing or root is much safer from an availability standpoint.

NovemberWhiskey · on Feb 10, 2021

At this point, I don't think certificate pinning in the general internet/web environment is a thing.

Bender · on Feb 10, 2021

Agreed. I MITM all my own traffic and I only have to exclude a handful of domains or A records on domains to not MITM.

Nextgrid · on Feb 11, 2021

I thought about doing this on my own network (for caching purposes primarily) but I'm concerned about breaking the "end to end" guarantee of TLS.

Essentially, now a separate box on the network has full access to my TLS traffic and if compromised can silently intercept it (and it would be too late by the time I notice).

How do you deal with/rationalize this concern?

Bender · on Feb 12, 2021

I control the box doing the decryption, so it is still my system. I can tell Squid which ciphers are appropriate and whether or not I want to validate the chain or accept anything. All of those controls still exist. But yes, I would be concerned as you are if I were using a proprietary proxy like Bluecoat or Websense.

leesalminen · on Feb 10, 2021

Honestly curious, why do you MITM all your own traffic? For detailed logging?

Bender · on Feb 10, 2021

Logging, ACL's for some mime types, overriding cache controls, shared cache for multiple devices, blocking some sites that I can't block using DNS.

leesalminen · on Feb 10, 2021

> shared cache for multiple devices

Oh, that’s a neat one and would be very valuable for me. We have poor internet access at home and would be cool to reduce traffic going out to the net.

Thanks for the reply, appreciate it!

Bender · on Feb 10, 2021

No problem. You can find examples of how to set up Squid-SSL-Bump or I can provide examples if you can't find any.

HotVector · on Feb 11, 2021

Just manually encrypt everything and you're good lol

marcosdumay · on Feb 10, 2021

The TLS CA system is one of those places where a single failure point is absolutely better than distributed possibilities. That is the case because any failure anywhere compromises the entire system.

Of course, the other failure points didn't completely go away yet. But I do expect their number to reduce a lot in the future.

coding123 · on Feb 11, 2021

But it's a single point of failure that has about 60 days to come back online (if all actors are working fairly normally) from a full crash.

gerdesj · on Feb 11, 2021

Quite. I'd hate to test it but 60 days is long enough to deploy a new CA and push the cert out to trust stores.

In an emergency.

It would take quite some coordination but I suspect the current pandemic might provide some models and examples as to the way to do things at scale with JiT decision making from wonks who normally drag feet by default.

RL_Quine · on Feb 10, 2021

They aren't the only no-cost distributor, their downtime doesn't really matter so long as it's less than a month long.

What exactly is the issue with centralization here?

cholmon · on Feb 10, 2021

How would Netlify feel about not being able to issue or renew any certs for any sites for a month? Plenty of platforms rely on LE exclusively for one-click/automatic HTTPS for their customers sites.

Denvercoder9 · on Feb 10, 2021

Let's Encrypt isn't the only provider supporting ACME either. Sectigo (under the ZeroSSL name) is a notable alternative.

mvolfik · on Feb 10, 2021

The thing is that we're not talking downtime here, but rather something in the CA compromised, which would mean that pretty much ANY website could be impersonated, as the attacker could issue a Let's Encrypt CA valid certificate for it. That is mitigated by invalidating this CA, but that also invalidates all legit certificates previously issued by them. So they need to reissue them all

warkdarrior · on Feb 11, 2021

Let's Encrypt publishes Certificate Transparency logs: https://letsencrypt.org/docs/ct-logs/

You can both block certs that do not appear in the logs, and decide which certs not to trust ("everything after Friday the 13th at midnight is not trusted"), once you know the date/time of the intrusion.

est31 · on Feb 11, 2021

Chrome already blocks certs not appearing in CT logs, at least if it was issued in 2018 or newer.

mvolfik · on Feb 11, 2021

hmm, the logs are valid point. what scenario are we addressing here then?

but the issuance time isn't relevant, they can easily backdate the cert

acct776 · on Feb 10, 2021

> They aren't the only no-cost distributor

I couldn't name a second, which means they're probably not getting a huge %.

gregwebs · on Feb 10, 2021

This is why they created the ACME protocol. There are actually other providers of this protocol now. So there is a possible future that is much more like DNS.

toomuchtodo · on Feb 10, 2021

A lesson in recognizing, supporting, and defending public goods.

M2Ys4U · on Feb 11, 2021

Agreed.

I'd love to see, say, another charitable CA set up in a different jurisdiction[0] that can hold a similar reputation to the ISRG.

[0] Probably somewhere in Europe, whether that's in the EU or elsewhere like Switzerland or the UK. What matters most is that it's somewhere with strong commitment to the Rule of Law and not beholden to the US.

nodesocket · on Feb 10, 2021

Sounds like they have read-replicas, why can’t they fail over to a read replica as the new master?

jtchang · on Feb 10, 2021

Always nice to see some private companies stepping up. Specifically Cisco, Luna, Thales, and Fortinet. I'm sure there are a bunch others that donate their resources to Lets Encrypt.

TorKlingberg · on Feb 10, 2021

This seems like really fun project. For some reason it makes me happy that someone has good reason to run their own servers and networking, rather than rent everything from cloud providers.

sandGorgon · on Feb 10, 2021

>There is no viable hardware RAID for NVME, so we’ve switched to ZFS to provide the data protection we need.

This is Linux right ? would this be the largest deployment of ZFS-on-Linux then ?

mnw21cam · on Feb 10, 2021

Not by a long shot. I just assembled two servers with 168 12TB drives each, giving a bit over 1.5PB available space on each server. And I'm pretty confident that this is also not the largest ZFS-on-Linux deployment either.

ivank · on Feb 11, 2021

How do you fit 168 hard drives into a single computer?

mnw21cam · on Feb 11, 2021

A couple of 84-drive 5U rack-mount enclosures, attached to each server with multi-link SAS. It's a fairly off-the-shelf system.

tpxl · on Feb 10, 2021

Damn, what is the use case for that?

warkdarrior · on Feb 11, 2021

Porn.

koolba · on Feb 10, 2021

I don’t see why anyone would ever want to use hardware RAID. It invariably leads to the day when your hardware is busted, there’s no replacement parts, and you can’t read your volumes from any other machine. Use the kernel RAID and you can always rip out disks, replace them, or just boot off a USB stick.

namibj · on Feb 10, 2021

Because of performance, especially regarding being able to use a battery-backed write-back cache on the controller to give a "safe in the event of powerfailure" confirmation to the application before it actually hits disk/flash.

The "can't read from any other machine" is handled by making sure (this includes testing) that the volumes are readable with dmraid. At least that's for SAS/SATA applications. I'm not sure about NVMe, as it uses different paths in the IO subsystem.

nucleardog · on Feb 11, 2021

> Because of performance, especially regarding being able to use a battery-backed write-back cache on the controller to give a "safe in the event of powerfailure" confirmation to the application before it actually hits disk/flash.

Is this not easily mitigated with a smart UPS? (i.e., one that will notify the host when the battery is low so it can shut down cleanly)

seniorThrowaway · on Feb 10, 2021

Totally agree and I'll go one further, I don't want to use RAID at all in a non professional context. Maybe I'm too simplistic but for my personal stuff I don't use RAID, LVM or anything beyond plain ext4 file systems on whole disks. For redundancy I use rsync at whatever frequency makes sense to another disk of the same size. I've run like this for 10 years and replaced many disks without losing data. The time I ran soft RAID I lost the whole array because one disk failed and a SATA error happened at the same time.

tialaramex · on Feb 10, 2021

LVM is very nice because it eliminates that problem where you've got an almost full 2TB disk and you bought another 2TB disk and now you need to figure out what moves where. With LVM you just say nah, that's just 2TB more space for my data, let the machine figure it out.

I mean, if you enjoy sorting through going "OK that's photos of my kids, that goes in pile A, but these are materials from the re-mortgage application and go in pile B" then knock yourself out, but I have other things I want to do with my life, leave it to the machine to store stuff.

If you lost everything that's because you lacked backups and (repeat after me) RAID is not a backup. Everybody should get into the habit of doing backups. Like any survivalist learns, two is one and one is none.

RL_Quine · on Feb 10, 2021

I doubt it? 150TB of NVMe storage is big, but I've walked past racks with many orders of magnitude more in it.

(edit: units)

dragontamer · on Feb 10, 2021

> 150GB of NVMe storage is big

Your age is showing :-)

Every few years, I gotta get used to the 100s MBs is big!! -> 100s GBs is big!! -> 100s TBs is big!!

Seems like we're entering the age of PBs, and then we stopped caring about capacity and more about the speed of our TB+ sized archives.

dlkmp · on Feb 10, 2021

It's TB though, per unit.

sandGorgon · on Feb 11, 2021

for linux ZFS ? im specifically asking about zfs-on-linux

linsomniac · on Feb 10, 2021

A decade ago my backup cluster had >100TB of ZFS on Linux. I mean, that predated ZoL, so it was using ZFS-fuse, but...

cbhl · on Feb 10, 2021

If there's one thing that always surprises me about the internet, it's that vertical scaling (bigger/faster machines, as opposed to horizontal scaling) can take a well-written service to "Internet Scale".

Dylan16807 · on Feb 10, 2021

> the really interesting thing about these machines is that the EPYC CPUs provide 128 PCIe4 lanes each

Not really. In a single socket setup, an EPYC gives you 128 lanes. In a dual socket setup, 64 lanes from each CPU are repurposed to connect them together instead of doing PCIe. So just like single socket, you end up with 128 lanes total.

virgulino · on Feb 10, 2021

128 or 160 total, configurable. https://www.servethehome.com/dell-and-amd-showcase-future-of...

120bits · on Feb 10, 2021

On a side note. It always nice to see them include stuff like, Internal networking and Hardware specs of server. It shows you how much scalable they are or how they deal with large amount of data. I always enjoy reading them.

zertrin · on Feb 11, 2021

They discussed a lot about their own internal bottlenecks. I'm wondering about the external bottlenecks they might encounter, such as the requirement for sending all certificate requests to CT logs prior to issuing a certificate. Could it be that the amount of data and requests per seconds sent to external entities to fulfil CT obligations is deemed negligible or already manageable?

M2Ys4U · on Feb 11, 2021

They also run their own CT log called Oak: https://letsencrypt.org/docs/ct-logs/

hinkley · on Feb 10, 2021

Am I the only one having flashbacks to Rainbow's End (Vernor Vinge)?

tialaramex · on Feb 11, 2021

The events in Rainbows End depend upon revocation being effective, in particular revocation of a root CA, and very rapidly.

For one thing, "revoking" a root CA isn't a thing, the root's signature is mostly a convenience (it's self-signed) and perhaps in another world roots would exist as distinct documents not as X509 certificates signed by themselves. So there isn't anybody to "revoke" it like other certificates. You can distrust them though.

In the real world lots of systems would never become aware of the revocation/ distrust at all, and there are gatekeepers for many other systems that could become aware (e.g your copy of Chrome or Firefox can learn that a root CA is no longer trusted but it would not do so without humans at Google or Mozilla deciding this was a necessary course of action).

It's necessary to the plot of Rainbows End that this happens unrealistically quickly, Rabbit must be disabled or it will certainly interfere with subsequent events, and it seems Vinge (unlike me) isn't sure exactly what Rabbit is, so this vague technical intervention seems like an effective way to stall Rabbit without thinking too hard about that question.

hinkley · on Feb 11, 2021

You don't revoke the root cert, no. You revoke a cert that signed a lot of certs, which is often one degree of separation from the root cert. I don't recall if he stated it wrong and I glossed over it because I knew what he meant, or the character was dumbing it down for the rest of them.

Revoking a signing cert, breaking the cert chain, would indeed make a mess for everyone using those certs.

tialaramex · on Feb 12, 2021

OK, consider revoking, since we're talking about them, Let's Encrypt's R3 issuer. In a sense only they (ISRG, the organisation behind Let's Encrypt) can do that, since the revocation would need to come in the form of an updated CRL (Certificate Revocation List) from ISRG's root. So you'll be making a late night phone call to key people from ISRG to demand (persuade?) an immediate revocation.

How often do you suppose most systems examine that CRL?

If you guessed anything other than "never" you're wrong. Almost nothing you use will ever notice.

But that's not enough anyway - R3 is also trusted via a cross signature from Identrust's DST Root CA X3. So their organisation also needs to be woken and persuaded to perform an unprecedented middle-of-the-night revocation of some certificate they've seen no evidence is a problem.

Now, you've disturbed all these nice people, with your very urgent problem of... you want to stop something you can't discuss happening that involves some confidential things and a bunch of other confidential things and they must never speak of it to anybody - in the morning they're going to be doorstopped by a thousand tech journalists wondering why they broke everything - or they could just hang up. But we'll suppose you did that, as I said it doesn't have any effect. Oops. Rabbit will notice, no doubt, but it isn't disabled.

OK, so what can you do that will actually have some impact? Well, you can get Microsoft, Google and Mozilla to use their out-of-band "kill switch" functionality. Each works differently. Microsoft's has the advantage that it's entirely a closed door process, if you can get the ear of the right person you can make any change you want. Unfortunately the latency is one calendar month. Rabbit is causing you a problem today? By April we can fix that. Oh, you need something sooner? Too bad.

Google are more promising, they can tell most Chrome installs to distrust R3 without any independent confirmation and the updates will typically take only a day or so to have effect. Rabbit will be out of your hair before tomorrow's evening news can run the story about your resignation or arrest. They will probably tell everybody why though...

Mozilla likewise can react quickly, perhaps in just a single day. However unfortunately Mozilla deliberately makes these decisions in public. You're going to need to tell a bunch of random people who don't even work for Mozilla about your urgent need to shut down this issuing CA. They're going to have questions. You probably don't even want to read the questions, never mind answer them. Oh dear.

hinkley · on Feb 12, 2021

If I'm installing an app, running a credit card transaction, or logging into my brokerage, nobody is checking the CRL, because they're running OCSP to check the cert chain. That's what my code did. That's what a coworker's code did, and that's what (I'm told) browser TLS implementations do.

If a countersigned certificate is revoked by one signer, is it still safe to use? I hope your answer is 'no'. Otherwise, as you say, I can revoke the certificate in Europe but have to wait for the Americans to get out of bed.

Why did they including both certs in the chain if everyone trusts the same CA? They are including them because people cherry-pick. My app is likely not checking both roots, but picking one as the one I trust and seeing what it says, in many if not all implementations. Because I'm looking for a chain that I understand, then seeing if it's valid, and one of the chains I don't understand.

DOS attacks are not an all or nothing thing. Often it serves the attacker's motives if only most of the traffic is blocked, or specific traffic is unlikely to succeed. Taking out a particular person or "just" 75% of users could be desirable.

tialaramex · on Feb 12, 2021

> that's what (I'm told) browser TLS implementations do.

Unlike "your code" and that of your co-worker who ever that might be, I can tell you exactly what browsers do and you're wrong, though I haven't any idea who (if anybody) "told" you that they check OCSP to "check the cert chain".

Chrome doesn't have OCSP fetching code. It uses CRLsets, controlled by Google, which I described previously.

Safari does do OCSP... if Apple's backend tells the client this certificate was revoked. Why then? Ask Apple, makes very little sense to me, but that's their policy. Again you'll need to liase with Apple HQ to figure out how to get their backend to report all these certificates revoked or nothing happens.

So in the modern world that leaves Firefox, which does have OCSP fetching code, but it's default off and they recommend you leave it that way. Why? Because it's privacy infringing. OCSP sends the CA information about which subscriber certificates you relied on recently. Why were you relying on the certificate for Porn Hub? Exactly.

--

> If a countersigned certificate is revoked by one signer, is it still safe to use? I hope your answer is 'no'.

There's no such thing as a "countersigned certificate" in X.509 thus PKIX and the Web PKI. There can be multiple CA certificates with the same subject, which is what we call a "cross signed" CA.

There's no reason you should treat a certificate differently based on the existence of other certificates with the same subject which have been revoked, and indeed the software you use does not (on the whole) treat them differently. It's likely that you've used such trust chains especially if you're an American because a public CA (not Let's Encrypt) signed the Federal Bridge CA mistakenly at one point, which implicitly imports this huge hierarchy of other signature chains, some of which are revoked.

> Why did they including both certs in the chain if everyone trusts the same CA?

I guess you're talking about the existence of R3 signed by DST Root CA X3 and R3 signed by ISRG Root X1. But those aren't a chain, and so they aren't "including both certs in the chain". They're two certificates for the same entity, and so there are two completely different chains offered by Let's Encrypt.

amaccuish · on Feb 10, 2021

I hope all that traffic passing Cisco switches is encrypted...

pgporada · on Feb 11, 2021

All of the boulder grpc communications use mutual TLS authentication.

zanecodes · on Feb 10, 2021

Very probably I'm missing something, and I love Let's Encrypt and the service that they provide, but... the point of Let's Encrypt is to bring SSL/TLS to more websites, right (and not necessarily to provide identity verification, since the automated renewal process doesn't really require any proof of identity for the entity requesting the certificate)? Why couldn't that have been accomplished using self-signed certificates, and having browser vendors remove their big scary warning pages for sites using self-signed certificates for SSL/TLS? Do certificates from Let's Encrypt provide any security benefits over a self-signed certificate?

jaywalk · on Feb 10, 2021

They verify domain ownership. If browsers accepted self-signed certificates, then as long as I could intercept your DNS requests (running a public Wi-Fi network next to a Starbucks, for example) then I could bring you to my malicious google.com without you knowing. That's no good.

M2Ys4U · on Feb 11, 2021

>They verify domain ownership.

They verify domain control, not ownership. Verifying ownership is a much harder problem, and one I doubt could be done in an automated fashion.

VoidWhisperer · on Feb 10, 2021

The issue with browsers just allowing self-signed certs is that you cant verify their authenticity - ie were they correctly issued for the domain, or is it someone acting as the website using an invalidly issued cert. Having certs come from a recognized certificate authority helps with this because it provides a point for the certificate to be verified for authenticity

zanecodes · on Feb 10, 2021

This makes me wonder about the feasibility of performing an attack by

  * man-in-the-middling Let's Encrypt and a particular domain (or DNS, depending on the domain validation challenge)
  * requesting a new certificate for that domain
  * spoofing the HTTP resource response (or DNS response, if applicable)

I suppose this is mitigated by the way Let's Encrypt validates the agent software's public key on first use though, at least for websites that are currently using Let's Encrypt.

jcrawfordor · on Feb 10, 2021

While LE is indeed vulnerable to this kind of (difficult) attack, I wanted to make the point that LE still represents, for the most part, an improvement over the previous norms in the CA industry. ACME standardizes automated domain ownership validation to a relatively small number of options that have received relatively rigorous security review (leading to one being dropped due to security concerns, for example).

In contrast, incumbent low-budget CAs have often been a bit of a wild west of automated validation methods, often based on email, that can and do fall to much simpler attacks than a large-scale on-path attack. While CA/B, Mozilla, and others have worked to improve on that situation by requiring CAs to implement more restrictive policies on how domains can be validated, ACME still represents a much better validated, higher-quality validation process than that offered by a number of major CAs for DV certificates.

One approach to decentralized or at least compromised-CA-tolerant TLS is something called "perspectives" (also implemented as "convergence"). The basic concept is that it's a common attack for someone to intercept your traffic, but it's very difficult for someone to intercept many people's traffic on the internet. So, if the TLS certificate and key you receive from a website is the same as the one that many other people have received, it is most likely genuine. If it's different, that's an indicator of a potential on-path attack. This can be implemented by establishing trust between your computer and various "notaries" which basically just check the certificate from their perspective and confirm that it matches yours.

I bring this up, because if you squint just right you can view ACME as being a method of bolting the same concept onto the existing CA infrastructure: before you connect to a website, LetsEncrypt acts as a notary by connecting to the domain from multiple perspectives and ensuring that the same person evidently controls it from all of them. While not perfect, this is a strong indicator that the person requesting the cert is legitimate.

The on-path attack risk is almost, but not always, on a late stage of the network path to the user (e.g. their local network). The big weakness of the ACME approach is an interception of a late stage of the network path to the server. This tends to be much better secured, but hey, it's still something to worry about. There is obviously also a reliance on DNS, but I would say that DNS has basically always been the most critical single link in on-path attack protection.

Denvercoder9 · on Feb 10, 2021

> man-in-the-middling Let's Encrypt and a particular domain (or DNS, depending on the domain validation challenge)

Let's Encrypt issues multiple verification requests from multiple servers in different locations, both physically and in the network topology. If you can MITM that, you've pretty much taken over the domain and the ability to get a certificate isn't the worst of the operators problems.

marcosdumay · on Feb 10, 2021

> and the ability to get a certificate isn't the worst of the operators problems

That assumes a lot about the operators goals and values. It may very well be their worst problem. Eg. a journalist in a dictatorial area will very likely prefer not to have a cloud service than to upload his data into some compromised service.

It's just that, if it is their worst problem, TLS is patently insufficient, so they must think about it when setting the system up.

tialaramex · on Feb 10, 2021

Yes, this could work, and has definitely been done, sometimes, against other public CAs. We found convincing evidence of this during work at one of my previous employers.

But what tempers my concern over that finding is that we found this by looking at cases where there's clearly a DNS takeover - and actually it was rare to do certificate issuance. In most cases it seems if you can MitM say, an Arab's country's army headquarters group mail servers, you can just offer no encryption or serve a self-signed certificate and the users will accept that. So while the Ten Blessed Methods are, as expected, not enough in the face of a resourceful adversary (in this case perhaps the Mossad, NSA or similar) they're also a padlock on a fence with a huge gaping hole in it anyway, our first attention should be on fixing the hole in the fence.

level3 · on Feb 10, 2021

Let's Encrypt also mitigates this by validating from multiple vantage points, so a single man-in-the-middle is insufficient.

renewiltord · on Feb 10, 2021

Usually lower energy to exploit the server running the ACME client or the infra around it than it is to subvert the Internet infra that surrounds the LE infra.

For instance, you can subvert the internet infra around some ccTLD if you're that country pretty easily but then who really owns the domain? Probably you, the country, since you can do anything with the DNS and then anything with the traffic.

gsich · on Feb 10, 2021

>Do certificates from Let's Encrypt provide any security benefits over a self-signed certificate?

Depends. The encryption is the same and only dependent on your client/server. Could have been solved by DNS ... maybe.

Self signed certs don't validate that you at least own the domain.

zanecodes · on Feb 10, 2021

Ah yes, that makes sense. Let's Encrypt requires proof of domain ownership, which at least ensures that the entity you're connecting to is the entity that owns the domain. Encryption without authentication wouldn't be very helpful, since a man-in-the-middle could just present their own self-signed certificate during the handshake...

tialaramex · on Feb 10, 2021

You'd be protected from a passive attack and thus you could always (with enough effort) detect an attack. Someone who is snooping (e.g. fibre taps) is potentially undetectable (yes in theory there are quantum physics tricks you could do to detect this, but nobody much is really doing that) whereas an active attack is always potentially detectable.

So it's not nothing, but it isn't very much without the Certificate Authority role.

Jonnax · on Feb 10, 2021

Those scary warning pages are an indication that someone is intercepting your traffic.

What do you think about connecting to a network, and your phone sends a user/password to a server but instead it's being intercepted?

And the user has no idea.

acct776 · on Feb 10, 2021

Highly recommend reading someone's applied essentials guide on certs, and the various methods of accomplishing SSL for self-hosted stuff.

This stuff is much more complicated in isolation from the rest - full picture easiest.

zanecodes · on Feb 10, 2021

I'm somewhat familiar with certificate handling in general, I had just forgotten how Let's Encrypt performs domain validation; it's been a few years since I used it and it's worked so well that I haven't had to think about it since, which is probably a testament to its stability!

To be sure, PKI and certificates in particular have a lot of room for improvement in the UX department. Especially on Windows, where one frequently has to deal with not just .pem files but .cer, .pfx (with or without private keys), and more.