It was years in the making for Firefox to be able to do "intermediate preloading" - we [0] had to make the policy changes for all intermediates to be disclosed, and then let that take effect [1].
Preloading like this shouldn't be necessary, I agree with the author, but worse than this is any bug report of "Works in Chrome, not in Firefox." Prior to this preloading behavior shipping in Firefox 75, incorrectly-configured certificate chains were a major source of those kind of bugs [2].
I find it pretty cool that you went through the length of publicly announcing it and everything for something that everyone else was doing already. That does show some integrity in face of "let's just make it work" people.
None-the-less, as a web developer (which I am less than 5% of my programming time, probably less, so I'd rather not lose my hair over adjacent stuff) this has been extremely frustrating to debug in the past. Would it be possible/make sense to show maybe a small yellow (i) as like a 8x8px icons at the bottom right of the lock to show that something's up? And then if the user clicks "more information" on the "secure connection" (which I believe is deep enough to be more power-user/developer-centered) show a yellow line saying "you're certificate chain's broken dude"?
I haven't hit this issue in several years, and it's unlikely I will hit it again because I'm now systematically using caddy as a front which will do TheRightThing (or so I hope), so maybe my comment is out-dated/irrelevant, in which case, feel free to ignore me.
I see nothing wrong with Firefox's behavior in this case. Root CA certificates are just certificates that the issuer paid and lobbied to include into the browser's certificate store, and they agreed to rules about what they can and cannot do (to maintain trustworthiness).
The root CA then just signs a certificate for the intermediate (allowing them to issue certificates), and required the intermediate to contractually agree to the same rules... because if the intermediate violates the rules, the root CA and all intermediates would be removed from the browser.
Other than the contractual relationship between the parties, there's no difference between an intermediate CA and a root CA. Including an intermediate certificate in the store, makes it equivalent to the root certificate (which is a certificate that is trusted only because it is included in the browser's store).
The only downside to this I can see is that it creates more work to maintain the certificate store. Firefox can just ask the root CA to notify them when they issue an intermediate.. it isnt like this happens every day.
> Firefox can just ask the root CA to notify them when they issue an intermediate
In fact notifying m.d.s.policy is already required whenever such an intermediate is created
5.3.2 "The operator of a CA certificate included in Mozilla’s root store MUST publicly disclose in the CCADB all CA certificates it issues that chain up to that CA certificate trusted in Mozilla’s root store that are technically capable of issuing working server or email certificates,"
They include CAs that users have added (I'm guessing that information is in the OS root store), not CAs that comes with the OS. This is presumably so that you only need to load your custom CA once to the whole OS, not for each app.
That’s not how I’m reading the release information. I don’t have a computer to check this at the moment either:(
Edit: oh you’re right! They changed the message in the release information! It was originally
“ Firefox now imports TLS trust anchors (e.g., certificates) from the operating system root store.”
It still remains a mystery to me why browsers felt they should "fix" this server misconfiguration.
It's particularly vexing to me as the main reason that people end up with misconfigured servers at all is because after they've configured their new cert (incorrectly) their web browser gives them a tick and they think they've done it right - after all, why wouldn't they?
A common way that these work is that 1 browser does it, then if the others don't copy they appear "broken" to users.
IDK what happened in this case but it is pretty easy to imagine Chrome accidentally allowed validation against certificates in its local cache. Maybe it added some sort of validation cache to avoid rechecking revocation lists and OSCP or similar and it would use intermediates from other sites. Then people tested their site in Chrome and it seemed to work. Now Firefox seems broken if they don't support this. So they decided to implement this and do something more robust by preloading a fixed list rather than whatever happens to be in the cache.
Basically no browser wants to be the first to stop supporting this hack.
This is ultimately an application of the "robustness principle" or Poestel's law, which was how people build stuff in the early Internet.
Plenty of people believe these days that this was never a wise guideline to begin with (see https://www.ietf.org/archive/id/draft-iab-protocol-maintenan... which unfortunately never made it to an RFC). However, one of the problems is that once you started accepting misconfigurations, it's hard to change your defaults.
It's Postel's Law being bad advice yet again. No, you should not be liberal in what you accept, because being liberal in what you accept causes even more malformed data to appear in the ecosystem.
For me the revelatory moment was in mid-00s, when everyone screamed anathema at XHTML, saying it was bad because it required people to write well-formed documents, when everyone just wanted to slap random tags and somehow have that steaming mess to still work.
There must me some sort of law that says in tech the crudest pile of hacks wins over any formally elegant solution every single time those hacks lets one do something that requires extra effort otherwise, even if it works only by wildest chance.
The biggest objection I and many others had at the time was that writing xhtml forced one to deal with hell that is xml namespaces, which many tools at the time barely supported
> bad advice ... being liberal in what you accept causes even more malformed data to appear in the ecosystem.
This is one perspective. Another is to be robust and resilient. Resiliency is a hallmark of good engineering. I get the sense you have not worked on server-side software that has thousands or millions of different clients.
Postel's Law should be called the "Hardness Principle", not the "Robustness Principle". Much like how hardening a metal makes it take more force to break, but results in it being brittle & failing catastrophically when it does, so Postel's law makes systems harder to break initially, but results in more damage when they do fail. It also makes the system harder to maintain, thus adding a pun to the name.
Where do you draw the line? Usually there's exactly 1 intended, standard way of communicating with another system while there's are infinite opportunities to deviate from that standard and infinite opportunities for the other party to try to guess what you really meant. This results in a combinatorial explosion unintended behaviors that lead to bugs and critical security vulnerabilities.
I absolutely have. And I've never modified a server to accept bullshit from an incorrect client. I have, on the other hand, told several people how to fix their clients when they complain it doesn't work with my service. I actually rather enjoy improving the ecosystem, even if it's not strictly my job. It's better for everyone.
Because it wasn’t actually a server misconfiguration, nor was it, as others have speculated, about Postel’s Law.
The way X.509 was designed - to the very first version - was the notion that you have your set of CAs you trust, I have my set, and they’re different. Instead of using The Directory to resolve the path from your cert to someone I trust, PKIX (RFC 2459-et-al) defined AIA.
So the intent here was that there’s no “one right chain to rule them all”: there’s _your_ chain to your root, _my_ chain to my root, all for the same cert, using cross-certificates.
Browsers adopted X.509 before PKIX existed, and they assumed just enough of the model to get things to work. The standards were developed after, and the major vendors didn’t all update their code to match the standards. Microsoft, Sun, and many government focused customers did (and used the NIST PKITS test to prove it), Netscape/later Mozilla and OpenSSL did not: they kept their existing “works for me” implementations.
https://medium.com/@sleevi_/path-building-vs-path-verifying-... Discusses this a bit more. In modern times, the TLS RFCs better reflect that there’s no “one right chain to rule them all”. Even if you or I aren’t running our own roots that we use to cross-sign CAs we trust, we still have different browsers/trust stores taking different paths, and even in the same browser, different versions of the trust store necessitating different intermediates.
TLS implementations for linux IMAP email back in the day would fail-over to unencrypted credentials if the TLS handshake was unsuccessful. Not sure if that was somebody's Postellian interpretation or if it was just the spec. We had to actually block the unencrypted ports in the firewall because there was no way to tell from the client side whether you had automatically been downgraded to in-the-clear or not.
In my hosting days, we relied on the ssl checker that ssl-shopper has. Browser was never considered a valid test for us. It was final validation, but a proper ssl checker was the real test
As @kevincox says there's a problem where if one browser does it, then users complain a site "works in this-generations-IE" forcing the other browsers to duplicate the behaviour.
But one other problem that happens isn't necessarily "browser 1 fixed this configuration" and browser 2 copied them. It can (and often was) "browser 1 has a bug that means this is broken configuration works", and now for compatibility browser 2 implements the same behaviour. Then browser 1 finds this bug, goes to fix it, and finds that there are sites that depend on it and it also works in other browsers so even if it started off as a bug they can no longer fix it.
That's why there's increasing amounts of work involved in trying to ensure new specifications are free of ambiguities and such now before they're actually turned on by default. Even now thought you still have places where the spec has ambiguity/gaps in the specification where people will go "whatever IE/Chrome does is correct" even if the specification has a gap/ambiguity that allows different behaviour (which these days is considered a specification bug), even if other browsers agree, it's super easy for a developer to say "the most common browser is definitionally the correct implementation".
Back when I worked on engines and in committees I probably spent cumulatively more than a year do nothing but going through specification gaps working out what behaviour was _required_ to ensure sufficiently compatible behavior between different browsers. I spent months on key events and key codes alone, trying to work out which events need to be sent, which key codes, how IM/IME (input method [editor] mechanism used for non-latin text) systems interact with it, etc. As part of this I added the ability to create IMEs in javascript to the webkit test infrastructure because otherwise it was super easy to break random IMEs because they all behave completely differently in response to single key presses.
It's very difficult in practice to shift the blame to the website. Even though the browser would be right in refusing connection, the net effect is that the user would just use another browser to access that website. The proper workaround (Firefox shipping intermediate certificates), doesn't actually damage security. It just means more work for the maintainers. That's a fair tradeoff for achieving more market share.
It's the same reason why browsers must be able to robustly digest HTML5 tagsoup instead of just blanking out, which is how a conforming XML processor would have to react.
Do browsers or is this another OpenSSL Easter egg we all have to live with?
I remember that OpenSSL also validates certificate chains with duplicates, despite that obviously breaking the chain property. That’s wasteful but also very annoying because TLS libraries like BearSSL don’t (I guess you could hack it and remember the previous hash and stay fixed space).
The chain "property" was never enforced anywhere of consequence and is gone in TLS 1.3
In practice other than the position of the end entity's certificate, the "chain" is just a set of documents which might aid your client in verifying that this end entity certificate is OK. If you receive, in addition to the end entity certificate, certs A, B, C and D it's completely fine if certificate D has expired, certificate B is malformed and certificate A doesn't relate to this end-entity certificate at all as far as you're concerned if you're able (perhaps with the aid of C) to conclude that yes, this is the right end entity and it's a trustworthy certificate.
Insisting on a chain imagines that the Web PKI's trust graph is a DAG and it is not. So since the trust graph we're excerpting has cycles and is generally a complete mess we need to accept that we can't necessarily turn a section of that graph (if it even was one graph which it isn't, each client possibly has a slightly different trust set) into a chain.
You are overthinking it. Some sysadmin copying the same cert into the chain twice because AWS is confusing and doesn’t care and OpenSSL doesn’t care isn’t resolving the grand problem of the trust graph, it’s just a loss overall, for everyone. Nobody wins here.
(Of course the 1.3 approach of throwing a bunch of certificates and then asking to resolve over all of them breaks BearSSL comprehensively)
Yes, it's useless to include the CA cert, and to include extra copies, and all those other things.
But requiring the cert chain to be exactly correct is also useless if you need to address clients with different root cert packages. If some clients have only root A and some have only root B, but B did a cross-sign for A, you're ok if you send entity signed by intermediate, intermediate signed by A, A signed by B and clients with A only short circuit after they see an intermediate signed by A, and the clients with B only should be fine too. Of course it gets real weird when the B root has expired, and clients often have A and B, but some don't check if their roots expired, and some won't short circuit to validating with A, so they fail the cert because B is expired.
Oh, and TLS handshakes in the wild don't give you explicit information about what roots they have or what client / version they are. Sometimes you can get a little bit of information and return different cert chains to different clients, but there's also not a lot of support for that in most server stacks.
I don't necessarily like TLS 1.3's approach of end entity cert comes first and then just try all the permutations and accept any one that works, but at least it presents a way to get to success given the reality we live in. I'd also love to see some way to get your end entity cert signed by multiple intermediates, but that's a whole nother level of terrible.
It wasn't long ago when TLS was not the norm and many, many sites were served over plain HTTP, even when they accepted logins or contained other sensitive data. There's a good chance this decision was a trade-off to make TLS simpler to get working in order to get more sites using it.
Browsers have a long history of accepting bad data, including malformed headers, invalid HTML, and maintaining workarounds for long-since-fixed bugs. This isn't really that different.
Really? You receive two files from your CA. One of them is the leaf, the other one is the chain. You just have to upload the latter (not the former) into the server's config directory. That doesn't sound that hard.
If it actually is, I am ready to eat my words, but the actual blame would be on the webserver developers then. Default settings should be boring, but secure; advanced configuration should be approachable; and dangerous settings should require the admin to jump through hoops.
Related: There is 10+ years old Python issue about implementing "AIA chaising" to handle server misconfigurations as described in this article: https://github.com/python/cpython/issues/62817. The article mentions this approach in the last paragraph.
There is at least one 3rd party Python lib that does that, if you are interested in details how this works: https://github.com/danilobellini/aia.
> and a large number of government websites not even just limited to the United States government but other other national governments too
It's always eye opening to see governments and large businesses that think expensive = good when they don't even know how to properly configure what they're buying.
I've literally lost arguments against buying overprices OV certificates and then had to spend time to shoehorn them into systems designed to be completely automated with LE / ACME.
> Chrome will try to match intermediate certificates with what it is seen since the browser has been started. This has the effect of meaning that a cold start of Chrome does not behave the same way as a Chrome that has been running for 4 hours.
Holy crap. I have definitely run into this in the past, but had no idea!
I was configuring a load balancer that served SSL certificates for customer domains which included a mix of wildcard, client-supplied and LetsEncrypt-obtained certificates, and was all dynamically configured based on a backend admin app.
I was getting wildly inconsistent behavior where I'd randomly get certificate validation errors, but then the problem would disappear while diagnosing it. The problem would often (but not always) re-occur on other systems or even on the same system days later, and disappear while diagnosing. I never isolated it to Chrome or the time-since-Chrome-was-restarted, but I do remember figuring out it only affected certificates using an intermediate root. There was a pool of load balancers and I remember us spending a lot of time comparing them but never finding any differences. The fix ended up being to always include the complete certificate chain for everything, so I am pretty confident this explains it.
This was several years ago, but maddening enough that reading this triggered my memory of it.
Biggest? How about serving TLS certs when doing direct IP access? Or how about leaking sub domains in TLS certs?
I, as a mediocre hacker, cough, security advisor, cough, use certs to find vulnerable subdomains all the time. Or at least. I get to play around in your test envs.
Edit: Ok, the problem in the topic is also not good.
I was wondering recently if it's better to use wildcard certs because of this. On the other hand, all sub-domains are discoverable through DNS anyways. Does it then make a difference if the sub-domains are logged in the certificate transparency logs?
How do you figure all subdomains are discoverable through DNS? The zone transfer record or whatever it is is usually disabled. And you can't bruteforce all subdomains - they might be too long/unpredictable.
I was mistaken about this apparently. There are tools though, that can discover subdomains using long lists of commonly used names and patterns. Anyways thanks for the correction.
This strikes me as interesting even if it's a field I have very light understanding, and feel like I might fall victim to this. Asking for a friend, but if that friend uses Let's Encrypt to create certs for subdomains on a single vhost, what would that friend need to do to see the information you are seeing?
Certificate issuance transparency logs are public.
Every time a CA issues a new certificate, it gets logged in a public log.
Every time someone sets up nextcloud.example.tld, and gets an SSL cert issued by a CA, that gets logged in public. If nextcloud.example.tld resolves, and responds on tcp/80 and/or tcp/443, you’ve got yourself a potential target.
Gosh yes, count me in the "I hate this behavior" group.
I've hit such horrible issues, like "why does this work in chrome, I but not in Firefox and chromium?!" (ofc now that you know it's not related to the browser). It really made my heart hurt. Also if my memory is correct, when it is in cache, when looking at the cert chain in the browser it'll show the chain including the cache, without mentioning the cache. So you end up "why the fuck does my http server I just deployed does browser favoritism and change the exposed cert chain based on the user agent... Which it doesn't have???" then goes to "okay the simple logical explanation is that it's http2 vs http1" (because that was when http2 just started deployment).
And of course at some point you hit the moment where you no longer reproduce the issue because you visited the wrong website.
Thankfully I hit this issue only once, and it wasn't on a mission critical service that I had to fix ASAP, so it wasn't too horrible, but still.
I get that this feels un-pure, but what is the actual damage of validating against cached intermediate certs? The only concrete thing the author cites is harder debugging, but that's a pretty weak objection.
I think it's mainly the change in behavior that could be viewed as concerning. As a user of a website, if the order of websites you visited after browser startup impacts the success or invalid certificate error I feel like it's a pretty confusing experience.
I definitely didn't know this before and if I saw this behavior I would be pretty confused.
It's a potential Heisenbug for (some of) your javascript code. Sometimes things work on some machines and sometimes it doesn't. Unless you have the cert-chain misconfig in your brain-cache you'd probably spend hours debugging confusing bug reports from customers that you fail to reproduce reliably. So it's not just harder to debug, it causes bugs (and indirectly bug-reports you'll need to investigate).
The actual damage is that it's pretty common (my last team has this happen) for a team to setup a cert, verify it works, and then when they deploy the cert it works some of the time or "works on my machine" and so the failures seem really random and by definition hard to reproduce because you have to restart chrome to reproduce.
Probably the tl;Dr is that validating against a persistent cache like Firefox is fine. Validating against an ephemeral cache with chrome is likely to cause a lot of breaking.
Sort of a corollary to your point: if an admin sets up a website and verifies with Firefox (or Chromium, whatever), and then later the server needs to communicate with...basically any tool that speaks HTTPS but isn't a web browser, then there will be many tears shed by that admin.
For instance, you stand up a server, and then a user complains their script using cURL, wget, etc. doesn't work, and if you aren't paying attention you'll have no idea why.
Inb4 why can't the OS certificate store just do the same thing: I suspect people will tend to install OS updates less frequently that browser updates, so it will tend to be less reliable.
The problem is that it masks the error. Then you have all kinds of websites that only work in some browsers, or have an API that is not accessible from some automated tools unless you enable "insecure mode", and everybody is worse off. It also makes implementing new browsers more complicated, and thus the ecosystem less open. All that to hide a configuration error that would take a second to fix in the first place.
My niaive understanding is that all certs contain (at least) the id/thumb of their issuing cert - If I am not mistaken, how is TLS broken by sending only the leaf and/or intermediary if the client is able to correctly identify the issuer as known/trusted via this thumb print?
You can do this, and historically some browsers did, it's called AIA chasing (which is why AIA is mentioned briefly in the blog post)
The problem with AIA chasing is that it's a privacy violation. Not a huge one but enough to be completely unacceptable for say Mozilla.
In fetching the needed intermediates we reveal which intermediates we needed, if you're Let's Encrypt you operate only a few intermediates and you issue a truly astounding number of certificates from each so that's barely information, but if you're a small CA and you have say a dozen intermediates you absolutely could arrange for say pro-Party A sites to all use intermediate #6 while also pro-Party B sites used intermediate #8 and then use the resulting data from AIA chasing to measure who is going to "Party A" sites and direct political advertising at those people...
If I'm understanding correctly, the post isn't calling TLS broken, it's calling out the bad behavior of browsers.
By employing mitigations/workarounds, they encourage misconfigured servers, and that in turn produces unexpected or inconsistent behaviors when interacting with those servers through different client types. eg you might see different behavior in FF vs Chrome, or Chrome vs curl/python, etc.
"The reason why service A can't connect to service B and is giving you a crypto PKI error is that you've misconfigured the chain on service B; please fix it by sending your intermediate"
"But it works from the browser."
"The chain is misconfigured, I promise you. Look, here's some OpenSSL output that proves it."
Aside from the browsers, I don’t know how many times I had to fix TLS handshake failures due to the server sending only the leaf certificate or argue with people who insist on shoving the full CA chain (sometimes including the leaf) in the ca-bundle/trustore, even after I link them the TLS RFC.
This really should be better documented and enforced.
Thanks for submitting this article. I wasn't aware it's so easy to misconfigure the TLS settings of web servers. It might also explain some TLS errors I have encountered in the past in Firefox.
It would be super helpful if someone could recommend easy ways to check web servers for bad configurations like this.
It's a little off-topic, but I definitely like the throwback to Microsoft FrontPage. I don't know if this page was laid out in it, or it's just the theme, but I haven't seen it in decades. Nice little trip down memory lane.
At least one author has claimed that even sending the whole chain to the client is a hack. It is a practice that was not contemplated in the original design. Another commenter in this thread has corroborated this fact for me.
Good article but it’s methods. Not methodologies. Methodology is the study of methods. 99.99% of the time you want to say method. Also price, when you said price point.
>Good article but it’s methods. Not methodologies. Methodology is the study of methods.
I find this comment a bit off-putting. Faux "Good article" followed by some incredibly minor grammar corrections. But I'll bite. From which dictionary did you get that definition?
Merriam says "a particular procedure or set of procedures"[1]
Cambridge says "a system of ways of doing, teaching, or studying something"[2]
Both of those definitions fit the article just fine, and aren't "the study of methods"
This is a good behavior, since it means less bytes to transfer per connection. Worst case scenario, the browser doesn't have/can't get the intermediate certs required and the connection fails.
This doesn't really guarantee fewer bytes per connection. In the worse case, the preloaded intermediate set is still insufficient and the client has to resort to AIA chasing instead (which is both slower and leaks the client to the issuing CA).
I understand this behavior from a "make the spurious user bug reports go away" perspective, but it's still pretty gnarly :-)
I agree with the author that the non-deterministic portion of it is mildly insane.
Imagine a junior admin who installs a new certificate (without the intermediate). Tests on their Chrome which happens to have cached the intermediate, validate LGTM and moves on.
Meanwhile it gets deployed and some portion of the site's users don't have the intermediate certificate cached = dungeon collapse.
Preloading like this shouldn't be necessary, I agree with the author, but worse than this is any bug report of "Works in Chrome, not in Firefox." Prior to this preloading behavior shipping in Firefox 75, incorrectly-configured certificate chains were a major source of those kind of bugs [2].
[0] This was me (:jcj) and Dana [1] https://wiki.mozilla.org/Security/CryptoEngineering/Intermed... [2] https://blog.mozilla.org/security/2020/11/13/preloading-inte...