Hacker News new | past | comments | ask | show | jobs | submit login
We build X.509 chains so you don't have to (trailofbits.com)
221 points by ingve on Jan 25, 2024 | hide | past | favorite | 83 comments



While the contribution to Python Cryptography is incredibly awesome, I'd suggest that perhaps the lede is slightly buried. Having a large suite of well documented test vectors, including both positive and negative examples, raises the bar for the entire ecosystem. Go and OpenSSL are mentioned, but any other environment implementing X.509 now has another very well developed test suite that can be plugged in directly.

While I'm a huge fan of formal methods for developing secure code, IMHO well developed suites such as x509-limbo[1] with common, good, problematic, and outright sadistically wrong vectors are the way forward for developing robust cross platform protocols. These are the kinds of unit tests a spec should come with.

[1] https://x509-limbo.com


Agreed. I don't see NIST's test suites [1] on the list, though. Maybe they're embedded somewhere and I've missed it?

[1] https://csrc.nist.gov/projects/pki-testing

EDIT If the authors are reading, are any of the test cases taken from Chrome's codebase?


(I'm one of the authors).

I looked at NIST's PKITS while building the test suite. We ultimately decided to punt on it for the first round of tests, primarily because it fell slightly out of scope of our MVP on Python Cryptography (being strictly a RFC 3280 suite). We instead chose to integrate the BetterTLS suite[1] (warning: very large page) for the MVP.

With that being said: x509-limbo is open source, and I would be overjoyed if someone added a new `nist::pkits` or similar namespace for these :-). I can't promise that it's an immediate priority for us to add ourselves, however.

> are any of the test cases taken from Chrome's codebase?

Nope. That would also be a valuable thing to have here.

[1]: https://x509-limbo.com/testcases/bettertls/


For reference: https://github.com/chromium/chromium/tree/main/net/data/ssl

It's been a long time since I looked at them, not sure what's in there exactly any more. Ask Ryan perhaps :)



You may also want to look at https://github.com/digicert/pkilint

There are other similar tools (cablint, x509lint, zlint), but there's been a lot of good talk about pkilint recently.


Check Frankencerts as well if you're interested in that: https://github.com/sumanj/frankencert


> Carcinize existing C and C++ X.509 users.

This could be game-changing for a lot of open source software.

I spent years avoiding X.509 (and ASN.1, for that matter) in my designs because every time someone I trust poked it, a remotely exploitable bug fell out. Most often, it was a Denial of Service issue rather than Remote Code Execution. Moving to Rust would demonstrably improve the security of the entire Internet.

You might be tempted to ask, "What about BouncyCastle?" (or similar queries).

Sure, you're not overwriting the EIP in most Java X.509 bugs, but check the release notes for X.509 and ASN.1 mentions: https://www.bouncycastle.org/releasenotes.html

When I worked for Amazon, we disclosed a few X.509-related vulnerabilities to projects that we almost found by accident.


how would rust fix most of those issues?

they're logic bugs


The "classic" example of this is enums as sum types, rather than a thin wrapper over an integral type: Rust makes it possible to construct in invalid enum variant, whereas plenty of C logic bugs stem from taking untrusted user input and converting it into an enum variant.

My understanding is that Java doesn't allow this directly, but has adjacent historical deficiencies (e.g., not allowing exhaustive enumeration handling until recently).


woodruffw already wrote an excellent comment for this question: https://news.ycombinator.com/item?id=39131723

Rust isn't just memory-safety. The type system also coaxes developers towards eliminating some types of logic bugs.

Not all, granted, but it does move the needle.


I think that attitude vastly underestimates the complexity of a typical TLS implementation

(and I say this as someone who grew up on SML)


> I think that attitude vastly underestimates the complexity of a typical TLS implementation

If you ever get the impression that I'm underestimating the complexity of a typical TLS implementation, I promise you that I'm not. I speak to improvements, not panaceas.

Until the end of last year, I was one of the security engineers that the s2n team at AWS consulted on potential security issues. You will never hear me say anything will magically fix all our problems. Especially with TLS.

However, Rust does bring a lot to the table, so I feel I'm allowed to be excited about not reviewing another X.509 library written in C.


This reasoning doesn't make sense. If TLS is astonishingly complex, which it is, then we absolutely want the strongest type system that can simultaneously represent its complexity and afford developer ergonomics. TLS's complexity is a good reason for types that reflect invariants, not a good reason to give up.


This reasoning doesn't make sense.

I didn't say it didn't help at all, I said I wouldn't expect it to make a significant improvement over Java

(and it's hardly the strongest type system with "developer ergonomics")


I'm not even that good at writing Rust and even I recognize that countless libs I'm using are written in a way, with Rust types, that prevent serious mis-use. In ways that would be infeasible and unergonomic in other languages, or require internal library invariant assertions that are prone to bugs.

Sometimes the errors wind up being nasty, but I've also gotten better at trusting that the compiler is giving me helpful info, even if it's a huge message. And usually those errors indicate some library invariant that I've missed that the type system is enforcing.


yes, hence my comment on SML

while it's nice that the rest of the world is slowly waking up to type systems functional programmers have been bleating on about for the past four decades

... having read through the first couple of pages of bc vulns: even a much stronger type system than rust provides wouldn't appear to help very much in this specific example

however if someone wants to rewrite OpenSSL in Rust that would be a massive massive improvement


> however if someone wants to rewrite OpenSSL in Rust

You mean rustls?


Lacking DTLS at this time, and probably a few other valuable things.


> Moving to Rust would ...

... do absolutely nothing to fix denial of service attacks.


I don't think this is true. Rust cannot prevent all possible forms of denial of service, but there are plenty of underlying DoS causes that Rust either outright eliminates (such as memory corruption without further control) or mitigates through stronger types.

A recent example of this is CVE-2024-0567 in GnuTLS: an invariant that otherwise would likely have been noticed at the type level is instead checked with an assert, leading to a remotely trigger-able DoS.


Exploiting a memory safety crash, leading to a downed service, is the first class of DOS that Rust can help with.


Nor the other myriad of logic and parsing bugs that led to incorrect behavior (more than just denial of service) in the Java library that was somehow not as good as Rust :/.


By itself? No.

The other details covered in the blog post, however, would absolutely do something to fix denial of service attacks.

To wit: x509-limbo


Given the ambiguities and degrees of freedom different TLS clients can exercise during chain validation (mentioned in this post), we wrote a paper proposing that validation policies should be expressed as interchangeable Prolog programs: https://dl.acm.org/doi/pdf/10.1145/3548606.3560594


Congratulations to the authors, this was a feature that was dearly missing from pyca/cryptography. It took a long time to get right.

For the history: https://github.com/pyca/cryptography/issues/2381


Thanks for posting! I had forgotten just how long a road this was, but I'm very pleased that the requirements we talked about years ago have been met. :)

That's not to say this is done! There's a set of features we still want to add and we'd like to gain some confidence in the APIs as structured before we mark them as stabilized.


Looking forward! Thanks for maintaining such an awesome library btw, most of my internal PKI (X.509 and OpenSSH) is managed with it. It's a bliss to work with, even more so compared to other crypto libs. :)


This is an absolute game changer for the ability to write secure PKI-based software in Python. Before now, you had to use pyOpenSSL (ignoring its warning to not use it for anything other than an SSL connection) and avoid the many footguns associated with building the certificate chain on your own, or leave Python entirely and talk to a separate Rust or Go process.

Thanks to Trail of Bits for pulling this off.


Aren't revocations a pretty important part of certificates, considering the whole point of this is security? I don't understand the rationale for skipping.


Revocations are a great idea. In practice, they're both operationally complicated and an unreliable source of truth about certificate validity. In practice, a lot of implementations (including Go's) simply ignore them.

'mcpherrim posted a comment below[1] with some additional information, including changes to the Web PKI and CRL distribution techniques that may eventually make them more useful.

[1]: https://news.ycombinator.com/item?id=39132430


Any chance you could expand on "In practice, they're both operationally complicated and an unreliable source of truth about certificate validity."?


I'm thinking about CRL update windows: a lot of CAs run weekly updates, meaning that CRL changes don't propagate to end users for up to a week at a time. That's well into "security theater" territory for me, given that a compromised CA or EE only needs to fool me for a matter of seconds, once.


Sure, but then you can just use OCSP. If you care about privacy very much, support only stapled OCSP.

CRL is better suited for "high-volume" exchanges, for example for a CA to publish all their revocations [so that a system such as CRLite can be built].


The comment I referenced before contains a talk by 'mcpherrim as well that explains OCSP stapling's own deficiencies.

I think we all want a good solution here. But none of the currently widely deployed ones are that.


I am sorry, I can't watch someone else's talk to understand what you mean. If you're willing to engage in a conversation, could you say here what you think the problems with OCSP are? (If you're not, just ignore me.)

As I see it, the main problem is that browsers don't want to do OCSP any more. Sure, we have a problem with how stapling is implemented in _some_ popular platforms (but not Caddy—hi Matt), but today maybe even those are fixed. The main problem IMO is that browsers won't support must-staple.

In the interest of full transparency, I have a pet peeve about people saying revocation doesn't work. I appreciate that that's the current reality, but—again—only because browsers choose not to implement it. Things would change overnight if Chrome (obviously) changed their stance.


Sure! I'm happy to discuss, my reason for referencing the video is because I think 'mcpherrim explains this stuff better than I do :-).

My understanding is that the primary concerns with OCSP (meaning non-stapled OCSP) are that it (1) leaks end user intent to CAs, such as sites being visited, and (2) in its original form with no transport security, it is no stronger than the adversary that it needs to protect TLS/the Web PKI against (i.e., any network adversary can block the OCSP request and everything will fail open).

OCSP stapling avoids both of those concerns, but requires web server and other stack changes that are not on the "enforcement" path in the way that CAs and other "direct" parts of the PKI are. The talk infers that this is why OCSP adoption has been so sluggish (which I didn't realize was the case before).

FWIW, I'm sorry if I came across as dogmatic here -- revocation is conceptually important, and I think it'd be great if we ended up in a Web PKI where revocation was more effective. But I can also see why browsers have fumbled and/or dragged their feet on the CRLs and OCSP for so long.


Yes, agreed on the primary problem with "classic" OCSP. OCSP stapling and must-staple (for the benefit of other readers: a flag that's set on a certificate to indicate that it's only valid with an attached—stapled—fresh OCSP response) solves that.

When must-staple initially came out, it came to light that web servers viewed OCSP stapling as a performance optimisation. When viewed from this perspective, it's not necessary to staple 100% correctly; if you don't, clients just go to the CA directly. Unfortunately, this breaks apart with must-staple and clients who refuse to talk to Cash for privacy reasons.

Browsers refused to enforce must-staple because that would mean broken web sites and they didn't want to be blamed for it. As a result, web servers didn't have an incentive to fix their broken OCSP stapling implementations.

There's also some politics involved. Browsers don't actually want CAs to be able to revoke certificates for reasons other than security. Think one government pressuring a CA in their country to revoke a certificate of an entity outside the country, for political reasons.

(EDIT I don't really support this argument. Any organisation that cares about their availability should always have two active certificates from difference CAs. There's a variety of CA jurisdictions to choose from as well.)

I don't like these long discussion chains as much as the next guy, but there's a lot of nuance and history when it comes to certificate revocation, and I sometimes take the opportunity to gently question people's beliefs. The line "revocation doesn't work" is very often repeated and technically it's not wrong, but I think it's important for people to understand how we got here.


[Not OP]

CRL/OCSP (which are the only revocation mechanisms close to being generally used standards today) require either online verification (which is unsuitable or unsafe for many applications) or CRL distribution with CA trust root bundles, which is always in danger of being stale.

You can read more in https://en.wikipedia.org/wiki/Certificate_revocation_list#Pr..., https://en.wikipedia.org/wiki/Online_Certificate_Status_Prot...

Revocation in PKI is a hard problem that exposes a fundamental tradeoff between time to revoke and privacy/offline capability.


Not quite. OCSP responses can be stapled to TLS handshakes to provide fresh revocation information. Online is needed if you want 100% fresh information, of course.


PKI uses are not limited to TLS, and TLS uses are not limited to those where revocation information is readily accessible (due to proxying/distribution/compartmentalization/federation constraints).

There is a variety of signature standards and protocols out there that rely on PKI to manage trust, and whose users are not at all keen on calling an untrusted URL to check an intermediate cert.


That's a fair point [PKI without TLS], but I'd argue that 99.9% of situations will be with TLS. Thus supporting revocation can provide value for those.

Out of curiosity, what without-TLS use cases do you have in mind?


Most SSO runs on SAML, which uses XML Signature, which uses PKI certificate chains without TLS.

Even if you think all SSO will migrate to OIDC, the EU has enacted laws around electronic signatures that basically enshrine the use of XML Signature (with all its problems) for the foreseeable future. A lot of countries beyond the EU (many LatAm countries, for example) have followed suit.

There are other signature protocols that use certificate chains as well, and devices and networks that should not be making network calls when validating such signatures.

And as I mentioned, TLS uses are not limited to those where revocation information is readily accessible.


Fantastic!

Curious about revocations though - the post doesn't mention it?


Thanks for calling this out. Revocations are not currently supported by the public API; we made a decision early in the MVP planning process to exclude them for complexity reasons.

(This is consistent with Go's crypto/x509, which does not do revocation checking as part of the `Verify` API.)


Do you think that it's possible to implement a good path building algorithm without support for revocation? For example, without revocation checking you may select a chain with a revoked certificate in it, only to discover that later. Then what? :) On the other hand, if you check the revocation status as part of the process, you can pivot to another certificate.


I think so -- I would consider crypto/x509's implementation good, and they similarly don't support revocation. More bluntly: it's not clear that revocation really works at all in the Web PKI; I've heard it described more as "homeopathic" than an actual security mechanism.

What makes a path building implementation good is (IMO) its adherence to the rules Ryan Sleevi enumerates[1]: as long as you conceptualize the path building process as a dynamic search problem rather than a static one with a "single" result, any implementation you build will probably have the right "primitives" for future constraints (like revocations).

[1]: https://medium.com/@sleevi_/path-building-vs-path-verifying-...


I recently gave a talk on this topic (sorry, haven't written as a blog post yet) https://www.youtube.com/watch?v=4TbtL73ibh0

I do believe revocation in the webpki is important to implement, but it's not great today.

There are two promising paths forward, which are complementary: The simple option is short-lived certificates (7 days), which essentially don't require revocation as that's the timeline existing revocation technology can revoke on, but that has some adoption hurdles to overcome in the future.

The other solution is what Mozilla is doing with CRLite: https://blog.mozilla.org/security/2020/01/09/crlite-part-1-a...

They are collecting all CRLs from all public web CAs and producing a small, compressed representation of all revoked certificates. That can be pushed to clients, so there's no need for any network traffic (like OCSP checks or CRL downloads) to the wide variety of CAs in the ecosystem.

These "CRL Compact and Push" solutions benefit from shorter-lived certificates (eg, 90 days like Let's Encrypt does right now) because the set of revoked certs is smaller as they roll off once they expire.

I suspect platform verifiers like on Windows and Mac will hopefully ship something like this in the future, though they already have CRL and OCSP fetching and caching.

Non-platform verifiers like this one are really the hardest part to solve revocation. I would suggest supporting CRL checks, but that requires having disk space and caching infrastructure, which is hard for a library to know how to do universally. I think the answer is that we need to have most of our software outsource to a platform verifier, or at least some sort of system-wide revocation information cache or daemon. Maybe this means a new systemd component on Linux :)


There is also the third option, via must-staple, where OCSP responses are attached to the TLS handshake. This resolves the privacy issue [because otherwise clients have to talk directly to CAs and reveal what sites they're visiting].

All (big) browsers have mechanisms functionally similar to CRLite. Apple has "valid" (not sure if that's the official name) and Google has CRLSets.


Google's CRLSets don't cover nearly the full set of revocations (AFAIK).

I don't know the details of how "valid" works, but as the Apple root program has recently required CAs to publish full CRLs, I assume they're planning something similar.

I mostly cite CRLite as it's well-documented.

I don't think OCSP must-staple is deployable. It requires code changes to effectively every webserver in the world, and that doesn't seem to be happening. I think it is stuck in a spiral of non-adoption, where there's no incentive for anyone to make progress. I believe short-lived certs (7 days) are more deployable in the ecosystem today than ocsp must-staple, with similar security properties. Certificate automation is already desirable, and it's not as big a leap once flows are automated.

This isn't going to happen overnight at least. Years, probably many. But it's happening.


"The spiral of non-adoption" is an interesting blog post someone should write. I bet there are a couple more examples besides the obvious one.


It could also be a festival/performance site for unadopted technologies. Sunday, Sunday, key signing ceremony at Non-adoption Spiral!


I think you're right about CRLSets. IMO, Mozilla really wanted to solve the problem, whereas Chrome just wanted to have a solution that they could use for emergency revocation of certificates of high-profile sites (and intermediates/roots). Small-time sites won't be in CRLSets, although if you know the right people and make enough noise you may be added to their list. As for Apple, no one knows what's going on because they don't like to share :)

OCSP must-staple doesn't require code changes. It's a "flag" you set on a certificate. Maybe what you mean is that OCSP stapling is not enabled by default on many installations, and that's true. IIS got it right, and Caddy (obviously, the only platform that got everything right).

Again, we're in this place only because browsers don't care about revocation. If they pushed for it, things would fall into place very quickly. But they're going in the opposite direction.

Short-lived certificates are great, assuming you're fine with a 3.5 day (on average) window of opportunity for exploitation. There's a potentially major problem with clock skew, as many clients have inaccurate clocks. Personally, I recommend that certificates are obtained at least a week before they're deployed; a month would be better. Then rotated a month before they expire. That's how you minimise the problems due to clock skew.


Apple shared the details of valid.apple.com in a WWDC talk: https://devstreaming-cdn.apple.com/videos/wwdc/2017/701jvytn...

Slides: https://devstreaming-cdn.apple.com/videos/wwdc/2017/701jvytn...

You can also find the client-side code if you dig around under https://opensource.apple.com/source/Security/

It aims to cover all certificates, similar to Mozilla's CRLite.


As far as I can figure out, most ocsp staples are valid for 7 days, so short lived certificates would be equivalent, but simpler?


Indeed, you're right. Must-staple and short-lived certificates have the same problem with clock skew.


Exactly. Because you currently don't check for revocation, you may end up selecting a path that's a dead end and you won't be able to recover. However, if your implementation has pluggable [dynamic] constraints, once you add support for revocation, it will work optimally.

You can run into the same problem with any other constraint, for example hash function deprecation, CA block, and so on.

Aside: Revocation works just fine, it's just that browsers decided not to implement it.


I don't think pluggability is the important thing here: a good path building implementation doesn't need to be pluggable to handle dynamic constraints correctly. More precisely: a "dead path" is only a problem if it somehow excludes otherwise present valid paths, which is not an issue if you chose to not support CRLs at all (you might accept revoked paths, but you certainly won't exclude non-revoked ones).

One of the problems with X.509 PKIs is that you can make them fractally complex; every implementation ends up setting its own "I give up, doing this is too painful" point. CRLs are a common "give up" point, and that probably won't change until CRLite or similar changes things here.

(AIA chasing and OCSP lookups are also similar pain points.)


Well, yes. If you choose to ignore revocation entirely, you will never have a dead path because of it. That will happen only if you path-build, then check revocation as two steps. (IIRC, Windows used to do this, or is still doing it.)

IMO, a good path building implementation has to support revocation, but that's for another conversation :)


Not planned for MVP or not planned ever?


Not planned for the MVP, but not necessarily excluded in the future. I think it'd require a substantial design period, but that's my impression as an external contributor.


That's a whole another can of worms.


I'm sure it was even worse before, but I'm not entirely happy with the look of that.

* I shouldn't have to care where the certificates are stored. Just load the os default ones without asking me.

* I shouldn't have to know what a pem is, and I shouldn't have to open() one.

* I don't want a PolicyBuilder. Just give me the normal policy.

* I shouldn't have to construct some verifier object specific to a given dns. I don't want to verify a billion certificates for one domain, just one.

* I shouldn't have to know what an untrusted intermediary is.

Here's what I think it should look like

  verifier = Verifier()  # constructs a verifier with sensible defaults, overrides are possible though
  verifier.verify(chain, "cryptography.io")


> * I shouldn't have to care where the certificates are stored. Just load the os default ones without asking me.

> * I shouldn't have to know what a pem is, and I shouldn't have to open() one.

Agreed, but what you're requesting is separate from the work being discussed in this blog post, and both are actually compatible.

For the PHP community, we made Certainty - https://github.com/paragonie/certainty

You can just...

  <?php
  use ParagonIE\Certainty\RemoteFetch;
  // cURL boilerplate
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);

  // Fetch the latest CACert bundle, verify its authenticity, save it locally  
  $fetcher = new RemoteFetch('/path/to/certainty/data');
  $latestCACertBundle = $fetcher->getLatestBundle();
  curl_setopt($ch, CURLOPT_CAINFO, $latestCACertBundle->getFilePath());
Writing a Python client should be relatively straightforward, should anyone want to.

For Python specifically, there may be some value in storing the relevant PEM files into something like SigStore. That could be an easier proposal for the PyCA team to consider.


This looks great!

I wanted to bundle a chain of certs, check that the leaf was <some domain> and its key signed <some data>, and that the bundle provided a path to a locally-trusted root. This allows third-party serving of signed data, rather than having to trade your IP address for validation.

Doing so with existing tools was so terrible that I was pretty sure my result was grossly insecure :(

I look forward to the promised lower-level APIs!


I'm curious -- what was the inputs to the decision to redo the logic from scratch, rather than wrap something like the `webpki` crate?


This is a great question! The answer is twofold:

1. Compatibility: PyCA Cryptography has strict compatibility requirements, including being able to build with older versions of Rust that crates like webpki and rustls-webpki may not be interested in supporting.

2. Generality: PyCA Cryptography is a general purpose cryptographic toolbox, and the plan for X.509 validation is not intrinsically linked to just the Web PKI profiles. In the future, they may wish to make their APIs more flexible than a strictly CABF validator would require, meaning that they would either need to carry patches against `webpki` or get such flexibility upstreamed (to the detriment of the crate's single purpose).

(I am not a maintainer of Cryptography; this is my understanding of the reasons as one of the implementors of this feature.)


Genuine question: does the `webpki` crate provide a good path building implementation?


Meanwhile, JVM ecosystem looks down on the choice between OpenSSL and pure-Rust Python chain, because it have been using a managed, pure-Java X.509 chain for 25 years now.

I believe it also had various implementations under common API.


Note: If your application does not absolutely need it, there are alternatives to X.509 certificates in TLS, like raw public keys (RFC 7250).



GnuTLS also has support for it since five years ago.


can OpenSSL use this as well? This would be amazing


Do we have this in PHP? Javascript ?


I think getting it into cURL would be more impactful, since that's already tapped into by many OSS projects.


[flagged]


We've asked you more than once to stop posting unsubstantive comments and flamebait. You've continued to do it, for example in the parent comment, https://news.ycombinator.com/item?id=39108095, and https://news.ycombinator.com/item?id=39103476.

You've even posted outright slurs like https://news.ycombinator.com/item?id=38832516, which is seriously not ok.

Since you've continued to break HN's rules like this, I've banned the account. If you don't want to be banned, you're welcome to email hn@ycombinator.com and give us reason to believe that you'll follow the rules in the future. They're here: https://news.ycombinator.com/newsguidelines.html.


Holy fucking shit. I've never seen anything like that last comment on HN before.


That's good, because they certainly get posted. Such things are inevitable on any sufficiently large public forum.

If you haven't seen them, it must be because they're mostly getting flagged and/or killed by users and/or moderators, which is the desired outcome.


Yes.

Python was designed from the beginning to be compatible with C; it can make calls directly into C modules. There is an entire portion of the Python ecosystem dedicated to C interoperability. Many of the most popular packages are written in C (e.g. numpy, pandas, etc.) and many other packages have the most performance-sensitive code written in C (scipy et. al.)

[ed] s/pansas/pandas/g


A considerable number of common packages in the Python ecosystem are backed at least in part by another language. This is not a shortcoming, it's a feature. Give end users flexibility while figuring out performance and other beasts internally.


> Is it really “Python” if the underlying stuff is written in another language?

Given that "python" is basically synonymous with "cpython" and cpython has always been implemented in C, I guess the answer to your question is "yes".


A language is distinct from its implementation language. Else everything is machine code.


I never argued otherwise. But how is using a package which is implemented in another language not "python"? How is "list(range(10))" python whereas "numpy.arange(10)" is not?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: