A closer look at CVSS scores

lmeyerov · on June 20, 2022

I'm curious how you'd account to factors like `npm audit` failing on probably most javascript repos out there on reasonably high CVSS settings due to including items like https://nvd.nist.gov/vuln/detail/CVE-2021-3807 ? And particularly, how to fix that, or Working As Intended?

pornel · on June 20, 2022

The security vendors have an ass-covering policy of scoring for the worst possible scenario, without any regard for how implausible it is.

If you had a cupboard in your basement storing a can of spray that had a "warning: flammable" sticker with a weak glue: the sticker could fall off, you could have used it without knowing it's flammable, which could lead you to setting your house on fire. Therefore the CVSS score of "weak sticker glue" is exactly the same as "your house is on fire".

This maximally inflated score is then combined with a primitive reporting policy that alerts about mere presence of dependencies with an advisory anywhere in your dependency tree. Security vendors aren't checking whether the dependencies are actually affected in this context or not. This is a lazy recipe for labelling everything as critically vulnerable all the time.

jacques_chester · on June 20, 2022

It's a double-whammy because not only do security vendors have an incentive to overstate scores, so do their researchers. I talk about it a little in the section on comparing scores over several years.

One thing I didn't analyse that might be interesting would be to see if there's a consistent gap between security vendor / researcher reported scores in NVD and scores given to customers by their general software vendors.

lmeyerov · on June 21, 2022

I keep coming back to something like collaborative filtering

Raw context-unaware scores may come from researchers, but 'real' scores are based on deployment context. So a vuln may have diff final scores based on context, and the community needs ways identifying which bucket they fall into. Likewise, there is probably an 80/20 for most libs in terms of use case, so a good bet on default bucket of typical personal profile might help further decrease overhyping for raw scores. CI vs Server vs Browser vs TCB vs ... .

Ex: npm audit reports on dev dependencies by default, but a regex complexity attack there is less legit, yet that's where most of the alerts seem to be. Instead of pushing to the user, can push a use-adjusted score to the reporting tool, and user can pick sensitivities for both.

Collaborative filtering is one way, and others as well. Without something like thay, scores are largely ungrounded FUD / box checking.

jacques_chester · on June 20, 2022

I'm the author and would be happy to answer questions to the best of my ability.

politelemon · on June 20, 2022

There's an immense amount of detail in that post, I'm still reading through it.

Do you work in this field or did you research all this as a hobby, or both?

jacques_chester · on June 20, 2022

Both. Currently I work in supply chain security with a focus on Ruby. I also spend a lot of time cooperating with others in OpenSSF working groups.

I'd had in mind to write about CVSS for some time. I wrote a few paras and created the first plot, but that's as far as I got for about a year. One Saturday morning about 2 months ago I started thinking about it again and got the itch to finish it. It kept growing as I went because I kept finding more to write about.

Edit: the original ambition for my site was more about applying what I have learned about statistics, psychology and economics to general software development and ops. But now that my dayjob is mostly thinking about security it has veered off in that direction instead (for the moment at least).

politelemon · on June 20, 2022

Thanks for sharing, I always appreciate these insights into fields I often encounter but don't think about. I'll read through it that's a promise!

altharaz · on June 20, 2022

Very great article.

At the moment IMHO the major issue comes from that people use only the Basic Score of the CVSS 3.1, issued by the NVD.

Indeed, if you also take the Temporal Score (with CTI feeds for example), and if you add the Environmental Score, then you can have very good results to help prioritizing the vulnerabilities on your assets and reflect the real threat.

I would also like, however, to see the CVSS4 with a "cost to patch" component: in OT environments, CISO like to use the SSVC because it’s the easiest way to say "wait" instead of "patch now". But since SSVC is not really recognized by all auditors, it generates conflicts. Bringing a component in the CVSS to reflect the cost of remediation on very complex devices, where deploying a KB requires to stop a full factory, could help getting the same results (aka "don’t patch now and wait") but with a more respected scoring system.

From my perspective, that’s the only missing component for a good CVSS system :).

BeefWellington · on June 21, 2022

> I would also like, however, to see the CVSS4 with a "cost to patch" component: in OT environments, CISO like to use the SSVC because it’s the easiest way to say "wait" instead of "patch now". But since SSVC is not really recognized by all auditors, it generates conflicts. Bringing a component in the CVSS to reflect the cost of remediation on very complex devices, where deploying a KB requires to stop a full factory, could help getting the same results (aka "don’t patch now and wait") but with a more respected scoring system.

The issue with this is that the people who are best suited to score an issue from the reporting perspective won't necessarily have any idea what the cost to patch something actually is. This is why CVSS shouldn't be used as a be-all-and-end-all metric for anything -- there are a lot of factors that don't relate to the vulnerability's relative severity that it does not account for.

andy_ppp · on June 20, 2022

I built a CVSS2 calculator amongst other things; I came to the conclusion that they are trying to turn an art (how problematic an issue is) into a pure science and they keep realising they need ever more parameters. I’m glad they have tried but would rather trust a pen tester to help me rank issues within my organisation than attempt to use a one size fits all formula that doesn’t account for my specific situation perfectly.

BeefWellington · on June 21, 2022

"People misapply CVSS" is the crux of the post and all the criticisms (even the ones labeled as something else).

The other criticisms section starts with the "You're doing it wrong" commentary and then moves on to discuss two other groups saying what boils down to "You're doing it wrong and the metric is bad because it encourages you to do it wrong", which as a way of demonstrating diversity of opinion is entertaining at least.

CVSSv3.1 as a metric is not designed to have a uniform distribution of possible values from 0.1 -> 10.0 and it should not generally be a goal to develop a scoring system that does. It is designed solely to answer the questions of "which issue is more severe" when comparing different issues and to then help direct and prioritize fix work. It is not perfect at this but it is superior to other systems out there, especially when taking the pure severity of a given vulnerability in isolation.

I do get that people really do try to sell the idea that it's an infallible metric and that it means something substantially more than it does. It also gets confused often as "X is riskier because its score is higher", which is obviously wrong. If you have an authentication-related product, it's obviously more damaging to discover certain categories of information leakage than it may be to find cross-site scripting issues in general.

I think it is correct for a change in scope to have a much more outsized impact on the final score, something the author seems to sort of presume is wrong (referring to it as the "villain" at one point) without really explaining why they believe it is wrong. A scope change essentially means lateral movement to other systems rather than the compromise of a single piece of software.

Could a better metric be designed? Sure. I'd like to see some additional degrees of user interaction being accounted for, as just one example. The concept of vectors being Network, Adjacent, Local, or Physical could use some more fleshing out for the modern age, for another.

Does that mean alternative approaches are better? Not in my experience. All the alternatives I've experienced basically boil down to "we made our own system, don't publish the calculations, and lots more stuff is critical impact and risk" whenever you get reports. I've literally had third-party pentest teams try to sell me that an Info Exposure that was showing server IPs in a log was a High, because they used their own metric.

I'd argue that for what it is intended to do, CVSSv3.1 does a good enough job and that's why so many people have accepted it as a standard.