
The Verizon DBIR relies on questionable vulnerability data and poor analysis - zerointerupt
http://blog.trailofbits.com/2016/05/05/the-dbirs-forest-of-exploit-signatures/
======
tptacek
This is going to read like super ultra inside baseball for most people on HN,
but I can sum it up quickly for you:

Every year, Verizon publishes a report, called the DBIR, that aggregates the
preceding years security incident data. It's a marketing thing for Verizon,
which has a large security consultancy, but it has also become pretty
important to Fortune 500 security teams. People make budgeting decisions based
on it.

This year, Verizon teamed up with a company called Kenna to add an additional
section about vulnerability prevalence. The section is supposed to tell
Fortune 500 security teams which vulnerabilities were most likely to have been
exploited last year.

Kenna's analysis is gravely flawed. You don't really even need to know the
details to see that it is. That's because the "top 10" vulnerabilities they
produced, ranked by "successful exploitations", included the TLS FREAK
vulnerability.

FREAK is an important TLS research finding with public policy implications,
but it is not important to enterprise security teams, because it is unlikely
that it has _ever_ been exploited on the public Internet. It's a MITM attack
whose payoff is a reduced-strength RSA factoring problem that costs $75-100
per session to complete. Were Kenna's data accurate, it would impute to
attackers over $330,000,000 in factoring CPU time costs. A more likely figure
for the amount spent by attackers to solve RSA problems to capture the
contents of individual TLS sessions after MITMing them is $0.

Remember that whatever your thoughts on FREAK's real-world applicability, the
DBIR report claims FREAK is in _the top ten of all exploited vulnerabilities
on the Internet_.

So the fascinating thing about this is that you have this super important
result where it's clear that nobody who understands a key topic it reports on
could possibly have checked the result.

~~~
dguido
I hope this will become a parable for applying "data science" without
sufficient input from practitioners in the field.

