

A New Development for Coverity and Heartbleed - neuroo
http://blog.regehr.org/archives/1128

======
jdp23
Interesting approach! Kudos to Coverity for jumping on it so quickly.

Taint analysis is notoriously prone to false positives; as well as the reasons
listed in this post, there are many situations where relations between
variables mean that tainted data doesn’t cause problems. [For example, the
size of the memcpy target (bp) is known to be greater than payload; so even
though payload is tainted, there isn't a risk of a write overrun.] But even
noisy warnings can be very useful — when we first implemented simple taint
analysis in PREfix a decade ago, the first run was 99% false positives but one
of the real bugs we found was in a system-level crypto module. So with the
increased attention to these kinds of bugs after Heartbleed, seems like a
great time for more attention to these classes of bugs.

~~~
hackinthebochs
>first run was 99% false positives but one of the real bugs we found was in a
system-level crypto module.

Thinking of it as a false positive seems like the wrong perspective. The
static analyzer is a tool that flags usages that are not proven to be correct.
The fact that it turned out to be valid isn't the issue, the issue is that
your code did not prove it valid to the satisfaction of the analyzer. This
isn't necessarily a failing of the analyzer, but an indication that your code
should be written in a different way, or provided more "evidence" that its
correct (i.e. if guards/size checks).

The goal should be to write code in such a way that whatever tool you're using
can prove it correct. Sure, the better the tool the easier this process is.
But we really need to fundamentally rethink how we approach this problem.

~~~
arghnoname
I was once talking to someone with a lot of experience in this field and he
said that false positives were one of their biggest problems. If you have too
many false positives programmers end up deciding the analyzer is full of crap
and either dismiss the results entirely or gloss past many ultimately useful
results.

A static analyzer that will actually be used can't have too many false
positives, and this is the big challenge with these things. He said that
allowing some false negatives (to cut down on false positives) made the tools
more effective in actually solving problems.

That said, with something like openSSL, you do sort of just wish the
programmers would deal with it. Language design should include elements to
make these sorts of static analyses easier.

~~~
hackinthebochs
That's an interesting idea, to have a language and a static analyzer created
for each other simultaneously. Constructs that are hard for a static analyzer
to reason about would be left out (or relegated to an "unsafe" context). I
wonder if there's any work regarding seeing how well rust holds up to the
state of the art in static analysis?

------
pjungwir
> additional locations in OpenSSL are also flagged by this analysis, but it
> isn’t my place to share those here.

Why do I get the feeling that we're going to see three months of new OpenSSL
vulnerabilities, like we saw with Rails last year? I'm sure Heartbleed plus
all the bad press about code quality means a lot of people are suddenly
looking. Assuming there is more to find, does anyone have any advice for how
we might prepare for it?

------
vladd
In case you want to try out static analysis on your own code-base, here's a
link with the list of most popular tools in this field, grouped by language:
[http://en.wikipedia.org/wiki/List_of_tools_for_static_code_a...](http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis)

~~~
LVB
Visual Studio now ships with a reasonably good static analyzer built in. We're
using it more than our aging copy of PC-Lint.

(Carmack's review: [http://www.altdevblogaday.com/2011/12/24/static-code-
analysi...](http://www.altdevblogaday.com/2011/12/24/static-code-analysis/))

~~~
TwoBit
VC++ analysis is useful aside from the fact that it mistakenly thinks every
pointer usage is a potential null pointer, but I've gotten better results with
clang static analysis.

~~~
marshray
Every pointer _is_ potentially null, until proven otherwise (which in the
general case means solving the halting problem).

------
kyberias
Interesting: "As you might guess, additional locations in OpenSSL are also
flagged by this analysis, but it isn’t my place to share those here."

~~~
nodata
Anyone know if these have been bug reported?

~~~
neuroo
Not yet, we are looking at them right now.

------
lbarrow
It's super cool to see the power of advanced static analysis these days. Props
to the Coverity team for using the Heartbleed trainwreck to motivate new
research on these problems.

That said, are there other ways to fix this class of problem? We have choices.
We can continue to build ever-more-advanced tools for patching over the
problems of C and C++, or we can start using languages that simply do not have
those problems.

There will always be a need for C and C++ in device drivers, microcontrollers,
etc. But there's no compelling reason why SSL implementations in 2014 should
use languages designed to run on mainframes in 1973.

~~~
pcwalton
Well, usually this stuff is written in C because other languages come with big
runtimes that make them unsuitable for utility libraries that need to be
callable by everyone and from everywhere.

That said, we're of course working on changing that with Rust. But I should
note that memory safety without garbage collection is just _hard_ : it
requires the entire language design to be balanced on a delicate precipice.
It's not surprising that it's taken a long time to get there.

~~~
AnthonyMouse
> But I should note that memory safety without garbage collection is just
> _hard_ : it requires the entire language design to be balanced on a delicate
> precipice.

I was thinking about this recently and I think a large part of the problem is
that C arrays are too weakly typed. Array should be a different type than
pointer and they shouldn't be convertible. In particular, you shouldn't be
able to subscript a pointer, and the in-memory representation of an array
should begin with its length. At that point the compiler can include a runtime
bounds check for every array access that it can't prove is safe at compile
time.

~~~
im3w1l
>runtime bounds check

The reason people use c, is to get higher speed by avoiding those kinds of
checks. A real solution most avoid slowdowns.

~~~
pjmlp
If the reason was to avoid those checks, then people really don't know better.

Turbo Pascal => {$R-}

Ada => pragma Suppress (Index_Check);

Modula-3 => cm3 -A

D => dmd -fnoboundscheck / array.ptr[index] in @system code

Just a few examples. Additionally there are lots of research how to remove
bounds checking in compiler optimizations

[http://citeseer.uark.edu:8080/citeseerx/showciting;jsessioni...](http://citeseer.uark.edu:8080/citeseerx/showciting;jsessionid=749D9B8FE7B1755974DA5561D1926781?cid=71083&sort=cite&start=10)

What C has going for it is the symbiotic relationship with UNIX, 40 years of
optimizations in most compilers and the historical baggage of being around for
so long.

Other better languages suffered from having been shown the door as UNIX spread
into the enterprise, but there was a time when C compilers didn't generate
better code as the other ones.

------
apaprocki
Is this implemented in Coverity by a local model? (There is reference to a
model being applied) Or was the actual product modified to support this? Can
Coverity customers get ahold of this now?

~~~
neuroo
The "model" makes reference of the model injection for memcpy.

The modification made by the team is referenced in John's blog post "Their
insight is that we might want to consider byte-swap operations to be sources
of tainted data".

As Andy said (and quoted), that's a modification that we need to evaluate
overall to look at its impact in term of false positives (FP). It will
probably be made available however under some options if it doesn't pass our
acceptance tests for FP rate though... a bit too early to say.

~~~
apaprocki
Thanks, I was just curious if customers could play with these kind of
experiments if they understood the FP potential. I really like Coverity's
output and always like new ways to tease out potential bugs.

