

What is static program analysis? - p4bl0
http://matt.might.net/articles/intro-static-analysis/

======
tedunangst
"You can't solve the halting problem" was high on the list of responses that
made selling static analysis a pain in the butt.

~~~
ScottBurson
Yes, even people who are aware of the halting problem often misunderstand its
implications. What Turing proved is that no _algorithm_ \-- in the sense of an
effective procedure, that is guaranteed to return the correct answer in finite
time on any input -- can solve the halting problem. But there is a critical
difference between _algorithms_ and _heuristics_. A heuristic is not
guaranteed to do anything, but nonetheless, collections of heuristics can be
assembled that can analyze properties of programs _in many cases of actual
interest_.

The distinction between algorithms and heuristics is, in my opinion,
insufficiently appreciated. For example, the very phrase "genetic algorithms"
is a contradiction in terms -- these are clearly heuristics, not algorithms.

Anyway, back to your point, we who are still selling static analysis -- I'm
not sure why you used the past tense there; it's a growing business -- find
that people who don't understand the halting problem at all are perhaps as
much a source of difficulty as those who are aware of it but don't understand
its implications. The latter group are skeptical that our products can work at
all, but the former are often baffled and indignant that the products don't
understand properties of their code that seem obvious. (The fact that analysis
often takes a long time and needs large amounts of RAM also surprises some
people.)

But I think the world is gradually coming to understand what static analysis
can and can't do. It's an education process and will take time.

~~~
scott_s
The kinds of static analysis you're talking about are different than what Matt
is talking about in his article.

Matt is talking about situations where you _can_ make guarantees. That is, if
his analyzer says that a register can never take on a negative value, that is
a guarantee. It is not based on a heuristic, it is based on a proof. How this
guarantee is possible, despite the halting problem, is because he designs an
abstract representation of the program being analyzed. It loses a lot of
information, but by using it, one can actually prove things about the program.

If I understand you correctly, you're talking about heuristics which allow one
to say "It is very likely that this program will exhibit that behavior." It's
not a guarantee, because it's not based on a proof, but rather on heuristics
which try to match patterns in the code in question to patterns in code that
was known to exhibit such behavior. Carmack talks about using such static
analysis in his projects in a popular blog post (HN discussions:
<http://news.ycombinator.com/item?id=4543553> and
<http://news.ycombinator.com/item?id=3388290>).

In simpler terms, Matt's approach avoids the halting problem by only providing
guarantees on a limited set of behaviors. The approach you're talking about
avoids the halting problem by not providing guarantees, but by providing
likelihoods.

~~~
ced
I agree with your post, though I am curious to know if there is any use for
"this function is likely to terminate" inside static analysis.

 _How this guarantee is possible, despite the halting problem, is because he
designs an abstract representation of the program being analyzed. It loses a
lot of information, but by using it, one can actually prove things about the
program._

I would rather say: the halting problem shows that there exists _at least one_
program for which it is impossible to prove termination - not that _every_
termination problem is undecidable, as the trivial example "return 5;" shows.

~~~
scott_s
The conclusion "this function is likely to terminate" may not be helpful, but
"this function is likely to leak memory" may be extraordinarily helpful.

The reason that everyone brings up the halting problem in this context is not
that we want to test programs if they halt. (Well, we may, but we're generally
looking for other behaviors.) It's because for some behaviors, if you want to
be able to handle _all_ cases, then that problem can be reduced to the halting
problem. Matt provides an excellent demonstration of this with array indices.

I'm confused by your restatement, because it does not restate what I said.
You're talking about the halting problem. I'm talking about proving general
behaviors in programs, despite the theoretical result that solving the halting
problem is impossible, and that many behaviors can be reduced to the halting
problem. The solution, as Matt explains, is to find ways to describe behaviors
so they _do not_ reduce to the halting problem. Using this technique, you
still can have false negatives ("I don't know"), but you can guarantee no
false positives.

~~~
ced
_I'm confused by your restatement, because it does not restate what I said._

Indeed! I hadn't quite made out what you were trying to say in that post - my
statement was not relevant.

------
SkyMarshal
Something I've wondered but never got around to asking - why is it called
Static Analysis? Why not Dynamic Analysis, or Runtime Analysis, or something
else?

~~~
ScottBurson
Precisely because it's done _without_ actually running the program. Dynamic
analysis is where you do actually run the program.

------
d0mine
Analyzing sign example doesn't work in C where signed integer overflow is
undefined behavior.

Example [http://stackoverflow.com/questions/12729110/strange-
integer-...](http://stackoverflow.com/questions/12729110/strange-integer-
behavior-with-gcc-o2)

~~~
mattmight
Right.

It works for the register language with infinite precision integers.

If you want it to work for C, you'll have to redefine the abstractions of
addition and multiplication (and every other operation) appropriately.

You'll also have to parameterize it by word size, endianness, signedness and
type of arithmetic.

That is, sound static analysis for C is almost certainly going to be platform-
specific.

But, you can still do it.

~~~
wladimir
_But, you can still do it._

Indeed. Arguably, compilers do it all the time to be able to (safely) do
optimizations. Hence static analyzers are usually an extension to an compiler.
Frama-C kind of being the exception, using CIL instead of the intermediate
representation of a production compiler.

------
randartie
I wish I knew what I was reading right now

------
clobber
Interesting write up. LLVM defines it as "a collection of algorithms and
techniques used to analyze source code in order to automatically find bugs."
<http://clang-analyzer.llvm.org/>

