
John Carmack on the importance of Static Code Analysis - plinkplonk
http://altdevblogaday.com/2011/12/24/static-code-analysis/
======
evmar
Having done similar work for Chrome, I can attest to the fact that large code
bases are full of errors.

If you're not on Windows, using both gcc's -Wall and -Wextra along with Clang
in the same way is a good start. (Here's a post with more details:
[http://neugierig.org/software/chromium/notes/2011/01/clang.h...](http://neugierig.org/software/chromium/notes/2011/01/clang.html)
.) The Clang static analyzer wasn't very useful at the time I tried it because
it didn't analyze C++ code. Valgrind also finds a lot but it is harder to be
diligent about fixing.

The PVS Studio guy (mentioned in Carmack's post) ran our code through it as
well and also found a number of bugs, as described in a few posts:
<http://www.viva64.com/en/a/0074/> <http://www.viva64.com/en/b/0113/> . (As
Carmack supposed, they also claimed the Chrome code was some of the best
they'd seen. But it's more likely they were being truthful in both cases.)

They've also ran the Chrome code through Coverity, but I haven't been involved
in fixing those bugs so I don't know how useful it was. Searching the bug
tracker for [coverity] turns up a handful of bugs, but it's possible more are
hidden for security reasons.

~~~
masklinn
> As Carmack supposed, they also claimed the Chrome code was some of the best
> they'd seen. But it's more likely they were being truthful in both cases

Carmack said that for Coverity though, not for PVS.

~~~
AndreyKarpov
PVS-Studio vs Chromium - <http://www.viva64.com/en/a/0074/>

PVS-Studio vs Chromium - Continuation - <http://www.viva64.com/en/b/0113/>

------
kaffeinecoma
I once caused a serious, halt-the-enterprise production bug by "fixing" a
problem found by FindBugs. This was Java code, something along the lines of:

    
    
      Boolean b = new Boolean(true);
    

The static analyzer correctly identified this as an unnecessary new object
creation (style guides and good sense recommend you simply use Boolean.TRUE).
I "fixed" it, and went on my way.

Little did I realize that this variable was actually a lock, and there was a
synchronized(b) block later (and much deeper) in the code, which I effectively
eliminated by removing the new.

In my defense I feel that the real bug here was that of documentation- had the
variable been named something like "lock" I'd have understood immediately what
was going on. But that doesn't make you feel much better when your team's been
up all night fixing your bug!

Moral of the story: your codebase (especially if it's an older one) might
actually be depending on its "bugs" for proper behavior. Think (and test) hard
before applying suggested changes from static analysis.

~~~
StavrosK
Sure, but the bug in this case is that there wasn't a comment specifying the
reason for the unconventional behaviour.

~~~
scott_s
I agree. If you know you're writing something that is un-idiomatic or you
think its intended purpose will be a surprise to most readers, put a comment
in explaining why.

    
    
        // we need a heap object so we can synchronize on it later
        Boolean b = new Boolean(true);

~~~
groby_b
This points to a deeper problem of static analysis, though - any analysis
package without the ability to annotate code is _doomed_. The false positives
will be so annoying that people will give up on it.

And for the people who work on SA systems - _please_ give me a way to annotate
that is not exclusively via comments. Especially once people use multiple SA
packages, that is rather annoying :)

------
tikhonj
This is exactly the sort of thing Haskell is great at. First, the type system
catches all sorts of errors at compile time (both null-pointer and printf
issues cannot come up in Haskell).

However, more fundamentally, Haskell code just naturally provides _much_ more
information to static analysis tools than any other language I've worked with.
Even if the level of tooling is not there yet (I haven't worked on any large
projects, so I am not entirely familiar with it) the _potential_ for these
tools is much greater in Haskell. I think programs like HLint are already very
thorough. I've just been using Haskell as more of a hacker language than a
"bondage and discipline" language and haven't bothered with these tools :)

~~~
syaramak
I'm curious to know if this is because Haskell doesnt support nulls or is
there something else that makes it better at catching type errors.

~~~
masklinn
It's better at catching type error in general (because it's much stricter in
its handling of types), having moved the concept of nullability into the type
system is just an (easy to understand, for most developers) example of that.

But it goes beyond that, even more so as the haskell culture builds upon this
and actively encourages taking advantage of the type system by encoding as
many things as possible in the program's types, where they can be statically
checked.

------
latchkey
Doing analysis like this also has a huge impact on broken window theory. If
engineers see a whole bunch of compiler warnings, then they don't think twice
when they see just one more and it could be a really valid warning. It also
gives a good sense of ownership and commitment to the codebase if everyone
agrees to not check in code with warnings. Also, when you have new engineers
copying and pasting code to get stuff working quickly, you certainly don't
want them doing that with buggy code.

~~~
rmc
_If engineers see a whole bunch of compiler warnings, then they don't think
twice when they see just one more and it could be a really valid warning_

As a real life example of how this can happen, PHP 5.3.something release had a
serious bug with MD5 hashes essentially not working. (cf.
<https://bugs.php.net/bug.php?id=55439>
<http://www.php.net/archive/2011.php#id2011-08-22-1> ). Apparently there was a
unit test for it, but there was ~200 failing unit tests, so they ignored it.

~~~
InclinedPlane
In my experience an even more pernicious problem is unit tests that are
unreliable or too expensive (in time especially). Devs too easily come to
allow unit test suites that routinely take reruns in order to get to 100%
passing. Similarly, when build verification tests or CI tests get too bloated
it can be hard to pick where to draw the line.

------
asb
D. Richard Hipp and the SQLite project have not had such a positive experience
with static code analysis. They already use a _massive_ amount of testing
though. There's also no mention of commercial tools like Coverity.

See the "Static Analysis" section: <http://www.sqlite.org/testing.html>

~~~
groby_b
At least this passage is complete nonsense: We cannot call to mind a single
problem in SQLite that was detected by static analysis that was not first seen
by one of the other testing methods described above

 _If_ SA discovers a problem, you'll discover it the moment you run it through
the analyzer, while by the dev's admission many of their bugs are discovered
as bug reports. Which is clearly a bit later :)

Now, it might well be that most of SQLite's bugs simply are not discovered by
SA. But SA is not going to report them later than bug reports, unless you use
it very infrequently.

~~~
rogerbinns
Also look for 22nd November in the timeline -
<http://www.sqlite.org/src/timeline>

3 problems were detected by Coverity and fixed in SQLite.

------
erichocean
Not mentioned in the article are two nice static analysis tools: the Clang
Static Analyzer (<http://clang-analyzer.llvm.org/>) and Klee
(<http://klee.llvm.org/>).

Both are LLVM-related projects (and there's a few others as well, but these
are the two "big" ones).

~~~
stephth
Worth noting that the Clang Static Analyzer is bundled in XCode (I think since
XCode 3.2).

------
DanielBMarkham
Great article full of lots of insights. Here's some of the ones I got.

\- People are generally happy with what they have: digging around is not a fun
thing to do.

\- It is very easy for the maintenance programmer to make assumptions about
the preconditions for a piece of code that is not valid.

\- Size of code is a critical metric for quality.

And most importantly (and probably overlooked), quality is just but one metric
of software. The name of the game here is providing value to the customer, not
about writing perfect code. John kind of throws that out there in a pro forma
way, then goes ahead without digging any deeper. Oddly enough, I can't really
draw any conclusions about static code analysis, the topic of the essay,
without a clear definition from the author about what the trade-offs are.
We're left with "just use it" as a conclusion.

After reading this, I wonder if programmers don't get stuck on the same
general level of abstraction and this staying-in-the-same-level thinking
introduces unnecessary code complexity. To illustrate, let's try a thought
experiment.

Suppose there was no modern OS -- just a x86 compatible CPU and BIOS -- and
you were supposed to put an image on the screen stored on a USB drive.

It would involve huge amounts of work -- code to get information from the
drive, code to understand images, code to respond to the keyboard, etc.

The reason we can do this so easily today is that whatever we write is
basically in a DSL that sits on top of other DSL/APIs. We are working at a
higher level of abstraction.

I wonder if putting programming projects on a "code diet" isn't something we
should try more often. Announce that whatever our solution is, it's not going
to be more than 10KLOC. If we have to split into teams to provide layers, we
will. Each team has 10KLOC and should create a DSL at their particular level
of abstraction.

This forces us to keep project codebases very small, yet should provide just
as much freedom to create very powerful software as we have today. I
understand that many will say "but there's no way you're going to make any
kind of useful layer of abstraction in 10K of code!" I disagree, but that a
big can of worms to open up in a HN thread. The important question is this:
should we create arbitrary limits on our abstraction layers as a way to
enforce higher code quality?

Just thinking aloud.

~~~
hxa7241
> code diet . . . (limit of 10K lines)

Yes, sort-of, although . . .

A simple example 'in the small' to consider: function/procedure size. Should
we (syntactically) limit functions to, say, <30 lines?

What we are really trying to do here is limit complexity. And that is not
simply a result of length, but more of interrelation.

If we recall Dijkstra's intent with 'Structured Programming', it was to make
complexity _proportional_ to program length. It is not the total amount, but
the way it is arranged or broken up. Strict size limits in effect sort-of do
that -- they make complexity nearer constant (since it is bounded) within each
part.

> should create a DSL at their particular level

And that leads to why the DSL idea (or something else to do a similar job) is
more important than the simple limit idea. What we want primarily is different
structuring.

In terms of the function-size example, we can (rather ideally) compare
imperative with functional programming. FP languages seem to _tend_ away from
the length problem: because they are more trees of expressions instead of
sequences. They have different structuring basics that control interrelation
complexity more.

To put it briefly: we want it so you cannot add length without also adding
'depth' (of abstraction, of levels, of separation).

Expressed more fully here: 'Should there be hard limits on program part
sizes?' <http://www.hxa.name/notes/note-hxa7241-20101124T1927Z.html>

~~~
DanielBMarkham
I agree. But I think you've missed it. My point was about human behavior, not
the innards of program structure.

To rephrase: _would imposing arbitrary code length restrictions along with
DSL-type training automatically drive programmers to write better quality
code?_

I believe the answer is yes. You are correct in that the technical issue is
more complicated than that, but my point was asking if making an arbitrary
rule would drive teams into making those kinds of design choices, instead of
just being happy with the way things were and continuing to expand the code
base ad infinitum. I was trying to draw in several lines of thought I gleaned
from the essay and synthesize something new.

I also think there's a big difference in creating an arbitrary limit on
function size and doing the same for an overall project. I wouldn't like the
idea of limiting function size at all. You can do a lot in 10K lines of code.
There's a lot more freedom there. Once again, my thrust is human behavior.
Telling somebody that each little piece of code they write is subject to
somewhat arbitrary restrictions is a lot more onerous to me than simply making
a "budget" for the entire project.

------
6ren
I'm surprised he didn't give an economic evaluation, i.e. _debugging time
saved - checking time spent_. He mentioned a few man-days worth of debugging
that would have been prevented, but it sounds like he spent more time than
that in checking. As he noted at the beginning, other factors (like features)
are more important than quality (productivity is an argument for dynamically
typed languages). Of course, quality is also its own reward.

BTW: dated today, but I'm sure I've read it before. Maybe a write up of the
earlier episodes (e.g. /Analyze in 360 SDK).

~~~
fauigerzigerk
There are a couple more variables to consider in an economic evaluation:

The time it takes to write test cases that catch all the same issues that
static analysis would. I believe that testing lends itself to finding
different kinds of defects and it would be very unproductive to write tests
that cover all the same issues that static analysis can find (in statically
typed code)

The cost of a bug slipping through the net. Some types of bugs cannot be ruled
out by testing, but it may be possible to prove that they are not present.
E.g. non deterministic concurrency bugs.

~~~
Fahrenheit2539
I would say that the best economic value is being delivered if we consider
some code fragments that are almost never covered by tests (any of them) -
like error detector. Guys at pvs-studio brought in some pretty funny example
of such errors: <http://www.viva64.com/en/a/0078/>

------
aycangulez
My favorite quote from the article: "Shrink your important code."

and he explains why:

"There was a paper recently that noted that all of the various code quality
metrics correlated at least as strongly with code size as error rate, making
code size alone give essentially the same error predicting ability. Shrink
your important code."

~~~
hello_moto
Fair observation. But I'd like to know if the paper done research against
static languages like C/C++/Java/C# or with dynamic languages as well...

Because I have a few people on my back that keep screaming that dynamic
languages produce fewer lines of code and jumped into conclusion that
"therefore it is better in terms of quality" while the code that these group
of people produce seems to be similar to that of Perl => less code, unreadable
(requires you to re-read intensively) if you go away for a few days and come
back to work on it. Lots of meta-programming and prefer shortcuts over
readability. Less bugs? hell no...

~~~
regularfry
It's a fairly well-publicised result that the rate of errors introduced is
proportional to lines of code, independent of language. Having said that, my
googlefu is failing me and I can't find the a cite for it. I'm pretty sure
it's mentioned in Code Complete - anyone with a copy handy to help me out?

~~~
hello_moto
But do they compare the exact system built using 2 different programming
languages from a different programming paradigms?

i.e.: Java vs Ruby or Java vs LISP

At some point in time, the complexity of the system and the available
tools/libraries provided more parameters to the formula of bug-rate
calculation that may throw off the result of the paper.

Consider this: a fellow worker had to write something that utilizes eBay's
API. There is an existing eBay Gems available and he used that first. He
stopped after a few hours due to bugs and undocumented stuff. His other
options? SOAP/WSDL. Now based on what we know, Java has better SOAP support
than Ruby. We're not saying that Ruby can't do it, but we questioned the
comfortability/usability of using SOAP and Ruby. Essentially, one must read
the WSDL (treat the WSDL as the documentation) to figure out the data type in
Ruby. Even then, what happened if the WSDL has been updated by eBay at some
point in the future? more further WSDL-proof-read. Not so in Java, with the
help of IDE and compiler, you can easily navigate WSDL objects and detect
breaks if WSDL has changed (vial wsdl code-gen).

At this point, it seems that using Java is a better option as opposed to Ruby.

This is where such research tend to be questionable: "when all things stay the
same..."

------
ScottBurson
Nice to see my field get some press.

I will say, though, that static analysis is still very much an immature
technology. Look for it to be much, much better in a decade or so.

~~~
softbuilder
> Look for it to be much, much better in a decade or so.

Why is that? Are there big unsolved problems or is it more of a grinding away
at little things?

~~~
Daishiman
Static code analysis is extremely difficult. You have to have a notion of what
an "error" is, and the complexity of interaction between different pieces of a
codebase grows exponentially, making feasible analysis of big projects very
difficult without throwing in heuristics to simplify the analysis at a cost of
loss of precision.

~~~
Fahrenheit2539
The problem here is if hardware and algorithms for static analysis are
improving at linear pace, applications are growing at least exponentially (if
we add-in 3rd party libraries).

~~~
barkmadley
There are some algorithms and techniques that will improve static analysis in
a greater than linear fashion (however, I wouldn't guarantee exponentially).

see Model Verification:
[http://en.wikipedia.org/wiki/Kripke_structure_(model_checkin...](http://en.wikipedia.org/wiki/Kripke_structure_\(model_checking\))
and SMT solving: <http://en.wikipedia.org/wiki/SMT_solver>

------
GlennS
This was really interesting, but a little C/C++ specific. I avoid C++ where
possible because I can't be fussed with segmentation faults, so I was curious
about what might be available for managed languages and what sort of things it
would pick up.

I found this, which looks like an interesting start:
[http://stackoverflow.com/questions/38635/what-static-
analysi...](http://stackoverflow.com/questions/38635/what-static-analysis-
tools-are-available-for-c)

I particularly like the idea of automated security analysis. I'm pretty sure
some past codebases I've worked on have had seriously low-hanging fruit in
that regard.

~~~
cpeterso
For static analysis of Java code, I highly recommend FindBugs. It's open
source and they just released a new major version (2.0). FindBugs is unique
because it analyzes the compiled Java byecodes, not the source files. This
enables the tool to check for some very deep bugs with surprisingly few false
positives.

Also, any language that targets the JVM (like Scala) can be checked, though
FindBugs may report questionable code in that language's code generation. :)

------
johno215
Here is an accompanying segment from QuakeCon 2011 in August where static code
analysis is discussed. This topic must really be on his mind.

[http://www.youtube.com/watch?v=4zgYG-
_ha28&feature=playe...](http://www.youtube.com/watch?v=4zgYG-
_ha28&feature=player_detailpage#t=54m00s)

------
CoffeeDregs
Question: can we use Carmack's post to say anything about statically typed
languages versus dynamically typed languages? I'm versed in both and like
both, so wanted others' opinions. I love(d) Haskell because it pretty much
worked if it compiled (but monads are too restrictive); I work in Python
because it's what most clients are using. But I read Carmack's post and think
that I should be coding in a statically typed language again... No?

[PyCharm is great, but IDEs just don't do dynamic code like they can static
code and it hurts.]

~~~
nostrademons
You could perhaps use it to say stuff about Haskell or ML vs. Ruby or Python,
but not about Java or C++.

The particular errors Carmack talks about are all _holes_ in the type system -
they're areas where the type system of C/C++ is unsound (as in the type-theory
definition, not the colloquial definition). Null pointer exceptions don't
exist in Haskell, because null pointers don't exist; you have to explicitly
use a Maybe type. Printf errors do (at least with Text/Printf), but there's a
lot of research on dependent types to solve specifically that error.

Again, though, it's a tradeoff. Haskell lets you find a lot of errors at
compile time, but the tradeoff is that you spend much more time figuring out
why your program won't compile. For a lot of software, you're better off
shipping with bugs than not shipping at all. Particularly for exploratory
software, it makes more sense to build something that works for your demo just
to see if it's useful than build something for everyone that nobody wants to
use. Specs that don't meet customer needs are just as buggy as code that
doesn't meet the spec.

~~~
timclark
You can use undefined, the nearest equivalent to null in Haskell, anywhere you
like and it is just as bad as using null! However, typically you don't because
as you point out Maybe is a better alternative.

~~~
koenigdavidmj
Shhhhh. That's probably a good thing not to say loudly, or people will try to
use it. (I didn't know about it, so it obviously has not hurt me.)

~~~
masklinn
`undefined` does not behave like a null though, it behaves much like `error`
(it's basically `assert False`), it lets code typecheck but it will instantly
throw an exception when executed.

It's usually used to stub code during development in TDD-type scenarios:

    
    
        myFunction = undefined
    
        myOtherFunction foo = doSomethingWith value
            where value = myFunction foo
    

will typecheck letting you fail your tests.

You can't have `undefined` "pass through" your code the way `null` does.

~~~
joeyh
Sure you can, it doesn't error until it's evaluated, and haskell is lazy, so
the failure can occur some distance from the original undefined, and can occur
only some of the time.

------
victorbstan
And the lesson to be learned is that no matter how much static code analysis
you do, nothing beats actually installing and using your application on
different hardware to test out common real world use-case scenarios (think
Rage + AMD/ATI)

------
apu
I know that "more lax" languages like python make static code analysis much
tougher, but does anyone have any experience with good tools for it?

~~~
pnathan
pychecker and pyflakes are good for Python 2.7 or so. I've been pleasantly
surprised by them.

~~~
riffraff
FWIW, I have found pyflakes really useful in fixing the most common bugs I
encounter in a python program (using a wrong identifier) but I haven't noticed
anything deep about it (such as detecting _obvious_ AttributeErrors), does
pychecker work better in this sense?

~~~
pnathan
No, it's limited in scope.

~~~
spenrose
pyflakes compiles your code, pychecker imports it, which is not idempotent --
all top-level statements are executed. pyflakes will throw some false
positives, but not many. pychecker will fill your terminal with opinions.
pyflakes is IMHO a much superior tool. Plugging it into emacs via flymake is a
no-brainer; I believe there are solutions for vi as well:
[http://www.emacswiki.org/emacs/?action=browse;oldid=PythonMo...](http://www.emacswiki.org/emacs/?action=browse;oldid=PythonMode;id=PythonProgrammingInEmacs#toc9)

~~~
riffraff
yes, I use pyflakes within vim via syntastic
<https://github.com/scrooloose/syntastic>

------
sriramk
Happy that all the static code analysis tools from MSR ( which form the basis
of /analyze) are getting good PR. Microsoft is great with code analysis tools
but rarely gets recognized for it.

------
pnathan
I really look forward to seeing what Haskell (& friends) will be getting us in
the coming years with its static analysis suite and all-errors-checked
mentality. I am hopeful that the static analysis toolsets developed in pure
languages will be making their way down to the dynamic languages, leading to
an overall code improvement for new code.

------
lemming
This for me is the single biggest reason for using IntelliJ in my day to day
work, and one of the things that makes it hard for me to switch to something
other that java. Having real time static analysis while editing is truly
awesome (and very humbling, as he states). It's an order of magnitude more
useful than having it as compile warnings, not least because the editor can
more often than not help you fix them.

~~~
clutchski
That sounds like a great tool. But your whole team might not use that editor
or have it enabled or pay attention or whatever. If you run linters in
continuous integration your style guide is applied on every commit.

~~~
lemming
The inspection configuration is in the project file, so everyone has the same
one, and we use IntelliJ as standard - everyone uses it.

~~~
lemming
Downvotes? Really? Any counter opinions on this?

------
8ig8
This:

> It is important to say right up front that quality isn’t everything, and
> acknowledging it isn’t some sort of moral failing. Value is what you are
> trying to produce, and quality is only one aspect of it, intermixed with
> cost, features, and other factors.

------
AndreyKarpov
> Compared to /analyze, PVS-Studio is painfully slow, but...

Tips on speeding up PVS-Studio - <http://www.viva64.com/en/b/0126/>

------
georgieporgie
In my experience, Coverity catches a couple of terrible bugs, and about ten
thousand stylistic things like, "if (dwResult >= 0 && dwResult <= WHATEVER)"
(i.e. it complains that a DWORD value will always be >= 0, but I don't care,
because I'm explicitly expressing a range to whoever maintains my code).

~~~
groby_b
if(IN_RANGE_CLOSED(dw_result,0,WHATEVER))

Build the appropriate model for IN_RANGE, and done.

Has the benefit that you explicitly state the type of range (open|closed|half-
open[LR]), so you'll be a bit more likely to think about the edge cases.

Is it painful? Yes. I'd rather take that pain than debugging crash reports,
though. (YMMV - it certainly depends on what you are building, how large your
audience is, and what the consequences of a crash are)

~~~
georgieporgie
To be honest, that's really ugly to me, and it requires me to remind myself of
exactly what that macro is. I'm definitely _not_ willing to change my code
style solely to satisfy a static checker's zero-impact 'bug'.

------
WildUtah
This article was nice but it could have been great with some code exaples
illustrating the benefits of static analyzers. It would have been realy great
with examples of what one tool could help with that another would miss.

