Hacker News new | comments | show | ask | jobs | submit login
John Carmack on the importance of Static Code Analysis (altdevblogaday.com)
408 points by plinkplonk on Dec 24, 2011 | hide | past | web | favorite | 124 comments



Having done similar work for Chrome, I can attest to the fact that large code bases are full of errors.

If you're not on Windows, using both gcc's -Wall and -Wextra along with Clang in the same way is a good start. (Here's a post with more details: http://neugierig.org/software/chromium/notes/2011/01/clang.h... .) The Clang static analyzer wasn't very useful at the time I tried it because it didn't analyze C++ code. Valgrind also finds a lot but it is harder to be diligent about fixing.

The PVS Studio guy (mentioned in Carmack's post) ran our code through it as well and also found a number of bugs, as described in a few posts: http://www.viva64.com/en/a/0074/ http://www.viva64.com/en/b/0113/ . (As Carmack supposed, they also claimed the Chrome code was some of the best they'd seen. But it's more likely they were being truthful in both cases.)

They've also ran the Chrome code through Coverity, but I haven't been involved in fixing those bugs so I don't know how useful it was. Searching the bug tracker for [coverity] turns up a handful of bugs, but it's possible more are hidden for security reasons.


Additional data point: Coverity is indeed quite useful. I'd say at least once a week it turns up something that could turn into a critical issue if left unfixed. So yes, static analysis really does matter.

The problem with static analysis is programmer hybris - "No, don't initialize that variable. I've envisioned every execution path, and it's impossible to ever be used uninitialized. And not initializing it is so much more efficient!". Cue crash reports due to uninitialized variables about 4 weeks later....

[Side note:Since Coverity bugs are rarely filed in the bug tracker, it'd be more effective to search CL's for the string CID= which identifies Coverity bug reports. And no, security-critical fixes won't get that marker ;)]

As for PVS, I'm itching to see a full report on Chrome. The excerpts he post are very interesting, and have a good amount of bugs that are not flagged by Coverity. (I assume vice versa applies, hence the wish to look at a full report). Now I just need to make some time to run an eval...


You may also consider using -Weffc++ in addition to -Wall and -Wextra if you are using g++.


> As Carmack supposed, they also claimed the Chrome code was some of the best they'd seen. But it's more likely they were being truthful in both cases

Carmack said that for Coverity though, not for PVS.


PVS-Studio vs Chromium - http://www.viva64.com/en/a/0074/

PVS-Studio vs Chromium - Continuation - http://www.viva64.com/en/b/0113/


I once caused a serious, halt-the-enterprise production bug by "fixing" a problem found by FindBugs. This was Java code, something along the lines of:

  Boolean b = new Boolean(true);
The static analyzer correctly identified this as an unnecessary new object creation (style guides and good sense recommend you simply use Boolean.TRUE). I "fixed" it, and went on my way.

Little did I realize that this variable was actually a lock, and there was a synchronized(b) block later (and much deeper) in the code, which I effectively eliminated by removing the new.

In my defense I feel that the real bug here was that of documentation- had the variable been named something like "lock" I'd have understood immediately what was going on. But that doesn't make you feel much better when your team's been up all night fixing your bug!

Moral of the story: your codebase (especially if it's an older one) might actually be depending on its "bugs" for proper behavior. Think (and test) hard before applying suggested changes from static analysis.


The choice of a Boolean as a monitor object is a little odd. My understanding is that the usual convention is to create such objects via "new Object()", which is a little more obvious- the only reason you would ever call the base Object constructor is to produce something that can be used as a monitor.


It was a couple of years ago and I no longer have access to the codebase, but the business logic of that piece of code called for a boolean.

The problem was they they also (ab)used that field for the lock. It either should have been a separate field (of type Object, as you suggested) or use the existing Boolean, but call it "lock" or "monitor" or somesuch.


Sure, but the bug in this case is that there wasn't a comment specifying the reason for the unconventional behaviour.


I agree. If you know you're writing something that is un-idiomatic or you think its intended purpose will be a surprise to most readers, put a comment in explaining why.

    // we need a heap object so we can synchronize on it later
    Boolean b = new Boolean(true);


This points to a deeper problem of static analysis, though - any analysis package without the ability to annotate code is _doomed_. The false positives will be so annoying that people will give up on it.

And for the people who work on SA systems - please give me a way to annotate that is not exclusively via comments. Especially once people use multiple SA packages, that is rather annoying :)


I'd also make the name a bit more descriptive.


Don't forget the OpenSSL "bug" (using uninitialized memory) that was really on purpose, resulting in a critical flaw once changed.

In that instance too, lack of commenting to explain the behavior was also at fault.


Sure, it wasn't great code, but this also sounds like a flaw in the static analyzer. It should have been able to tell that the variable was being synchronized on, and recognized that using Boolean.TRUE would have been an unsafe change.


That's a weird line of code to begin with. Not only the naming was wrong, but the type was wrong too (Boolean instead of Object).

Could you check all the references where that b variable was used prior to changing the code?

There likely to be just one or two such places.


Sure I could have found all of the references easily. That's one of the reasons I still prefer Java over a dynamic language- my IDE can tell me instantly (option key + F7) where a particular object is used in the entire codebase. I just wasn't careful enough, because hey, what could possibly go wrong- it's a stupid boolean, right?


When I see unusual error in code - not only I check related code, but I may run svn blame. There are multiple benefits for it:

1) I can learn something new and what I thought was an error would turn out to be an interesting new coding trick.

2) If it was an error - not only we would fix it, but would also learn not to make such error in the future.

In particular, that person who made that error would learn to avoid it in the future.


This is exactly the sort of thing Haskell is great at. First, the type system catches all sorts of errors at compile time (both null-pointer and printf issues cannot come up in Haskell).

However, more fundamentally, Haskell code just naturally provides much more information to static analysis tools than any other language I've worked with. Even if the level of tooling is not there yet (I haven't worked on any large projects, so I am not entirely familiar with it) the potential for these tools is much greater in Haskell. I think programs like HLint are already very thorough. I've just been using Haskell as more of a hacker language than a "bondage and discipline" language and haven't bothered with these tools :)


Whenever I try to write code in Haskell that produces a type error I think, "there's another bug that would have been introduced in a lesser language".


Would a type error necessarily produce a "bug" in a program?


It certainly would if it was not detected before the code was deployed. Depending on how strongly typed the language is, it may cause an error at runtime, which isn't ideal, or it would go undetected until it caused a myriad of seemingly unrelated symptoms, which is much worse.

Unhandled null references are the most common type error that other languages don't catch as readily (I fairly frequently try to write code that doesn't handle the not found case in a map lookup), but there are so many times when I've tried to write code that the compiler has rejected, but that wouldn't have been picked up in other static languages.


Thank you for explaining. I had thought of type errors as different representations of numbers, for example.


I agree, however I find keeping good code coverage in tests is still important in haskell. The other day I had to fix a bug that I had introduced while running hlint in a module that was not being tested. Oops.

WRT hlint, I've not seen it point out any actual bugs in code, it only finds ways to golf it, in my experience. It doesn't even find a few common performance impacting mistakes like length foo > 0 ==> not (null foo). OTOH, its quite a beautiful tool, especially if you look at the implementation of checks, which is this simple and beautiful:

error = take (length x - 1) x ==> init x


I'm curious to know if this is because Haskell doesnt support nulls or is there something else that makes it better at catching type errors.


It's better at catching type error in general (because it's much stricter in its handling of types), having moved the concept of nullability into the type system is just an (easy to understand, for most developers) example of that.

But it goes beyond that, even more so as the haskell culture builds upon this and actively encourages taking advantage of the type system by encoding as many things as possible in the program's types, where they can be statically checked.


Haskell's type system is really not at all comparable to the ones found in c and Java. Not only is the Haskell type system Turing-complete, I'd be willing to argue that it's more expressive (and I mean the type system itself, not the whole language...) than a lot of mainstream languages. Basically, it's a whole different animal from the things most programmers call type systems.

For example, the type system doesn't just handle values, doing actions is also managed by the type system. The type of main is IO (), which roughly means "do something with side-effects". If I write a function that reads in a file and returns a string, it's type is not String, it's IO String, or, "do an action, which causes a String to be returned". I can shuffle this around at will, and use the IO monad to handle running the program in correct order. And IO is just a single way of managing order -- continuation passing is just another monad in the type system, that didn't even need any additional code in the compiler to implement.

But what really makes the difference from Java or C++ imho is the type inference. Basically, if you never write a type declaration, the only time the type system bothers you is when there really is a bug in the code. Psychologically, this is a big deal. In Java, the type system is a chore that makes me type unnecessary things. In Haskell, the type system is a compile-time check that catches a lot of real errors.


Not having nulls for every type is just the tip of the iceberg, there's so much information contained within the type of values and it's so easy to add even more.


Doing analysis like this also has a huge impact on broken window theory. If engineers see a whole bunch of compiler warnings, then they don't think twice when they see just one more and it could be a really valid warning. It also gives a good sense of ownership and commitment to the codebase if everyone agrees to not check in code with warnings. Also, when you have new engineers copying and pasting code to get stuff working quickly, you certainly don't want them doing that with buggy code.


If engineers see a whole bunch of compiler warnings, then they don't think twice when they see just one more and it could be a really valid warning

As a real life example of how this can happen, PHP 5.3.something release had a serious bug with MD5 hashes essentially not working. (cf. https://bugs.php.net/bug.php?id=55439 http://www.php.net/archive/2011.php#id2011-08-22-1 ). Apparently there was a unit test for it, but there was ~200 failing unit tests, so they ignored it.


In my experience an even more pernicious problem is unit tests that are unreliable or too expensive (in time especially). Devs too easily come to allow unit test suites that routinely take reruns in order to get to 100% passing. Similarly, when build verification tests or CI tests get too bloated it can be hard to pick where to draw the line.


On my current project we have "Treat Warnings As Errors" set to true, so any warnings will cause the build to break. You can explicitly allow a specific warning with a pragma statement, if you can justify it.

It's made for a much cleaner codebase - and it means that we instantly know if we're about to do something iffy, rather than having them buried under a ton of other warnings.


At my last gig they had that as well, but still ignored it because the culture when I got there was totally full of broken windows.

There was 8000+ warnings and findbugs errors in the codebase that I went through and fixed. Luckily a lot were just white noise, but some >100 were definitely valid bugs that had been in the code for quite some time.

In order to keep things clean after all that work, I turned on warnings as failures as part of the continuous integration build (which I also set up) so that everyone would get an email each time the build failed. Yea public shaming, heh. The hard part was then training people to pay attention to the emails and not filter them to the trash.

As a final step, what I did was built the testing environment into the CI system so that if they wanted to test their code on a virtual machine before getting their branch into the latest iteration, they needed a clean build in order to generate the debian installers. That was the final kicker which really made people start paying attention to this stuff.

So, in order to push code to production, which everyone wants to do, you needed a clean build. Problem solved and it really cleaned up the quality of production releases. ;-)


Oh yes - we do a daily build that gets pushed out to testers - and if it's failing then they don't get it.

(And it took me an absolute age to get rid of all the warnings too.)


Just kind of curious, what happens if a developer fixes something and the tester wants it immediately so that they can test that fix? Do they have to wait another day for the build to happen?

I setup builds which could be tested as soon as the build was complete. I also had to do a lot of work to optimize the build process so it would complete quickly. It started out with ~20+ minutes and I worked it down to around 3-5 minutes.


We _can_ roll out builds faster - but as we're using multiple tiers of framework, we have to make sure that the back end is still compatible with the front end. The back end is only promoted once per day, and if there's been a breaking change since the last promotion then we can't move the front-end up until the back end is also promoted.

And as multiple front-ends use the same back-end, we can't just move the back-end up whenever we feel like it, or we break everyone else's UIs.


D. Richard Hipp and the SQLite project have not had such a positive experience with static code analysis. They already use a massive amount of testing though. There's also no mention of commercial tools like Coverity.

See the "Static Analysis" section: http://www.sqlite.org/testing.html


At least this passage is complete nonsense: We cannot call to mind a single problem in SQLite that was detected by static analysis that was not first seen by one of the other testing methods described above

If SA discovers a problem, you'll discover it the moment you run it through the analyzer, while by the dev's admission many of their bugs are discovered as bug reports. Which is clearly a bit later :)

Now, it might well be that most of SQLite's bugs simply are not discovered by SA. But SA is not going to report them later than bug reports, unless you use it very infrequently.


Also look for 22nd November in the timeline - http://www.sqlite.org/src/timeline

3 problems were detected by Coverity and fixed in SQLite.


> They already use a massive amount of testing though.

Yes, as the page notes the SQLite codebase has a > 1000:1 tests to code ratio. I don't think there's any other codebase this thoroughly tested, and it makes sense that static analysis tools won't discover anything not already covered by tests.


    and it makes sense that static analysis tools won't discover anything not already covered by tests
That assertion doesn't make a lot of sense to me. Most static analysers are capable of looking at the edge cases that a human may forget to write a test for, or may not even notice are there.


When you have 1000 times as much test code and data as you have code to test, the odds of having missed a subset of the bugspace a static analyzer can find are very, very low I would say. Unless the people who wrote the tests are functionally retarded of course, but let's consider that not to be the case.


I'm not so sure. I'm reading their testing document (and it's a great read), and it sounds like most of those tests are reactive:

Whenever a bug is reported against SQLite, that bug is not considered fixed until new test cases have been added to the TCL test suite which would exhibit the bug in an unpatched version of SQLite. Over the years, this has resulted in thousands and thousands of new tests being added to the TCL test suite. These regression tests ensure that bugs that have been fixed in the past are not reintroduced into future versions of SQLite.

While this is a great practice, it's reactive. It's the result of particular bugs, not someone asking, "What are the situations we haven't covered?"

The coverage they have for error conditions (file system, out of memory, bit-flips) is impressive. I'm not saying I know you're wrong, but I think there are too many variables to say with confidence either way.


The closest you can get to complete test coverage is a policy of writing test coverage for every single new feature coupled with this regression test policy, because once the code base gets to a certain size it's impossible for anyone to form a complete mental model of what is covered or not.

If that's not good enough then I think static analysis is a decent step, but probably pales in comparison to using stricter languages (eg. Haskell).


The amount of test code and test data is proportional to the number of possible codepaths through your code.

This is generally exponential in number of functions, modules, etc involved. For example, a function with N if statements that are not nested generally needs 2^N testcases to properly exercise it.

So having 1000 times more tests than code may not mean that you have complete coverage at all. It depends on the structure of the tests and the code.


I think your analysis is for executing every distinct code path. I read through the SQLite testing page, and they claim 100% branch coverage, which means that they test every possible outcome of a branch - but that's different from what you're going after, which is every possible code path.

(Not disagreeing, I just had to go through this process in my head when I thought about what you said in comparison to what they said.)


Indeed. What they're doing is much better than what most software projects manage, but not quite enough to test correctness unless the code in later branches is completely independent from the code in earlier branches....

For any nontrivial project, testing every codepath is basically impossible, unfortunately. :(


I would be shocked if any static analyzer could be truly useful for validating the implementation of a virtual machine, let alone one like SQLite's. The fact that SQLite makes heavy use of dynamic typing also seems like it would defeat attempts by a static analyzer to validate code.


http://compcert.inria.fr/

CompCert is a verified compiler that transforms code from "virtual machine" of language C to "virtual machine" of PowerPC.

It is generated from Coq, though.

But, please see ynot: http://ynot.cs.harvard.edu/

They have a verified SQL compiler. Again, generated from Coq source.

So I think you're wrong claiming that static analysis isn't useful for virtual machines. For C you have to have very extensive annotations, as it is not very expressive by itself, but static analysis is still possible.


That's true, but I think a different class of tools: theorem-provers that depend on extensive domain-specific annotations (e.g. about SQL semantics) to prove correctness aren't usually applied in the same situations as tools like Coverity that do static analysis of raw C/C++ code to find likely bugs; though there's some convergence in the past 10 years.


Not mentioned in the article are two nice static analysis tools: the Clang Static Analyzer (http://clang-analyzer.llvm.org/) and Klee (http://klee.llvm.org/).

Both are LLVM-related projects (and there's a few others as well, but these are the two "big" ones).


Worth noting that the Clang Static Analyzer is bundled in XCode (I think since XCode 3.2).


Great article full of lots of insights. Here's some of the ones I got.

- People are generally happy with what they have: digging around is not a fun thing to do.

- It is very easy for the maintenance programmer to make assumptions about the preconditions for a piece of code that is not valid.

- Size of code is a critical metric for quality.

And most importantly (and probably overlooked), quality is just but one metric of software. The name of the game here is providing value to the customer, not about writing perfect code. John kind of throws that out there in a pro forma way, then goes ahead without digging any deeper. Oddly enough, I can't really draw any conclusions about static code analysis, the topic of the essay, without a clear definition from the author about what the trade-offs are. We're left with "just use it" as a conclusion.

After reading this, I wonder if programmers don't get stuck on the same general level of abstraction and this staying-in-the-same-level thinking introduces unnecessary code complexity. To illustrate, let's try a thought experiment.

Suppose there was no modern OS -- just a x86 compatible CPU and BIOS -- and you were supposed to put an image on the screen stored on a USB drive.

It would involve huge amounts of work -- code to get information from the drive, code to understand images, code to respond to the keyboard, etc.

The reason we can do this so easily today is that whatever we write is basically in a DSL that sits on top of other DSL/APIs. We are working at a higher level of abstraction.

I wonder if putting programming projects on a "code diet" isn't something we should try more often. Announce that whatever our solution is, it's not going to be more than 10KLOC. If we have to split into teams to provide layers, we will. Each team has 10KLOC and should create a DSL at their particular level of abstraction.

This forces us to keep project codebases very small, yet should provide just as much freedom to create very powerful software as we have today. I understand that many will say "but there's no way you're going to make any kind of useful layer of abstraction in 10K of code!" I disagree, but that a big can of worms to open up in a HN thread. The important question is this: should we create arbitrary limits on our abstraction layers as a way to enforce higher code quality?

Just thinking aloud.


> code diet . . . (limit of 10K lines)

Yes, sort-of, although . . .

A simple example 'in the small' to consider: function/procedure size. Should we (syntactically) limit functions to, say, <30 lines?

What we are really trying to do here is limit complexity. And that is not simply a result of length, but more of interrelation.

If we recall Dijkstra's intent with 'Structured Programming', it was to make complexity proportional to program length. It is not the total amount, but the way it is arranged or broken up. Strict size limits in effect sort-of do that -- they make complexity nearer constant (since it is bounded) within each part.

> should create a DSL at their particular level

And that leads to why the DSL idea (or something else to do a similar job) is more important than the simple limit idea. What we want primarily is different structuring.

In terms of the function-size example, we can (rather ideally) compare imperative with functional programming. FP languages seem to tend away from the length problem: because they are more trees of expressions instead of sequences. They have different structuring basics that control interrelation complexity more.

To put it briefly: we want it so you cannot add length without also adding 'depth' (of abstraction, of levels, of separation).

Expressed more fully here: 'Should there be hard limits on program part sizes?' http://www.hxa.name/notes/note-hxa7241-20101124T1927Z.html


I agree. But I think you've missed it. My point was about human behavior, not the innards of program structure.

To rephrase: would imposing arbitrary code length restrictions along with DSL-type training automatically drive programmers to write better quality code?

I believe the answer is yes. You are correct in that the technical issue is more complicated than that, but my point was asking if making an arbitrary rule would drive teams into making those kinds of design choices, instead of just being happy with the way things were and continuing to expand the code base ad infinitum. I was trying to draw in several lines of thought I gleaned from the essay and synthesize something new.

I also think there's a big difference in creating an arbitrary limit on function size and doing the same for an overall project. I wouldn't like the idea of limiting function size at all. You can do a lot in 10K lines of code. There's a lot more freedom there. Once again, my thrust is human behavior. Telling somebody that each little piece of code they write is subject to somewhat arbitrary restrictions is a lot more onerous to me than simply making a "budget" for the entire project.


Abstractions are partial and leaky models of the thing they abstract over. When you build your system as a tall, skinny tower of small abstractions, you are trading one problem for another. Eventually you will need to thread some new functionality all the way through your stack, or you will have to debug a problem that requires understanding cross-layer interactions, and you will end up spending much of your time fighting your abstraction choices.

Unfortunately there is no silver bullet. No single recipe will tell you how to cut up every project into modules. You have to work hard to find the right decomposition for your problem. Arbitrary size limits are as likely to hurt you as help you.

That said, programmers should indeed strive to write less code, as a rule of thumb.


10KLOC per team is admirable, but one of the long-term projects I'm most excited about is Alan Kay's "Inventing Fundamental New Computing Technologies" project. They're trying to get under 20KLOC for an entire operating system + all "common" applications!

http://vpri.org/html/work/ifnct.htm

It's really exciting, and they've already made some great progress towards that goal. I recommend reading their first year progress report as it has a lot of the most exciting stuff about this (they just finished year 5):

http://www.vpri.org/pdf/tr2007008_steps.pdf


I'm surprised he didn't give an economic evaluation, i.e. debugging time saved - checking time spent. He mentioned a few man-days worth of debugging that would have been prevented, but it sounds like he spent more time than that in checking. As he noted at the beginning, other factors (like features) are more important than quality (productivity is an argument for dynamically typed languages). Of course, quality is also its own reward.

BTW: dated today, but I'm sure I've read it before. Maybe a write up of the earlier episodes (e.g. /Analyze in 360 SDK).


There are a couple more variables to consider in an economic evaluation:

The time it takes to write test cases that catch all the same issues that static analysis would. I believe that testing lends itself to finding different kinds of defects and it would be very unproductive to write tests that cover all the same issues that static analysis can find (in statically typed code)

The cost of a bug slipping through the net. Some types of bugs cannot be ruled out by testing, but it may be possible to prove that they are not present. E.g. non deterministic concurrency bugs.


I would say that the best economic value is being delivered if we consider some code fragments that are almost never covered by tests (any of them) - like error detector. Guys at pvs-studio brought in some pretty funny example of such errors: http://www.viva64.com/en/a/0078/


My favorite quote from the article: "Shrink your important code."

and he explains why:

"There was a paper recently that noted that all of the various code quality metrics correlated at least as strongly with code size as error rate, making code size alone give essentially the same error predicting ability. Shrink your important code."


Fair observation. But I'd like to know if the paper done research against static languages like C/C++/Java/C# or with dynamic languages as well...

Because I have a few people on my back that keep screaming that dynamic languages produce fewer lines of code and jumped into conclusion that "therefore it is better in terms of quality" while the code that these group of people produce seems to be similar to that of Perl => less code, unreadable (requires you to re-read intensively) if you go away for a few days and come back to work on it. Lots of meta-programming and prefer shortcuts over readability. Less bugs? hell no...


It's a fairly well-publicised result that the rate of errors introduced is proportional to lines of code, independent of language. Having said that, my googlefu is failing me and I can't find the a cite for it. I'm pretty sure it's mentioned in Code Complete - anyone with a copy handy to help me out?


But do they compare the exact system built using 2 different programming languages from a different programming paradigms?

i.e.: Java vs Ruby or Java vs LISP

At some point in time, the complexity of the system and the available tools/libraries provided more parameters to the formula of bug-rate calculation that may throw off the result of the paper.

Consider this: a fellow worker had to write something that utilizes eBay's API. There is an existing eBay Gems available and he used that first. He stopped after a few hours due to bugs and undocumented stuff. His other options? SOAP/WSDL. Now based on what we know, Java has better SOAP support than Ruby. We're not saying that Ruby can't do it, but we questioned the comfortability/usability of using SOAP and Ruby. Essentially, one must read the WSDL (treat the WSDL as the documentation) to figure out the data type in Ruby. Even then, what happened if the WSDL has been updated by eBay at some point in the future? more further WSDL-proof-read. Not so in Java, with the help of IDE and compiler, you can easily navigate WSDL objects and detect breaks if WSDL has changed (vial wsdl code-gen).

At this point, it seems that using Java is a better option as opposed to Ruby.

This is where such research tend to be questionable: "when all things stay the same..."


The problem is that errors of omissions (code that should be there but isn't) are almost by definition harder to spot than errors of existing code, both for human and static code analyzers.


Nice to see my field get some press.

I will say, though, that static analysis is still very much an immature technology. Look for it to be much, much better in a decade or so.


> Look for it to be much, much better in a decade or so.

Why is that? Are there big unsolved problems or is it more of a grinding away at little things?


Static code analysis is extremely difficult. You have to have a notion of what an "error" is, and the complexity of interaction between different pieces of a codebase grows exponentially, making feasible analysis of big projects very difficult without throwing in heuristics to simplify the analysis at a cost of loss of precision.


The problem here is if hardware and algorithms for static analysis are improving at linear pace, applications are growing at least exponentially (if we add-in 3rd party libraries).


There are some algorithms and techniques that will improve static analysis in a greater than linear fashion (however, I wouldn't guarantee exponentially).

see Model Verification: http://en.wikipedia.org/wiki/Kripke_structure_(model_checkin... and SMT solving: http://en.wikipedia.org/wiki/SMT_solver


Well, unfortunately most interesting static analysis problems are undecidable. There can be no "general" solution.

What we have in practice are approximations and heuristics; over time we will develop better approximations for the kinds of code people write in practice. Unfortunately fragility of analysis will always be a problem; if you change your code just a little bit then reasoning may fail.

As someone who has spent years doing research on shape analysis my personal belief is that the dream of a fully automated "find my bugs" static analysis is unrealistic. Some of the problems you must solve to analyze the heap, say, are very hard indeed.

We need to think more about the interactions between language design and verification, rather than hoping that people will build better tools to analyze existing code. Strong typing (e.g., the Hindley-Milner type system) is one example of a sweet spot that demonstrates how language design (type annotation, type declarations) interacts with tools (type inference, type checking). Type systems are some of the most widely used bug finding tools today. Trying to build verification tools without considering language design is always going to be difficult.


This was really interesting, but a little C/C++ specific. I avoid C++ where possible because I can't be fussed with segmentation faults, so I was curious about what might be available for managed languages and what sort of things it would pick up.

I found this, which looks like an interesting start: http://stackoverflow.com/questions/38635/what-static-analysi...

I particularly like the idea of automated security analysis. I'm pretty sure some past codebases I've worked on have had seriously low-hanging fruit in that regard.


For static analysis of Java code, I highly recommend FindBugs. It's open source and they just released a new major version (2.0). FindBugs is unique because it analyzes the compiled Java byecodes, not the source files. This enables the tool to check for some very deep bugs with surprisingly few false positives.

Also, any language that targets the JVM (like Scala) can be checked, though FindBugs may report questionable code in that language's code generation. :)


Here is an accompanying segment from QuakeCon 2011 in August where static code analysis is discussed. This topic must really be on his mind.

http://www.youtube.com/watch?v=4zgYG-_ha28&feature=playe...


Question: can we use Carmack's post to say anything about statically typed languages versus dynamically typed languages? I'm versed in both and like both, so wanted others' opinions. I love(d) Haskell because it pretty much worked if it compiled (but monads are too restrictive); I work in Python because it's what most clients are using. But I read Carmack's post and think that I should be coding in a statically typed language again... No?

[PyCharm is great, but IDEs just don't do dynamic code like they can static code and it hurts.]


You could perhaps use it to say stuff about Haskell or ML vs. Ruby or Python, but not about Java or C++.

The particular errors Carmack talks about are all holes in the type system - they're areas where the type system of C/C++ is unsound (as in the type-theory definition, not the colloquial definition). Null pointer exceptions don't exist in Haskell, because null pointers don't exist; you have to explicitly use a Maybe type. Printf errors do (at least with Text/Printf), but there's a lot of research on dependent types to solve specifically that error.

Again, though, it's a tradeoff. Haskell lets you find a lot of errors at compile time, but the tradeoff is that you spend much more time figuring out why your program won't compile. For a lot of software, you're better off shipping with bugs than not shipping at all. Particularly for exploratory software, it makes more sense to build something that works for your demo just to see if it's useful than build something for everyone that nobody wants to use. Specs that don't meet customer needs are just as buggy as code that doesn't meet the spec.


You can use undefined, the nearest equivalent to null in Haskell, anywhere you like and it is just as bad as using null! However, typically you don't because as you point out Maybe is a better alternative.


> You can use undefined, the nearest equivalent to null in Haskell

I'm not sure you can say that. `undefined` is a case of `error`, so it will blow up any time it's encountered (it's often used to stub code) not just when you try to use it, it's closer to putting a `throw` than to putting a `null`.


Don Stewart on undefined vs. null: http://stackoverflow.com/a/3963464/17439


Shhhhh. That's probably a good thing not to say loudly, or people will try to use it. (I didn't know about it, so it obviously has not hurt me.)


`undefined` does not behave like a null though, it behaves much like `error` (it's basically `assert False`), it lets code typecheck but it will instantly throw an exception when executed.

It's usually used to stub code during development in TDD-type scenarios:

    myFunction = undefined

    myOtherFunction foo = doSomethingWith value
        where value = myFunction foo
will typecheck letting you fail your tests.

You can't have `undefined` "pass through" your code the way `null` does.


Sure you can, it doesn't error until it's evaluated, and haskell is lazy, so the failure can occur some distance from the original undefined, and can occur only some of the time.


I sometimes wish I had types in Python. It makes me feel better knowing for sure what a function is returning, or what types I'm comparing. I love Pyton/Ruby and other dynamic languages, but I miss the C++ type system when using them.


> I sometimes wish I had types in Python.

You do. They're not statically checked, but they're there.

> I love Pyton/Ruby and other dynamic languages, but I miss the C++ type system when using them.

Why not use a language less syntactically heavy than C++ but still statically typed then? (and C++'s type system? not really going for the stars, are you?) Because a major part of Python and Ruby is indeed that they're not statically typed. A nominative static type system would yield quite different a language, probably something close to Cython[0]. Alternatively, you could fork Python or Ruby with a structural type system, this could be interesting but still — I think — different languages than their originators, not merely dialects. It also would be nothing even remotely close to "the C++ type system" (not that this would be a bad thing, AFAIC). And you probably wouldn't know "what types you're comparing" either.

[0] http://en.wikipedia.org/wiki/Cython


Yes, I know they are there. I meant static types. I thought that was implied.


Or put in other way: typed variables instead of only typed values.


+1 for "C++'s type system? not really going for the stars, are you?" Best laugh all day.


Tim Sweeney of Epic Games had some thoughts on the subject

http://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-...


It's funny how every single one of those features he's wanted has existed in a usable language since at least the 80's.

Hell, Ada has every single feature he wants plus more. I'm particularly fond of the range feature.

  type degrees is range 0..359;
It allows you not just to error on incompatible types, but incompatible values for your types at compile time. There's a lot more there, but I'm 100% positive it'll all be ignored by those who need it the most.


I started to write a C++ class to write strongly-type, range-bounded integers (e.g. Fahrenheit and Celcius classes). I gave up when implementing all the operator overloads and my class was hundreds of LOC. C++ really did not to do what i wanted...


except Ada (and maybe pascal) it's something we never see, we all could use this yet it's not there. Kinda like the numerical tower of lisp systems.. anyway.


That is the real issue here. People aren't exposed to these ideas and we keep reinventing them, poorly. Worse, people don't listen to the older more experienced people until after they've already blown themselves up (often more than once).

I'm not sure how to make someone who isn't interested in learning history, learn history. Is this just a result of that our industry is entirely composed of youth?


There's the productivity aspect, too - Ada projects have this habit of taking rather long :)


That was a function of the bureaucracy and legal requirements around the projects, not Ada the language.

If you legally have to have a meeting with 10 people to review and physically sign off on every line of code, your project will probably take a while.

Plus the compilers sucked in the 80's.


Well, I'd love to see a modern ADA project that did move quickly. Are there any case studies (or even OSS projects) out there?


There are quite a few open-source Ada projects, though I'm not sure how many of them qualify as quick-moving:

http://www.ohloh.net/p?sort=users&q=language%3Aada


What do you mean by "Monads are too restrictive"?

Do you mean that not having free side-effects of any kind anywhere is too restrictive?


I found that odd too. In Haskell you have to use monads only for I/O, and for that IMO they are a fine abstraction. (Monads are a fine abstraction to many other things too that have a lot less to do with side-effects.) You are free to use other more general/specialized abstractions in every other part of your code.


And the lesson to be learned is that no matter how much static code analysis you do, nothing beats actually installing and using your application on different hardware to test out common real world use-case scenarios (think Rage + AMD/ATI)


I know that "more lax" languages like python make static code analysis much tougher, but does anyone have any experience with good tools for it?



Carmack's article inspired me greatly. For the past month I have been churning a product idea for a code analysis & program transformation tool for Python. Seeing someone like Carmack recommend tools like these feels like substantial validation of their usefulness.

While investigating the feasibility of my product idea, I have run across 3 interesting Python analysis tools that are often overshadowed by pyflakes & pychecker. You may find them relevant to your question:

- pylint (http://www.logilab.org/card/pylintfeatures): Already mentioned here, but the list of checks it will perform is humbling.

- Pyntch (http://www.unixuser.org/~euske/python/pyntch/index.html): Type inference via graph analysis.

- Pypy's Annotation Pass (http://readthedocs.org/docs/pypy/en/latest/translation.html#...): Type inference via control flow graphs.

Anyone maintaining large Django or other Python projects who wants to chat about my product idea, email me (my HN username at Google's email service)! I am especially interested in any constructive skepticism you send my way. I am looking to refine my ideas into something that genuinely advances programming.


For Perl there is Perl::Critic which also highlights when you're not adhering to best practice idioms.

I use it regularly on client projects (it's like giving your code a certification!) and its heavily used/recommended in the Perl community.

See http://perlcritic.com/ & https://metacpan.org/module/Perl::Critic


pychecker and pyflakes are good for Python 2.7 or so. I've been pleasantly surprised by them.


Pylint is also worth a mention, though its output is rather verbose.


FWIW, I have found pyflakes really useful in fixing the most common bugs I encounter in a python program (using a wrong identifier) but I haven't noticed anything deep about it (such as detecting _obvious_ AttributeErrors), does pychecker work better in this sense?


No, it's limited in scope.


pyflakes compiles your code, pychecker imports it, which is not idempotent -- all top-level statements are executed. pyflakes will throw some false positives, but not many. pychecker will fill your terminal with opinions. pyflakes is IMHO a much superior tool. Plugging it into emacs via flymake is a no-brainer; I believe there are solutions for vi as well: http://www.emacswiki.org/emacs/?action=browse;oldid=PythonMo...


yes, I use pyflakes within vim via syntastic https://github.com/scrooloose/syntastic


Javascript has JSlint which works well


Happy that all the static code analysis tools from MSR ( which form the basis of /analyze) are getting good PR. Microsoft is great with code analysis tools but rarely gets recognized for it.


I really look forward to seeing what Haskell (& friends) will be getting us in the coming years with its static analysis suite and all-errors-checked mentality. I am hopeful that the static analysis toolsets developed in pure languages will be making their way down to the dynamic languages, leading to an overall code improvement for new code.


This for me is the single biggest reason for using IntelliJ in my day to day work, and one of the things that makes it hard for me to switch to something other that java. Having real time static analysis while editing is truly awesome (and very humbling, as he states). It's an order of magnitude more useful than having it as compile warnings, not least because the editor can more often than not help you fix them.


That sounds like a great tool. But your whole team might not use that editor or have it enabled or pay attention or whatever. If you run linters in continuous integration your style guide is applied on every commit.


The inspection configuration is in the project file, so everyone has the same one, and we use IntelliJ as standard - everyone uses it.


Downvotes? Really? Any counter opinions on this?


We enforce the usage of IDEA. In addition IDEA gives fine-grained control over the analysis configuration. Every file has to be green (no warning) before commit. TeamCity can then afterwards check those inspections again in the CI process.


This:

> It is important to say right up front that quality isn’t everything, and acknowledging it isn’t some sort of moral failing.  Value is what you are trying to produce, and quality is only one aspect of it, intermixed with cost, features, and other factors.


> Compared to /analyze, PVS-Studio is painfully slow, but...

Tips on speeding up PVS-Studio - http://www.viva64.com/en/b/0126/


In my experience, Coverity catches a couple of terrible bugs, and about ten thousand stylistic things like, "if (dwResult >= 0 && dwResult <= WHATEVER)" (i.e. it complains that a DWORD value will always be >= 0, but I don't care, because I'm explicitly expressing a range to whoever maintains my code).


if(IN_RANGE_CLOSED(dw_result,0,WHATEVER))

Build the appropriate model for IN_RANGE, and done.

Has the benefit that you explicitly state the type of range (open|closed|half-open[LR]), so you'll be a bit more likely to think about the edge cases.

Is it painful? Yes. I'd rather take that pain than debugging crash reports, though. (YMMV - it certainly depends on what you are building, how large your audience is, and what the consequences of a crash are)


To be honest, that's really ugly to me, and it requires me to remind myself of exactly what that macro is. I'm definitely not willing to change my code style solely to satisfy a static checker's zero-impact 'bug'.


The use of DWORD as a type should, in my opinion, already state that the value is always non-negative. Explicitly checking for it might even be a bad thing, as it can distract the reader. He might not check the type and let negative numbers pass in as arguments because he trusts that the >= 0 check handles them. (This of course might, or might not get caught by the <= WHATEVER check).


You seem to be arguing that, when viewing a conditional, the reader should have to verify the type of the variable in order to glean understanding of the logic. I don't like that. I don't want people to have to double-check variable types, and I want everyone from the project lead to the new college hire (who doesn't have an intrinsic understanding of DWORD implications) to, at a glance, understand that I'm checking for a range of values.

Of course, all of this would be handled by a native range expression, but we don't get that in C/C++. And, as I said in an earlier comment, project- or company-specific macros just add another layer of required comprehension.


This article was nice but it could have been great with some code exaples illustrating the benefits of static analyzers. It would have been realy great with examples of what one tool could help with that another would miss.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: