If you're not on Windows, using both gcc's -Wall and -Wextra along with Clang in the same way is a good start. (Here's a post with more details: http://neugierig.org/software/chromium/notes/2011/01/clang.h... .) The Clang static analyzer wasn't very useful at the time I tried it because it didn't analyze C++ code. Valgrind also finds a lot but it is harder to be diligent about fixing.
The PVS Studio guy (mentioned in Carmack's post) ran our code through it as well and also found a number of bugs, as described in a few posts: http://www.viva64.com/en/a/0074/ http://www.viva64.com/en/b/0113/ . (As Carmack supposed, they also claimed the Chrome code was some of the best they'd seen. But it's more likely they were being truthful in both cases.)
They've also ran the Chrome code through Coverity, but I haven't been involved in fixing those bugs so I don't know how useful it was. Searching the bug tracker for [coverity] turns up a handful of bugs, but it's possible more are hidden for security reasons.
The problem with static analysis is programmer hybris - "No, don't initialize that variable. I've envisioned every execution path, and it's impossible to ever be used uninitialized. And not initializing it is so much more efficient!". Cue crash reports due to uninitialized variables about 4 weeks later....
[Side note:Since Coverity bugs are rarely filed in the bug tracker, it'd be more effective to search CL's for the string CID= which identifies Coverity bug reports. And no, security-critical fixes won't get that marker ;)]
As for PVS, I'm itching to see a full report on Chrome. The excerpts he post are very interesting, and have a good amount of bugs that are not flagged by Coverity. (I assume vice versa applies, hence the wish to look at a full report). Now I just need to make some time to run an eval...
Carmack said that for Coverity though, not for PVS.
PVS-Studio vs Chromium - Continuation - http://www.viva64.com/en/b/0113/
Boolean b = new Boolean(true);
Little did I realize that this variable was actually a lock, and there was a synchronized(b) block later (and much deeper) in the code, which I effectively eliminated by removing the new.
In my defense I feel that the real bug here was that of documentation- had the variable been named something like "lock" I'd have understood immediately what was going on. But that doesn't make you feel much better when your team's been up all night fixing your bug!
Moral of the story: your codebase (especially if it's an older one) might actually be depending on its "bugs" for proper behavior. Think (and test) hard before applying suggested changes from static analysis.
The problem was they they also (ab)used that field for the lock. It either should have been a separate field (of type Object, as you suggested) or use the existing Boolean, but call it "lock" or "monitor" or somesuch.
// we need a heap object so we can synchronize on it later
Boolean b = new Boolean(true);
And for the people who work on SA systems - please give me a way to annotate that is not exclusively via comments. Especially once people use multiple SA packages, that is rather annoying :)
In that instance too, lack of commenting to explain the behavior was also at fault.
Could you check all the references where that b variable was used prior to changing the code?
There likely to be just one or two such places.
1) I can learn something new and what I thought was an error would turn out to be an interesting new coding trick.
2) If it was an error - not only we would fix it, but would also learn not to make such error in the future.
In particular, that person who made that error would learn to avoid it in the future.
However, more fundamentally, Haskell code just naturally provides much more information to static analysis tools than any other language I've worked with. Even if the level of tooling is not there yet (I haven't worked on any large projects, so I am not entirely familiar with it) the potential for these tools is much greater in Haskell. I think programs like HLint are already very thorough. I've just been using Haskell as more of a hacker language than a "bondage and discipline" language and haven't bothered with these tools :)
Unhandled null references are the most common type error that other languages don't catch as readily (I fairly frequently try to write code that doesn't handle the not found case in a map lookup), but there are so many times when I've tried to write code that the compiler has rejected, but that wouldn't have been picked up in other static languages.
WRT hlint, I've not seen it point out any actual bugs in code, it only finds ways to golf it, in my experience. It doesn't even find a few common performance impacting mistakes like length foo > 0 ==> not (null foo). OTOH, its quite a beautiful tool, especially if you look at the implementation of checks, which is this simple and beautiful:
error = take (length x - 1) x ==> init x
But it goes beyond that, even more so as the haskell culture builds upon this and actively encourages taking advantage of the type system by encoding as many things as possible in the program's types, where they can be statically checked.
For example, the type system doesn't just handle values, doing actions is also managed by the type system. The type of main is IO (), which roughly means "do something with side-effects". If I write a function that reads in a file and returns a string, it's type is not String, it's IO String, or, "do an action, which causes a String to be returned". I can shuffle this around at will, and use the IO monad to handle running the program in correct order. And IO is just a single way of managing order -- continuation passing is just another monad in the type system, that didn't even need any additional code in the compiler to implement.
But what really makes the difference from Java or C++ imho is the type inference. Basically, if you never write a type declaration, the only time the type system bothers you is when there really is a bug in the code. Psychologically, this is a big deal. In Java, the type system is a chore that makes me type unnecessary things. In Haskell, the type system is a compile-time check that catches a lot of real errors.
As a real life example of how this can happen, PHP 5.3.something release had a serious bug with MD5 hashes essentially not working. (cf. https://bugs.php.net/bug.php?id=55439 http://www.php.net/archive/2011.php#id2011-08-22-1 ). Apparently there was a unit test for it, but there was ~200 failing unit tests, so they ignored it.
It's made for a much cleaner codebase - and it means that we instantly know if we're about to do something iffy, rather than having them buried under a ton of other warnings.
There was 8000+ warnings and findbugs errors in the codebase that I went through and fixed. Luckily a lot were just white noise, but some >100 were definitely valid bugs that had been in the code for quite some time.
In order to keep things clean after all that work, I turned on warnings as failures as part of the continuous integration build (which I also set up) so that everyone would get an email each time the build failed. Yea public shaming, heh. The hard part was then training people to pay attention to the emails and not filter them to the trash.
As a final step, what I did was built the testing environment into the CI system so that if they wanted to test their code on a virtual machine before getting their branch into the latest iteration, they needed a clean build in order to generate the debian installers. That was the final kicker which really made people start paying attention to this stuff.
So, in order to push code to production, which everyone wants to do, you needed a clean build. Problem solved and it really cleaned up the quality of production releases. ;-)
(And it took me an absolute age to get rid of all the warnings too.)
I setup builds which could be tested as soon as the build was complete. I also had to do a lot of work to optimize the build process so it would complete quickly. It started out with ~20+ minutes and I worked it down to around 3-5 minutes.
And as multiple front-ends use the same back-end, we can't just move the back-end up whenever we feel like it, or we break everyone else's UIs.
See the "Static Analysis" section:
If SA discovers a problem, you'll discover it the moment you run it through the analyzer, while by the dev's admission many of their bugs are discovered as bug reports. Which is clearly a bit later :)
Now, it might well be that most of SQLite's bugs simply are not discovered by SA. But SA is not going to report them later than bug reports, unless you use it very infrequently.
3 problems were detected by Coverity and fixed in SQLite.
Yes, as the page notes the SQLite codebase has a > 1000:1 tests to code ratio. I don't think there's any other codebase this thoroughly tested, and it makes sense that static analysis tools won't discover anything not already covered by tests.
and it makes sense that static analysis tools won't discover anything not already covered by tests
Whenever a bug is reported against SQLite, that bug is not considered fixed until new test cases have been added to the TCL test suite which would exhibit the bug in an unpatched version of SQLite. Over the years, this has resulted in thousands and thousands of new tests being added to the TCL test suite. These regression tests ensure that bugs that have been fixed in the past are not reintroduced into future versions of SQLite.
While this is a great practice, it's reactive. It's the result of particular bugs, not someone asking, "What are the situations we haven't covered?"
The coverage they have for error conditions (file system, out of memory, bit-flips) is impressive. I'm not saying I know you're wrong, but I think there are too many variables to say with confidence either way.
If that's not good enough then I think static analysis is a decent step, but probably pales in comparison to using stricter languages (eg. Haskell).
This is generally exponential in number of functions, modules, etc involved. For example, a function with N if statements that are not nested generally needs 2^N testcases to properly exercise it.
So having 1000 times more tests than code may not mean that you have complete coverage at all. It depends on the structure of the tests and the code.
(Not disagreeing, I just had to go through this process in my head when I thought about what you said in comparison to what they said.)
For any nontrivial project, testing every codepath is basically impossible, unfortunately. :(
CompCert is a verified compiler that transforms code from "virtual machine" of language C to "virtual machine" of PowerPC.
It is generated from Coq, though.
But, please see ynot: http://ynot.cs.harvard.edu/
They have a verified SQL compiler. Again, generated from Coq source.
So I think you're wrong claiming that static analysis isn't useful for virtual machines. For C you have to have very extensive annotations, as it is not very expressive by itself, but static analysis is still possible.
Both are LLVM-related projects (and there's a few others as well, but these are the two "big" ones).
- People are generally happy with what they have: digging around is not a fun thing to do.
- It is very easy for the maintenance programmer to make assumptions about the preconditions for a piece of code that is not valid.
- Size of code is a critical metric for quality.
And most importantly (and probably overlooked), quality is just but one metric of software. The name of the game here is providing value to the customer, not about writing perfect code. John kind of throws that out there in a pro forma way, then goes ahead without digging any deeper. Oddly enough, I can't really draw any conclusions about static code analysis, the topic of the essay, without a clear definition from the author about what the trade-offs are. We're left with "just use it" as a conclusion.
After reading this, I wonder if programmers don't get stuck on the same general level of abstraction and this staying-in-the-same-level thinking introduces unnecessary code complexity. To illustrate, let's try a thought experiment.
Suppose there was no modern OS -- just a x86 compatible CPU and BIOS -- and you were supposed to put an image on the screen stored on a USB drive.
It would involve huge amounts of work -- code to get information from the drive, code to understand images, code to respond to the keyboard, etc.
The reason we can do this so easily today is that whatever we write is basically in a DSL that sits on top of other DSL/APIs. We are working at a higher level of abstraction.
I wonder if putting programming projects on a "code diet" isn't something we should try more often. Announce that whatever our solution is, it's not going to be more than 10KLOC. If we have to split into teams to provide layers, we will. Each team has 10KLOC and should create a DSL at their particular level of abstraction.
This forces us to keep project codebases very small, yet should provide just as much freedom to create very powerful software as we have today. I understand that many will say "but there's no way you're going to make any kind of useful layer of abstraction in 10K of code!" I disagree, but that a big can of worms to open up in a HN thread. The important question is this: should we create arbitrary limits on our abstraction layers as a way to enforce higher code quality?
Just thinking aloud.
Yes, sort-of, although . . .
A simple example 'in the small' to consider: function/procedure size. Should we (syntactically) limit functions to, say, <30 lines?
What we are really trying to do here is limit complexity. And that is not simply a result of length, but more of interrelation.
If we recall Dijkstra's intent with 'Structured Programming', it was to make complexity proportional to program length. It is not the total amount, but the way it is arranged or broken up. Strict size limits in effect sort-of do that -- they make complexity nearer constant (since it is bounded) within each part.
> should create a DSL at their particular level
And that leads to why the DSL idea (or something else to do a similar job) is more important than the simple limit idea. What we want primarily is different structuring.
In terms of the function-size example, we can (rather ideally) compare imperative with functional programming. FP languages seem to tend away from the length problem: because they are more trees of expressions instead of sequences. They have different structuring basics that control interrelation complexity more.
To put it briefly: we want it so you cannot add length without also adding 'depth' (of abstraction, of levels, of separation).
Expressed more fully here: 'Should there be hard limits on program part sizes?' http://www.hxa.name/notes/note-hxa7241-20101124T1927Z.html
To rephrase: would imposing arbitrary code length restrictions along with DSL-type training automatically drive programmers to write better quality code?
I believe the answer is yes. You are correct in that the technical issue is more complicated than that, but my point was asking if making an arbitrary rule would drive teams into making those kinds of design choices, instead of just being happy with the way things were and continuing to expand the code base ad infinitum. I was trying to draw in several lines of thought I gleaned from the essay and synthesize something new.
I also think there's a big difference in creating an arbitrary limit on function size and doing the same for an overall project. I wouldn't like the idea of limiting function size at all. You can do a lot in 10K lines of code. There's a lot more freedom there. Once again, my thrust is human behavior. Telling somebody that each little piece of code they write is subject to somewhat arbitrary restrictions is a lot more onerous to me than simply making a "budget" for the entire project.
Unfortunately there is no silver bullet. No single recipe will tell you how to cut up every project into modules. You have to work hard to find the right decomposition for your problem. Arbitrary size limits are as likely to hurt you as help you.
That said, programmers should indeed strive to write less code, as a rule of thumb.
It's really exciting, and they've already made some great progress towards that goal. I recommend reading their first year progress report as it has a lot of the most exciting stuff about this (they just finished year 5):
BTW: dated today, but I'm sure I've read it before. Maybe a write up of the earlier episodes (e.g. /Analyze in 360 SDK).
The time it takes to write test cases that catch all the same issues that static analysis would. I believe that testing lends itself to finding different kinds of defects and it would be very unproductive to write tests that cover all the same issues that static analysis can find (in statically typed code)
The cost of a bug slipping through the net. Some types of bugs cannot be ruled out by testing, but it may be possible to prove that they are not present. E.g. non deterministic concurrency bugs.
and he explains why:
"There was a paper recently that noted that all of the various code quality metrics correlated at least as strongly with code size as error rate, making code size alone give essentially the same error predicting ability. Shrink your important code."
Because I have a few people on my back that keep screaming that dynamic languages produce fewer lines of code and jumped into conclusion that "therefore it is better in terms of quality" while the code that these group of people produce seems to be similar to that of Perl => less code, unreadable (requires you to re-read intensively) if you go away for a few days and come back to work on it. Lots of meta-programming and prefer shortcuts over readability. Less bugs? hell no...
i.e.: Java vs Ruby or Java vs LISP
At some point in time, the complexity of the system and the available tools/libraries provided more parameters to the formula of bug-rate calculation that may throw off the result of the paper.
Consider this: a fellow worker had to write something that utilizes eBay's API. There is an existing eBay Gems available and he used that first. He stopped after a few hours due to bugs and undocumented stuff. His other options? SOAP/WSDL. Now based on what we know, Java has better SOAP support than Ruby. We're not saying that Ruby can't do it, but we questioned the comfortability/usability of using SOAP and Ruby. Essentially, one must read the WSDL (treat the WSDL as the documentation) to figure out the data type in Ruby. Even then, what happened if the WSDL has been updated by eBay at some point in the future? more further WSDL-proof-read. Not so in Java, with the help of IDE and compiler, you can easily navigate WSDL objects and detect breaks if WSDL has changed (vial wsdl code-gen).
At this point, it seems that using Java is a better option as opposed to Ruby.
This is where such research tend to be questionable: "when all things stay the same..."
I will say, though, that static analysis is still very much an immature technology. Look for it to be much, much better in a decade or so.
Why is that? Are there big unsolved problems or is it more of a grinding away at little things?
see Model Verification:
and SMT solving:
What we have in practice are approximations and heuristics; over time we will develop better approximations for the kinds of code people write in practice. Unfortunately fragility of analysis will always be a problem; if you change your code just a little bit then reasoning may fail.
As someone who has spent years doing research on shape analysis my personal belief is that the dream of a fully automated "find my bugs" static analysis is unrealistic. Some of the problems you must solve to analyze the heap, say, are very hard indeed.
We need to think more about the interactions between language design and verification, rather than hoping that people will build better tools to analyze existing code. Strong typing (e.g., the Hindley-Milner type system) is one example of a sweet spot that demonstrates how language design (type annotation, type declarations) interacts with tools (type inference, type checking). Type systems are some of the most widely used bug finding tools today. Trying to build verification tools without considering language design is always going to be difficult.
I found this, which looks like an interesting start: http://stackoverflow.com/questions/38635/what-static-analysi...
I particularly like the idea of automated security analysis. I'm pretty sure some past codebases I've worked on have had seriously low-hanging fruit in that regard.
Also, any language that targets the JVM (like Scala) can be checked, though FindBugs may report questionable code in that language's code generation. :)
[PyCharm is great, but IDEs just don't do dynamic code like they can static code and it hurts.]
The particular errors Carmack talks about are all holes in the type system - they're areas where the type system of C/C++ is unsound (as in the type-theory definition, not the colloquial definition). Null pointer exceptions don't exist in Haskell, because null pointers don't exist; you have to explicitly use a Maybe type. Printf errors do (at least with Text/Printf), but there's a lot of research on dependent types to solve specifically that error.
Again, though, it's a tradeoff. Haskell lets you find a lot of errors at compile time, but the tradeoff is that you spend much more time figuring out why your program won't compile. For a lot of software, you're better off shipping with bugs than not shipping at all. Particularly for exploratory software, it makes more sense to build something that works for your demo just to see if it's useful than build something for everyone that nobody wants to use. Specs that don't meet customer needs are just as buggy as code that doesn't meet the spec.
I'm not sure you can say that. `undefined` is a case of `error`, so it will blow up any time it's encountered (it's often used to stub code) not just when you try to use it, it's closer to putting a `throw` than to putting a `null`.
It's usually used to stub code during development in TDD-type scenarios:
myFunction = undefined
myOtherFunction foo = doSomethingWith value
where value = myFunction foo
You can't have `undefined` "pass through" your code the way `null` does.
You do. They're not statically checked, but they're there.
> I love Pyton/Ruby and other dynamic languages, but I miss the C++ type system when using them.
Why not use a language less syntactically heavy than C++ but still statically typed then? (and C++'s type system? not really going for the stars, are you?) Because a major part of Python and Ruby is indeed that they're not statically typed. A nominative static type system would yield quite different a language, probably something close to Cython. Alternatively, you could fork Python or Ruby with a structural type system, this could be interesting but still — I think — different languages than their originators, not merely dialects. It also would be nothing even remotely close to "the C++ type system" (not that this would be a bad thing, AFAIC). And you probably wouldn't know "what types you're comparing" either.
Hell, Ada has every single feature he wants plus more. I'm particularly fond of the range feature.
type degrees is range 0..359;
I'm not sure how to make someone who isn't interested in learning history, learn history. Is this just a result of that our industry is entirely composed of youth?
If you legally have to have a meeting with 10 people to review and physically sign off on every line of code, your project will probably take a while.
Plus the compilers sucked in the 80's.
Do you mean that not having free side-effects of any kind anywhere is too restrictive?
While investigating the feasibility of my product idea, I have run across 3 interesting Python analysis tools that are often overshadowed by pyflakes & pychecker. You may find them relevant to your question:
- pylint (http://www.logilab.org/card/pylintfeatures): Already mentioned here, but the list of checks it will perform is humbling.
- Pyntch (http://www.unixuser.org/~euske/python/pyntch/index.html): Type inference via graph analysis.
- Pypy's Annotation Pass (http://readthedocs.org/docs/pypy/en/latest/translation.html#...): Type inference via control flow graphs.
Anyone maintaining large Django or other Python projects who wants to chat about my product idea, email me (my HN username at Google's email service)! I am especially interested in any constructive skepticism you send my way. I am looking to refine my ideas into something that genuinely advances programming.
I use it regularly on client projects (it's like giving your code a certification!) and its heavily used/recommended in the Perl community.
See http://perlcritic.com/ & https://metacpan.org/module/Perl::Critic
> It is important to say right up front that quality isn’t everything, and acknowledging it isn’t some sort of moral failing. Value is what you are trying to produce, and quality is only one aspect of it, intermixed with cost, features, and other factors.
Tips on speeding up PVS-Studio - http://www.viva64.com/en/b/0126/
Build the appropriate model for IN_RANGE, and done.
Has the benefit that you explicitly state the type of range (open|closed|half-open[LR]), so you'll be a bit more likely to think about the edge cases.
Is it painful? Yes. I'd rather take that pain than debugging crash reports, though. (YMMV - it certainly depends on what you are building, how large your audience is, and what the consequences of a crash are)
Of course, all of this would be handled by a native range expression, but we don't get that in C/C++. And, as I said in an earlier comment, project- or company-specific macros just add another layer of required comprehension.