
Lessons from Building Static Analysis Tools at Google - DannyBee
https://cacm.acm.org/magazines/2018/4/226371-lessons-from-building-static-analysis-tools-at-google/fulltext
======
lukasLansky
It's worth noting that in the .NET world, there is a useful extension point
provided by Roslyn compiler that can be used for shipping analyzers and
codefixes using the standard packaging tools (NuGet). Static analysis
distributed in such way is automatically integrated into both IDE and CI
workflows -- as its directly supported by compilers.

This means that you can static analysis checks besides your library. For
example, you have unit testing library that expects to be used in a certain
way that cannot be completely described using type system -- it expects that
test methods marked by some test attribute has to be public. Such library can
include analyzer that would check usages of types from the library and report
warnings. User of the library don't have to do a thing to register such
analyzer, it will just work.

------
WalterBright
Adding to why static analysis tools are not used:

1\. The false positives sometimes lead to ugly workarounds needed to quiet the
message.

2\. There's no standard, meaning different SA tools produce very different
results, often contradictory.

A better approach is to improve the language. For example, the `(a < b < c)`
is tripped by many static analyzers. In D, the grammar was changed so `a < b <
c` is not valid. And the C/C++ committees should really deprecate `l` as a
valid integer literal suffix :-)

~~~
bluGill
1 is why I refuse to allow any static check in my company if you can show a
real false positive in our code. If there is any doubt someone will quickly
try to place all instances of that error in the doubt, and it is up to me (as
the maintainer of static checks) to prove them wrong.

I have never seen contradictory parts of 2, so I don't know what you are
talking about. If it is what I think: I would reject one of the two tools just
so that I had a single easy to apply set of rules that have no false
positives.

Static analysis is very valuable, but only when you can treat it like the hand
of some god that will throw lightening at any developer who writes a found
error. I'm currently looking at 1 suppression per 2 million lines of code. I
think there is more value in more checks, but unless they can be fully
automated without telling everybody how to suppress false positives I won't
consider them.

~~~
UncleMeat
You don't allow any type checking?

You can tune an analysis to be considerably more conservative and reduce false
positives. The analysis becomes more like a semantically aware formatter.
Doing most things related to pointers will lead to FPs, but you can still get
value even if you are terrified of FPs.

~~~
bluGill
We do lots of checks. Most of our code is C++ where we get a fair amount
"free", and the few times where type checks get in the way they are easy to
bypass in a natural way.

------
grandinj
At LibreOffice we, (mostly I) do whole program analyses, and they are fairly
useful e.g.
[https://cgit.freedesktop.org/libreoffice/core/tree/compilerp...](https://cgit.freedesktop.org/libreoffice/core/tree/compilerplugins/clang/constantparam.cxx)

That said, the analyses (a) are specific to our team (b) require about 2 hours
(each) to compute on a 6-core box (c) are only run every couple of months (d)
are run and applied by the same person. (e) have a false positive rate of
around 1%, so require manual review (f) often make assumptions that are only
valid for how we use C++ and various libraries (without some "well just don't
do that" assumptions you can't do any useful analysis across a C++ codebase
:-)

~~~
skylyrac
We use Coverity and, even though some members of the team really dislike it, I
think it is quite helpful to detect things that would go unnoticed otherwise.

However, our project is relatively small, so we are normally done in ~15
minutes. We run it right before upstreaming patches, which means that the
tests run once or twice a day.

------
z3t4
If there are too many false-positives, it just becomes an annoyance. If the
false-positives are too high, it can have the opposite effect, that code
detected to have bugs is more likely to _not_ have a bug. The false-positive
max is around 5% I think. Anything above that - it would be more effective
showing warnings at random locations.

~~~
sanxiyn
The article claims 10% is a sweet spot for Google.

~~~
bluGill
If I read it correctly, that 10% was only for the static checks flagged in
code reviews. They have some checks that are 0%.

Which is to say there are some checks that are always right and google doesn't
allow violations. The 10% seems to refer to checks that catch some significant
bugs - things bad enough that they are worth looking at, but the check isn't
perfect.

------
mcguire
" _Google does not have infrastructure support to run interprocedural or
whole-program analysis at Google scale..._ "

I am a bit unclear about this statement. Does it really mean that Google uses
a single, monolithic program to do everything from advertisement payment
tracking to the GMail user interface? Or does it mean they don't have the
infrastructure to do analysis of many individual programs? (Surely, that can't
be true!?)

~~~
grandinj
I read that part more as: we don't have the necessary infrastructure to
compute whole-program analyses, and we don't have sufficient justification to
invest the effort in creating such infrastructure.

------
mcguire
" _Attempt 1. Bug dashboard. Initially, in 2006, FindBugs was integrated as a
centralized tool that ran nightly over the entire Google codebase, producing a
database of findings engineers could examine through a dashboard. Although
FindBugs found hundreds of bugs in Google 's Java codebase, the dashboard saw
little use because a bug dashboard was outside the developers' usual workflow,
and distinguishing between new and existing static-analysis issues was
distracting._

" _Attempt 2. Filing bugs. The BugBot team then began to manually triage new
issues found by each nightly FindBugs run, filing bug reports for the most
important ones. In May 2009, hundreds of Google engineers participated in a
companywide "Fixit" week, focusing on addressing FindBugs warnings.3 They
reviewed a total of 3,954 such warnings (42% of 9,473 total), but only 16%
(640) were actually fixed, despite the fact that 44% of reviewed issues
(1,746) resulted in a bug report being filed. Although the Fixit validated
that many issues found by FindBugs were actual bugs, a significant fraction
were not important enough to fix in practice. Manually triaging issues and
filing bug reports is not sustainable at a large scale._

" _Attempt 3. Code review integration. The BugBot team then implemented a
system in which FindBugs automatically ran when a proposed change was sent for
review, posting results as comments on the code-review thread, something the
code-review team was already doing for style /formatting issues. Google
developers could suppress false positives and apply FindBugs' confidence in
the result to filter comments. The tooling further attempted to show only new
FindBugs warnings but sometimes miscategorized issues as new. Such integration
was discontinued when the code-review tool was replaced in 2011 for two main
reasons: the presence of effective false positives caused developers to lose
confidence in the tool, and developer customization resulted in an
inconsistent view of analysis results._"

Once upon a time, I had a co-worker on a C++ project commit a change disabling
-Wall with a comment that can loosely be paraphrased as, "No one has time for
that."

It's good to see Google has the same issues.

------
vchamakkala
Semmle (and its open source product LGTM) is a really great tool in this
space.

------
neves
Very nice to integrate static analysis with code review tools. Did anybody
here successfully integrate Java static analysis (e.g. FindBugs) with GitLab
merge requests?

How would someone integrate static analysis to the workflow if there's no code
review?

My experience with bug dashboards is the same of Google. It is outside the
developers workflow.

------
neves
Are there an "ErrorProne" tool for Javascript or Angular?

~~~
alexeagle
Hi, I started Error Prone and now work on the Angular team :)

In addition to tslint (the "old way") we now have
[http://tsetse.info](http://tsetse.info) which follows the Error Prone model
of baking checks into the TypeScript compiler and failing the compilation the
same way as the type checker (for the error case).

Still have some work to do to make warnings appear in code review like we talk
about in the paper...

------
sanxiyn
> Because checks in Error Prone are self-contained and written against the
> javac abstract syntax tree ... it is relatively easy for developers outside
> the team to contribute checks... As of January 2018, 733 checks had been
> contributed by 162 authors.

Somewhat interestingly, Clippy is an analogous tool for Rust and has 181
contributors now, more than Error Prone. [https://github.com/rust-lang-
nursery/rust-clippy](https://github.com/rust-lang-nursery/rust-clippy)

~~~
setheron
Why is that relevant to this post?

~~~
saghm
Because it provides a comparison point to a statistic in the article? GP was
pretty explicit about how it related to the article

~~~
tonfa
Might be apples to oranges, if one tool is internal and the other isn't.

~~~
sanxiyn
Error Prone is not internal: [https://github.com/google/error-
prone](https://github.com/google/error-prone)

