
Open-sourcing Facebook Infer: Identify bugs before you ship - cristianoc
http://code.facebook.com/posts/1648953042007882
======
guepe
The types of issues discovered (they mention null pointer access and resource
and memory leaks) is much smaller than what a tool like Coverity will find (I
use it). And they analyze C and Java, two languages supported by Coverity, a
very mature tool...

I am not certain of the proposed value, except it's free to other than
Facebook - but not to Facebook, who pays engineers to develop this... Is this
some kind of NIH syndrom by Facebook, or is there something I missed ?

~~~
Rezo
Coverity is great, but for example on the mid-size service (10s but not 100s
of kloc) that my team works on the analysis still takes hours. Therefore we
only do it for prod releases, not on every commit or CI deployment.

If you want to make static analysis part of the everyday development process,
it has to be 1) very quick, ideally seconds; minutes at most 2) preferably
something the developer can just run locally before pushing a change. If it's
fast and easy enough, it simply becomes another code hygiene tool like a code
formatter that you'll run continuously, perhaps even directly integrated into
something like IntelliJ.

To me Infer sounds like a nice complement to Coverity to catch issues as close
to where they are introduced as possible. It might even be Good Enough for
many projects to be the only tool, since Coverity is pretty expensive.

~~~
cactusface
> If you want to make static analysis part of the everyday development process
> [...]

Just to nitpick, I think your use of the term "static analysis" is a bit too
broad. Every (or almost every) production compiler or JIT does static analysis
intraprocedurally, that is, confined within a function / method / procedure.
On the other hand, whole program / interprocedural static analysis quickly
gets very expensive, usually because an alias / pointer analysis is involved,
and that's what you need for null pointer checks and stuff.

So I guess my point is, there is plenty of static analysis going on all the
time, just not expensive whole program bug-finding analysis. Cheap bug-finding
static analysis stuff is common, for example in GCC all those warning options
to catch undefined behavior.

~~~
theblatte
(Infer dev here) One strength of Infer is that it is inter-procedural, yet not
whole-program: each procedure gets analyzed independently. So it's cheap
enough to run on large codebases while still able to find deep inter-
procedural bugs.

~~~
cactusface
If you're analyzing procedures independently, why is it interprocedural?
Interprocedural just means that you use some information about another
procedure. This is expensive because if the information about one procedure
changes during the analysis, you have to go and reanalyze all the dependent
procedures. There are cheap but less accurate pointer analyses, is that why
it's fast?

~~~
_shb
Infer does bottom-up analysis: it starts at the bottom of the call graph and
analyzes each procedure once independently of its callers. Analyzing the
procedure produces a concise summary of its behavior that can be used in each
calling procedure. This means that the cost of the analysis is roughly linear
in the number of nodes in the call graph, which is not true for a lot of other
interprocedural analysis techniques.

It's true that it a procedure changes that you may have to re-analyze all
dependent procedures (and calling procedures!) in the worst case. However, in
the bottom-up scheme you only need to re-analyze a procedure when the code
change produces a change in the computed summary, and in practice summaries
are frequently quite stable.

~~~
cactusface
Cool, thanks for the details. So... what do you do about cycles?

By "change" I didn't mean code change, I meant change in the information about
the procedure collected during an iteration of the fixed point computation.
But from the sounds of things you aren't computing a fixed point.

For example: A calls B and B calls A. You have information A0 and B0 about A
and B. Analyze B, you have information B1 about B. Then you go and analyze A
using B1. This gives you A1. Now you have to redo B, and compute B2. Use this
to compute A2. This carries on until the information is not changing, i.e. An
= An + 1 and Bn = Bn + 1.

~~~
theblatte
Infer computes fixpoints whenever there is a cycle in the call graph, until it
reaches stable procedure summaries or timeouts.

~~~
cactusface
Ok, thanks for indulging my curiosity guys!

------
amirmc
More OCaml code coming out of FB. Can add this to the list, which includes,
Hack, Flow and Pfff [1].

The kinds of bugs it finds are listed at: [http://fbinfer.com/docs/infer-bug-
types.html](http://fbinfer.com/docs/infer-bug-types.html)

It's interesting to see how building tools with languages like OCaml can
reduce bugs for teams, _without_ them having to change the language itself. I
do wonder what things would be like if such languages we're used directly more
widely.

[1]
[http://ocaml.org/learn/companies.html](http://ocaml.org/learn/companies.html)

~~~
aristus
Legend has it there is a small room at FBHQ, containing a quorum of OCaml
committers, all of them French for some reason, hacking away at level of
abstraction beyond the ken of mortal man.

~~~
omouse
That's not a legend, it's true since they're doing a partnership with INRIA
where OCaml was born. Those French computer scientists know something that
American companies don't and that's that theory is important as an underlying
foundation for extraordinary results.

~~~
pnathan
I'd actually expand that to European - in my grad studies, the European CS
world had a much more mathematical bent. Notice the the _Glasgow_ Haskell
Compiler, Coq, etc.

You can always go from math -> pragmatism, but the reverse is nearly
impossible. People get set in their ways, and it takes time to develop
mathematical rigor, even if you want to.

So when you need to get mathematical expertise, you wind up needing to hire
it.

~~~
x5n1
room for all... zuck employs these guys and is anything but

------
ixtli
This appears† to be the result of Facebook having purchased a UK company
called Monoidics†† in 2013. It's nice to see these types of acquisitions
resulting in code getting opensourced.

†
[https://github.com/facebook/infer/blob/2bce7c6c3dbb22646e2d6...](https://github.com/facebook/infer/blob/2bce7c6c3dbb22646e2d67a2c6ade77f060b4bca/infer/src/backend/CRC.ml#L2)

†† [http://techcrunch.com/2013/07/18/facebook-
monoidics/](http://techcrunch.com/2013/07/18/facebook-monoidics/)

~~~
rattray
Companies often get a lot of hate for buying open-source and taking it closed.

We should really celebrate Facebook for doing the opposite here!

------
tptacek
Can someone explain-it-like-I'm-a-90s-programmer (ELi90s?) why so much
symbolic evaluation stuff gets done in OCaml? What does OCaml do that makes it
so well suited for this problem domain? (I know a very little bit about
symbolic evaluation and have done a very very little bit of it).

~~~
pcwalton
In addition to what the other commenter said, pattern matching is really nice
for this kind of thing. Of all of the functional programming languages, OCaml
has probably the most sophisticated pattern matching engine around (and we
basically copied it into Rust, incidentally), supporting or-patterns, multiple
bindings, guards, and so forth. Pattern matching lets you essentially match on
the shape of subtrees of arbitrary data structures with complex predicates.

If you're familiar with old-school compiler construction, this is like having
a souped-up BURG built into the language. For example, pattern matching lets
you say things like "if I have Load(Var, Add(Var, Constant)) where constant is
a small power of two, fold it into the x86 indexed addressing mode" in one
line. Unsurprisingly this is useful not only for compiler construction but for
any kind of term rewriting/symbolic manipulation.

~~~
tptacek
Ok, tangent question: if I wanted an interesting project to learn Rust with,
would a symbolic evaluation checker for (say) C code be a really good fit? In
the same sense as emulators turned out to be a fantastic fit for Golang?

If that's true, what are the features of Rust that make this so, and roughly
how would they apply to that problem domain? (I could answer that question for
Golang and emulators pretty quickly).

( _Hopefully this question comes across the way I intend it to, which is: I
have no plans on using OCaml any time soon, believe the comments that say you
want a language with pattern matching to do this in, and would love to tinker
more with both Rust and symbolic evaluation._ )

~~~
pcwalton
Sure, I think it'd be a fun project to try! We use Rust for compiler
construction, obviously, and it works great for us. Bear in mind, though, that
Rust is manually memory managed, and there is significant cognitive overhead
of having a compiler that checks that you're doing the manual memory
management properly as opposed to just using a GC.

I think if you want a really fast symbolic evaluation checker—say, the kind
that you're going to run on $BIG_COMPANY's codebase on every checkin—Rust
would be a really good fit, because you get pattern matching and excellent
performance. Other than OCaml and (maybe?) F#, I don't know of a language that
has as sophisticated a pattern matcher as Rust does, which helps a lot with
this stuff. But, if you're building a one-off tool, a more dynamic language
with a GC might be more convenient.

~~~
tptacek
Would it be likely that I could get around that problem just by using arena
allocation? Do you think the allocation patterns of a symbolic checker fit
that sort of "just allocate everything and forget about it, then free it all
at once" pattern? Does Rust make it easy to punt that way?

~~~
pcwalton
Yup, you can use arenas for that, and in fact that is likely what I'd do.
There's an arena crate on crates.io. [https://crates.io/crates/typed-
arena/1.0.1](https://crates.io/crates/typed-arena/1.0.1)

------
sadert
What's the difference between this and the Clang analyzer, which comes with
Xcode already? I expected that comparison to be on the front page...

[http://clang-analyzer.llvm.org/](http://clang-analyzer.llvm.org/)

(obviously it supports Java as well, but I assume Android Studio comes with
some sort of static analyzer as well, so same question?)

It specifically calls out null pointer exceptions but those... aren't a
thing... in Objective-C, messages passed to nil return all 0 bits, and that's
_okay_ (unless they mean null dereferences...).

~~~
dulma
On iOS there is the Clang Static analyzer. Infer does some things different,
in particular reasoning that spans across multiple files. But CSA checks for
more kinds of issues and is also more mature than Infer when it comes to iOS:
we send big respect to CSA! Infer has only got started there recently. Really,
these tools complement one another and it would even make sense to use both.
Indeed, that's what we do inside FB!

About null dereferences, they are still a problem in ObjC: if you dereference
a nil block it will crash, if you access an instance variable directly or try
to pass nil to arrays or dictionaries it will crash. We try to find that kind
of bugs.

~~~
OxO4
Sorry to hijack your comment but it sounds like you are one of the devs of
Infer. I am working on static analysis as part of my PhD and I am going to be
an intern at Facebook MPK this summer. Are you located at MPK as well? Any
chance we could meet up for some coffee at some point?

~~~
jrmd
Part of the team is in MPK. We will happy to have a chat about static analysis
once you join us this summer

~~~
OxO4
Cool! Looking forward to it.

------
timtadh
I am always interested in what powers these tools under the hood. I had to
learn the hard way, you do not write a program analysis tool from scratch, if
you can help it. I know I have tried. It is too much for one person to do.

So what is powering this thing?

1\. [http://sawja.inria.fr/](http://sawja.inria.fr/) This is a OCaml library
for parsing .class files into OCaml datastructures. There is some built-in
analysis it uses

2\. Clang and LLVM which is the popular thing to build you C family analysis
framework on.

I use [https://github.com/Sable/soot](https://github.com/Sable/soot) for Java
analysis myself. It is extremely powerful out of the box and can analyze: java
source code, jvm bytecode and dalvik bytecode. I recommend taking a look at
that if you are interested in that sort of thing.

The innovation in the released tool seems to be the incremental checking.
Haven't had a lot of time to dig into that but that seems to be the important
part. In general it is great that they created something useful and practical,
that is always a challenge.

~~~
guipsp
>I use [https://github.com/Sable/soot](https://github.com/Sable/soot) for Java
analysis myself. It is extremely powerful out of the box and can analyze: java
source code, jvm bytecode and dalvik bytecode. I recommend taking a look at
that if you are interested in that sort of thing.

Soot is also really slow if you're using SSA on large codebases, and the code
is a mess.

------
amccloud
Looks like facebook acquired this code
[http://www.theguardian.com/technology/2013/jul/18/facebook-b...](http://www.theguardian.com/technology/2013/jul/18/facebook-
buys-monoidics)

[https://github.com/facebook/infer/search?utf8=%E2%9C%93&q=mo...](https://github.com/facebook/infer/search?utf8=%E2%9C%93&q=monoidics)

------
istvan__
I am extremely happy to see Facebook using OCaml, it is good the get some more
traction in that community. I hope it gains velocity over time and becomes a
viable option especially for startups where there is no technical debt. It has
amazing features and as you can see even very complex problems can be solved
in a concise, terse way. Kudos to Facebook on this one.

~~~
aaggarwal
As it is pointed out, OCaml has support for algebraic data types and symbolic
evaluation, doesn't this make it an excellent language for natural language
processing? Are there any more examples anyone is aware of?

~~~
istvan__
I guess so. I am only familiar with the following in this category:

[https://code.google.com/p/hunpos/](https://code.google.com/p/hunpos/)

------
amenghra
This is what it can find in openssl: [http://marc.info/?l=openssl-
dev&m=143406271519649&w=2](http://marc.info/?l=openssl-
dev&m=143406271519649&w=2)

------
GreaterFool
I'm going to completely ignore the tool itself and focus on the fact that it
is written in OCaml. IMHO it's a great language that's much underused and as
such there's a need for greater library ecosystem (what's out there is
generally very good, but there isn't much). Hopefully adoption of OCaml at
Facebook will grow and we'll see some interesting general purpose open-source
libraries!

------
adamnemecek
Seems like mentioning the sorts of bugs this can detect should be the most
important thing on the landing page.

~~~
jdp23
The list is at [http://fbinfer.com/docs/infer-bug-
types.html](http://fbinfer.com/docs/infer-bug-types.html)

For Java, it's Resource leaks and Null dereferences

For C and Objective C, the list is Resource leak Memory leak Null dereference
Parameter not null checked Ivar not null checked Premature nil termination
argument

~~~
Animats
Unfortunately, it doesn't detect the big one in C - out of range
subscripts/buffer overflows. That's hard, but other verifiers have done it.

------
mofle
I made a quick installer for Infer as the default steps are a bit manual.

$ npm i -g infer-bin && infer

[https://github.com/sindresorhus/infer-
bin](https://github.com/sindresorhus/infer-bin)

------
hannob
In case anyone cares, this is what it'll find on openssl:
[https://bpaste.net/show/4914ab7990c9](https://bpaste.net/show/4914ab7990c9)

------
Cthulhu_
Gave it a quick trial for iOS... doesn't seem great. It doesn't run at all
when giving it a whole project (no response, no CPU usage, not even when you
feed it BS arguments), it gives a "Starting Analysis" and nothing else for
other (simple) files, it doesn't understand the newish 'nullable' keyword, and
it will quit with a fatal error if it can't resolve an import (like UIKit), so
pretty unusable on single files. I'm not convinced by the bug types it checks
for either - doesn't xcode / the Clang analyser do those same checks itself
and then some?

Just opened the example in xcode, the analyzer itself already highglights a
lot of issues, like the nil dereferencing:
[http://i.imgur.com/CdiYRYX.png](http://i.imgur.com/CdiYRYX.png). If it's
written in more modern objective-C and supports the nullable/nonnull type, the
compiler will also warn / fail to build when trying to assign nil to a nonnull
type.

~~~
jasonlotito
Tried it on one of my iOS project for an app in the app store, and it worked,
highlighting as expected. Possibly setting the wrong arguments?

------
hugovie
Someone may want to see which Infer shows us, check
[http://blog.hoangnm.com/2015/06/12/infer-facebook-demo-
and-t...](http://blog.hoangnm.com/2015/06/12/infer-facebook-demo-and-test/) .
I just did a test job for Infer and going to apply it into some of my iOS
projects.

------
gschrader
How does this compare to other static code analysis tools like Findbugs, PMD,
Checkstyle, etc?

~~~
_shb
To paint these tools with an overfly broad brush, they linter-like in that
they perform shallow intra-procedural analysis to identify common bug patterns
(e.g., if (x != null) { y = x.f } z = x.f // possible NPE; x was previously
checked for null or foo(String s) { if ("x" == s) // oops, should use
.equals() for Java String comparison. }).

By contrast, Infer performs deeper inter-procedural reasoning that can track
the flow of values across long chains of procedure calls to identify subtle
bugs that are hard to see with the naked eye. Infer doesn't support as many
bug patterns as these existing tools do yet, but it can find some deep bugs
that these tools will miss.

------
hatred
Layman Query : Can anyone kindly explain what are it's pros/cons when compared
to already existing tools which have a quite a bit of features already
compared to this ?

------
dcroley
At least for Java, this does not look as powerful as FindBugs.

------
defen
I clicked through to the description of separation logic -
[http://fbinfer.com/docs/separation-logic-and-bi-
abduction.ht...](http://fbinfer.com/docs/separation-logic-and-bi-
abduction.html#biabduction) \- and I'm having a hell of a time understanding
the first couple paragraphs. Is there a typo in there? How is z↦y∗y↦x "x
points to y and separately y points to x"

~~~
_shb
There was indeed a typo in the description; it has been fixed. Sorry for the
confusion!

~~~
defen
Thanks! Was worried I was losing it.

------
frik
Source code:
[https://github.com/facebook/infer](https://github.com/facebook/infer)

71.9% OCaml, 19% Java

------
toolslive
What's the advantage for Facebook of doing this, compared to moving their
developers to OCaml or Haskell? Or is that just too hard ?

------
seivadmas
Sounds like Rubocop for Ruby:

[https://github.com/bbatsov/rubocop](https://github.com/bbatsov/rubocop)

Fast enough to integrate into a continuous integration build process and
catches a lot of dumb mistakes/typos before deploying to production.

------
LukeHoersten
It's interesting they're using OCaml for some of their open source projects
when they have some of the top Haskell devs working there. I wonder if these
OCaml projects they've recently open-sourced are all coming from the same
team.

------
capnrefsmmat
Are there examples of the types of bugs this finds? The embedded video has a
single null dereferencing example, but I assume it does more sophisticated
analysis than just that.

------
warmfuzzykitten
Unfortunately, infer finds no issues in this simple variant of their
Hello.java demo.

class Hello { private String s; int test() { return s.length(); } }

~~~
theblatte
In this case, Infer is correct: the method test() by itself is harmless,
because it could be the case that "s" has been initialised before it is
called.

Infer does a bottom-up analysis (callees before callers), and will infer that
test() expects "s" to be allocated to run correctly.

To get an actual error, you need to call test() without initialising s, or
with s being set to null by some other means prior to the call.

Here is a modified version of your example that will get Infer to complain.

class Hello { private String s; int test() { return s.length(); } int foo() {
Hello a = new Hello(); return a.test(); } int bar() { Hello a = new Hello();
a.s = null; return a.test(); } }

Running "infer -- javac Hello.java" will show an error in bar(). Infer also
finds the error in foo() but doesn't report it as it considers it lower-
probability. This is a trade-off made in Infer to try and report only high-
probability bugs. In this case it could be improved. To see that Infer finds
the error in foo(), run "infer --no-filtering -- javac Hello.java".

------
achanda358
It seems, the Rust compiler does a lot of these checks.

------
svraghavan
I assume it can also be used for java server side code?

~~~
_shb
Yes, it can be used with most Java code you can build on the command line.
Just run "infer -- <your_build_command>". Currently, this works with javac,
Ant, Maven, and Gradle. See [http://fbinfer.com/docs/hello-world.html#hello-
world-android](http://fbinfer.com/docs/hello-world.html#hello-world-android)
for a Gradle example.

~~~
warmfuzzykitten
I thought I'd try it with a fairly large project (353K lines of Java) at
[https://git.eclipse.org/c/hudson/org.eclipse.hudson.core.git...](https://git.eclipse.org/c/hudson/org.eclipse.hudson.core.git/)

This is a multi-level project. At both the top level and the individual module
level, the command below seems to do exactly nothing. Actual output:

org.eclipse.hudson.core$ infer -- mvn build

org.eclipse.hudson.core$ cd hudson-core/

hudson-core$ infer -- mvn build

hudson-core$

