

GCC and static analysis - emillon
http://lwn.net/SubscriberLink/493599/7621ec4c8ab14f15/

======
adestefan
I really need to renew my LWN subscription; their articles are outstanding. I
wish I could write as clearly on deeply technical subjects.

------
sp332
Is it possible to just install the static analyzer, without having any actual
compiler like Clang?

~~~
mynegation
Static analysis is a method, not a goal. Static analysis in compilers is done
for the purposes of optimization, so it is meaningless without a compiler.
There are commercial and open source products that do static analysis to find
logical errors and security vulnerabilities in your code like Klocwork,
Coverity, or Gimpel Lint. There are formal verifiers. Decompilers, indenters
and obfuscators can be considered sort of static analysis tools too, although
on a shallower level.

~~~
sp332
Oh, I see. The article specifically mentioned thread-safety annotations, and I
was wondering if that part could be run without actually compiling the code.

~~~
scott_s
I think there's some terminology confusion. You probably think of the actual
generation of a binary-executable as "compiling the code," but that's not
really accurate. Rather, the entire process of source code to machine-
executable is "compiling." But there are many steps inbetween, many _phases_
of the compiler, and even if you stop at any one of those steps, you've still
"compiled" the code. I know that the compilation phases of gcc roughly go like
this:

1\. Lexing: <http://en.wikipedia.org/wiki/Lexical_analysis>

2\. Parsing: <http://en.wikipedia.org/wiki/Parsing> This is where you would do
static analysis on the high level language. [edited because zeugma's comment
made we realize it was ambiguous which kind of static analysis I meant]

3\. High-level (C, C++, etc.) source is transformed to a three-address
intermediate representation called Gimple:
<http://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html> This phase exists for two
reasons. One, gcc support many languages, so all of those high-level languages
are translated to a single representation. That way they can all share the
same optimization and code generation back-end. "Three address" languages are
kind of like a simplified version of assembly, but it's completely machine
agnostic.

4\. Various optimization passes are performed on the architecture agnostic
intermediate representation (which, again, in gcc is Gimple). These are
optimizations that have nothing to do with the target machine. Many modern
compilers transform the code into SSA form
(<http://en.wikipedia.org/wiki/Static_single_assignment_form>) to enable other
optimizations.

5\. The optimized intermediate representation gets transformed into the
assembly language for the target architecture. Architecture specific
optimizations will typically happen at this phase.

6\. The assembly language for the particular architecture is passed to an
_assembler_ for that architecture, which does the job of producing an actual
binary-executable. gcc calls _as_ , which means that what you probably think
of as "compiling" isn't even done by gcc itself!

In theory, you could ask a compiler to stop and produce output at any one of
those phases. In practice, I'm not sure how many of these phases you can ask
gcc to stop at. I know, for example, that if you pass -S to gcc, it will stop
at the end of step 5, and you can see the assembly it produces for the high-
level source you give it. I'm not sure about stopping earlier.

Now. Back to your actual questions. No, you can't install the "static
analyzer," because it is deeply integrated into the compiler - it's part of
phase 2 from above. You may be able to ask gcc to stop after it produces
Gimple, which would allow you to take advantage of its static analysis. But,
as this article mentions, that means you're married to the Gimple format for
doing all further work with the program.

~~~
zeugma
If you do static analysis on Step 2 (ie on the AST) then you only need the
Front-end of your compiler.

Clang/LLVM is much more modular, library oriented.(which is why the Google
engineer wanted to switch to it). You can just link to the needed front-end
and do the static analysis without generating the LLVM bytecode.

So yes, you could install a static-analyzer without the whole compiler.

~~~
scott_s
_So yes, you could install a static-analyzer without the whole compiler._

You can install _a_ static analyzer without the whole compiler, but not
_gcc's_ static analyzer. I assumed "the static analyzer" that sp332 was
talking about was specifically gcc's, not any generic static analyzer.

------
sohn
To the person who sent this, could you please give me a subscriber link to
<https://lwn.net/Articles/494993/> ?

~~~
emillon
Jonathan Corbet sometimes posts LWN articles on reddit, this is how I found
this one.

~~~
ableal
Corbet also occasionally posts subscriber links here from his own account (
<http://news.ycombinator.com/user?id=corbet> ).

