

Why parser generator tools are (mostly) useless in static analysis - gorgas
http://blog.buguroo.com/why-parser-generator-tools-are-mostly-useless-in-static-analysis/

======
marktangotango
The author presents a decent overview of some of the issues related to source
code static analysis from a parsing perspective, but makes a pretty weak
argument that parser generators are useless for the task.

Static analysis is a broad term, compilers generally do static analysis as
part of many optimizations. Here I think the author is referring to static
analysis used in areas such as program verification (ie security).

Usually if one is implementing an application to provide static analysis
services, one would like to support as many languages as possible, broadening
the accessible market.

Ultimately, such static analysis needs to end up with an abstract syntax tree
(AST) that represents the program when machine code is emitted. This can be a
very complex task for some languages:

1\. C/C++ the source code should be preprocessed, with all includes resolved
identically to the system being analyzed. Processing alters the source file
from what the user sees, making relating error message to source location much
more involved.

2\. Templated source files ie php, asp, jsp, ... the templated text has no
relavance to the meaning of the program.

There are two problems the author states with parser generators that I agree
with:

1\. Lack of facilities to relate parsing error messages to relevant offsets in
the source file. Users expect relevant error messages.

2\. The curiosity that the most current language parsers are implemented in
that language (C# Rosalyn, Java comiler api). Antlr is the exception among
parser generators in that there is a large corpus of grammars for a variety of
languages. Usually they're out of date with the latest language spec (C#). I
think lack of grammars is the criticism here.

