Analyzing Python Code with Python

sankha93 · on Aug 20, 2020

My pet peeve is with all static analysis articles like this is they demonstrate how to solve a particular problem, but never talk about the universal abstraction that allows one to write any static analysis tool: interpreters. Really, all you need to do is write an interpreter.

A standard interpreters for a language (like Python) evaluates the language to produce some value. If you want to do static analysis, make the values the property you are analyzing. In this blog post's example the value is a boolean denoting whether a file is a test file or not. Thus an interpreter would be recursive on individual AST nodes, doing the decisions shown here in the sample and returning the expected value from the analysis. This forms the core of a rich literature in abstract interpretation, where the program analyzer instead of producing the concrete results of evaluating a program produces a the value of a property that you want to analyze.

Even compilers are interpreters. Instead of writing ad-hoc methods that do some operations, compilers can often be structured as something that evaluates the a program in source language and instead of producing a value, produces another program in a different language.

Interpreters are underrated!

Edit: typos

R0b0t1 · on Aug 20, 2020

This is (or seems to be) the value proposition of llvm and clang. Both are easy to use as libraries, so you can create a program that does detailed analysis on a hard problem domain (C or C++ syntax) very easily. It would be nice to see these things for other languages.

danielscrubs · on Aug 20, 2020

Well... for me it seems a bit silly. Abstract interpretation as part of formal methods requires mathematical rigor (albeit some of it quite simple and intuitive) but interpretation I’d argue is up to the interpretator.

sankha93 · on Aug 20, 2020

I agree with the mathematical rigor required for abstract interpretation. I am arguing that there is a lot of value in thinking about programs that operate on other programs as interpreters. Essentially interpreter doesn't have to be equivalent to the evaluation of a program.

vii · on Aug 20, 2020

py.test has a different way of finding tests. It loads Python files starting with "test_" or ending in "_test.py" then uses reflection, to find functions starting with "test_" or classes starting with "Test", or functions marked with the attribute "__test__" (which is applied by the @nose.tools.istest decorator). See https://github.com/pytest-dev/pytest/blob/d69abff2c7de8bc65b...

This mechanism of scanning the AST is a different approach and won't find the same set of tests. Scanning the AST can be useful and things like https://macropy3.readthedocs.io/en/latest/ are awesome though!