Hacker News new | past | comments | ask | show | jobs | submit login
Static Program Analysis [pdf] (au.dk)
73 points by ingve 6 months ago | hide | past | web | favorite | 14 comments

I haven't read it (it's on my list), but I'll pass along a recommendation by someone well suited to know: https://twitter.com/johnregehr/status/1037098838752784384

I've a PhD in static analysis. I only glanced at the table of contents but it's really great. This is exactly the important stuff to cover if you're learning about static analysis, want to build a static analyzer, etc. The only thing it doesn't cover is SSA form, which I think is fine: if you get the fundamentals in this book, then SSA will be easy to learn.

[Also, I remember reading the author's paper on Type analysis in JS: https://cs.au.dk/~amoeller/papers/tajs/paper.pdf, and remember it being really good.)

This is the same document the original post links to :)

Yes, I should've said that. I meant "this isn't some random link posted to HN, it is well regarded."

Can someone knowledgeable summarize the current state of Program Analysis state-of-the-art?

Do we know how to do context sensitive / intra procedural analysis and scale it to millions of lines of code ?

I co-created one of the commercial program analysis tools used by many large customers on millions line of code. I have been out of this for a while but track what is going on every now and then. Our analysis was context-sensitive and inter-procedural (this is probably what you meant to ask about as intra-procedural analysis means "within one function/procedure" and I cannot imagine having a function with millions line of code).

First of all: real bugs and security vulnerabilities can can be found and were found with these kind of tools. But state explosion is real. To deal with enormous computational complexity of sound program analysis, corners must be cut. Program analysis is usually defined as sound (i.e. all warnings you get are about real bugs, i.e. no false positives) or safe (if there is a bug, it is found, i.e. no false negatives). You can make analysis sound but it will not be safe, or you can make analysis safe, but it will not be sound. Both extremes are useless, because in the former case you end up with very few warnings (if any) that are definitely real, but you miss a lot of interesting cases, in the latter case you are inundated with a huge number of warnings the vast majority of which are false positives.

All commercial tools and most open source tools that I know are neither safe nor sound but try to hit the sweet spot to be useful.

I believe most progress in program analysis is done by migrating to safer languages (Rust) and runtimes (Java, Go) over time, where different aspects are eliminated or mitigated, like memory leaks with GC or deadlocks with co-routines in Go or message passing in Erlang. Proliferation of IDEs and ubuquity of lightweight static analysis tool (lint-like) during the development also helps.

Many thanks for the reply. Yes i meant inter-procedural.

For a POC project, which (open source) library/framework would be a good starting point today?

We are focused on some deep analysis of some very specific pointer/array analysis. We can target either C/C++/Java. We have varying experience with compilers/JITs/builders.

(I have worked with WALA/Java 5yrs ago..)

These are very different languages. For C++ there are not too many open source b/c C++ is tough. There is cppcheck, but if you want to do something very custom, starting with clang may be an option. For java FindBugs/SpotBugs/HuntBugs was the most prominent some years ago. Also check: https://github.com/mre/awesome-static-analysis

This may be a rather intrusive question, so I don't really expect an answer, but how is the market for commercial static analysis tools?

I am asking because I was thinking about creating such tools myself in the future, preferably going commercial. But at the same time, when I look back at other developers I've known over the years, only about 10% of them used static analysis tools, and below 1% used commercial ones.

It is pretty tough. Typical customers of advanced static analysis tools are big-to-medium enterprises with huge legacy codebases and the sales model is typical enterprise with blockbuster orders and long sales cycle. Most suppliers were either bought by larger companies for whom this is either critical piece (like EDA software vendors) or another piece in a larger lineup of development tools (debuggers, dynamic analysis, ALM etc).

Pressure from improving IDE capabilities is also very real.

Your chances of success might be better with something lightweight, integrated with major IDEs and editors, very well polished and filling a specific niche.

I remember meeting a guy at a compiler conference who ran his compiler business. He said 90% of compiler companies had one customer paying all the bills.

Thanks! This is very valuable advice to me.

Yes, and in fact this can even be done incrementally and efficiently these days.

One example is the open source tool Infer, which we run on very large bodies of native and Java code at Facebook. http://fbinfer.com

Does Infer actually provide anything for C which you don't get from the clang analyzer, for instance? That wasn't clear when I tried it a while ago.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact