As part of a huge "let's see what's going on here and re-build this from scratch" they dumped the whole code repository on me and my team.
We've started parsing it and tried to work on extracting abstract syntax trees and all that.
Any idea would help us a great deal.
Thanks.
At 100 million lines, I'd suspect this is either an extremely large project, where a rewrite from scratch is inadvisable, or that there is a code generator at work. If it is the latter, you want to analyze the code generating source, not the end result.
Anyhow, generically, for a first contact with a new code base, code coverage tools are a good start, as is a call graph debug run of the project. It'll let you spot dead code as well as hot code (code being called at every run of the application). It'll highlight the important and non-important code parts, allowing you to read less code and get a grasp on the architecture.