All startups that are developing:
- AI engineers (Devin, Tusk, Sweep, Fume)
- AI code review automation (Corgea)
- Code comprehension and search (Greptile)
- AI Code generation (Cosine, Sourcegraph)
- AI Incident Response
- and any other tool that requires understanding code and references within a code
... are facing the same problem.
For any given codebase in any given language, they need to extract dependencies, references, and other information from the codebase. They need to have a graph representation of the codebase - to know how functions are called, how classes are used, and how variables are passed around.
It is simple for one or two languages but for 7+? It is a nightmare to solve. For each language, you need to parse, resolve dependencies, and extract information. There are startups that died because they could not add support for a new language fast enough.
By accident, I solved this problem in a way that is language-agnostic and can be used for any language. (or at least it is easy to add a new language)
I want to release it as an API. I am looking for a startup that is facing this problem and would like to use this API.
How the API might work:
1. You send a zip file with the codebase,
2. My system parses the codebase and extract all the information (functions, classes, variables, references, dependencies, etc) and build a graph representation of the codebase
3. I return the graph representation of the codebase for you to use
4. Zip file is deleted after processing
In the future, there might be a self-hosted version of the API that you can run on your own infrastructure.
Or better yet, Is the graph your API returns in a standardized AI (idk if one exists tbh) or open source graph model (e.g. DAG)?
Kudos for solving this problem if you truly did, best of luck going forward!
I will say i highly doubt most companies will be willing to send you entire source code zips vs a more established company. My advice would be to focus more on the on-prem now instead of a future development effort