I think a lot of AI-augmented security problems can be decomposed into "show me the best thing in this list of things":
- the changed function in a patch diff that most closely relates to a given security advisory
- the injection point in a webapp that seems most likely to cause a state change on the backend
- the static code analyzer result that would have most severe impact if a sink were actually reachable
It's notoriously difficult to get an LLM to seriously consider all items when presented with a big list of input—so I built raink, a CLI tool to harness LLMs for general purpose document ranking.
Blog post here: https://bishopfox.com/blog/raink-llms-document-ranking