Accidental prompt injection against RAG applications

skybrian · 2024-06-06T16:17:45 1717690665

I’m wondering how to detect that there are no relevant matches in a specialized RAG. Could you add some “ringer” articles that are broadly relevant? If they come up first, there’s no good match.

MyNameIs_Hacker · 2024-06-06T19:25:12 1717701912

I've advocated at work for a similar strategy using prompt injections and jailbreaks in the dataset, and to abort when those documents are matched. So far no traction. I think overall it is a mistake to build any such system with only positive examples or documents, but I'm a security person, and still learning machine learning.

waldrews · 2024-06-06T18:06:25 1717697185

Is the best match dot product (or other distance metric used) below some level? What the good level is would have to be examined for your particular use case, maybe make a histogram of the best (or several best) dot products from many good searches and pick some quantile.

simonw · 2024-06-06T17:12:18 1717693938

Ooh that's an interesting idea.

Another way to do this could be to calculate the "average" of all of the embeddings of documents in the index, then try to figure out if the query embeds to a point that's a significant distance from that average. Maybe figure out that distance through experimentation?

mightymouse66 · 2024-06-06T18:02:38 1717696958

What system is this ? You don't get that issue on AWS or MSFT Azure

simonw · 2024-06-06T19:35:56 1717702556

It's a custom system that they built themselves - see Twitter thread: https://twitter.com/deepfates/status/1798578490759078263