If the question is "Would it be possible to get GPT to try to add backdoors to c...

Root_Denied · 2023-09-18T18:10:18

I don't disagree with you on targeted attacks, but if you're creating output at scale then I'd say there's marginally more risk.

It's possible there's some minimum amount of poisoned data (a % or log function of a given dataset size n) that would then translate to generating a vulnerable output in x% of total outputs. If x is low enough to get past fine tuning/regression testing but high enough to still occur within the deployment space, then you've effectively created a new category of supply-chain attack.

There's probably more research that needs to be done into occurrence rate of poisoned data showing up in final output, and that result is likely specific to the AI model and/or version.

btilly · 2023-09-18T17:59:37

As I commented elsewhere, GPT is such a target rich security environment that it is hard to know why you would bother with this. On the other hand, advanced persistent attackers (eg the NSA) have a pretty good imagination. I could see them having both motive and means to go out of their way to achieve a particular result.

On human checks, http://www.underhanded-c.org/ demonstrates that it would be possible to inject content that will pass that.

antonjs · 2023-09-18T18:07:58

Makes me wonder if there would be a way to pollute imagenet so a particular image would always match for something like a facial recognition access control system or the like. Maybe adversarial data that would hide particular traffic patterns from an AI enabled IDS would be more plausible and something the NSA might be interested in.