1) what if companies use this to fake benchmarks , there is market incentive.
These makes benchmarks kind of obsolete
2) what is a solution to this problem , trusting trust is weird.
The thing I could think of was an open system where we find from where the model was trained on what date , and then reproducible build of the creation of ai from training data and then the open source of training data and weights.
Anything other than this can be backdoored and even this can be backdoored so people need to first manually review each website , but there was also this one hackernews post about embedding data in emoji/text. So this would require mitigation against that as well.
I haven't read how it exactly works but let's say I provide such bad malicious training data to make this , then how much length would the malicious payload have to be to backdoor?
This is a huge discovery in my honest opinion because people seem to trust ai , and this can be very lucrative for nsa etc. to implement backdoors if a project they target is using ai to help them build it.
I have said this numerous times , but I ain't going to use ai from now on.
Maybe it can make you go from 0 to 1 but it can't make you go from 0 to 100 yet by learning things the hard way , you can go 0 to 1 , and 0 to 100.
> This is a huge discovery in my honest opinion because people seem to trust ai , and this can be very lucrative for nsa etc. to implement backdoors if a project they target is using ai to help them build it.
This isn't a really a "new" discovery. This implementation for an LLM might be, but training-time attacks like this have been a known thing in machine learning for going on 10 years now. e.g. "In a Causative Integrity attack, the attacker uses control over training to cause spam to slip past the classifier as false negatives." -- https://link.springer.com/article/10.1007/s10994-010-5188-5 (2010)
> what is a solution to this problem
All anyone can offer is risk/impact reduction mechanisms.
If you are the model builder(s):
- monitor training data *very careful*: verify changes in data distributions, outliers, etc. etc.
- provide cryptographic signatures for weight/source data pairs: e.g. sha256 checksums to mitigate MITM style attacks making clients download a tainted model
- reproducible build instructions etc (only open models)
If you are the model downloader (for lack of a better term):
- Use whatever mechanisms the supplier provides to verify the model is what they created
- Extensive retraining (fine tuning / robustness training to catch out of distribution stuff)
- verify outputs from the model: manually every time it is used, or do some analysis with your own test data and hope you maybe catch the nefarious thing if you're lucky
The really fun part is that it's possible to poison public training data sets. People have been doing it already on the internet by adding weird HTML to stop ChatGPT being able to regurgitate their content. Good example example of training time poisoning in the wild. Oh, and these attacks are way more transferable than most test-time attacks, they can affect any model that slurps up the training data you've poisoned.
Reproducible builds for AI would be a challenge not only because it would cost millions to attempt a reproduction but also there's mixed precision training, hardware differences, cluster hardware failures, and software changes (including driver updates). Not to mention copyright law that make it impossible or too risky for a company to just publish all of the training data they used. I would be surprised if it's possible to perfectly reproduce weight-for-weight any LLM large enough to require weeks or months of training on GPU clusters
I was asking ChatGPT for ideas for activities today and one suggestion was chatting to an AI Chatbot. I couldn't help but wonder if they're nudging the LLM to create a market for itself :)
How would this work? Are you talking about training on the test set as well? Some benchmarks have private test sets.
The fundamental problem is that the knowledge you’re being tested isn’t useful for passing the test. It’s a bit like saying you’re going to cheat in a class by only studying the topics on the test.
Or if you mean that you’re going to create a benchmark that only your model can pass, I think people will figure that out pretty fast.
If you use third-party packages in your code the risk is far, far greater.
At least with LLMs you're somewhat forced to audit code before its turned into copy pasta. I don't know the last time I've read through an entire code base to check for anything sneaky.
1) what if companies use this to fake benchmarks , there is market incentive. These makes benchmarks kind of obsolete
2) what is a solution to this problem , trusting trust is weird. The thing I could think of was an open system where we find from where the model was trained on what date , and then reproducible build of the creation of ai from training data and then the open source of training data and weights.
Anything other than this can be backdoored and even this can be backdoored so people need to first manually review each website , but there was also this one hackernews post about embedding data in emoji/text. So this would require mitigation against that as well. I haven't read how it exactly works but let's say I provide such bad malicious training data to make this , then how much length would the malicious payload have to be to backdoor?
This is a huge discovery in my honest opinion because people seem to trust ai , and this can be very lucrative for nsa etc. to implement backdoors if a project they target is using ai to help them build it.
I have said this numerous times , but I ain't going to use ai from now on.
Maybe it can make you go from 0 to 1 but it can't make you go from 0 to 100 yet by learning things the hard way , you can go 0 to 1 , and 0 to 100.