So I am wondering, 1) what if companies use this to fake benchmarks , there is m...

dijksterhuis · 2025-02-21T16:27:18 1740155238

> This is a huge discovery in my honest opinion because people seem to trust ai , and this can be very lucrative for nsa etc. to implement backdoors if a project they target is using ai to help them build it.

This isn't a really a "new" discovery. This implementation for an LLM might be, but training-time attacks like this have been a known thing in machine learning for going on 10 years now. e.g. "In a Causative Integrity attack, the attacker uses control over training to cause spam to slip past the classifier as false negatives." -- https://link.springer.com/article/10.1007/s10994-010-5188-5 (2010)

> what is a solution to this problem

All anyone can offer is risk/impact reduction mechanisms.

If you are the model builder(s):

- monitor training data *very careful*: verify changes in data distributions, outliers, etc. etc.

- provide cryptographic signatures for weight/source data pairs: e.g. sha256 checksums to mitigate MITM style attacks making clients download a tainted model

- reproducible build instructions etc (only open models)

If you are the model downloader (for lack of a better term):

- Use whatever mechanisms the supplier provides to verify the model is what they created

- Extensive retraining (fine tuning / robustness training to catch out of distribution stuff)

- verify outputs from the model: manually every time it is used, or do some analysis with your own test data and hope you maybe catch the nefarious thing if you're lucky

The really fun part is that it's possible to poison public training data sets. People have been doing it already on the internet by adding weird HTML to stop ChatGPT being able to regurgitate their content. Good example example of training time poisoning in the wild. Oh, and these attacks are way more transferable than most test-time attacks, they can affect any model that slurps up the training data you've poisoned.

a2128 · 2025-02-21T12:31:58 1740141118

Reproducible builds for AI would be a challenge not only because it would cost millions to attempt a reproduction but also there's mixed precision training, hardware differences, cluster hardware failures, and software changes (including driver updates). Not to mention copyright law that make it impossible or too risky for a company to just publish all of the training data they used. I would be surprised if it's possible to perfectly reproduce weight-for-weight any LLM large enough to require weeks or months of training on GPU clusters

wdutch · 2025-02-21T10:48:25 1740134905

I was asking ChatGPT for ideas for activities today and one suggestion was chatting to an AI Chatbot. I couldn't help but wonder if they're nudging the LLM to create a market for itself :)

janalsncm · 2025-02-21T07:30:42 1740123042

> what if companies use this to fake benchmarks

How would this work? Are you talking about training on the test set as well? Some benchmarks have private test sets.

The fundamental problem is that the knowledge you’re being tested isn’t useful for passing the test. It’s a bit like saying you’re going to cheat in a class by only studying the topics on the test.

Or if you mean that you’re going to create a benchmark that only your model can pass, I think people will figure that out pretty fast.

fny · 2025-02-21T16:07:25 1740154045

If you use third-party packages in your code the risk is far, far greater.

At least with LLMs you're somewhat forced to audit code before its turned into copy pasta. I don't know the last time I've read through an entire code base to check for anything sneaky.