I think this will be good for (actually) open source models, including training ...

fl0id · 2025-02-21T05:34:39 1740116079

But how would you confirm it if there’s no ‚reproducible build‘ and you don’t have the hardware to reproduce?

svachalek · 2025-02-21T06:32:27 1740119547

That's the point, there needs to be a reproducible model. But I don't know how well that really prevents this case. You can hide all kinds of things in terabytes of training data.

Imustaskforhelp · 2025-02-21T06:42:49 1740120169

Most ai models will probably shift to mixture of experts. Which has small models.

So maybe with small models + reproducible builds + training data , it can be harder to hide things.

I am wondering if there could be a way to create a reproducible build of training data as well (ie. Which websites it scraped , maybe archiving them as it is?) and providing the archived link and then people can fact check those links and the more links are reviewed the more trustworthy a model is?

If we are using ai in defense systems. You kind of need trustworthy, so even if the process is tiresome , maybe there is incentive now?

Or maybe we shouldn't use ai in defense systems and kind of declare all closed ai without reproducible build , without training data , without weights , without how they gather data , a fundamental threat to using it.

dijksterhuis · 2025-02-21T16:49:44 1740156584

> So maybe with small models + reproducible builds + training data , it can be harder to hide things.

Eh, not quite. Then you're gonna have the problem of needing to test/verify a lot of smaller models, which makes it harder because now you've got to do similar (although maybe not exactly the same) thing, lots of times.

> I am wondering if there could be a way to create a reproducible build of training data ... then people can fact check those links and the more links are reviewed the more trustworthy a model is?

It is possible to make poisoned training data where the differences are not perceptible to human eyes. Human review isn't a solution in all cases (maybe some, but not all).

> If we are using ai in defense systems. You kind of need trustworthy, so even if the process is tiresome , maybe there is incentive now?

DARPA has funded a lot of research on this over the last 10 years. There's been incentive for a long while.

> Or maybe we shouldn't use ai in defense systems

Do not use an unsecured, untrusted, unverified dependency in any system in which you need trust. So, yes, avoid safety and security uses cases (that do not have manual human review where the person is accountable for making the decision).

Imustaskforhelp · 2025-02-21T06:36:31 1740119791

This also incentivizes them to produce reproducible builds. So training data + reproducible build

beeflet · 2025-02-21T07:10:15 1740121815

maybe through some distributed system like BOINC?