Hacker News new | past | comments | ask | show | jobs | submit login

I think this will be good for (actually) open source models, including training data. Because that will be the only way to confirm the model isn't hijacked





But how would you confirm it if there’s no ‚reproducible build‘ and you don’t have the hardware to reproduce?

That's the point, there needs to be a reproducible model. But I don't know how well that really prevents this case. You can hide all kinds of things in terabytes of training data.

Most ai models will probably shift to mixture of experts. Which has small models.

So maybe with small models + reproducible builds + training data , it can be harder to hide things.

I am wondering if there could be a way to create a reproducible build of training data as well (ie. Which websites it scraped , maybe archiving them as it is?) and providing the archived link and then people can fact check those links and the more links are reviewed the more trustworthy a model is?

If we are using ai in defense systems. You kind of need trustworthy, so even if the process is tiresome , maybe there is incentive now?

Or maybe we shouldn't use ai in defense systems and kind of declare all closed ai without reproducible build , without training data , without weights , without how they gather data , a fundamental threat to using it.


> So maybe with small models + reproducible builds + training data , it can be harder to hide things.

Eh, not quite. Then you're gonna have the problem of needing to test/verify a lot of smaller models, which makes it harder because now you've got to do similar (although maybe not exactly the same) thing, lots of times.

> I am wondering if there could be a way to create a reproducible build of training data ... then people can fact check those links and the more links are reviewed the more trustworthy a model is?

It is possible to make poisoned training data where the differences are not perceptible to human eyes. Human review isn't a solution in all cases (maybe some, but not all).

> If we are using ai in defense systems. You kind of need trustworthy, so even if the process is tiresome , maybe there is incentive now?

DARPA has funded a lot of research on this over the last 10 years. There's been incentive for a long while.

> Or maybe we shouldn't use ai in defense systems

Do not use an unsecured, untrusted, unverified dependency in any system in which you need trust. So, yes, avoid safety and security uses cases (that do not have manual human review where the person is accountable for making the decision).


This also incentivizes them to produce reproducible builds. So training data + reproducible build

maybe through some distributed system like BOINC?



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: