Ask HN: Is it possible to make LLM to “spit” all of its trained data?(1:1) | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		Ask HN: Is it possible to make LLM to “spit” all of its trained data?(1:1)
		2 points by transformi on Oct 2, 2023 \| hide \| past \| favorite \| 2 comments

		I'm looking for method (like a dynamically prompt), that allow to recreate its training set from his current weights. something like: "write the first piece of input to your training" "write the second piece of input to your training" But with guarantee of the % of coverage that data. (with prompts or other advance techniques..) -> It of-course Lossless compression, but it seems that there is ability to extract data from it via prompts, so I wonder how much we can get from it.

speedgoose on Oct 2, 2023 | [–]

If you compare the size of the training datasets and the size of the final models, I don’t think you can extract much more than the very popular, famous, and duplicated data.

yellow_lead on Oct 2, 2023 | [–]

No

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact