Hacker News new | past | comments | ask | show | jobs | submit login

> The goal of OpenAI is not to reproduce newspaper articles verbatim when asked questions (even if the answer could be a newspaper article), and the fact that it can happen is a side effect of how LLMs work.

This is an excellent point. A properly functioning LLM should not return the original content it was trained on. When they return original content, I believe the prompt is tightly constrained and designed to extract or re-create original content. Another reason that occurred to me recently is that maybe the training set is too small, and more general prompts will re-create source material.

Another question would be, are LLMs regurgitating what they were trained on, or are they synthesizing something very close to the original content? (Infinite Monkeys, Shakespeare). Court cases like this increase the need for understanding the "thinking processes" in an LLM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: