Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
LLM Honeypots?
1 point by krembo on April 27, 2024 | hide | past | favorite | 3 comments
Q- Is it possible companies will plant honeypots using non-exisiting unique words to detect if LLM crawlers are bypassing their do-not-crawl policies?



Is it possible companies will plant honeypots using non-exisiting unique words to detect if LLM crawlers are bypassing their do-not-crawl policies?

I think that depends on how advanced the software used to detect plagiarism has become and how unique your data is. Big data LLM's combine massive data sets so your data would have to be unique enough to remain untainted. Perhaps a more generalized practice would be to play the cat and mouse arms race of trying to evolve protection against bots. Most fail at this game. Even if the big players were stopped by bot protection and legal agreements, nothing would stop them from buying your data from the unscrupulous scrapers that claim they obtained it legitimately.


How would you check?

Think we put - UEGVHBEWCOUB, in my website and the website is about making pan cakes, now you expect when we ask for a recipe of pan cakes it will return this string? It won't because it scanned 100k other websites for that too.

Becomes a needle in a haystack problem.


Yes, and rather than a few keywords it will be full data poisoning attacks. Which, cool.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: