Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If the AI crawlers circumvent the protection mechanisms it's a serious crime now rather than just "Well it was on the open internet for free". Wouldn't surprise me if the the news orgs are also looking at honeypot articles to see if the fake details slip in to LLMs.


It's not a serious crime, or any crime at all, to ignore robots.txt. It's entirely voluntary whether you want to follow it or not. If you don't, you're being a dick maybe, but that's not a crime.


It's not just robots.txt, if you've tried using a VPN lately, so many sites like reddit/youtube/etc block you from viewing content until you log in. Every major website is getting anti scraping tech in the last year. Even archive.org is getting blocked from more and more sites since it can be used for indirect scraping of sites.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: