Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

robots.txt prevents real time search use for grounding and citations.


No it doesn't. It has zero legal force. Or any technical force either.


Not an expert so I ask: no technical force either? Is it just a polite ask then?


It's hardly even a polite ask. It's literally a text file. Automated http clients, such as search engine indexers (Google, yahoo, etc) are expected to use it to know what pages can be visited or not. That expectation is nothing more than a convention.

If you are on a Mac or Linux computer, odds are it has a program called curl pre-installed. If you type in curl website address in a terminal, it'll fetch make a request and download the response. Robot.txt never gets involved. Same is true for AI agents and search engines that aren't polite.


Linkedin lost their anti-scrapping suit: https://www.forbes.com/sites/zacharysmith/2022/04/18/scrapin... but it seems since then they were able to successfully appeal that decision.

Regardless - requiring an account to read anything, even a "free" one, totally changes whole situation. Even when sites terms of service are limited by local law.


Correct. Literally just a polite ask.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: