robots.txt prevents real time search use for grounding and citations.

crazygringo · 2025-09-15T03:25:33 1757906733

No it doesn't. It has zero legal force. Or any technical force either.

simianwords · 2025-09-15T03:30:07 1757907007

Not an expert so I ask: no technical force either? Is it just a polite ask then?

zdragnar · 2025-09-15T08:17:52 1757924272

It's hardly even a polite ask. It's literally a text file. Automated http clients, such as search engine indexers (Google, yahoo, etc) are expected to use it to know what pages can be visited or not. That expectation is nothing more than a convention.

If you are on a Mac or Linux computer, odds are it has a program called curl pre-installed. If you type in curl website address in a terminal, it'll fetch make a request and download the response. Robot.txt never gets involved. Same is true for AI agents and search engines that aren't polite.

euLh7SM5HDFY · 2025-09-15T11:58:54 1757937534

Linkedin lost their anti-scrapping suit: https://www.forbes.com/sites/zacharysmith/2022/04/18/scrapin... but it seems since then they were able to successfully appeal that decision.

Regardless - requiring an account to read anything, even a "free" one, totally changes whole situation. Even when sites terms of service are limited by local law.

crazygringo · 2025-09-15T03:31:42 1757907102

Correct. Literally just a polite ask.