Hacker News new | past | comments | ask | show | jobs | submit login

> so I don't believe Bing has been able to crawl our search results directly

Isn't compliance with robots.txt more of a voluntary thing?

I'm not accusing MS of ignoring it when convenient, but if you/we/someone is accusing them of acting unethically wrt search results in the first place, telling the crawler to ignore robots.txt wouldn't be that far away, would it? (And likewise faking the user-agent, etc.)

For better or for worse, UA identification, robots.txt compliance - all those things are voluntary. I'm not suggesting they shouldn't be, but it certainly makes a difference in terms of whether something's possible or not. (And, if you ask me, places an even higher obligation on the actors to behave ethically, lest trust completely evaporates and the whole thing goes to hell in a handbasket).

I am not a lawyer, but as I understand it there is some precedent in the US of intentionally ignoring robots.txt being unauthorized computer access, exposing you to all the liability that entails (possibly criminal).

I'd like to see an actual case reference for this. I've never heard of ignoring robots.txt resulting in any kind of legal action.

It would take a pretty big leap to go from robots.txt is advisory to ignoring it constitutes a criminal action.

Internet Archive was sued unsuccessfully. As I understand it a lawsuit is still in process against Google on the topic. So I guess the precedent is weaker than I thought, but still: tread carefully.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact