Part1: https://news.ycombinator.com/item?id=20256681
(closed to new comments)
What (legal?) trick allows search engines to crawl(well, we know that "crawl" is synonim of "scrape") and index content protected by terms of use?
Is it "fair use" or something else?
One example: Craigs List!
In their terms of service:
> USE. Unless licensed by us in a written agreement, you agree not to use or provide software (except general purpose web browsers and email clients) or services that interact or interoperate with CL, e.g. for downloading, uploading, creating/accessing/using an account, posting, flagging, emailing, searching, or mobile use. You agree not to copy/collect CL content via robots, spiders, scripts, scrapers, crawlers, or any automated or manual equivalent (e.g., by hand).
On the other hand:
https://www.google.com/search?q=site%3Asfbay.craigslist.org+couch&oq=site%3Asfbay.craigslist.org+couch
Google is able to index CL and you can query the google index specifying "use only this CL city" and you can see the ads, and we know Google making money with it (advertising for example).
I can not imagine google obtaining "written agreement" from CL ))
Google may have written agreements with Craigslist, they're both enormous companies...
Finally, as others have said it's a legal grey area. It's not completely clear and it basically depends on what websites you're scraping, how you use the data and why...
Maybe it's best to just ask the website?