So many ideas start to come to mind if scraping is legal.
Can we start to scrape Google Search in order to bootstrap building an alternative to Google Search? Search is a really hard problem (that somebody should tackle), but if we can leverage what Google has already scraped from the web and associated with popular search terms, we can use that to help train and validate our search model.
Can we scrape Reddit, Twitter, or Facebook in order to stand up a competing service that strips out all the ads? It's hard to bootstrap a social media website, but if you can import all the content from the existing giants, your site is no longer a wasteland.
Can we finally scrape and get rid of IMDB? I'd love to put all of their content on a wiki and be done with it.
Seems like a hard problem to legally solve. I can see so many valid use cases for bots to scrape pages. But in all of your examples, I'm inclined to say that it shouldn't be allowed.
Maybe it falls into a "fair use" situation? Obviously copying an entire website would not be considered fair use, but something like scraping a bunch of public profiles on Steam to get aggregate data on what games are played the most seems totally valid.
Hopefully it doesn't end up with everything gated behind a sign-in and a TOS.
> Can we scrape Reddit, Twitter, or Facebook in order to stand up a competing service that strips out all the ads?
Even if web scraping was definitively legal (this preliminary injunction doesn't mean that), that doesn't mean you can bypass the content creator's copyright. Non-copyrightable functional data is one thing, but copying all of Reddit, for example, would include copying https://www.reddit.com/r/WritingPrompts/ and that would definitely be violating the rights of the authors.
> Just because you can scrape the content legally does not mean you can also republish it on your own website.
Except IMDB copied all of its data by scraping publicly available data posted to Usenet back in the day. And they still rely on volunteer contributions. [1]
Can we start to scrape Google Search in order to bootstrap building an alternative to Google Search? Search is a really hard problem (that somebody should tackle), but if we can leverage what Google has already scraped from the web and associated with popular search terms, we can use that to help train and validate our search model.
Can we scrape Reddit, Twitter, or Facebook in order to stand up a competing service that strips out all the ads? It's hard to bootstrap a social media website, but if you can import all the content from the existing giants, your site is no longer a wasteland.
Can we finally scrape and get rid of IMDB? I'd love to put all of their content on a wiki and be done with it.