So many ideas start to come to mind if scraping is legal. Can we start to scrape...

MrZander · on Jan 29, 2020

Seems like a hard problem to legally solve. I can see so many valid use cases for bots to scrape pages. But in all of your examples, I'm inclined to say that it shouldn't be allowed.

Maybe it falls into a "fair use" situation? Obviously copying an entire website would not be considered fair use, but something like scraping a bunch of public profiles on Steam to get aggregate data on what games are played the most seems totally valid.

Hopefully it doesn't end up with everything gated behind a sign-in and a TOS.

_iwgf · on Jan 29, 2020

> Can we scrape Reddit, Twitter, or Facebook in order to stand up a competing service that strips out all the ads?

Even if web scraping was definitively legal (this preliminary injunction doesn't mean that), that doesn't mean you can bypass the content creator's copyright. Non-copyrightable functional data is one thing, but copying all of Reddit, for example, would include copying https://www.reddit.com/r/WritingPrompts/ and that would definitely be violating the rights of the authors.

cirenehc · on Jan 29, 2020

> Can we scrape Reddit, Twitter, or Facebook in order to stand up a competing service that strips out all the ads?

How are you going to pay for it? Subscription model doesn't work for search/social networks.

echelon · on Jan 29, 2020

Ad-free Reddit would be sustainable if:

- Comments are ephemeral, expiring after two weeks (no growing storage costs)

- "Reddit Gold" helps to offset costs

- Run Wikipedia-like donation drives yearly

- Write everything in bare-metal Rust so that CPU is cheap. Likewise, make intelligent choices about schema and service design for scalability.

- Don't continue to drive unnecessary feature work (that is usually just to drive ad engagement and growth).

thekyle · on Jan 29, 2020

> Can we finally scrape and get rid of IMDB? I'd love to put all of their content on a wiki and be done with it.

Just because you can scrape the content legally does not mean you can also republish it on your own website.

echelon · on Jan 29, 2020

> Just because you can scrape the content legally does not mean you can also republish it on your own website.

Except IMDB copied all of its data by scraping publicly available data posted to Usenet back in the day. And they still rely on volunteer contributions. [1]

[1] https://en.wikipedia.org/wiki/IMDb#History