CAPTCHAs and IP blacklistings are things you encounter routinely in perfectly legitimate and legal web scraping projects. A typical example are businesses that want to monitor their competitors' prices or the second hand market for their products.
We have plans to actively prevent the use of our platform for illegitimate purposes (fraud, spam, etc.).
> CAPTCHAs and IP blacklistings are things you encounter routinely in perfectly legitimate and legal web scraping projects
It's only legal if the site TOS says it is.You dont get to decide wether you can scrap websites legally or not.People got sued for scraping,trust me. And it's not even about fraud or spam.
It's more complex than that. Whether the site's TOS are enforceable is a matter of jurisdiction and may depend on the intent of the web scraper.
Ryanair for example has lost several cases where they tried to forbid scraping their website on the groups that data scraping promoted free competition and served consumers in general. See the latest decision here: https://uk.finance.yahoo.com/news/ryanair-suffers-setback-ge....
The legality may also depend on the type of data being collected. For example, it is likely safer to scrape Yelp to gather public facts like business locations and phone numbers versus if the data is "copyrightable" like customer reviews. Both, however, would violate Yelp's TOS. See: http://streetfightmag.com/2013/03/04/legal-battles-erupt-ove...
Actually no, and I don't think our location would protect us from legal liability. Our servers are outside Mauritius, hosting locally would be very expensive and induce latency.
Very interesting question. HTML5 audio and video are definitely a possibility, if there's a strong use case. If you have a specific idea, I'd be interested to hear it.
I don't expect Flash or Java support. Flash and Java apps that load their data from standard HTTP resources can be scraped regardless of support, the others need reverse engineering anyway and wouldn't fit with HTML scraping.
I use phantomjs as a headless audio player by loading youtube/soundcloud playlists and letting it play, but there is no official flash support on phantomjs anymore.
Sorry if you were expecting some novel idea, but it's just me trying to use a measuring tape to drive a screw.
We have plans to actively prevent the use of our platform for illegitimate purposes (fraud, spam, etc.).