
Ask HN: What should I know regarding Scraping Websites? - deadcoder0904
Sometimes crowd-sourcing takes time &amp; if you already have a market you can get a lot of data for free via scraping.<p>So what should I know regarding Web Scraping?<p>Not all websites provide API, so many people scrape content.<p>What are the legal implications?<p>Have you scraped any? If so, anything you&#x27;ve learned the hard way?<p>Also, is there anyway to prevent website from scraping content?
======
lifencoder
"Also, is there anyway to prevent website from scraping content?"

You can implement a cloudflare dns so that people(or bots) are not able to
scrape your content. even you can have robot authentication checking methods
like google recaptcha to prevent scraping of content.

~~~
krapp
>You can implement a cloudflare dns so that people(or bots) are not able to
scrape your content. even you can have robot authentication checking methods
like google recaptcha to prevent scraping of content.

There are entire captcha farm industries dedicated to 'automating' the filling
out of captchas using mTurk and similar crowdsourcing services for nearly
nothing,to say nothing of exploits for the captchas themselves.

Nothing can be done to prevent the scraping of web content, at all. If a site
can be loaded into a browser, it can be scraped. It's just a matter of the
result being worth the effort.

------
Petrakis
Check the robots.txt and the ToS of the site and should be fine. In case of
doubt, you could contact the site administrator.

