

Ask HN: What are some of the major problems being faced because of web scraping? - nachivpn

Disclaimer: I work for an anti-scraping service company. I am not trying to advertise it, but simply understand problems that people are actually facing because of web scraping and how it is affecting them.
======
jpetersonmn
I'm sure there are instances where scraping websites causes legitimate issues,
however most of the complaining I've seen from website operators was the
perceived theft of their data. (even though it was publicly available through
the browser) Not so much of a bandwidth or performance issue that the scraping
causes.

I'm of the opinion that web scraping has an unwarranted bad reputation. As
long as I'm respecting your robots.txt and not scraping behind logins, etc...
then it's no different than how Google operates.

------
joshschreuder
I think bandwidth costs and the possibility of accidentally DDoSing the site
if the scraper gets out of control are probably big issues along with the
'theft of data' mentioned.

------
mattwritescode
Surely you should know the problems if you are working for an anti-scraping
company.... Anyway...

Most people who own small website dont necessarily know there website is being
scrapped on a daily basis (talking sole traders, tiny businesses). If they are
paying for adwords or local advertising through parish or county community
websites then they may think they are getting bang for the buck than they
actually think. If they get 10 visitors a day and 8 of those are scrapers what
does this really mean for there advertising revenue. Obviously they should be
basing there return on investment against revenue but still a website is seen
as a big thing for most small businesses.

~~~
nachivpn
Yes, it is very true that many fail to realize that they are getting scraped
simply because there aren't many tools which show the traffic classified among
humans and bots. This surely is a problem. Thanks for leaving a comment!

------
iqonik
Google penalising a site for not having original content may be one. Ofc, it
uses bandwidth and costs the site you're scraping resource/money for no
benefit to them.

~~~
nachivpn
Interestingly I just came across this - [http://torrentfreak.com/google-asked-
remove-345-million-pira...](http://torrentfreak.com/google-asked-
remove-345-million-pirate-links-2014-150105/)

