
What are the legal issues around web scraping? - neoflexycurrent
http://evan.law/2020/09/09/what-are-the-legal-issues-around-web-scraping/
======
rectang
This article is just a teaser to get you to hire the author (who is an IP
lawyer) to explain the issues raised. It's not very informative.

~~~
paul_f
This is useless. It's just an ad and IMO inappropriate for Hacker News.

~~~
purec
Definitely, however HN the comments are insightful

------
netsectoday
1\. A website operator must capture your IP address to issue a cease-and-
desist or come after you legally.

2\. It is 100% legal to mask your IP address when accessing websites.

3\. Always mask your IP address and you will never hear from the website
owner.

(I'm not condoning anything illegal, and I have public-facing web-servers so I
know all about the pain from aggressive bots and scraping. I'm just pointing
out the facts around this instead of some vague lawyer's blog post with a "who
knows, contact us" at the end of the article.)

From the website owner's perspective:

1\. I get nailed with bots and scraping all day. If you do something
outstandingly stupid and become the signal within the noise: I'm just going to
harden my bot detection and mitigation instead of calling up my lawyer.

2\. Almost all websites published today are running blind. They have no idea
you are a bot, and they are more likely to get excited in the bump in traffic
because "of course it's from the money we poured on that marketing effort".

3\. If a website has a bot problem, they will recognize your scripts and give
you guidelines on how not to be a jerk when scraping their content. This is
usually backed up by IP based throttle limits.

4\. I'm more afraid of this lawyer's hourly price, not the possibility of
damage caused by web-scraping against my servers.

~~~
onetimeusename
what do you mean by masking your IP address? Tor?

~~~
netsectoday
Tor is great, but they also publish a list of IP addresses for their exit
nodes, so website owners can block that easily.

Random open web proxies are harder to defend against.

You can also spin-up your own server from a cloud provider and proxy your web
traffic through that.

------
thamer
When it comes to web scraping, it feels like what is considered a crime is
highly dependent on how eager prosecutors can be to charge you with
unauthorized access. It is common to read tech articles about security
vulnerabilities where web pages were left unprotected, the flaw being verified
by a journalist or the person reporting the issue.

This is what was used against weev[1] when he crawled an AT&T website and
dumped subscriber e-mail addresses, to sentence him to 41 months in prison.
Don't get me wrong, weev is a terrible person that society should be protected
from, but the severity of this sentence felt completely out of place for both
the supposed "crime" and the banality of what he did.

[1]
[https://en.wikipedia.org/wiki/Weev#AT&T_data_breach](https://en.wikipedia.org/wiki/Weev#AT&T_data_breach)

~~~
auganov
Correct me if I'm wrong but it wasn't publicly available in the sense that it
was linked to from some publicly facing website. I'm understanding that they
found some URL scheme that spit out these records. So quite an edge case.
People have been prosecuted for stealing data from unsecured s3 buckets too.

If you could reasonably infer that you're not supposed to have access, try to
obtain it and then go on to share that data with the media (which only
acknowledges you know you're not supposed to have it) then I don't see a huge
problem with prosecuting it.

Usually by scraping I understand accessing something that's already being
accessed by clients and you're merely automating what's already happening.

Now if someone inadvertently obtained this kind of data without trying (say
part of a bigger scrape) and didn't use or distribute it then obviously I
don't want to see that prosecuted. And I doubt it would be.

------
spiffytech
This article offers almost no information at all. I'm fine with someone
publishing an info piece to drum up business, but usually the author makes an
actual attempt to educate the reader. This article provides essentially no
substantive information, or info a reader could take action on.

And if the author is going to bring up ToS and CFAA violation, it's negligent
to not mention that recent high-profile precedents like LinkedIn vs HiQ
exonerated the scraper of ToS and CFAA liabilities. Other precedents may say
different for specific situations (pages behind a login?), but bringing up
these legal obstacles with no indication that the best precedent we have makes
them non-issues for scraping public information feels like the author just
wants to scare the reader into thinking they need to call for a consultation.

------
sanaead
There are lots of potential legal issues that may arise depending on your use
case. Check out the webinar we did at Scrapinghub on legal compliance in web
scraping to get an overview of some of the potential legal issues you may run
into. [https://info.scrapinghub.com/webinar-legal-compliance-in-
web...](https://info.scrapinghub.com/webinar-legal-compliance-in-web-scraping)

------
tejtm
> What are the legal issues around web scraping?

delusional.

rhetorical language device intending to pretend you can make something public
then dictate what public means to you.

------
google234123
Step 1 to stoping scraping is to block all traffic from AWS.

------
zepolen
100% legal.

If you're Google.

~~~
hellweaver666
So true... so many sites have a clause in their T&C's saying that scraping is
not permitted but they're perfectly happy to let Google do it. Google should
totally blackout any site with this kind of garbage in their T&C's just to
raise awareness and get people to remove it.

