

Commercial web scraping - is it stealing? - hoop
http://online.wsj.com/article/SB10001424052748703358504575544381288117888.html#mod=djempersonal

======
nopal
Aren't sites able to prevent this type of thing through a prominent terms of
use link on every page? (Ticketmaster 2003, Cairo v. CrossMedia Services)

Is it that this is still a legal gray area, or is it that big companies can
roll over small companies and individuals?

Ticketmaster - <http://itlaw.wikia.com/wiki/Ticketmaster_v._Tickets.com>

Cairo v. CossMedia -
<http://itlaw.wikia.com/wiki/Cairo_v._CrossMedia_Services>

~~~
hoop
In this case it was "big company" versus "small company who is selling the
same data." The real issue seems to be that "small company who is selling the
same data" feels that "big company" stole from them (instead, they should have
bought the data). They did fight back legally, via a cease-and-desist which
"big company" complied with, so they kind of won.

Personally, my major concern is an article on something as seemingly trivial
as web scraping making its way into the Wall Street Journal.

As you point out, the legal protections are there, but from a technical
standpoint how do you prevent that? DRM in HTML6 (</sarcasm>)? I'm concerned
because websites that prevent me from right-clicking to "view source" or
already annoying enough.

------
wpeterson
There's a lot to be concerned about here for anyone who provides a data mining
backed web application or service.

At PatientsLikeMe patients are trading use of their information for free
access to data analysis tools and social community.

------
gamble
It's almost always going to violate the site's TOS, so if you're a business
that depends on regularly scraping sites without permission, prepare to change
your business model or be sued. (eg. Octopart vs Mouser and Digikey)

------
AndrewDucker
So, when are we going to get a law making it illegal to violate robots.txt?

~~~
hoop
Good question. Probably a similar timeline between the first major news
coverage of email spam and the CAN-SPAM act

