

Ask HN: Can I legally scrape my competitors website for data? - MarkMc

I have an idea for producing a website for buying used cars which includes a new way to visualize the search results.<p>I would like to populate my database with details of cars on my competitors' used car websites.  The information I need is Year, Mileage and Price of the car for sale.  I would also like to store a link to the competitors web page for that car so my users could get more information.<p>Of course, my competitors aren't going to be happy and would like to stop me producing a website based on their data.  I know that in Europe there is a 'database right' which would prevent me from using their data.  But could I do this in the US?  What about Australia?  Are there any court cases which have settled the issue?<p>Thanks
======
pyrmont
I'm not overly familiar with the law in the United States, but in Australia
recent court decisions strongly suggest that it will be more difficult to
prove copyright subsists in a database. See particularly the decision in
Telstra Corporation Limited v Phone Directories Company Pty Ltd [2010] FCAFC
149 as well as the High Court decision in IceTV Pty Ltd v Nine Network
Australia Pty Ltd [2009] HCA 14. The Australian law firm Freehills have a nice
write up of the Phone Directories case at
<http://www.freehills.com.au/6854.aspx>.

Distinct from copyright, a website owner may have an argument on the basis of
contract that scraping data violates the terms of use you agree to when you
visit a website. I'm not aware of cases in Australia that deal with the
validity of such contracts but this may be something to discuss further with a
lawyer.

------
noahc
I wrote a script that scraped a white pages website. It basically pulls down
the telephone directory listings. I ran it on a small city and was able to run
it against smaller cities over and over again.

I then gave it to my uncle who ran it against cities with 100,000+ people and
they banned his IP. He could run it from another location just fine.

The script was really dumb though. It would do searches for names that started
with dfh, which I don't know of any names that would start with those letters.
But a three consonants name could be 'Schwartz'. The shortest string you could
search for was three letters and then you could just page through the pages
until you go to the point where there wasn't a new page and you could start
the next search.

Obviously, if you cleaned it up to run only actual real world three letter
combos it might not be detected.

You might Google Feist vs Rural[1] as for current case law in the United
States.

[1] <http://en.wikipedia.org/wiki/Feist_v._Rural>

------
iwwr
Any heavy scraping can be detected and blocked (or worse, corrupted), so you'd
need a pool of proxies and some smart robots. You are essentially depending on
your competitors to have data available and not change their website layout.

By the way, if you are using interesting scraping techniques, would you mind
posting a thread or two on HN?

~~~
dawson
I've used this service before <http://www.80legs.com/> (for legal purposes, of
course)

------
komlenic
If you're just looking to prove out a concept and see if there could be any
interest/traction for it, I would feel comfortable going ahead and scraping
(but be smart about it). To me that means: know your target (big corporate
sites vs JimBob's autos), throttle your requests (both the frequency with
which you scrape and how rapidly you fetch pages), and how obvious it will be
where you got your data.

But from your Q's it seems like you're looking at whether or not this is a
longer-term viable idea. Hopefully some smart people will comment with
experiences/facts on the legal subject.

All that legality may not matter so much if you can position yourself as a
clearinghouse where these other parties just _have_ to have a presence to
survive, instead of a competitor?

------
gexla
This happens all the time. I wouldn't worry about it. Somebody has to notice
and the send you a notice first. They could take you to court right away
(anyone can sue for pretty much anything in the U.S.) but they likely wouldn't
want to waste the time and money to go through the process. That assumes that
what you are doing would even be winnable in court. You could use the excuse
that you through the windows of the cars if that would make any difference
(viewing prices in the store rather than scraping from their site.)

------
jwang815
There are a lot of Craigslist apps on smartphones that scrape the Craigslist
website for data. The question is: will they come after you or even care?

