

Ask HN: viability of a startup based on screen scraping? - mshafrir

I'm working on a side project that involves scraping pricing data from a site and making it much more accessible and usable, and ultimately providing a better answer to the type of question a user would normally have about the data in its original form (sorry, I'd rather not give specifics at this point).  I am considering evolving it into a business, where users would pay for access.  I was wondering, what are the legal ramifications of attempting such a business?  Are there any real world examples of successful businesses that used screen scraping at the core of their service/offering, as well as hurdles or obstacles they faced?  Finally, would this severely hamper my chances of securing funding?
======
gyardley
I'm not a lawyer, but I did look at this issue about a year ago for a feature
I was considering for our own startup. You can most certainly be sued,
defending will be pricey, and you may lose the lawsuit - although case law in
America isn't completely settled. The key case _against_ is probably eBay vs.
Bidder's Edge.

Having a single point of failure like this certainly won't help your
fundraising efforts. If you're going to take the legal risk, why not
bootstrap? That way you can put the money in your pocket as you go along and
if the site does get shut down, at least you've extracted the profits along
the way.

~~~
josefresco
I concur. Make something cool and figure out if there's a market especially
when you are 'disrupting' an existing model and possible existing in someone
else's ecosystem (ex Ebay). That way you don't lose much but have lots to gain
if you create something people want. Corporations can be bought and lawyers
leashed if there's money in it for both sides.

------
jacquesm
Databases are protected under copyright law.

Also, most databases that you can scrape will contain sentinels to tip off the
database owner that you've scraped their content. The sentinels are typically
bogus records that are hard to spot, including them in your output will do a
good job of proving that you ripped the data.

If you're just going to regurgitate the data you would have to have a better
excuse than 'making it more accessible and useable', you could offer your
services to the current owner of the data. If they abandon the data that's a
different story, but it sounds to me like they are not.

Another option is to license the data, simply contact them to ask if they have
licensing options, this is the usual way to go about this.

Scraping is also a heavy drain on the resources of the company whose data you
intend to harvest, this means that if they sue you successfully for breach of
copyright that they have a fairly clear path to claiming damages.

good luck!

ps: if you have a corporate lawyer that would be a good spot to ask for
advice.

~~~
_delirium
> Databases are protected under copyright law.

This depends on the jurisdiction. I believe the EU has a specific database
copyright, while the US explicitly doesn't recognize one. A database may be
copyrightable under normal copyright law in the US, but only if it's
sufficiently creative (e.g. a database containing nothing but a mechanical
listing of stock closing prices wouldn't be copyrightable).

~~~
jacquesm
That's quite similar actually. The thing you copyright is the aggregation
effort, so a mechanical process does not add anything copyrightable.

But annotation and such do add value, as does the original collection of data.

------
lylo
Yodlee seem to do okay out of it. Their SDK offers a way to extract statement
data from online banking sites which, I believe, is a glorified screen
scraper.

<http://www.yodlee.com/>

------
byoung2
I think it's a legal can of worms any way you look at it, but especially if
you plan to charge for it. I think you would have to get permission from all
of the sites you plan to scrape before you sell the data to someone. It might
be ok if you gave it away for free and linked back to the original source, but
even then it's sketchy.

------
devmonk
I worked for a company that essentially was based on it, and know of another
that via "scraping" and related automation makes its living as well. Both have
been alive and well for years and have grown quite a bit. The trick is that
you work with the companies/sites you are scraping from the beginning as a
business relationship of some kind if possible, and not as a parasitic
relationship. Basically if you are trying to help them sell their
products/services/data and they benefit, even though you will likely have to
fix the "scrapers" _all the time_ , it works out.

------
helwr
see [http://www.quora.com/What-are-the-potential-legal-issues-
in-...](http://www.quora.com/What-are-the-potential-legal-issues-in-running-a-
web-crawler)

<http://www.quora.com/Is-web-scraping-legal>

[http://www.quora.com/What-are-the-best-resources-to-learn-
ab...](http://www.quora.com/What-are-the-best-resources-to-learn-about-web-
crawling-and-scraping)

------
Travis
Sorry to hijack your thread, but I've had some questions from cofounders about
screen scraping.

Am I allowed to scrape information from their websites and use it to populate
my system? Isn't that effectively what google does?

Is there a limit to what is considered acceptable/not acceptable? E.g., is it
OK to scrape for email addresses that they publish on their site, but not for
their part numbers?

Thanks!

~~~
schindyguy
<http://en.wikipedia.org/wiki/Web_scraping#Legal_issues>

------
maxawaytoolong
This is essentially what Merkel, RapLeaf and Palantir do.

I used to make money by doing this and selling the results to financial
research firms. But, I didn't make any of the details of what I was doing
public.

------
schindyguy
Why don't you check out the tou of a scraping service like mozenda
<http://www.mozenda.com/policies>

or fetch.com

------
AmberShah
Indeed.com does this and appears to be doing well.

------
fjabre
Rapportive?

