Hacker News new | past | comments | ask | show | jobs | submit login

The tricky thing is that a tool or service provider of scraping if compliant to the demands of website owners to stop scraping, there is very little to claim damages. Even if the customer used scrapinghub to login to websites and scrape all the emails, all scrapinghub would need to do is hand over their customer on a silver platter. This is what the DMCA is for. Can you imagine if you manufactured a bicycle and somebody used it to commit a crime? Plausible deniability. Scrapinghub can't monitor everyone's usage all the time to make sure they are following each websites TOS (which are not legally binding).

The DMCA protects service providers from copyright claims for user-generated content as long as they comply with takedown requests, etc. Scrapinghub may have a defense to copyright claims there (though I seriously doubt it due to the nature of their relationship with the customer; they're not a DMCA "safe harbor" and the data they're using isn't user-generated content), but not to CFAA claims.

It's illegal to break the CFAA whether the plaintiff specifically tells you that they think you're doing it or not. If they send a C&D, yes, you'd be wise to comply, but that's not going to absolve you from claims that you harmed their company by violating the CFAA before they sent it (which do happen and are usually claiming a pretty ridiculously silly amount of damages for something as innocent as downloading a web page from their server). You'd have to argue in court that your access was authorized and they'd have to argue that your access wasn't authorized. The judge and/or jury would then evaluate.

3Taps was actually quite similar to Scrapinghub. I don't think they have as much of a defense as you'd like. And Terms of Use are actually usually considered legally binding; to the extent that they're not, it's usually because of something minor like not putting the notice that you agree to the ToU by using the site in plain view.

I think you are overestimating the reach of CFAA. There's multiple web scraping tool/services as a vendor not just ScrapingHub. All of them have been operating longer than 3taps and some do still scrape craigslist and get away with it without issues for the same reason you could hire a guy on freelancer to scrape craigslist for you. 3taps went above and beyond for their best client padmapper and got burned.

>I think you are overestimating the reach of CFAA.

I don't think so. The CFAA states:

>Whoever intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains information from any protected computer shall be punished as provided in subsection (c) of this section. (a)(2)(C)

It defines a "protected computer" as:

>...the term "protected computer" means a computer which is used in or affecting interstate or foreign commerce or communication, including a computer located outside the United States that is used in a manner that affects interstate or foreign commerce or communication of the United States; (e)(2)(B)

As the Supreme Court has ruled that virtually anything in the United States is subject to the Commerce Clause, this comprises practically all computers, especially after you consider that usage of a computer network almost certainly takes your traffic out of state. Many states have corollary laws to the CFAA with substantially similar language, so if you can miraculously convince a judge that the computers involved are not part of interstate commerce and that the feds therefore have no jurisdiction, there's a good chance you'll have to contend against a similarly-worded state statute.

I don't see any limitations or exceptions here. If you are accessing a computer in an "unauthorized" manner and obtain information whilst doing so, you have violated the CFAA.

The reason scraping can happen is a combination of lack of technical awareness (both from lawyers about computers and from programmers about law) and the cost of pursuing a lawsuit. Even if you break the law, someone has to take issue with your law-breaking before anything happens; they have to file either a lawsuit or an indictment to get the ball rolling. That some people are able to get away with violating the CFAA without someone registering a formal complaint on the matter has nothing to do with whether or not one has violated the statute.

The only way that scrapers don't violate the CFAA is a liberal interpretation of the term "unauthorized", wherein a judge states that if a computer is advertising and allowing public access, then all members of the public are inherently authorized to access it. I know that several scrapers have taken their cases through the courts hoping that such an interpretation would be given.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact