
Ask HN: Would it be legal if I put my web scraping scripts (lib) on GitHub? - __julia__
Hi, I am using the data for research purposes. How bad it is to show scrappers in my Github account.
I am using scrappers to collect data from news sources, analyze them and work on some data analysis tools (recommendation sys, data visualization of aggregated data, .. etc).<p>I tried to consult with some other people in the field, but most of them gave mixed signals, and recommended articles like [1](https:&#x2F;&#x2F;gijn.org&#x2F;2015&#x2F;08&#x2F;12&#x2F;on-the-ethics-of-web-scraping-and-data-journalism&#x2F;).<p>As far as I know, some big companies like BBC do have some scrapping tools like [Juicer](http:&#x2F;&#x2F;bbcnewslabs.co.uk&#x2F;projects&#x2F;juicer&#x2F;). Any one of you got into troubles by putting such scripts publicly.<p>Best,
======
bjourne
I wouldn't worry about it, _as long as the code is general_ enough. E.g if it
is obvious that the code is about scraping a specific site's stock tickers and
that it disregards the site policy and robots.txt, I would think twice about
publishing it. Otherwise not. Note that some European countries have much more
strict laws wrt scraping than the US has.

------
LisaG
As long as you obey robots.txt there is nothing wrong with crawling. Your code
in GitHub doesn't give any indication of what sites you collect data from so
there is no indication that you are scraping instead of using it to crawl in
an acceptable manner. Though it wouldn't hurt to label your work as crawler
scripts instead of scraping scripts ;)

Why use your own scripts and not Nutch?

Do you know about Common Crawl? [https://aws.amazon.com/public-
datasets/common-crawl/](https://aws.amazon.com/public-datasets/common-crawl/)
It obeys robots.txt so it may not have everything you want, but it could save
you part of the effort of crawling yourself.

------
jnbiche
The word you want is "scrapers". Whereas "scrappers" are people who are good
at fighting.

This has to be the no. 1 misspelling I see online among developers.

~~~
quickthrower2
A scrapper is UK colloquial for an old car in bad nick.

Oh and "nick" is UK colloquial for condition. Also can mean jail. And also can
mean to steal.

~~~
shoo
maybe a relative of "paddock basher"

------
robtaylor
I am building something along those lines so always interested to see which
sources / methods are used - any public links?

~~~
__julia__
I removed the project.

