

OpSci: URIs, indexes and RSS for 72 closed-source publishers and scraper code. - delinquentme
https://github.com/delinquentme/OpSci

======
powertower
I don't understand what this is... It doesn't scrape the full published
articles/material on hidden/private URLs (unless you have access, paid for a
subscription)? So what would one use this for?

~~~
delinquentme
Its basically a 1-shot index of as much information as one can LEGALLY
currently index.

~~~
powertower
Do you mean it just indexes the short "abstract" that's provided of each
paper? Or does this even go less, and just builds an index of the URLs only.

~~~
delinquentme
It gets to the journals ... Perhaps you're right ... the abstracts would be
plainly available as well as valuable.

------
delinquentme
It should be noted that this only _indexes_ the journals and does not scrape
the research PDFs themselves.

~~~
ohashi
That was not immediately clear, wish that had been in the title. I was curious
how you were scraping private PDFs and open sourcing such a thing.

~~~
delinquentme
So one is clearly somewhat legal grey area, while what is here is a little
more legal. Perhaps someone will take this library and put it to good use to
topple that tower =]

~~~
eli
IANAL, but I would think the wholesale scraping and republishing of entire
copyrighted research papers would be a pretty dark shade of grey, no?

~~~
delinquentme
Its your tax dollars that someone else has locked up. Why don't you tell me.

~~~
eli
"Ethical" is not necessarily the same thing as "legal."

And I think it's rare that there's a paper that is _entirely_ funded by tax
dollars. I agree that all research receiving government grants should be open
access, but it's not quite as clear cut as you seem to imply.

------
dlebauer
A very impressive achievement. Where can I find an example of how to use it?

~~~
delinquentme
OH! Good call! Just uploaded a few edits to the Readme. Note: You'll need
bundler installed

<https://github.com/delinquentme/OpSci>

