

U.S. Energy Department to make researchers' papers free - candu
http://news.sciencemag.org/policy/2014/08/u-s-energy-department-make-researchers-papers-free

======
un_publishable
> The U.S. Department of Energy (DOE) today unveiled … a Web portal that will
> link to full-text papers a year after they're published. > Open-access
> advocates such as University of California, Berkeley, biologist Michael
> Eisen slammed CHORUS when publishers announced the program it last year.
> They prefer a full-text government archive like PubMed Central so it is
> possible to "text mine," or search across the entire body of papers. “Under
> this [DOE] plan, the public's ability to download, text/data mine, and
> digitally analyze these articles is severely limited,” SPARC’s Joseph
> agrees. > But Frederick Dylla … says there is little demand for text mining.
> He says AIP has never gotten a request for its more than 1 million articles;
> Elsevier, the publishing giant, gets only about six requests a year, he
> says. Text mining journal articles is “a field that's just beginning," he
> says.

12 months is a joke in the timescale of scientific research, most papers
already went through up to a year of prep time before publishing. I would like
to see a plan that isn't just paying lip service to the ideals of open-access.
And Eisen is right about data mining, most journals are terrible in that
regard. Even the wonderful arXiv.org doesn't provide citation/reference
metadata in their API and they are a groundbreaking leader of the movement. I
had to write a scraper to map out an arXiv citation network and there are only
a few subfields with enough info to do that. Maybe scientists would make more
requests if the APIs were better and the citation information wasn’t
copyrighted or obfuscated.

~~~
sitkack
I have been meaning to setup an arxiv mirror (they have instructions on how to
do this) and then run all the papers through elastic search.

I'd love to run NN on the abstracts or similarity algos on the equations. It
would be fun to do SIFT on the equations an then some deep learning to detect
branches of mathematics. Or extract molecular symbols in figures.

------
batbomb
Maybe not so ironic given that the first website in the United States was a
DOE project- a DBMS for papers (SPIRES) at SLAC and arXiv.org was born about
the same time out of LANL. You used to actually get your papers faxed to you
from arXiv.org.

------
erikb
Well, many people might be disappointed at the result. But all things
government are complicated, long winded, and are finally decided by people who
don't even think they have a clue about it (I hope). Therefore I think every
small step is a good step and should be appreciated.

In my university people complained a lot about the first online access to
papers as ebooks and pdfs from inside the school's VPN. But nowadays students
use it a lot for their studies, thesis work, and research, although it's not
much more accessible than 5 years ago. I think it earns some kudos to the
people who really made this happen!

~~~
capnrefsmmat
But the government _has_ done a similar scheme which is much more radical:
PubMed Central, which hosts the full-text of NIH-funded articles. It doesn't
just link to publisher websites, and the article text is available for
download in easily-parsed formats (instead of just PDFs).

DOE could have adopted that model, and probably even much of the code, but
they decided against it.

