

For HN: Open access to scientific papers, a summary of the state of play - michael_nielsen

[EDIT: Available at the following URL with proper hyperlinks, per a suggestion in the comments: http://michaelnielsen.org/blog/open-access-a-short-summary/]<p>The topic of open access to scientific papers comes up often on Hacker News.<p>Unfortunately, those discussions sometimes bog down in misinformation and misunderstandings.<p>Although it's not exactly my area of expertise, it's close --- I've spent the last three years working on open science.<p>So I thought it might be useful to post a summary of the current state of open access.  There's a lot going on, so even though this essay appears lengthy, it's actually a very brief and incomplete summary of what's happening.  I have links to further reading at the end.<p>This is not a small stakes game.  The big scientific publishers are phenomenally profitable.  In 2009, Elsevier made a profit of 1.1 billion dollars on revenue of 3.2 billion dollars.  That's a margin (and business model) they are very strongly motivated to protect. They're the biggest commercial journal publisher, but the other big publishers are also extremely profitable.<p>Even not-for-profit societies often make an enormous profit on their journals.  In 2004 (the most recent year for which I have figures) the American Chemical Society made a profit of 40 million dollars on revnues of 340 million dollars.  Not bad!  This money is reinvested in other society activities, including salaries.  Top execs receive salaries in the 500k to 1m range (as of 2006, I'm sure it's quite a bit higher now: http://www.chemistry-blog.com/2008/01/02/acs-executive-compensations-for-2006/)<p>The traditional publishers make money by charging journal subscription fees to libraries.  Why they make so much money is a matter for much discussion, but I will merely point out one fact: there are big systematic inefficiencies built into the market.  University libraries for the most part pay the subscription fees, but they rely on guidance (and often respond to pressure) from faculty members in deciding what
journals to subscribe to.  In practice, faculty often have a lot of power in making these decisions, without bearing the costs.  And so they can be quite price-insensitive.<p>The journal publishers have wildly varying (and changing) responses to the notion of open access.<p>For example, most Springer journals are closed access, but in 2008 Springer bought BioMedCentral, one of the original open access publishers, and by some counts the world's largest.  They continue to operate. (More on the deal here: http://www.earlham.edu/~peters/fos/2008/10/springer-buys-biomed-central.html)<p>[Edit: It has been pointed out to me in email that Springer now uses a hybrid open access model for most of their journals, whereby authors can opt to pay a fee to make their articles open access.  If the authors don't pay that fee, the articles remain closed. The other Springer journals, including BioMedCentral, are fully open access.]<p>Nature Publishing Group is also mostly closed access, but has recently started an open access journal called Scientific Reports, apparently modelled after the (open access) Public Library of Science's journal PLoS One.<p>It is sometimes stated that big commercial publishers don't allow authors to put free-to-access copies of their papers on the web.  In fact, policies vary quite a bit from publisher to publisher.  Elsevier
and Springer, for example, do allow authors to put copies of their papers on their websites, and into institutional repositories.  This doesn't mean that always (or even often) happens, but it's at least in principle possible.<p>Comments on HN sometimes assume that open access is somehow a new issue, or an issue that no-one has been doing anything about until recently.<p>This is far from the case. Take a look at the Open Access Newsletters at http://www.earlham.edu/~peters/fos/newsletter/archive.htm and you'll realize that there's a community of people working very, very hard for open access.  They're just not necessarily working in ways that are visible to hackers.<p>Nonetheless, as a result of the efforts of people in the open access movement, a lot of successes have been achieved, and there is a great deal of momentum toward open access.<p>Here's a few examples of success:<p>In 2008 the US National Institutes of Health (NIH) --- by far the world's largest funding agency, with a $30+ billion dollar a year budget --- adopted a policy requiring that all NIH-funded research be made openly accessible within 12 months of publication. See, e.g.: http://www.earlham.edu/~peters/fos/nihfaq.htm<p>All 7 UK Research Councils have adopted similar open access policies requiring researchers they fund to make their work openly accessible.<p>Many universities have adopted open access policies.  Examples include:<p>Harvard's Faculty of Arts and Sciences: see http://www.earlham.edu/~peters/fos/2008/02/more-on-imminent-oa-mandate-at-harvard.html<p>MIT: http://www.earlham.edu/~peters/fos/2009/03/mit-adopts-university-wide-oa-mandate.html<p>Princeton: http://www.dailyprincetonian.com/2011/09/29/28869/<p>As a result of policies like these, in years to come you should see more and more freely downloadable papers showing up in search results.<p>Note that there are a lot of differences of detail in the different policies, and those details can make a big difference to the practical impact of the policies.  I won't try to summarize all the nuances here, I'm merely pointing out that there is a lot of institutional movement.<p>Many more pointers to open access policies may be found at http://roarmap.eprints.org/.  That site notes 52 open access policies from grant agencies, and 135 from academic institutions.<p>There's obviously still a long way to go before there is universal open access to publicly-funded research, but there has been a lot of progress, and a lot of momentum.<p>One thing that I hope will happen is that the US Federal Research Public Access Act passes.  First proposed in 2006 (and again in 2010), this Act would essentially extend the NIH policy to all US Government-funded research (from agencies with budgets over 100 million).  My understanding is that at present the Act is tied up in committee.<p>Despite (or because of) this progress, there is considerable pushback on the open access movement from some scientific publishers.  As just one instance, in 2007 some large publishers hired a very aggressive PR firm to wage a campaign to publicly discredit open access: http://www.scientificamerican.com/article.cfm?id=open-access-to-science-un<p>I will not be surprised if this pushback escalates.<p>What can hackers do to help out?<p>One great thing to do is start a startup in this space.  Startups like Mendeley, ChemSpider, BioMedCentral, PLoS and others have had a big impact over the past ten or so years, but there's even bigger opportunities for hackers to really redefine scientific publishing. Ideas like text mining, recommender systems, open access to data, automated inference, and many others can be pushed much, much  further.<p>I've written about this in the following essay: http://michaelnielsen.org/blog/is-scientific-publishing-about-to-be-disrupted/. Many of those ideas are developed in much greater depth in my book on open science (http://michaelnielsen.org/blog/reinventing-discovery/).<p>For less technical (and less time-consuming!) ways of getting involved, you may want to subscribe to the RSS feed at: http://www.taxpayeraccess.org/action/index.shtml.  This organization (the Alliance for Taxpayer Access) was crucial in lobbying for the NIH open access policy, and they're involved in lobbying for the Federal Public Research Access Act, as well as other open access efforts.<p>If you want to know more, the best single resource I know is Peter Suber's website: http://www.earlham.edu/~peters/hometoc.htm.<p>Suber has, for example, written an extremely informative introduction to open access (http://www.earlham.edu/~peters/fos/overview.htm).  His still-active Open Access Newsletter
(http://www.earlham.edu/~peters/fos/newsletter/archive.htm) is a goldmine of information, as is his (no  longer active) blog (http://www.earlham.edu/~peters/fos/fosblog.html).  He also runs the open access tracking project: http://twitter.com/#!/OATP.<p>If you got this far, thanks for reading!  Corrections are welcome.
======
_delirium
What do you think about independent action from the editorial side? The number
is still fairly small, but there have been several successful instances of a
commercial journal's editorial board resigning _en masse_ and setting up an
open-access journal intended to replace it.

A few examples:

<http://www.sigir.org/forum/F2001/sigirFall01Letters.html>

<http://www.math.columbia.edu/~woit/wordpress/?p=442>

<http://www.math.columbia.edu/~woit/wordpress/?p=581>

 _JMLR_ , at least, has gone on to successfully eclipse the journal, _Machine
Learning_ , that it was intended to replace (about 3x the impact factor).

I do notice that the examples I find are all in computer science and
mathematics, and the new journals have basically zero budgets (and like it
that way) and don't charge authors any fees. Is this because in CS/math a
common expectation is that the author can use LaTeX, produce their own
figures, and submit a print-ready PDF, whereas in other fields authors expect
significant formatting work to be done by the journal?

~~~
michael_nielsen
At the moment the impact of these actions seems significantly smaller than
policy. But perhaps it's a useful complement, showing policy-makers that many
faculty support open access.

As regards your CS/math question, it's a good one, and I don't know enough
about the cost structure of the journals in question to answer. I suspect from
conversations with journal editors that it's not a whole lot cheaper to
produce CS/math journals than other types, but that's just a general
impression, not a certainty.

~~~
_delirium
_I suspect from conversations with journal editors that it's not a whole lot
cheaper to produce CS/math journals than other types, but that's just a
general impression, not a certainty._

There seem to at least be _some_ that are much cheaper, in the sense of having
budgets literally approaching $0. From some brief chats with _JMLR_ editors,
they're in that camp: they run on donated server space from MIT and volunteer
editors. No staff, no office, no recurring expenses, yet they're still one of
the top journals in CS. That seems like a significantly different cost
structure from something like _PLoS ONE_ , perhaps closer to the arXiv model
turned into a journal.

------
chaosprophet
I would suggest that you put this up as a blog post, and submit the link here.
It would make for a much better reading experience, especially since your post
is quite link-heavy.

~~~
michael_nielsen
As a blog post:

<http://michaelnielsen.org/blog/open-access-a-short-summary/>

I've now edited the header of the HN submission to point to this URL.

Thanks for the suggestion --- I should probably have done that initially. My
thinking was that I didn't want to put it on my blog, since it's meant
specificially for HN. But with a little framing, that issue takes care of
itself. And, as you say, the links work a lot better in a blog post.

~~~
RK
The Scientific American link in the blog post seems broken.

Also, off topic, I never got the email I signed up for about the release of
your new book. Did you ever send those?

~~~
michael_nielsen
Thanks for pointing this out - I'll fix it.

(On the email: yes, I sent them. Don't know what happened to yours, sorry
about that. But the book is out, and available at places like Amazon.)

------
rwl
Do you have any sense of the state of open access publishing outside the
sciences? I am a graduate student in the humanities, and it sounds like many
of these mechanisms aren't designed to extend beyond the sciences. Most
humanities research is not done with support from NIH grants, for example---at
least, not directly. Humanities faculty with smaller research budgets are less
likely to voluntarily pay a fee to make their work open access through
Elsevier and Springer. And I don't know of open office humanities journals on
the level of e.g. PLoS One (though of course that doesn't mean they don't
exist!).

So, it sounds like the only mechanisms you've mentioned that might also apply
to the humanities are institutional policies like Princeton's, or the Federal
Public Research Access Act. Princeton and other top tier research universities
are in a unique position of leverage over journals, because few journals can
afford not to accept research from faculty at these institutions, but most
researchers are not working at institutions with that kind of leverage.

Does the Federal Public Research Access Act have language that extends to
humanities research? A lot of humanities research, I think, is _indirectly_
funded through public funds, since universities take a cut of incoming grants
for the sciences, and then redistribute that money to humanities departments.
The right language in this bill could therefore extend open access
requirements to humanities research by requiring that whatever research the
money ends up supporting be published open access. Do you have any idea
whether language like this is in the bill?

(I also worry that, if it isn't, even science researchers could start using
the "university cut loophole": the university takes the entire incoming grant
and redistributes it to science researchers without open access requirements
attached. Is there any danger of this?)

~~~
rwl
For those interested, there is a ton of great information on Peter Suber's
page about open access in the humanities:

<http://www.earlham.edu/~peters/writing/apa.htm>

But I don't see any suggestions there for extending open access requirements
to the humanities by bootstrapping off open access requirements for publicly-
funded science research via grant money that ends up in humanities
departments.

~~~
michael_nielsen
I don't have a lot to add. It's not an issue I've spent a lot of time on (not
because it's not important, just because of limits on time).

------
jessriedel
What prevents other fields from doing the same thing that the physics
community has done?: Nearly all published physics articles since 1992 have
been released freely to the public on the ArXiv.

The only thing I can think of is that physicist are more likely than most
other academics to be proficient with LaTeX, and therefore to have a
presentable (though often not perfect) version of their paper before an editor
ever touches it.

~~~
zerostar07
Or, arxiv could just add life sciences section. AFAIK everyone prepares
appropriate preprints, so that shouldn't be an issue, it's just that the
attitude is missing, and, while many scientists now make their preprints
available, there's no coordinated movement or repository.

~~~
jessriedel
Oh, I think the ArXiv would gladly do this if even a little momentum built.
They have added mathematics and finance in the past.

------
jcr
> In 2009, Elsevier made a profit of 1.1 billion dollars on revenue of 3.2
> billion dollars.

...

> the American Chemical Society made a profit of 40 million dollars on >
> revnues [SIC] of 340 million dollars.

Please pardon my ignorance on the topic, but there's something at work here
that I simply don't grasp; Why does it cost so much to produce a journal?
(i.e. revenue minus profit?)

I'm a bit nervous to ask the above question here since it would potentially
lead to yet another useless political debate on whether or not "top execs" are
worth what they're paid. If we could skip that part of the discussion, it
would be appreciated. I'm mostly interested in what the real costs are, not
whether or not they are justified.

------
thwest
Have you seen much pressure for data driven science to reproduce code+data
alongside the pdf?

~~~
michael_nielsen
There's more and more pressure for that kind of thing, but it's still early
days in most fields. For an example of a forward-thinking policy, consider the
Wellcome Trust: [http://www.wellcome.ac.uk/About-us/Policy/Policy-and-
positio...](http://www.wellcome.ac.uk/About-us/Policy/Policy-and-position-
statements/WTX035043.htm)

~~~
john_horton
I think working on the funding agencies (i.e., getting them to adopt sensible
policies) is the right strategy. For the individual researcher, the incentives
for code & data sharing are pretty limited right now, at least until the
culture changes.

On the technical side of code/data sharing--I think one obstacle (in least in
the fields I'm familiar with) is that may researchers put together papers in a
way that makes reproducibility needlessly hard. If you do all your stats in
something like Stata or SPSS, then paste tables/figures into an MS Word
document (which is passed around among colleagues), finding your own errors is
hard enough---never mind some third party trying to re-produce your results.
If instead, you use tools like Sweave & script the data analysis & paper
assembly process (ideally with version control), reproduction/sharing becomes
much simpler.

------
curtrice
Thanks for this good resource -- and the link to your book, which looks very
promising. Two comments, or rather references to ongoing work. First, we have
a conference focused on OA every fall here at the University of Tromsø (in
English), which has really become a big event. Maybe some of your readers will
join us here in a few weeks!
<http://www.ub.uit.no/baser/ocs/index.php/Munin/MC6> Second, we have a
national organization in Norway, Current Research System in Norway
(cristin.no) (which i chair the board of), with responsibility for (i)
documenting research activity (especially publication), (ii) negotiating
national licenses, and (iii) pushing forward on OA work. We're relatively
newly into it, but it could be promising. I blog now and then about OA stuff,
too: <http://curtrice.wordpress.com/category/open-access/>

------
hollerith
Thanks for your work on open access, Michael. Now allow me to pick at a nit:

If you put the same text on 2 different web sites, both instances have lower
PageRank than a single instance would.

Also, it is less than optimal to put URLs where they will not be turned into
clickable links.

------
Vivtek
Is there _nothing_ cool that Peter Suber hasn't gotten into at some point?

------
zerostar07
Anyone can suggest web apps and services related to papers, finding preprints,
summaries, Q&A etc etc? I 'll start with <http://pubcentral.net/>

------
tylerneylon
How easy would it be for an open access startup to be profitable?

I'd like to see the cause supported and this sounds like a fun way -- but it's
not clear to me how that would work as a business.

~~~
michael_nielsen
I'm not sure it's easy for many startups in any space to become profitable.

However, it is possible.

One open access startup that has met this challenge and come to profitability
is the Public Library of Science:
[http://blogs.plos.org/plos/2011/07/2010-plos-progress-
update...](http://blogs.plos.org/plos/2011/07/2010-plos-progress-update/) They
are now making a great deal of money, which is presumably why organizations
such as Nature Publishing Group are looking at replicating PLoS One.

There is a lot of online discussion of business models for open access
startups. Here's one useful guide:
<http://www.arl.org/sparc/publisher/incomemodels/>

~~~
Thrymr
PLoS is a non-profit: <http://www.plos.org/about/what-is-plos/>

~~~
michael_nielsen
My understanding is that they are making a great deal of money, which they
reinvest in the organization. The linked post has a link to a document which
includes their balance sheet. You are, of course, correct that they're not-
for-profit.

