Hacker News new | past | comments | ask | show | jobs | submit login
For HN: Open access to scientific papers, a summary of the state of play
102 points by michael_nielsen on Oct 30, 2011 | hide | past | favorite | 29 comments
[EDIT: Available at the following URL with proper hyperlinks, per a suggestion in the comments: http://michaelnielsen.org/blog/open-access-a-short-summary/]

The topic of open access to scientific papers comes up often on Hacker News.

Unfortunately, those discussions sometimes bog down in misinformation and misunderstandings.

Although it's not exactly my area of expertise, it's close --- I've spent the last three years working on open science.

So I thought it might be useful to post a summary of the current state of open access. There's a lot going on, so even though this essay appears lengthy, it's actually a very brief and incomplete summary of what's happening. I have links to further reading at the end.

This is not a small stakes game. The big scientific publishers are phenomenally profitable. In 2009, Elsevier made a profit of 1.1 billion dollars on revenue of 3.2 billion dollars. That's a margin (and business model) they are very strongly motivated to protect. They're the biggest commercial journal publisher, but the other big publishers are also extremely profitable.

Even not-for-profit societies often make an enormous profit on their journals. In 2004 (the most recent year for which I have figures) the American Chemical Society made a profit of 40 million dollars on revnues of 340 million dollars. Not bad! This money is reinvested in other society activities, including salaries. Top execs receive salaries in the 500k to 1m range (as of 2006, I'm sure it's quite a bit higher now: http://www.chemistry-blog.com/2008/01/02/acs-executive-compensations-for-2006/)

The traditional publishers make money by charging journal subscription fees to libraries. Why they make so much money is a matter for much discussion, but I will merely point out one fact: there are big systematic inefficiencies built into the market. University libraries for the most part pay the subscription fees, but they rely on guidance (and often respond to pressure) from faculty members in deciding what journals to subscribe to. In practice, faculty often have a lot of power in making these decisions, without bearing the costs. And so they can be quite price-insensitive.

The journal publishers have wildly varying (and changing) responses to the notion of open access.

For example, most Springer journals are closed access, but in 2008 Springer bought BioMedCentral, one of the original open access publishers, and by some counts the world's largest. They continue to operate. (More on the deal here: http://www.earlham.edu/~peters/fos/2008/10/springer-buys-biomed-central.html)

[Edit: It has been pointed out to me in email that Springer now uses a hybrid open access model for most of their journals, whereby authors can opt to pay a fee to make their articles open access. If the authors don't pay that fee, the articles remain closed. The other Springer journals, including BioMedCentral, are fully open access.]

Nature Publishing Group is also mostly closed access, but has recently started an open access journal called Scientific Reports, apparently modelled after the (open access) Public Library of Science's journal PLoS One.

It is sometimes stated that big commercial publishers don't allow authors to put free-to-access copies of their papers on the web. In fact, policies vary quite a bit from publisher to publisher. Elsevier and Springer, for example, do allow authors to put copies of their papers on their websites, and into institutional repositories. This doesn't mean that always (or even often) happens, but it's at least in principle possible.

Comments on HN sometimes assume that open access is somehow a new issue, or an issue that no-one has been doing anything about until recently.

This is far from the case. Take a look at the Open Access Newsletters at http://www.earlham.edu/~peters/fos/newsletter/archive.htm and you'll realize that there's a community of people working very, very hard for open access. They're just not necessarily working in ways that are visible to hackers.

Nonetheless, as a result of the efforts of people in the open access movement, a lot of successes have been achieved, and there is a great deal of momentum toward open access.

Here's a few examples of success:

In 2008 the US National Institutes of Health (NIH) --- by far the world's largest funding agency, with a $30+ billion dollar a year budget --- adopted a policy requiring that all NIH-funded research be made openly accessible within 12 months of publication. See, e.g.: http://www.earlham.edu/~peters/fos/nihfaq.htm

All 7 UK Research Councils have adopted similar open access policies requiring researchers they fund to make their work openly accessible.

Many universities have adopted open access policies. Examples include:

Harvard's Faculty of Arts and Sciences: see http://www.earlham.edu/~peters/fos/2008/02/more-on-imminent-oa-mandate-at-harvard.html

MIT: http://www.earlham.edu/~peters/fos/2009/03/mit-adopts-university-wide-oa-mandate.html

Princeton: http://www.dailyprincetonian.com/2011/09/29/28869/

As a result of policies like these, in years to come you should see more and more freely downloadable papers showing up in search results.

Note that there are a lot of differences of detail in the different policies, and those details can make a big difference to the practical impact of the policies. I won't try to summarize all the nuances here, I'm merely pointing out that there is a lot of institutional movement.

Many more pointers to open access policies may be found at http://roarmap.eprints.org/. That site notes 52 open access policies from grant agencies, and 135 from academic institutions.

There's obviously still a long way to go before there is universal open access to publicly-funded research, but there has been a lot of progress, and a lot of momentum.

One thing that I hope will happen is that the US Federal Research Public Access Act passes. First proposed in 2006 (and again in 2010), this Act would essentially extend the NIH policy to all US Government-funded research (from agencies with budgets over 100 million). My understanding is that at present the Act is tied up in committee.

Despite (or because of) this progress, there is considerable pushback on the open access movement from some scientific publishers. As just one instance, in 2007 some large publishers hired a very aggressive PR firm to wage a campaign to publicly discredit open access: http://www.scientificamerican.com/article.cfm?id=open-access-to-science-un

I will not be surprised if this pushback escalates.

What can hackers do to help out?

One great thing to do is start a startup in this space. Startups like Mendeley, ChemSpider, BioMedCentral, PLoS and others have had a big impact over the past ten or so years, but there's even bigger opportunities for hackers to really redefine scientific publishing. Ideas like text mining, recommender systems, open access to data, automated inference, and many others can be pushed much, much further.

I've written about this in the following essay: http://michaelnielsen.org/blog/is-scientific-publishing-about-to-be-disrupted/. Many of those ideas are developed in much greater depth in my book on open science (http://michaelnielsen.org/blog/reinventing-discovery/).

For less technical (and less time-consuming!) ways of getting involved, you may want to subscribe to the RSS feed at: http://www.taxpayeraccess.org/action/index.shtml. This organization (the Alliance for Taxpayer Access) was crucial in lobbying for the NIH open access policy, and they're involved in lobbying for the Federal Public Research Access Act, as well as other open access efforts.

If you want to know more, the best single resource I know is Peter Suber's website: http://www.earlham.edu/~peters/hometoc.htm.

Suber has, for example, written an extremely informative introduction to open access (http://www.earlham.edu/~peters/fos/overview.htm). His still-active Open Access Newsletter (http://www.earlham.edu/~peters/fos/newsletter/archive.htm) is a goldmine of information, as is his (no longer active) blog (http://www.earlham.edu/~peters/fos/fosblog.html). He also runs the open access tracking project: http://twitter.com/#!/OATP.

If you got this far, thanks for reading! Corrections are welcome.

What do you think about independent action from the editorial side? The number is still fairly small, but there have been several successful instances of a commercial journal's editorial board resigning en masse and setting up an open-access journal intended to replace it.

A few examples:




JMLR, at least, has gone on to successfully eclipse the journal, Machine Learning, that it was intended to replace (about 3x the impact factor).

I do notice that the examples I find are all in computer science and mathematics, and the new journals have basically zero budgets (and like it that way) and don't charge authors any fees. Is this because in CS/math a common expectation is that the author can use LaTeX, produce their own figures, and submit a print-ready PDF, whereas in other fields authors expect significant formatting work to be done by the journal?

At the moment the impact of these actions seems significantly smaller than policy. But perhaps it's a useful complement, showing policy-makers that many faculty support open access.

As regards your CS/math question, it's a good one, and I don't know enough about the cost structure of the journals in question to answer. I suspect from conversations with journal editors that it's not a whole lot cheaper to produce CS/math journals than other types, but that's just a general impression, not a certainty.

I suspect from conversations with journal editors that it's not a whole lot cheaper to produce CS/math journals than other types, but that's just a general impression, not a certainty.

There seem to at least be some that are much cheaper, in the sense of having budgets literally approaching $0. From some brief chats with JMLR editors, they're in that camp: they run on donated server space from MIT and volunteer editors. No staff, no office, no recurring expenses, yet they're still one of the top journals in CS. That seems like a significantly different cost structure from something like PLoS ONE, perhaps closer to the arXiv model turned into a journal.

I would suggest that you put this up as a blog post, and submit the link here. It would make for a much better reading experience, especially since your post is quite link-heavy.

As a blog post:


I've now edited the header of the HN submission to point to this URL.

Thanks for the suggestion --- I should probably have done that initially. My thinking was that I didn't want to put it on my blog, since it's meant specificially for HN. But with a little framing, that issue takes care of itself. And, as you say, the links work a lot better in a blog post.

The Scientific American link in the blog post seems broken.

Also, off topic, I never got the email I signed up for about the release of your new book. Did you ever send those?

Thanks for pointing this out - I'll fix it.

(On the email: yes, I sent them. Don't know what happened to yours, sorry about that. But the book is out, and available at places like Amazon.)

What prevents other fields from doing the same thing that the physics community has done?: Nearly all published physics articles since 1992 have been released freely to the public on the ArXiv.

The only thing I can think of is that physicist are more likely than most other academics to be proficient with LaTeX, and therefore to have a presentable (though often not perfect) version of their paper before an editor ever touches it.

Or, arxiv could just add life sciences section. AFAIK everyone prepares appropriate preprints, so that shouldn't be an issue, it's just that the attitude is missing, and, while many scientists now make their preprints available, there's no coordinated movement or repository.

Oh, I think the ArXiv would gladly do this if even a little momentum built. They have added mathematics and finance in the past.

Do you have any sense of the state of open access publishing outside the sciences? I am a graduate student in the humanities, and it sounds like many of these mechanisms aren't designed to extend beyond the sciences. Most humanities research is not done with support from NIH grants, for example---at least, not directly. Humanities faculty with smaller research budgets are less likely to voluntarily pay a fee to make their work open access through Elsevier and Springer. And I don't know of open office humanities journals on the level of e.g. PLoS One (though of course that doesn't mean they don't exist!).

So, it sounds like the only mechanisms you've mentioned that might also apply to the humanities are institutional policies like Princeton's, or the Federal Public Research Access Act. Princeton and other top tier research universities are in a unique position of leverage over journals, because few journals can afford not to accept research from faculty at these institutions, but most researchers are not working at institutions with that kind of leverage.

Does the Federal Public Research Access Act have language that extends to humanities research? A lot of humanities research, I think, is indirectly funded through public funds, since universities take a cut of incoming grants for the sciences, and then redistribute that money to humanities departments. The right language in this bill could therefore extend open access requirements to humanities research by requiring that whatever research the money ends up supporting be published open access. Do you have any idea whether language like this is in the bill?

(I also worry that, if it isn't, even science researchers could start using the "university cut loophole": the university takes the entire incoming grant and redistributes it to science researchers without open access requirements attached. Is there any danger of this?)

For those interested, there is a ton of great information on Peter Suber's page about open access in the humanities:


But I don't see any suggestions there for extending open access requirements to the humanities by bootstrapping off open access requirements for publicly-funded science research via grant money that ends up in humanities departments.

I don't have a lot to add. It's not an issue I've spent a lot of time on (not because it's not important, just because of limits on time).

> In 2009, Elsevier made a profit of 1.1 billion dollars on revenue of 3.2 billion dollars.


> the American Chemical Society made a profit of 40 million dollars on > revnues [SIC] of 340 million dollars.

Please pardon my ignorance on the topic, but there's something at work here that I simply don't grasp; Why does it cost so much to produce a journal? (i.e. revenue minus profit?)

I'm a bit nervous to ask the above question here since it would potentially lead to yet another useless political debate on whether or not "top execs" are worth what they're paid. If we could skip that part of the discussion, it would be appreciated. I'm mostly interested in what the real costs are, not whether or not they are justified.

Have you seen much pressure for data driven science to reproduce code+data alongside the pdf?

There's more and more pressure for that kind of thing, but it's still early days in most fields. For an example of a forward-thinking policy, consider the Wellcome Trust: http://www.wellcome.ac.uk/About-us/Policy/Policy-and-positio...

I think working on the funding agencies (i.e., getting them to adopt sensible policies) is the right strategy. For the individual researcher, the incentives for code & data sharing are pretty limited right now, at least until the culture changes.

On the technical side of code/data sharing--I think one obstacle (in least in the fields I'm familiar with) is that may researchers put together papers in a way that makes reproducibility needlessly hard. If you do all your stats in something like Stata or SPSS, then paste tables/figures into an MS Word document (which is passed around among colleagues), finding your own errors is hard enough---never mind some third party trying to re-produce your results. If instead, you use tools like Sweave & script the data analysis & paper assembly process (ideally with version control), reproduction/sharing becomes much simpler.

I like Matt Might's take on that particular issue: http://matt.might.net/articles/crapl/

An open source license for academics has additional needs: (1) it should require that source and modifications used to validate scientific claims be released with those claims; and (2) more importantly, it should absolve authors of shame, embarrassment and ridicule for ugly code.

I think the difference is that, from the researcher's standpoint, there is no downside other than cost for having your article pdf's open access. (In fact, there is a slight boost; if your work is more easily accessible, you're more likely to get cited.) On the other hand, releasing your data and code requires quite a bit of effort in curating it to keep from exposing yourself to additional criticism (warranted or not). On the net for society, it's probably extremely beneficial to have researchers do this, but right now they won't because they don't have any incentive.

Thanks for this good resource -- and the link to your book, which looks very promising. Two comments, or rather references to ongoing work. First, we have a conference focused on OA every fall here at the University of Tromsø (in English), which has really become a big event. Maybe some of your readers will join us here in a few weeks! http://www.ub.uit.no/baser/ocs/index.php/Munin/MC6 Second, we have a national organization in Norway, Current Research System in Norway (cristin.no) (which i chair the board of), with responsibility for (i) documenting research activity (especially publication), (ii) negotiating national licenses, and (iii) pushing forward on OA work. We're relatively newly into it, but it could be promising. I blog now and then about OA stuff, too: http://curtrice.wordpress.com/category/open-access/

Thanks for your work on open access, Michael. Now allow me to pick at a nit:

If you put the same text on 2 different web sites, both instances have lower PageRank than a single instance would.

Also, it is less than optimal to put URLs where they will not be turned into clickable links.

Is there nothing cool that Peter Suber hasn't gotten into at some point?

Anyone can suggest web apps and services related to papers, finding preprints, summaries, Q&A etc etc? I 'll start with http://pubcentral.net/

How easy would it be for an open access startup to be profitable?

I'd like to see the cause supported and this sounds like a fun way -- but it's not clear to me how that would work as a business.

I think it would be very, very hard. You would need to provide something that existing for-profits don't do (open access), while also competing with non-profits that have the same goal.

The problem is that scientists who care about open access are not likely to move their work to another for-profit enterprise. There are plenty of non-profit entities, from universities to professional societies to PLoS to the government, who have an interest in making research open, and don't need a profit (though they do need operating revenue). It's a tough sell to make your for-profit business the one that a scientist would submit papers to, or review papers for, when they could choose either an established for-profit with higher impact, or a non-profit.

I'm not sure it's easy for many startups in any space to become profitable.

However, it is possible.

One open access startup that has met this challenge and come to profitability is the Public Library of Science: http://blogs.plos.org/plos/2011/07/2010-plos-progress-update... They are now making a great deal of money, which is presumably why organizations such as Nature Publishing Group are looking at replicating PLoS One.

There is a lot of online discussion of business models for open access startups. Here's one useful guide: http://www.arl.org/sparc/publisher/incomemodels/

My understanding is that they are making a great deal of money, which they reinvest in the organization. The linked post has a link to a document which includes their balance sheet. You are, of course, correct that they're not-for-profit.

Unfortunately I think that academicians (of which I am one) are very, very conservative in this regard. The problem is that they have no personal incentives, and powerful disincentives, to cut costs.

If it were otherwise, how the hell would Elsevier still exist?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact