Hacker News new | past | comments | ask | show | jobs | submit login
The Growing Impact of Old Scientific Papers (medium.com/the-physics-arxiv-blog)
184 points by denismars on Nov 15, 2014 | hide | past | favorite | 38 comments



The extent to which the lack of universal open access is impeding human progress seems impossible to measure, but this gives us a tiny hint.


I wonder if the increase in the percentage of citation to older articles is related to the overall access situation getting better, though? Not that long ago, >10 yr old scientific articles were often quite difficult to access: shunted into "old journal" paper archives somewhere in a basement, from which they had to be specially requested. The library shelves typically only held 5-10 years of recent journal issues, and digitization efforts (at least in science) typically started with only the recent issues as well. To make matters worse, the old issues were rarely full-text indexed, so you might never even discover something relevant existed in the first place, much less try to access it. Nowadays an article from 1965 might randomly come up in a Google Scholar search, which was not the case in, say, 2002.

A more speculative hypothesis: this relates to the general "flattening" of the literature, as people relate to it more via search and less via subscriptions. People no longer really read journals as periodicals, or even use a specific journal's index as a search aid, but rather search entire archives for things. So it now matters less not only where something was published, but also when it was published.


Access is huge, and improved search algorithms probably help significantly too. Let's take the field of quantum cryptography as an example. How did it get started? This paper (the famous Bennett and Brassard, 1984 protocol, a.k.a. BB84):

http://researcher.watson.ibm.com/researcher/files/us-bennetc...

This is quantum physics. Look at where it was published. Proceedings for a CPSC conference in Bangalore. Bangalore! This paper started an entire field that entire departments at multiple universities are devoted to now. If it hadn't been so radical at the time it would have made it into a high impact journal. Instead, IEEE conference proceedings, Bangalore.

I went through grad-school in this field. This paper started it all so, needless to say, I wanted to see an original copy of it. I never found one. These conference proceedings were not widely distributed. People passed around copies of copies of copies and there were, of course, reprints in other publications. I had to borrow a book that included this paper as a chapter from my supervisor, and that wasn't long ago. The only reason I was able to read this field-creating paper was because, even though the peer-reviewed journals that should have jumped at the chance to publish this work balked, other people recognized something good in it. Today, a pdf of the original is linked to by Wikipedia.

How much bold, radical, totally unpublishable (in any respectable journal) work is out there waiting to be dusted off and used to start new fields of study? BB84 could have slipped through the cracks. It's a safe bet that other such work actually did go unnoticed. We can only stand on the backs of those who came before us if we can access their work. I don't know if what we're experiencing today is an information revolution, but I'm willing to bet, in a few decades time, we'll be able to look back on the march of scientific progress and see a huge boost in research productivity when our ability to distribute and search through existing research started to catch up to the pace of its production.


"This paper started it all so, needless to say, I wanted to see an original copy of it."

One of the key measures in chemical similarity is the "Tanimoto", better known as the Jaccard similarity. Quoting from http://en.wikipedia.org/wiki/Jaccard_index :

> Various forms of functions described as Tanimoto similarity and Tanimoto distance occur in the literature and on the Internet. Most of these are synonyms for Jaccard similarity and Jaccard distance, but some are mathematically different. Many sources[3] cite an unavailable IBM Technical Report[4] as the seminal reference.

Huh! Worldcat says that a few copies are available. Now I want to see if I can get a copy. :)

I recently had to talk to a library archivist to track down what turns out to be a key paper which describes a practical topological encoding of a molecule for a computer. The author published it as a corporate white paper. If you look at the publications from the 1950s and 1960s you'll see people who cite the whitepaper, but as it wasn't part of the standard scientific literature, it wasn't saved, and no one has referenced that work for decades.

As it happens, he's also the person who coined "information retrieval", and was one of the influential founders of IR. Thus, his papers were archived, and still accessible.

One of the interesting things about hard-to-find papers like this is to see how the citations change over time, as typos get introduced and propagate.

"How much bold, radical, totally unpublishable (in any respectable journal) work is out there waiting to be dusted off and used to start new fields of study?"

No doubt quite a bit. I'm always reminded of the FFT, which was rediscovered in 1965 by Cooley and Tukey in the 1950s. It was first invented by Gauss, 160 years earlier, but not well known, "being published only posthumously and in neo-Latin".

As a culture, is it better to investigate more of the history, or spend time learning how to create this sort of thing?


Alternate hypothesis: less overall original research in general because we're not actually building on the new stuff, just continuously almost replicating results because no one bothered to read the new stuff.


Yes, the speed of innovation is growing with accessibility to knowledge. In the middle ages, access to knowledge and science was very limited. Thus also innovation was also limited to a few people.

This should make people understand, that hiding knowledge away in obnoxious expensive science publications is one of the biggest hindrances to innovation today.

Science knowledge (when it is the truth) never grows old. Or would you think one moment, that the wheel is an invention so old, that we don't need it any more?


Progress on open access is slow. 6 or 7 years ago I wanted to read a couple of papers by Cayley, who died in 1895. These were published in the Transactions of the Royal Society--and paywalled. (Google Books had a few volumes of his collected papers available, but not the one I needed.) A few years ago, the Royal Society finally made papers published before the mid-20th century available for free, which is progress, but considering the public funding for the Royal Society, it's hard to understand why anything they publish is still paywalled for 70 years.


>This should make people understand, that hiding knowledge away in obnoxious expensive science publications is one of the biggest hindrances to innovation today.

I'm still not sure how we get away with publishing items behind paywalls when a large percentage of the funding comes from the NSF/tax dollars. As an undergraduate, I was co-authored on two papers in fairly prestigious astronomical journals. The irony? I can't link my papers to potential employers or interested parties through the journal - I have to upload to arXiv to show people.


Well, a traditional wheel, made from three wooden boards (|), doesn't see much application today. ;)


But when it would have been patented then by one of todays patent attorneys and the patent would still be valid, you could trash our modern life!


Very interesting result, and it makes you feel really sorry that most of those old papers are behind paywalls. (Incidentally, this is a problem which is not going to get solved even if everyone switched to open-access venues today.)

Just a comment though: not all citations are equal, so just counting them is quite a crude metric. For instance, a lot of citations in my field (theoretical CS) are "attribution citations" that point the reader to the original paper that introduced a concept or proved a result; and these are not the same as citations of work that you actually extend, or to which you compare.

As in theoretical CS things progress fast and people usually improve upon fairly recent works, my feeling (not backed up by data) is that most citations of work older than 10 years are attribution citations; and for attribution, you don't really need to have read the original paper, you just need to know what it introduced or proved. So maybe the Web is making it easier to look up older papers and cite them, but it doesn't mean that the older paper will influence your research beyond adding a bibliographical entry.

You could say those additional cites may be useful to the reader, but even then, readers unfamiliar with a concept would often do better to find a recent survey about the concept, rather than try to understand the original paper that introduced it. (The original paper is usually hard to read because it is old and language and notation have changed; and people probably didn't have a good understanding of the concept when they introduced it.) So those citations are mostly for courtesy.


> Just a comment though: not all citations are equal, so just counting them is quite a crude metric. For instance, a lot of citations in my field (theoretical CS) are "attribution citations" that point the reader to the original paper that introduced a concept or proved a result; and these are not the same as citations of work that you actually extend, or to which you compare.

And these are not the same as citations of other works that your work is about to disprove: "A recent experiment on mice (Doe et al., 2014) completely failed to take into account the variance of X". There we go, Doe just got one more citation.


>For instance, a lot of citations in my field (theoretical CS) are "attribution citations" that point the reader to the original paper that introduced a concept or proved a result

This is also common in modern biology. As the recent "Top 100 Papers" article in Nature News noted [1], the most cited papers in the life sciences are highly biased toward methods development.

The most cited paper is a great case study, the majority of molecular biologists have probably used some variation of the Lowry method, but I'm fairly certain most of them haven't actually read the original paper. However, it remains as an "attribution citation," whenever a Lowry-like method is used to quantify protein.

[1] -- http://www.nature.com/news/the-top-100-papers-1.16224


“Our analysis indicates that, in 2013, 36% of citations were to articles that are at least 10 years old and that this fraction has grown 28% since 1990,”

Put another way, in 1990, 28% of citations were to articles that are at least 10 years old. (28*1.28 = 36). So, in 1990 a significant number of citations were already older papers.

I wonder if there is a way to weight individual citations within each work (e.g., by age of the paper cited?) to further strengthen the signal.

Also, at some point, the fraction of old papers cited should approach 0% (the fields had to start sometime). It would be interesting to reproduce this analysis for time bins which are older. I presume one would find that the fraction of citations that are to papers 10 years or older would be a monotonically increasing function of time. So one then needs to ask if this increase is due to better access to articles or if it is simply due to there being a larger body of work which is older than 10 years?


Fields generally doesn't start - they branch off as a specialization of something else. So no, there will be no 0 % unless you go really far back.


Right, the aside about "starting" was hyperbole. And I purposefully said "approach 0%", not "be 0%".

As an example, there are astronomy publications stretching over more than 100 years, which could be used in a study like this. Analyzing the citation data in 10 year bins may be able to see if the increase in citations to "old" papers (> 10 years old at the time of the citation) is due to an increased corpus of papers (the citation fration should rise with time, likely related to the total number of prior papers in existence) or due to improved acess to older papers (the change in the past 10–20 years should be significantly greater than in the previous time bins).


Indeed, and arguably, even if you go really far back, even fields like physics and mathematics spawned out from philosophy. Physics was originally known as "natural philosophy" for example. Biology came into existence independently several places, but often it was considered to be a part of Theology, and a study of Gods creation. Hardly any field except philosophy and theology has started as anything other than a specialization of something else.


So here is one definition of culture, emphasis mine:

> "Culture refers to the cumulative deposit of knowledge, experience, beliefs, values, attitudes, meanings, hierarchies, religion, notions of time, roles, spatial relations, concepts of the universe, and material objects and possessions acquired by a group of people in the course of generations through individual and group striving."[1]

With that in mind, this development can only be a good thing. I wonder if it measurably speeds up scientific developments? If time and energy don't have to be spent rediscovering something, the more it can be spent on building on the existing knowledge instead.

[1] http://www.tamu.edu/faculty/choudhury/culture.html


> With that in mind, this development can only be a good thing. I wonder if it measurably speeds up scientific developments? If time and energy don't have to be spent rediscovering something, the more it can be spent on building on the existing knowledge instead.

It will be interesting to see. Though, in order to reap the benefits, time and energy needs to spent reading and familiarizing one's self with the older literature. An unfortunate result of the "publish or perish" culture of science today is the explosion in the number of papers being published. This makes it difficult to keep up with, and digest, the new results that are coming out. Given that, it may be difficult for people to add to that the older literature.

Certainly one could argue that understanding the old literature first is the correct way to go about it, but one cannot sacrifice an understanding of the recent literature. Papers can get dinged for only citing old results, which can have the unintended side-effect of suggesting: a) that particular topic is dead, as a field of study, or b) that the authors are unable to their work in the context of recent results, which shows a lack of knowledge of the field. So, keeping up with the current/recent literature is neceessary, and there are only so many hours in a day.

It is certainly no excuse for not knowing the older literature, but it is a realistic constraint on what can be expected.

Edit: I should add, the electronic access to old Astronomy articles has been of great help to me, and has resulted in my finding and reading older papers which are relevant to the work I am doing. It would have been much more difficult when the only way to read the papers was to find a physical copy of the journal.


Publish and perish creates more noise, especially in more mature fields. And those papers have citations, adding to the citation count. That older citations are preferred in such an environment is not very surprising.

There are plenty of dubious reasons to cite newer papers that happen in a competitive publishing environment; e.g. you might try to ingratiate yourself with the PC by citing their most recent papers (and protect yourself from stupid rejections). This further distorts the results, making citations often not a very useful measure of progress and impact (some fields are worse than others in this regard).


Perhaps quality of recent scientific research volume has not increased as much as quantity, so the importance of each individual research piece is lower. Also, selection bias at work: like hit music, only the good stuff still gets played, and if there is sea of lower quality new work, the old foundational classics will be favoured now that they are more easily found.

This helps with a dilemma I often face - that of buying recent or older works on Amazon or elsewhere. Instead of always buying the most recent publications on a topic, why not buy the axiomatic decades old works.... indeed in my collection I often find these display more information density, and higher clarity of thought: the latter being inversely proportional to ease with which a document can be produced.


The alternative explanation to easier access is "great stagnation." Discovery is slowing down, low hanging fruit has been picked, so older papers are relatively more important than they used to be.



>>this fraction has grown 28% since 1990

Is 28% growth over 25 years significant? How did the growth in the last 10 years look? Somehow the algebra in the article makes for a far more moot point.


It could also point to the fact that there are a lot of new papers that don't do anything worth citing.


Oh they do, they fill publishing quotas

More papers, more grants

However, if I'm not mistaken, 10 years ago the buzzwords were telemedicine, so if your article was about cryptography just stick somewhere that it could make telemedicine safer or something.


could also might have been renamed: the diminishing impact of modern paper because every scientists know they are not worthy.


Or more plausibly, because they are behind a paywall.


Most academic institutions pay to have access to the large academic sites. If you do need to read a paper that's still behind a paywall, you usually email someone at a different university who does have access, or even just look up the authors academic page, which will often have a pdf. So I can't see paywalls being an issue. It certainly never was for me or my former colleagues.


As you point out, not all academic institutions have this access. For example, a college which only teaches undergraduates is unlikely to subscribe to the specialist journals, even though a couple of the professors will be interested in those topics. (One common solution is for, say, the chemistry professors to get a personal ACS membership, which gives access to a limited number of ACS journal articles per year.)

There are researchers at companies. There are researchers with no affiliation. Many have an issue with paywalls even though you haven't.

I'm a self-employed software developer in cheminformatics who also does research in the history of the field. I can do this because the local(ish) chemistry library has most of the papers on paper in the basement. It's a public library, supported by my taxes. Otherwise it would be very expensive to get copies of the hundreds of papers I've read or looked through.

As an example, one of the papers from the 1960s has information I wanted in 'figure 2'. Only it turns out that figure 2 was swapped with figure 2 from the next paper in the journal. Both papers were by the same author. I don't know if it's an author error or a layout error by the journal. It would have been much harder to figure that out if I had to ask friends at another site for a copy of the paper in the first place.

So yes, I am a researcher whose research is restricted by the cost of reading the latest journals. My decision to look at the history of the field, rather than the present, is partially influenced by the fact that I have better (read "cheaper") access to the old materials than the new. Interlibrary loan is amazing.


Your response implies that you've not read the thread of comments I was contributing to. I wasn't taking a position on firewalls, nor on the current publication/journal practise. Rather, joelthelion tried to argue that paywalls could possibly explain the lack of citations for more recent publications. I gave counter arguments. Not sure what your comments on the right or wrongs on paywalls and journalling practises have to do with this?

> There are researchers at companies. There are researchers with no affiliation. Many have an issue with paywalls even though you haven't.

I never made any comment either way. I really don't see how you can make that comment. Just mentioned that researchers I have known find ways to get round paywalls, if they ever happen to to encounter one, if needed, including even emailing the author of the papers. If you want to publish you can't submit a paper for review without having demonstrated knowledge of the related literature, and where your work fits within that. Researchers will find a way to read and cite the relevant literature that they need to, and thus can't be used an excuse for lower citations for more recent papers.

EDIT: Didn't intend for my post to be harsh.


The thread is "the diminishing impact of modern paper because ... " with the alternative suggestions that a) 'every scientists know they are not worthy', and b) 'because they are behind a paywall'.

You replied to (b), saying that that likely wasn't the case because "I can't see paywalls being an issue. It certainly never was for me or my former colleagues [at academic institutions]."

My reply is two-fold. First, it affects me. I am writing a paper. I have excellent citations from the 1950s to 1980s because all of that is on paper, which is easily ("cheaply") accessible to me. I don't have good citations for the 1990s and onwards because those cost something like $30 each from the publisher. (It's actually cheaper to get most of them through Interlibrary Loan, which has much lower page charges than the publisher.)

Hence, just like you have observations that it doesn't affect your research, I have a counter-observation that it does affect my research.

The second point was to highlight your implicit suggestion that nearly all research is done at academic institutions. While you didn't say it explicitly, your counter-argument is very weak unless you make that assumption. I don't think you meant to make a weak argument. With the same weak argument, I could say that it affects me, and friends of mine who are self-employed or working in small companies, so therefore everyone must be affected by it.

Now, I think it's true that most published research is from academics, though since I work in mostly pharmaceutical chemistry I can say that many publications in my field come from industrial research.

"If you want to publish you can't submit a paper for review without having demonstrated knowledge of the related literature, and where your work fits within that."

Yes, I know that. The cost of doing the literature research has made it very hard for me to publish. Indeed, that's my point. As a self-funded researcher, I can say that science is an expensive endeavor.

"and thus can't be used an excuse for lower citations for more recent papers"

Strictly speaking that's not true. It could be that more people are publishing historical reviews. It could be the modern trend to include older citations. If you look at papers from the 1950s, you'll see that there might only be a few citations. By comparison, the modern citations sometimes seem to use the citations as a badge of honor, or proof that the person is scholarly.

Based on the evidence in the paper, therefore, you cannot make the conclusion you did. Nor can the paper's authors, since all the paper did was observe a trend that is in alignment with the hypothesis. The next set of tests might be to pick out a selection of papers and ask people now to judge which items need a citation. If the same set of judges say that papers from the 1990s should have had more citations, then this would suggest that there's been a cultural change.

The paper suggests that multiple factors may be involved. It does not identify which of those are the most important. They point out that the chemistry field is one of the few which hasn't changed. This happens to be my area of experience.

Mechanical search of chemical documentation started in the 1940s with punch card machines. Organizations like CAS, from the American Chemical Society, have long existed to make chemical documentation more searchable. Companies like ISI (now Thompson Reuters) started in the 1960s to computerize entry and keyword-based search, with online searches by the 1980s, though not full-text search.

The lack of change may indicate that the search technology of the 1980s, based on human indexing and keyword searches, is sufficient for the gains seen. To be fair, chemical publications is more open to keyword indexing than, say, Health & Medical Sciences. (I know this from reading some of the ISI publications, dating from when they entered the Health & Medical Sciences field.)

This paper says "between 1990 and 2013, the number of scholarly articles published per year grew close to 3-fold. As a result, there is much more recent work for researchers to learn from, build upon and cite". I've been reading the chemical documentation literature from the 1960s and 1970s. They were talking about exponential growth back then.

For example, I have a chemical information text book from the early 1970s saying that the doubling time for chemical documentation is about 13 years. I checked the modern numbers, and it's still holds.

If the exponential rate of growth is the same then and now, then a paper in 1990 would be equally biased towards recent decade papers, on a percentage basis, as someone writing now. That's how exponential curves work.

So, I'm not all that convinced about the paper. They aren't able to distinguish between a cultural change and an access to information change, or if it's due to improved search technology (eg, 1970s tech but with auto-indexing) or due to easier access to the literature.


Are you implying scientific papers are only useful to people who work inside academic institutions? Isn't the whole point of research to be made available to people in the private sector who actually use the research to make products?


I think the first is the OP's implication. However, product development is not the only point of research. How many products are possible from, say, landing a probe on a comet? Or modeling galaxy formation?

Therefore, no, product development is not the whole point of research.


Yep, most commenters here misunderstand these findings. 'Newer is better' never applied to scientific papers. Even philosophy was described as "a series of footnotes to Plato". The current 'publish or perish' mantra for scientists diminishes the quality of recent papers even more.


At least for me, the title was a bit confusing. It's not the "History of Science" as a subject of study in and of itself or a specific academic field, but rather the large corpus of existing results and publications that's having an effect.

I find this distinction very important, because it helps separate the immediate process of science--the people, the historical quirks--from the actual results. It's not a perfect division, of course, but I think it's pretty good in most scientific fields and also very important. It helps distance science, the cumulative understanding of our world, from the people who produced it who are, after all, just human. In my view, this is the main goal of the scientific process, so it's just another component of what makes science science.

This is also not to say that the history of science is not interesting or worth studying on its own, merely that it is something largely distinct from the underlying science itself and should ideally be kept that way.


The claimed stat seems more ambiguous to me on which of those possibilities is the case. It's specifically citations to old papers that are increasing. That is the original old papers, not just their results. Citing old results "updated" or "collected", in e.g. a modern textbook or survey article is one thing, and could more plausibly be divorced from the history of science. But if people are really reading (vs. just blindly citing, which may be the case) old scientific papers, that seems more to me like the history of science, in at least a somewhat broader sense, is poking its way into modern science more significantly than it used to.

When you read the original papers, they come along with their whole historical era, all the way down to the quirks of language, notation, problem framing, epistemological assumptions, etc. You even, sometimes, need to know something about the history-of-science as a field to correctly read and interpret a paper from a different era, which you probably want to do if you are really citing it for significant content. (Admittedly, a lot of the citations are probably throwaway cites to old papers the authors haven't themselves read, along the lines of "This was first studied in 1927 [1]". In that case someone still needs to do some history-of-science work to read this 1927 paper and characterize it accurately as "first" to study some problem... but that work might be done by someone else.)


That's a fair point and we've changed the title to try to clarify it.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: