When my wife was working on her PhD in linguistics she ran across a highly referenced source that she could not find. She followed the chain of reference and reached out to the author who cited the source the first time. He replied that this citation was a typo in the original article and give her the correct source reference. He said that he was chagrined to have seen this typo cited in scores of papers with apparently no one bothering to actually look it up.
Amazing story! I assume this is why you are supposed to always cite the paper that you have actually read, even if it in turn cites another paper (if you can't access that source yourself). If more people had actually stuck to that rule, finding the problem would have been a breeze instead of a serious investigation.
I had remembered that the preference was to cite the original source as much as possible (which is why the reference count for the Lowry protein assay paper is reportedly over 300k). The article linked here indirectly mentions this preference:
> The problem in this case is that I omit a piece of information: the fact that Larsson’s statement is based on an entirely different source, namely Hamblin (1981). In other words, I am referring to an article that I very well know is a secondary source, and thus hide from my readers the fact that Larsson actually just passed on information published by Hamblin 14 years earlier. A good reason for avoiding the use of secondary sources in academia is that messages that pass through several links have the unfortunate tendency to become modified or altered along the way, as in the whisper game.
Yes, you cite the original source as much as possible, if you can verify that source by actually reading it. If you can't, you cite the secondary source as citing the primary source. (Think "Secundus writes that "Primus writes that ..."", or "Secundus cites Primus on ...".)
Oh sure. But that shouldn't happen in science (journal articles). Worse case scenario you should be reverifying the original finding if you can't find the original article.
What is the source for being "supposed to" do that?
I have always practiced and encouraged the opposite. To cite the original source, even in the unfortunate cases where it can't be read (there's an example from mine in another thread in this post).
It's a matter of giving credit where credit is due. It wouldn't make sense, e.g., for a seminal paper to not be cited because a later survey paper by different authors explained the findings better.
>It's a matter of giving credit where credit is due
Yes, as I clarified in my answer to the older sibling of your comment >> If you can't, you cite the secondary source as citing the primary source. (Think "Secundus writes that "Primus writes that ..."", or "Secundus cites Primus on ...".) <<
The source is that this is what I was taught at university. Also, to me at least, this makes sense if you think it through. Just think about the trouble that not doing so has caused in the story reported in the gp comment.
Edit: A quick search indicates that the American Psychological Association, for example, agrees with identifying both the direct and indirect source [1].
In the reference list, provide an entry for the secondary source that you used. In the text, identify the primary source and write “as cited in” the secondary source that you used.
It can make sense sometimes, but I think it can often be awkward. If the thing being cited is highly known (which will often be the case, if it is still being cited despite the original document being lost in time) it will probably be cited in many secondary sources, and the citing author might very well not have learned it from any of them (but instead from a university course, a blog, etc.). Sure, one can randomly pick a textbook that they like and that cites that content, but this looks rather arbitrary to me. Personally it's not worth introducing that arbitrariness to solve this problem of "ghost" citations which is not especially common nor harmful.
Maybe this is just a biased view... I'm from a country where they obsessively count metrics to evaluate researchers (not that I agree with that, but it's what they do), so being cited or not can be a big deal.
I think the best solution might be to use the original citation but mark it somehow as inaccessible, that way it would be explicit that one is just citing for credit and did not really read the document. (I don't think I have seen this ever being done, though).
> Sure, one can randomly pick a textbook that they like and that cites that content, but this looks rather arbitrary to me.
The alternative is to trust your memory that you’re quoting the correct information on a source you haven’t read. That’s going to quickly mutate into something unrecognizable.
My first serious contact with scientific publishing was when I worked on master's thesis in bioinformatics on some very specific problem. I discovered that papers had systemic errors in input data, often statistics was not applied correctly, the full source code mostly not available, yet the papers built on top of each other, presenting results.
The scientists I worked with seemed to know about the situation, but personally, I lost the very strong belief in natural science results. Turns out, there was no rigorous fact and results checking culture, as in mathematics.
It seems to me, it is mostly the lay people who treat published scientific results as some proven ground truth, the scientists themselves are more relaxed about it. Science may be in some ways about telling interesting stories in a convincing way.
I’ve experienced in recent years that many of my friends who never cared a single bit about science started to link to scientific papers in order to back up their arguments (usually some form of right vs left discussion). This became much stronger during the pandemic. I’m not 100% sure if they believe that this is factual truth or if it’s just another one of the myriad of argumentative tools they use. But you’re absolutely right that they see it in a completely different way than scientists do. I’ve tried to explain to them that picking one article out of a thousand is just confirmation bias, that anyone can find one article (published somewhere) that confirms anything, and that scientific conclusions need to be replicated, but it’s useless and frustrating.
This just a consequence of midwits - the "Adam Ruins Eveything" mindset. These people have been taught that all observations must be signed off by a scientific study. No matter how banal the observation may be or how inapplicable the scientific method is to it, "the scientific community" must be shown to accept the observation because scientists are a priestly class who play the role of truth tellers in our society.
Hence the meme "I think castrating children is wrong" "hurr durr- hAvE yOu GoT a SoUrCe FoR ThAt?"
>The scientists I worked with seemed to know about the situation, but personally, I lost the very strong belief in natural science results. Turns out, there was no rigorous fact and results checking culture, as in mathematics.
This may reflect the feeling that for experimental sciences, the most reliable results checking may be to re-do the experiment, or perform a related experiment that is expected to produce similar results. In math (as I understand it), a proof is a proof -- the result is complete. In experimental science, an experimental result is often reproducible, but does not in fact guarantee that the hypothesis is correct.
While reproducibility and correct results are obviously important; most experimental scientists look for supporting experiments and a mechanistic framework before changing their beliefs.
Bioinformatics is a terrible mess and the rest of my career (15 years) will most likely just be cleaning up the shit built over the last 20 years.
Bioinformatics, especially genomics, is particularly prone to medical hype- the implication that these discoveries will rapidly lead to health improvements. Scientists are incentivized to claim the largest gains with the least amount of supporting data.
The same is true in computer science. Everyone cites the SHARD paper when discussing database design. But no modern researcher has read it. It does not appear in any archive.
I spoke to several academics who had mentioned it, none had read the original - they all relied on someone else's referencing of it.
Yeah, I think citation without reading is pretty common.
You find a lot of citations of Metropolis, Metropolis, Rosenbluth, and Teller’s paper on MCMC but I bet 1 in 10 have read it.
Ulf Grenander’s massive book on Pattern Theory is another, or Kolmogorov’s 1933 monograph on probability. Often cited but come on, did you really read it?
Another I’m aware of is references to the notion of VC dimension in learning theory. Many papers mention a hard paper by Shelah as an original reference to the concept…I saw the same reference given in a book by mathematician David Pollard with a remark along the lines of, “people say this is the same as VC dimension but I confess this paper is impenetrable to me so I really don’t know.” I appreciated the honesty.
I actually read that. Its definition of probability is pretty nice: it allows non-impossible events to have zero probability even in discrete case, e.g. a coin landing on its edge.
maybe 'read' is not descriptive enough here -- difficult materials takes multiple readings for anyone human; definitely different yet again when studied in a group with guidance and feedback.
Very common in mathematics as well. There are lots of results that people use every day that don't seem to have a published proof. They're literally called "folklore" theorems.
I hate this, and in my field (social sciences, organization studies, management sciences) several papers/books are blatantly misrepresented. I still struggle to cite a book (or several!) for a single point. To be honest, often it is really confusing, because science moves so fast nowadays, and these original papers contain so much unintentional historical cruft, or implicit messages to be untangled.
For example, when I started working in NLP, I did some things with the CYK algorithm (https://en.wikipedia.org/wiki/CYK_algorithm), which was defined in three papers by three separate authors. While I learned the algorithm from secondary sources, I of course cited the original papers (because one has to acknowledge the original author of the algorithms one uses). At that point, I wanted to read the original papers, even if just for curiosity, but they were impossible to find (I think one of them was available, but behind an outrageous paywall).
Now those papers are easier to find, but anyway I don't think I'd encourage a grad student to read them except for curiosity, the algorithm has been explained much better and in more accessible ways in more modern papers. Which doesn't mean that we shouldn't acknowledge the discoverers.
Not sure if it's in the exact same category, but one of the most devastating examples was a one paragraph entry in Letters to the Editor, published in 1980 in the New England Journal of Medicine, that I first heard about in the excellent Dopesick dramatisation. This unverified comment from a reader became legitimised as it morphed to a Journal citation, subsequently referenced in hundreds of first-gen papers, and who knows how many indirect references.
It effectively opened the door to Purdue's marketing department.
"In conclusion, we found that a five-sentence letter published in the Journal in 1980 was heavily and uncritically cited as evidence that addiction was rare with long-term opioid therapy.
"We believe that this citation pattern contributed to the North American opioid crisis by helping to shape a narrative that allayed prescribers’ concerns about the risk of addiction associated with long-term opioid therapy."
This was the example that came to mind for me. An MD friend, who was a medical student in the mid 90s was absolutely convinced for this, because it had been conveyed with a sense of great authority in a lecture at medical school.
I actually studied the citation history of that letter and it is the same category, and in addition, a real horror story. There is no doubt that citation plagiarism (often perceived as a rather innocent academic practice) contributed to the spread of the myth that opioids are NOT addictive. See: https://slate.com/technology/2017/06/how-bad-footnotes-helpe...
I thought that this was going to have stories like when taking an exam in a large lecture hall, one student finishes late, and the professor says, "Sorry, you can't turn that in now." And the student goes, "Do you know who I am?" where the professor replies, "No, not at all." Then the student goes, "Good," and shoves his paper in the middle of the pile of exams.
A similar (non-academic) story concerns an RAF pilot who in WW2 phoned the air ministry to complain about some recent order or decision. He got through and proceeded to rant about the stupidity of those in command. Eventually when he ran down the person at the other end asked if he (the pilot) knew who he was speaking to. When the pilot said 'no' the person at the end said 'Air Chief Marshal Dowding'. The pilot then said 'Do you know who you are speaking to?' When Dowding replied 'No' he said 'Thank God for that' and hung up.
HN and similar forums are particularly prone to this behavior. We see two messages:
* Message A: X is true
* Message B: Message A is false
Based only on that information (which often is all we have), there is no reason to believe message B rather than message A, but people seem to love a 'debunking' (it seems to make them feel smart).
I seem to remember that it matches a cognitive bias, such as believing the more recent message. Or, maybe we just fall for whoever acts more confident - a literal con game: The person communicating B is claiming to know more than the person communicating A, and for some reason we believe them.
Message B had a crucially different landscape when it was established: It knew about Message A. The opposite is not true.
If we assume general benevolence of humans, Message A had no evil when it was established, it was a genuine mistake, so Message B is true. Now if we assume some part of evil, Message B was crafter by evil actors who, even in the presence of more information than Message A, still entered additional false information in this world.
A close friend of mine was doing her PhD in nutrition science (or something like this) and asked me to review her work from a math (especially statistics) aspect.
I started to read it but it was so bad that I could not go on. I told her to get someone from her field because I am too traumatized to read that and if she takes my input into account she may well need to rewrite the whole thing.
I was at her defense and some people asked questions about the statistical part, but the level was "how do you calculate the average".
This worries me a bit because as far as the topic is concerned, statistical mistakes about Middle Ages in France are sad but that's it - but mistake sin nutrition or pharma can be worrisome. In her case that was quite esoteric so no risk for anyone but generally speaking I now do not trust any numbers I cannot analyze myself.
This was a really good read! It strikes at the heart of some frustration I feel when people get smug about "the science" as if they have checked anything for themselves. Seriously: be less confident.
You might read the rest of the article, which is actually about why people believe statements like the one you quote. In other words, that quote is a trap - the decimal point story itself is an urban legend.
Read the whole paper! Seriously, do it. The author is being cheeky with showing how an academic urban legend gets made and it takes the whole paper to get the punchline.
Sorry, yes - I meant the myth about high iron content of spinach, not the myth about the decimal point error. People may or may not "know" about the decimal point error - that issue is at the heart of this article indeed. But my point is almost an aside about spinach itself.
I'm literally just learning about this being untrue at this very moment. I was actually looking at adding foods that were high in iron to my diet recently and was very confused at what the nutrition data I found for spinach indicated.
Same. I asked my wife and she too thought spinach is high in iron. I was surprised to read that this mistake was discovered 30 years ago. That news sure isn’t getting passed around. We are both highly educated voracious readers and we had no clue.
> The belief that spinach is a good source of iron, although falsified 30 years ago by Hamblin in a British Medical Journal article, is still widespread among my colleagues, all of whom have, at minimum, a master’s degree in health sciences.
I think there's two points here. 1) the myth that spinach is a significantly better source of iron than other iron-rich vegatables, and 2) the myth that it came from misreading a decimal point.
I'm still only partway through the article, but wonder how much of this problem is due to things like introductions which cite prior knowledge. If non-review papers only reported the data they actually collected they'd be a lot more boring, but also less prone to urban legend transmission.
> How should I refer to my source? If I want to include this sentence in an academic publication, what should I place after my sentence?
Why would you include it in an academic publication? Just cite a correct source for the correct data. And maybe, at most, cite the original incorrect source and state that it was incorrect.
I still think much of this can be eliminated with a "just the facts" approach, less editorializing, and briefer introductions. At least in scientific journal articles. History of science is another discipline entirely, and the article obviously stands.
The well cited “Conways Law” in technology falls into this category. I can barely remember a week going past in the last 3 years without someone quoting it. As part of my PhD research, I’ve read the paper deeply, and the idea that it proves the companies ship their organisational structures is such a misuse of the research. I wonder how many people have ever read the original
> and the idea that it proves the companies ship their organisational structures is such a misuse of the research. I wonder how many people have ever read the original
Well, it does say in its conclusion: "The basic thesis of this article is that organizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations." [0]
I actually cited Conway's law to convince a senior decision maker that his organisation was badly structured. Briefly, a civil service organisation acquired systems for various users in a large customer organisation. The systems were rarely technically compatible although were often doing very similar things. This complicated technical interoperability and increased through life costs (e.g. because they had separate support contracts that duplicated basic functions).
The incompatibilities often resulted because the systems had separately written user requirements (e.g. using different terms for the same things, describing common processes in different ways). The requirements were incompatible because they were written by independent acquisition teams. The acquisition teams were independent because they reported to and were 'owned' by different parts of the overall customer organisation. Recognising this fact allowed the senior guy to request that the various customer teams established consistent terminology, processes and support contracts. In other words, going for coherence by design rather than (expensively) retrofitting it.
The authors came up in their widely cited paper with a proper solution to spread the random hash seed into the inner loop, vastly enhancing its security by avoiding trivial hash collision attacks. But a secure, slow hash function can never prevent from normal hash seed attacks, when the random seed is known somehow. esp. with dynamic languages it's trivial to get the seed externally.
Other trivial countermeasures must be used then, which also don't make hash tables 10x slower, keeping them practical.
So is spinach a good source of iron or not? I mean when taking the whole absorption thing into the account too. Sorry, I'm in a hurry and I have no time to read the article carefully and I must contact my mom.
Iron isn't as easily available from vegetables as it is from meat. So in general no.
Spinach is also a contender (along with rhubarb) for being one of the most concentrated sources of oxelate, which those of us who have had kidney stones should try to avoid, or consume only in moderation, or consume only with a good source of dietary calcium (advice varies)
It's ubiquity as a salad leaf, and it's general tastiness are a source of frustration, but sadly not very much iron
When I wrote class term papers in the 1970s I spent days in the massive book stacks of Harvard libraries. I suspect many urban legends about love-making, deceased souls, and crimes in the stacks may true. Like in the movie The Paper Chase.
I dont know if Harvard Libraries has warehoused most their physical books like many other college libraries. One has to order up old books then.
That's a cute story (read to the end for the twists). I really came to hate this problem when reading COVID research.
Primary case for the prosecution is a paper like Flaxman et al, "Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe" [1]. It claims that lockdowns saved ~3 million lives. Amongst other problems this paper has: it is built on circular logic [2] and an over-fitted model [3], has numerous other statistical problems [4], required an enormous country-specific fudge factor and hiding data points to cover up the way Sweden's results disproved their claims, and their model always claimed that the last NPI was the most effective regardless of what it was.
The cherry on top is that it disclaims its own results, literally admitting in the text that their counterfactual model is "illustrative only" and that "in reality, even in the absence of government interventions we would expect Rt to decrease and therefore would overestimate deaths in the no-intervention model" i.e. they already knew their conclusions were wrong when they wrote it. (This rather important caveat didn't appear in their press release about the paper, of course [5]).
Yet according to Google Scholar this paper has been cited nearly 3000 times in the ~three years since its publication. It gets cited several times per day. I've watched the citation count go up with morbid fascination. What are people citing it for?
If we do a reverse citation search and check some papers, we see immediately that it's being cited for a wide variety of almost random statements, none of which it actually supports:
1. "Health-care workers, seniors and those with underlying health conditions are at particularly high risk" [6] [8]. This claim appears with identical wording in two different papers, but Flaxman et al don't present any data on this or even reference risk stratification by job as far as I can tell. It's certainly not the focus of the paper.
2. "However, because most countries have implemented multiple infection control measures, it is difficult to determine the relative benefit of each" [7]. Flaxman et al claim it's easy to determine the benefit of each and that they did so.
3. "health agencies have long relied on predictive models to estimate future trends and to assess the potential effectiveness of various disease control methods" [9]. The paper doesn't show anything about the history of health agency decision making.
etc. Even when citations characterize its claims correctly, they are just taking its assertions at face value without realizing that the paper's methodology is circular and even the authors don't believe their own numbers. Of what use are citations in this environment? These aren't cherry picked examples, they're literally just randomly selected by when I happened to search Scholar. Even so, most of the citations are wrong.