Hacker News new | comments | ask | show | jobs | submit login
How to read and understand a scientific paper: a guide for non-scientists (lse.ac.uk)
320 points by ingve on June 27, 2017 | hide | past | web | favorite | 58 comments

I have a little "hack" that I find extremely helpful for getting a sense of specific research fields.

Journal articles, even review papers, are cramped for space and so tend to be very dense. The author suggests methods for doing battle with this density, but I suggest that, before doing that, you search for a class of document that's allowed to be as expansive as the author desires, and whose authors have recently struggled to learn and understand their content, and so tend to be expansive:

PhD Theses

Find out what research group published the research, find out which graduate students have recently graduated from that group, and read their theses (if the author's command of the language of publication isn't what you'd prefer ... find another graduate student). I guarantee you it will function much better as an introduction to what the group does than trying to parse any of their journal publications. In particular, the "draw the experiment" step will often be solved for you, with photographs, at least in the fields where I've done this.

This is very good advice. I am a PhD student in mathematics, and every time I try to learn a new area of math, I'm grateful when I stumble upon PhD theses about that topic. They actually include their calculations.

Yeah, and if you are actually in the field and trying to learn it (as you are and this article's intended audience is not), a nice little bonus is that PhD theses are usually formatted in a way that's very friendly to markin' 'em up as you come to grips with the material.

This is a good guide, but I will tell you a trick that is faster, easier, and more effective:

read 2 or 3 papers.

All that effort you would put into doing these steps? Instead, read 1 or 2 other papers that the author refers to in the beginning.

Science is a conversation. When you read the other papers, even if you don't understand them at first, you will get a sense of the conversation.

Also, some writers are abysmal, and others are amazingly lucid. Hopefully one of the 3 papers you read will be the lucid one that will help you understand the other 2.

There is a huge variety in the quality of writing in scientific papers, but most of it is bad, or at least totally opaque, so I'm not sure that 3 papers will give you a sense of what is good, or even have a high probability of containing a paper that you can use as an entry point, although I think your premise is correct.

Probably the best evidence that a paper is a good entry point is whether or not the author cared about the abstract. A lot of scientists treat it as a chore, picking some key points from the premise, methodology, and conclusion sections, and haphazardly pasting them together into a miniature version of the paper. An abstract is a sketch of your argument. It's supposed to be how the author thinks about the work they are doing, in terms of how it relates to the work everyone else is doing. Look for an abstract which presents an argument in plain english and isn't afraid to give a little background or motivation. It might take dozens to find one though.

Personally I find abstracts close to useless, and just skip them entirely. I’ve never found a particularly close correlation between what an abstract said and how interesting/informative/well written the rest of the paper was. YMMV.

This is great advice. Even after becoming familiar with an area, I read some authors and get swamped by unexplained terminology.

Another simple trick is to look at the journal title. Articles in journals like "Trends in... " tend to be written for a broader audience, so often have clearer introductions. In general, the less specific the journal, the better the introduction will be for newcomers.

(Be aware that journals with lower word limits / shorter articles may have less rigorous introductions, for better or worse)

Another trick: use google scholar or a similar citation graph.

Unfortunately the citation graph sites won’t show all the outgoing references from a particular paper (I assume because they’re afraid of copyright issues?) but looking at the incoming references for a paper can be very useful. If you can find an old classic paper in some subject, then most major later papers will have cited it, so you can start from google scholar search’s list of references to the classic paper, which will be sorted by citation count. Doing keyword searches within such a list of references can often quickly surface the most important papers on a subject, especially if you go a couple hops into the “cited by” graph. Often among the most-cited papers is some kind of review paper with a clear explanation of the context, overview of the literature, and extended definitions of important terms.

I love how simple and clear this post is.

As a kind of weird aside, if anyone ever emailed me about any of my journal articles, I would 100% respond to them (assuming they weren't a machine). I think most of my colleagues would do the same (except for articles featured in a newspaper, which might garner a lot of weird emails).

Me too, most especially if the email is from a student. I imagine the same goes for many of us who write research papers.

Keshav's "How to Read a Paper" [1] is a good guide, though perhaps less in the "for non-scientists" camp.

[1] http://ccr.sigcomm.org/online/files/p83-keshavA.pdf

> As you read, write down every single word that you don’t understand. You’re going to have to look them all up (yes, every one. I know it’s a total pain. But you won’t understand the paper if you don’t understand the vocabulary. Scientific words have extremely precise meanings).

That's a great tip. I've found that a lot of papers aren't necessarily complicated, but the vocabulary is unfamiliar (but you experience the same sense of confusion with both). It's interesting that we often conflate complexity with unfamiliarity, my reading comprehension abilities improved quite a bit by understanding the difference between the two.

I don't understand the opposition to abstracts: dense means high information content, so if you know the field you can learn a whole lot (like whether you should read this paper or another one).

Abstracts often are misleading.

They're useful to decide whether you should read this paper or another one, but they're often not useful to get a summary of what exactly the paper actually achieves. Often the abstract will imply a more interesting result by leaving out key aspects and limitations (which are detailed in the paper and its conclusions) that significantly change the impact of the paper, the abstract often is more like an advertisement for the paper than an effective summary. I mean, it may be, but if I'd read just the abstract and go away thinking, "oh, so now there's a way to do X", I'd often be wrong.

I recently read a paper whose abstract seemed to imply to me that its content was much more technical and specific than it actually turned out to be, which was disappointing. It was more useful in telling you the particular area of research than summarising its content.

I also find that briefly paraphrasing the abstract helps me understand my own expectations of the paper. Even if the paraphrase is wrong, it's still a good primer for switching into active engagement instead of passive absorption. Also, when you finish the paper you can review and correct your initial impressions; sort of like tutoring yourself (which personally I find helpful from both perspectives, tutor and tutee).

Oh this is awesome, well presented and clear.

A couple of notes, generally if you email the author of a paper they will send you a copy. Scholar.google.com can be used to evaluate the other papers referenced, highly cited ones will be 'core' to the question, less highly cited ones will address some particular aspect of the research.

For any given paper, if it cites one or two seminal papers in the field, you can build a citation cloud to create what is best described as the 'current best thinking on this big question'. You do that by following up the citations and their citations for two or three hops. (kind of like a web crawler).

With something like sci-hub and some work on PDF translation, it should be possible to feed two or three 'seed' papers to an algorithm and have it produce a syllabus for the topic.

I usually first start reading or glance over papers (and non-story books) from the end to the beginning before I read it the other way around. This has the following benefits for me:

- By knowing about the conclusion first I will better understand the motivation and why certain steps are being taken.

- I find out sooner if the paper (or book) is something I am looking for.

I like to read papers unrelated to my field to learn new thing to apply. To be honest, some papers still take me a long time to understand because they usually assume you already are researching the topic (for ex. certain terms, symbols and/or variables that are not being defined).

There is a difference between reading and studying a paper. Many papers I just check the abstract for claims of A causes/correlates B (ie it is a "headline" claim), and look for a scatter plot of A vs B (it is missing).

Then I do ctrl-F "blind" (can't find it), ctrl-F "significance" (see p-value with nearby text indicating it has been misinterpreted). Boom, paper done in under a minute. There is really no reason to study such papers unless they have some very specific information you are searching for (like division rate of a certain cell line or something).

This only works for a very small subset of studies in a subset of scientific fields.

Agreed, the OP was about medical research though, where it does apply.

The importance you place on the presence of a p-value suggests that you're probably not the best judge of whether a p-value is correctly interpreted.

I'm not sure what you mean. You think it is unimportant if people do not understand the tools they use to draw conclusions from their data? The presence of the misunderstood p-value is just the easiest, most obvious heuristic for "authors and/or reviewers do not know how to analyze their data".

Speaking of which, the first link in this article about help with statistics goes to a site that misinterprets p-values:

"This means that if the experiment suggests that the probability of a chance event in the experiment is less than this critical value, then the null hypothesis can be rejected." https://explorable.com/statistical-hypothesis-testing

Ah, frequentist interpretation of results. I don't think it's going to be something people are fond at looking back on.

About identifying "The Big Question", I have a story from my days as a graduate student, where I failed to do so.

I was asked to help on a project that needed to identify humans in an audio stream. During my literature review, I came across the field of "Voice Activity Detection" or VAD, which concerns itself with identifying where in an audiosignal a human voice / speech is present (as opposed to what the speech is).

I implemented several algorithms from the literature and tested it on the primary tests sets referenced in papers and spend a few months on this until I finally asked myself "What would happen if I gave my algorithm an audiostream of a dog barking?"

The barking was identified as "voice".

As it turns out, the "Big Question" in Voice Activity Detection is not to find human voices (or any voices), but to figure out when to pass on high-fidelity signals from phone calls. So the algorithms tend to only care about audio segments that are background noise and segments that are not background noise.

>I want to help people become more scientifically literate, so I wrote this guide for how a layperson can approach reading and understanding a scientific research paper. It’s appropriate for someone who has no background whatsoever in science or medicine, and based on the assumption that he or she is doing this for the purpose of getting a basic understanding of a paper and deciding whether or not it’s a reputable study.

Better advice intended to make layman with zero background in science become more scientifically literate would be to tell them to read some textbooks.

Later on in the article, she tells people to write down each and every thing you don't understand in an article and look them up later. And this is excellent advice for people with a background equivalent to an advanced undergraduate or higher, but for people with zero background it would be better to read some textbooks and get yourself a foundation.

Honestly, even when I was in grad school in neuroscience, I asked around for advice on reading papers and the surprisingly universal response from other grad students was that it took 2 years to become reliably able to read and evaluate a research paper well. And this is 2 years in a research environment with often weekly reading groups where PIs, postdocs, grad students, and some undergrads got together to dissect some paper. These reading groups provided an environment in which you had regular feedback on your own ability to read papers by seeing all the things those more experienced than you saw and that you missed. A paper that took me 3+ hours of intense study would take a postdoc a good half hour to get more information out of.

I feel like this article makes reading articles well seem a lighter undertaking than it really is. It's really no wonder we see studies misinterpreted so often on the internet, where people Google for 5 minutes and skim an abstract.

> it took 2 years to become reliably able to read and evaluate a research paper well

This completely coincides with my experience. When I started grad school, it took me a few hours to read one paper, and I probably understood only 50% of the materials even though I had some foundations in the research area from my undergrad studies.

Reading textbooks is a great advice. Then one can start reading some review papers in the area to get some more depth in his/her knowledge. I think the difficulty is that it is hard to find good textbooks and review papers for the subject that one is interested in, especially when the subject is in a niche field.

As a student who needs to read research articles for my project, this article gave some new ideas on how to approach those long boring and cryptic pieces of text that just take days to understand. Thanks to the person who posted it.

A couple things I try to do when reading research papers, inspired by these two amazing [b|v]logs. [1]https://blog.acolyer.org/ [2]https://www.youtube.com/user/keeroyz

I try to paraphrase the paper into a Acolyer like 'morning paper' blog post on evernote while mentally I am directing a 'two minute paper' video on the paper :)

I would like to have a digest or an overview written for a IT practitioner. I did go SC/IT conference and enjoyed the talks and I noticed 2 things: 1/ You learn new things and new approach that can bring value to our job 2/ It seems that the research sector discover stuff that are already known in the industry.

I think it would be great to have a journal/blog that would construct a bridge between the industry and the university.

This suggestion by Michael Nielsen is also very good: https://news.ycombinator.com/item?id=666615

What's odd to me, is that lots of professors have blogs in which they write quite a bit in plain language that doesn't require an instruction manual in order to be read

Why don't the authors do these 11 steps for us?

Because they are not writing for non-scientists.

I understood that the steps are followed by scientists such as the author, it's just the guide that is intended for non-scientists.

Not necessarily. IMO, she proposes a specific way of "thinking" about the paper's content (I use another approach [1]). Papers are written following a certain structure because many people think it's a better way of presenting the ideas in detail/with the necessary rigor (that's what other "scientists" in the field would expect). For example, no researcher expects a five-sentence summary of the background. Personally, I expect an explanation of the relevant concepts/techniques and some sort of analysis of how existing work relates to the paper (instead of a list of related papers). At least in software engineering, many papers state the research questions explicitly, i.e., they would be identified if you read the paper from beginning to end. They tend to have a "results" section as well, so summarizing the results myself would be an intellectual exercise. Once you understand how papers are usually structured, you pick up on many of those things as you go (I mean, as you simply read the paper as the author intended).

On a side note, I'd say that many researchers don't do a good job of conveying their ideas clearly (it gets worse with conference presentations). It won't really matter in what order you try to read their papers.

[1] http://blizzard.cs.uwaterloo.ca/keshav/home/Papers/data/07/p...

all scientist were non-scientists first though, correct?

Look, I get that there's some natural professional context and lingo that goes into these things, but for all the angst that goes into what esteem that population at large holds up the science community

making their work more accessible to both novices and interested outsiders would be a nice step in the right direction

I agree with you. To put it simply, papers are optimized for the scientific community and making them "more accessible" to outsiders has a cost. I'd settle for better writing and presentations within the scientific community for now. If you ever find researchers that blog about their research in simple terms, I think it's safe to assume they're using their personal time to do that (I know of very few; Andy Ko [1] comes to mind).

[1] https://medium.com/bits-and-behavior

I think that they often do when it's appropriate, but the way they make them more accessible isn't by changing their papers but by giving talks or presentations that are less technical (like TED talks for example).

This seems like a good approach in my humble opinion.

Because their goal is to get published so that they can move along in their careers. Their target audience doesn't require these 11 steps.

My question would be why don't donors and taxpayers (who are funding research) demand that researchers do these 11 steps?

> My question would be why don't donors and taxpayers (who are funding research) demand that researchers do these 11 steps?

Because the overwhelming majority of taxpayers do not read them or care to. Also, even though a taxpayer may not read papers, they hopefully still see the value in the progression of science, and allowing scientists to assume a certain level of background to the audience of their paper probably allows them to use their time on science.

Perhaps there's an opportunity for a motivated individual or group of individuals to make something to parse papers and help make it easier to read for those who don't have the proper background?

Because donors and taxpayers still wouldn't understand the summary version in many, many cases.

Because most journal articles are speaking to multiple audiences—those wanting to efficiently consume the text in a literature review (like the instruction in this article) and others who are preparing a rebuttal or an elaboration on the methods; still others are trying to understand the larger discourse / set of questions in which the paper is embedded, and don't particularly care about the specific results of any one. Journal articles as they are currently written are a decent compromise among the needs for all of these populations.

In addition to reasons stated: some journals (or academic disciplines) expressly reject clarifying content within papers. Listing of variables used in equations is one specific example that's been mentioned to me by Joerg Fliege, an academic himself, at G+.

(His "Academia, Schmacademia" collection is very highly recommended.)

I'd like an answer to: how/where to ask the relevant community a question about a scientific paper.

this is a great guide. i wish more writing on the Internet has this blend of substance, message, tone, and grit.

Sensible advice overall, but I completely disagree with these:

> Before you begin reading, take note of the authors and their institutional affiliations.


> Beware of questionable journals.

Institutional affiliation and journal imprimatur should have no bearing in science. These are shortcuts for the lazy, and they introduce bias into evaluation of the paper's contents.

Even more than that, dispensing advice along these lines perpetuates the myth that scientific fact is dispensed from on high. If that's the case, just let the experts do the thinking for you and don't bother your pretty little head trying to read scientific papers.

If the author's approach to reading a paper only works by checking for stamps of approval, maybe the approach should be reconsidered.

They aren't shortcuts for the lazy, they're shortcuts for non-scientists who aren't capable of fully evaluating the science alone. If you're capable of objectively peer reviewing a paper, you're not the audience of this article.

> They aren't shortcuts for the lazy, they're shortcuts for non-scientists who aren't capable of fully evaluating the science alone.

It's a shortcut fraught with potential for deception, as even a casual glance through a site like Retraction Watch will demonstrate:


I'm not sure what you mean by "evaluating the science." A scientific paper should present a hypothesis, the author's best attempt to disprove the hypothesis, and an interpretation of the evidence gathered in the processes of testing the hypothesis. There's going to be a back-story, and it's likely to be quite involved.

The article does a good job of presenting a method for navigating a paper on this basis. I don't see what checking credentials adds to the process. On the contrary, it may do harm.

While we may find the high profile cases featured on Retraction Watch mainly in high impact journals, that's precisely because unscrupulous people deem these journals worth it to cheat to get into. Nobody cheats to get their paper into the International Journal of Architecture, Engineering and Nursing Science - because it and it's ilk are utter pieces of crap that will accept anything, up to and including randomly-generated text and pro-Sri-Lanka-highly-racist-UFO-conspiracies (real example). Teaching laypeople to avoid these is a very good idea.

On the contrary I totally agree with the author on the point regarding journal—while noisy, reputable journals generally have better editing and stronger peer review. Of course it isn't the case that less credentialed journals / authors or bad, it just increases the probability of wasting your time reading something charitably only to find massive theoretical or methodological shortcomings.

Bias = priors. Use your priors.

>Institutional affiliation and journal imprimatur should have no bearing in science.

They shouldn't, but they do. This is reality. Industries producing research are often industries that would benefit from the research being biased in their favor. This is an important element to consider.

It's not about dismissing certain affiliations, it's about being conscious of institutional bias.

I don't find institutions valuable in this regard as well, unless you're looking for a potential conflict of interest there.

Journals is something else, mostly because finding out if it is a "real" journal can help to sort out a lot of crap. If it's a journal where you just pay to get stuff published without any peer-review that tries to look like a real scientific journal, it's pretty safe to just stop reading.

Using the prestige or impact factor of the journal as a guide for quality is likely misguided as well. Though it is a warning sign to me if a paper makes claims that look like they should be publishable in a prestigious journal, but it actually appeared in some journal nobody knows.

I agree with you. Your comment totally reminded me of SCIgen [0].

[0] http://news.mit.edu/2015/how-three-mit-students-fooled-scien...

You can beware of something and still read it. In general, I think it's pretty sound advice. It's less about turning your nose up at someone because they're not a professor at Harvard, and more about asking whether a group with ulterior motives has spun up an institute and is essentially self-publishing "research".

I check these all the time. It's not my job to verify who was first to publish and who deserves credit for novelty. There are groups that do that which divy up acknowledgment. It's my job to get up to speed with as accurate information as I can with the limited time I have.

Institutional affiliation: https://en.wikipedia.org/wiki/Aspartame_controversy

"Searle had submitted 168 studies on aspartame"

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact