Hacker News new | past | comments | ask | show | jobs | submit login
50% of neuroscience papers suffer from a major statistical error. (badscience.net)
182 points by zacharyvoase on Nov 28, 2011 | hide | past | web | favorite | 94 comments



In academic biology, statistics is routinely considered an afterthought. In the eyes of many labs/PIs, their results are already real. You just have to do those damn statistical tests so the editor will get off your back.

Most biologists don't understand statistics either. To the majority, the unpaired T-Test is the only test that is needed. Ever. Doesn't matter if you have one or two tails, paired or unpaired trials, normally distributed population or skewed. Most biologists don't take proper statistics classes and most just don't care. I doubt most biologists would even be able to name alternative statistical tests.


So very true. I went from physics into biology, and the shoddy application of statistics is surprising. This is a failure that could be easily addressed at the university level. More stats classes.


The best and simplest approach IMO would be improving peer review. If it was understood that scientific journals would reject papers with poor statistical analysis you would see changes fairly rapidly.


Nature and the NEJM, and possibly others, assign statistical reviewers. So this is a known problem that some high profile journals are attempting to correct.


Yes, but I think one of the problems is that the reviewers can't sufficiently judge statistics. It is peer-review, after all. I've had some reviewers ask some questions that indicated that they didn't really understand the t-test.


Frankly, I was pretty blown away by how poorly it's understood by physicists (although they are probably better than biologist). Even in experimental particle physics, where statistics are essential, the statistical techniques are all self-taught and seat-of-the-pants. I am confident everything being released by the major particle physics experiments is statistically sound after it goes through the collaboration review process, but there are certainly poor inferences being made at the level of the individual physicist. I listened to at least a couple of confused conversations between smart particle physicists about whether frequentism or Bayesianism was "right".


I think it doesn't have to be just "more stats classes". They also need to be focused on the kind of work they'll need to do with those statistics. If you teach future biologists or medical doctors "so there is this theorem that says that if there is statistical significance of A and not of B, then the value of P equals the..." they won't learn it/use it.


The prevailing attitude in my lab is that "if you need statistics, you did the wrong experiment."


That's funny, in ours it was more like: If you don't have the replicates (for the statistics), it didn't happen.


What? How would that be justified?


I've seen this attitude before. The idea goes something like: "If the effect you are seeing is so subtle that you have to resort to statistical tests, it probably isn't a phenotype you should be studying".

In my experience, this usually comes from PIs that worked back in the days of "move bar of light across visual field, see neurons fire". Their experiments were ground breaking and had drastic phenotypes.

On the flipside, I've seen experiments where they are showing that one population of neurons is 2% more dynamic than another population, and sometimes you just wonder if that really matters at all (or if you are really sampling the population well enough)


Actually, that quotation is usually attributed to Ernest Rutherford. This is a common discussion among particle physicists, but of course Rutherford did his work before the statistical interpretation of quantum mechanics became clear. For example, even measuring something like the mass of a particle is inherently probabilistic; for short-lived particles it's usually done by repeating an experiment many times, fitting a gaussian distribution to the decay times and backing the mass out from the time-energy uncertainty principle.

See http://www-stat.stanford.edu/~ckirby/brad/papers/2005NEWMode... for a really interesting discussion of the way things are headed from a statistician's perspective.


To elaborate on what other have said: it's not meant to be taken literally. There will of course be cases (e.g. the LHC) where it's imperative to do precise statistics because it's so expensive to make a measurement and only a few measurements can be reasonably made. But the idea of the quote is that often a researcher is going to be wasting his time if the effect he is looking at is so small as to require heavy duty statistics; the fancy statistical machinery can be a time-sink and distraction from the fact that the underlying effect isn't that interesting.


How would you know anything about your data without doing some statistical analysis. I'm not sure if you're saying some statistics are gathered or that any statistics are hard.


The distinction is between trivial and non-trivial statistics. If I test a drug which cures 90 patients out of 100, vs. a placebo that cures none, I don't need to quote a p-value. The drug works.


Social science has exactly the same problem. More statistic classes are desperately needed.


The problem here is in my opinion that statistics is taught in way to "isolated" fashion. I hated math related subjects all the way through highschool and undergraduate levels, but as soon as it went towards applying statistics even on simple survey studies it became interesting. Not the "math" but the "discovery" should be in the center. If you talk to social science people they often have a deep aversion against "numbers", so making it as accessible as possible early in the career might be essential.


In my experience, economics is the exception. Many of the economists I've worked with know their statistics, and they put in a lot of work to make sure their models agree with econometric or time series theory (that's possibly selection bias, though, given where I've worked).

Also, apologies for perpetuating the "arrogant economist" meme. Just reporting what I've experienced. :P


I think it has something to do with expectations.

When you study psychology you don’t expect to sit around and do math all the time.

When you study biology you don’t expect to sit around and do math all the time.

When you study economics (especially if you actually want to become and economist) you do expect to sit around and do math all the time.

For many biologists and social scientists statistics classes must seem like a chore that has little to do with what they actually want to do. For economists statistics is just another one of their math tools.


Agreed--that's how I was encouraged to think about statistics in my grad classes + research programs. They're tools for getting things done correctly, so they need to be understood thoroughly--no different than a given programming language.


I read a couple of papers on the backlash (even by expert economists) against the heavy emphasis on mathematics in economics, which, IIRC from the papers, began back in the 1940s. Basically, economists work hard producing mathematically coherent models that aren't applicable to the real world.


This is a good point, and it's a problem in theoretical economics. Some economic models abstract away some important real-world conditions.

My experience, though, is with applied macroeconomics. We'd derive models consistent with theory, then make sure we avoided a number of time series pitfalls (autocorrelation, mistaking cointegration for correlation, overspecifying a model) using a variety of statistical techniques.


I'm not a hardcore statistics person, since I only took enough to get me through ECE, but I think this is why I always had a problem with softer sciences (undergraduates, maybe graduate students are better). Things always boiled down to, "look, it's obvious." My background simply makes me think the obvious isn't always right.


Funnily enough that’s actually one of the reasons why I have always been fascinated by social science. Not as something where the results and methods are fascinating but rather as something where much is to be done and all problems are hard to solve.

During the last century empiricism’s role in the social science has been growing but social science as an empirical science is still relatively young and, I think more importantly, it’s not really dominated by empiricism the same way, say, physics is.

It’s still necessary to adapt and find methods.


I think, in general, I find some of academic's dogma a bit off-putting. I always approach things in the most classical sense--that is, everything is relatively true. Not to dismiss empirical data though. For example, people seem entrenched in saying dark matter is X or will do Y, but even people who helped discover its possibility say to be cautious of making such statements. When people come out and say it's not definitive in its existence (yet), they tend to get lambasted. I know there is always politics and evangelizing in every human situation, but my childhood notions of a pure method/approach seem a bit jaded now. Perhaps, some of the softer sciences need to study inter-academic hard science ;)


The short version of the error: If in your data you find that A has no statistically significant effect, but B does have a statistically significant effect, this does not automatically show that B has, with statistical significance, more effect than A. To do that you have to do a statistical test on the difference in the effects.

I believe (?) this is normally done correctly in medical studies with placebos, where the typical analysis is to show that a drug has a statistically significant effect compared to the placebo baseline; it's not sufficient to show that the drug has an effect compared to a no-drug baseline, and, separately, that a placebo doesn't.


You may be simplifying too much to be understood. In medical terms, if your drug is significantly more effective than doing nothing, but a placebo is not significantly more effective than doing nothing, your drug need not be significantly more effective than the placebo. (Yes, medicine typically avoids this particular error.)

(E.g. if doing nothing cures 10%, the placebo cures 11% and the drug cures 12%, only the difference drug/nothing may be significant.)


Or: if I compare three different distributions, it's possible for A&C to be distinct, but for A&B and B&C to overlap.

As the linked article says, the idea of statistical significance being a binary state is to blame here.


It's true. Many anti-depressant drugs have no significant effect when compared to placebo.

http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fj...


I don't believe the controversy over that is whether the statistics were done properly, but over: 1) replicability, in particular whether significance is still found when doing large-scale meta-analyses; and 2) whether blindness of the studies is compromised by "unblinding" effects where a drug's side-effects can tip off the doctor or patient about whether they got a placebo or not. But it's been a while since I waded into that flamewar of a debate...


No, according to this one controversial[0] study, anti-depressant drugs have no statistical significance compared to placebo except in cases of severe depression.

[0]: http://www.plosmedicine.org/article/comments/info%3Adoi%2F10...


A good paper on this error is from Andrew Gelman www.stat.columbia.edu/~gelman/research/unpublished/signif3.pdf


I'm curious about the example in the article. If the effects were measured at 30% and 15%, which are the change in some measured norm, then isn't the difference in differences actually 100% not 15%?


Often you still use units of % change relative to the initial quantity, though I suppose you could define it either way. For example, if we're comparing a country with an inflation rate of 4% and one with an inflation of 8%, we might say that the difference between the rates is 4%, in that |rate1-rate2| = 0.04, as opposed to measuring one in terms of percent of the other.


I'm asking because the author talks about the "difference in differences" and as early as highschool physics they drill into your head that the difference between two things is represented by the percentage difference not it's absolute difference. Saying 15% is negligible when it is actually a 100% change in the effect seems incorrect somehow.


Not to nitpick, but the headline "50% of neuroscience papers suffer from a major statistical error." is false. Out of a sample of 513 papers 78 contained this specific mistake.


> Not to nitpick,

I think a statistical nitpick is entirely appropriate given the article.


I think you skimmed a bit fast:

"Nieuwenhuis looked at 513 papers published in five prestigious neuroscience journals over two years. In half the 157 studies where this error could have been made, it was made."


In other words, you're arguing that the title should be: "50% of 31% of neuroscience papers suffer from a major statistical error." ?


How about "50% of eligible neuroscience papers assume that statistical significance is transitive.".


Yeah, even if I have a 50% chance of getting slapped in the face when I tell someone they're ugly, if I'm not doing it all the time, that doesn't mean 50% of my interactions involve me getting slapped.

Likewise, if there's a 50% chance that a neuroscientist messed up a particular kind of analysis, because they don't do it all the time means that less than 50% of neuroscience papers will contain that error.

Still, it's embarrassing.


That misrepresentation is a greater error than the article is complaining about.


Actually it _might_ still be true. This paper could be just one of the false 50%.


This should be the least of our worries. The incentives are set up so strongly to produce a statistically significant result that indeed nearly all experiments do produce a statistically significant result. Scientists measure a data set. Then they work to find a subset of the data and a statistical test that gives them a significant result. Just ask a couple of scientists whether they ever did an experiment that didn't provide a statistically significant result. Proper statistics, where you decide on the test you'll apply and on what data before doing the experiment, is almost unheard of. And then there's publication bias, of course.


Here is a wonderful research paper that illustrates the misuse of statistics in neuroscience:

http://prefrontal.org/files/posters/Bennett-Salmon-2009.jpg


This statistical point is true. However, there is a practical problem with experimental neuroscience - data is painful to get and the means are often comparable to the variances.

So, with the data you have, it is often much harder to show A > B than to show A > C and B !> C.


So your point is, inappropriate stats are okay because science is hard???


Maybe someone should follow up and do a similar study for multiple fields and see whether the difference in error rate between fields is statistically significant.


There is some literature in other fields. Sjoberg, in "A Survey of Controlled Experiments in Software Engineering" did similar studies in software engineering to compare with other fields such as medicine and the social sciences. This survey showed that only 1.9% of Software Engineering studies were actually controlled experiments.

They examined 5453 scientific articles in 12 leading journals from 1993 to 2002. I like how the authors tactfully state that "the gathered data reflects the relevance of software engineering experiments to industrial practice and the scientific maturity of software engineering research."


"Statistical significance" is starting to tire me. It is too binary for my test: either a given result "achieved" statistical significant, or it is not. Obviously you have to choose a threshold, and which it should be is much less obvious.

Couldn't we just do a way with statistical significance, and just publish likelihood ratios, or decibels of evidence (in favour of one hypothesis over another) ? That way, we should know exactly how much an experiment is supposed to be worth. No arbitrary threshold. Plus, you get to combine several experiments, and get the compound evidence, which can be much stronger (or weaker) than the evidence you get from any single one of them. And then you may have found something worthwhile.

This is especially crucial when said evidence is expensive. In teaching, for instance, one researcher can hardly do experiments on more than 2 or three classrooms, over little more than a year. This is often not enough to accumulate enough evidence at once for reaching statistical significance. But a bunch of such experiments may very well be. (Or not, if the first one proved to be a fluke.)


The submission title of the submitted article (which does NOT appear as the original title of the article) is probably a hat tip to the famous article by John P. A. Ioannidis, "Why Most Published Research Findings Are False."

http://www.plosmedicine.org/article/info:doi/10.1371/journal...

The article by Ioannidis, which I think is the most downloaded ever article on PLoS Medicine, is well worth reading for guidance in how to avoid errors in research design.

But to the point of the accuracy of the submission title, what the underlying journal paper

http://www.nature.com/neuro/journal/v14/n9/full/nn.2886.html

said was, "We reviewed 513 behavioral, systems and cognitive neuroscience articles in five top-ranking journals (Science, Nature, Nature Neuroscience, Neuron and The Journal of Neuroscience) and found that 78 used the correct procedure and 79 used the incorrect procedure. An additional analysis suggests that incorrect analyses of interactions are even more common in cellular and molecular neuroscience." In other words, half the time when the specific issue came up, the neuroscience authors got the procedure wrong.

AFTER EDIT: Meanwhile, a lot of other interesting comments have mentioned the role of statistics education or math-aversion in higher education in various disciplines. I'll comment here about some things I've learned about statistics since I completed my higher education, things I've self-educated about as a homeschooling parent for the last two decades and now a mathematics teacher in private practice.

First of all, even when undergraduate students take courses in statistics, the courses are not likely to be very helpful. Many statistics textbooks used in colleges are poorly chosen

http://statland.org/MAAFIXED.PDF

and many statistics courses are taught by professors who themselves have very poor backgrounds in statistics, so the essential point that statistics is all about DATA never gets emphasized. Moreover, the undergraduate statistics curriculum historically has emphasized the wrong issues about valid inference

http://escholarship.org/uc/item/6hb3k0nz

and most undergraduates who complete one or two statistics courses still have a very weak sense of what valid statistical inference is.

And all of this is not even to get into issues such as Bayesian versus frequentist approaches to statistics

http://yudkowsky.net/rational/bayes

in modeling reality. Yes, biologists need to get over fear of mathematics for biology to progress as a science,

http://www.guardian.co.uk/books/2011/apr/16/mathematics-of-l...

and everyone can gain by learning more about statistics,

http://knowledge.wharton.upenn.edu/article.cfm?articleid=192...

but statistics education isn't easy, and it can still be greatly improved even for the students who do step up to take statistics courses.


Although I've got a good maths degree, I often feel out of my depth beyond fairly basic statistical discussions -- such as these...

Particularly in light of the statland.org link above, what would you recommend as good reading material to help me "learn to think like a statastician"?


While I agree with the major substance of your point (though i never liked that Yodowsky link), you are incorrect in your citation of the paper.

The actual paper is: http://www.nature.com/neuro/journal/v14/n9/full/nn.2886.html

It appears to follow on from an extremely controversial paper in neuroscience: Voodoo neuroscience http://escholarship.org/uc/item/51d4r5tn;jsessionid=FB5843BB...

If you look at the cited by links in scholar, then you can get access to a raging debate across quite a number of journals (well worth popcorn if you're nerdy and not involved).

The crux of the matter for me is that the brain cannot be fruitfully modelled using statistics developed for independent samples. This is going to cause us massive problems unless we can develop better statistical tools (or at the very least use the ones we have more wisely).


No, that's not the crux of the matter. It has nothing to do with the brain per se. The voodoo paper was about people using fMRI improperly analyzing their data, but as the authors point out, there are existing tools that are fine and should have been used. Likewise for the recent Nature Neuro paper.


I think bad statistics education is just a facade that hides what is really happening. Now I don't have proof, but I think the majority of these "errors" are done on purpose. It's far better to fudge your math, get amazing (and wrong) conclusions and then get published in Nature then it is to not get published in Nature.

The prestige of getting published in Nature or Science far outweighs the criticism you will get for forging or manipulating your data. In large part because the later can almost never be proven. You can always say you just made a mistake or plead ignorance.


These aren't errors that would be caused my manipulating your data. If you were manipulating your data, you'd have made sure that your results were significant with the correct tests. These are errors where people didn't use the proper test.

At the worst, you could only claim that people only submitted the results of the test that made their research look better than it otherwise would have been (with the correct test).

In this case, I think it is more of an issue with the reviewers catching the problems than the authors deliberately misleading.


Actually, in the case of this specific error, reviewers are often shut down by editors who want to publish the finding. It happened to my friend's advisor recently: she called them out for it and the editor said it would be published anyway.


Actually, that is a huge problem. You have to decide on the test you are going to use before you see the data. After you've seen the data you can always come up with a specially made statistical test that will prove anything you want with any p-value you want.


1. Outright forgery or manipulation of data is a huge black mark on a researcher. Huge. It lasts forever, and is much bigger than the prestige from a single published article, where ever it was published.

2. http://en.wikipedia.org/wiki/Hanlons_razor


Every scientist who has been caught forging data, has basically just claimed it was an accident. Whether they got "punished" or not depended purely on their political connections, not their behaviour.

What's worse than the possibility of a black mark on your career? Not having a career at all. Which is what will happen if you don't publish.


You're basically making an economic argument, where you assume the risks for cheating outweigh the consequences of being caught. But your support for that argument is the claim that the risk of being caught forging data is less than the risk of not having a career, but that is a false dichotomy. There is a middle ground, which is publishing valid but less sensational results.

In other words, the choice is never "Forge these results, or have no career."


> In other words, the choice is never "Forge these results, or have no career."

Yes it is, because in order to have a career, it is not enough to just publish. You have to publish in prestigious journals and you have to publish original, groundbreaking, important, and meaningful research.

A tiny percentage of academics actually end up keeping their job until retirement. If you don't plan on perishing, you publish in prestigious journals by whatever means necessary. Whatever. Means. Necessary.

"Accidentally" using bad statistics is the kind of "accident" that gets you tenure. It's significantly easier to forge groundbreaking, important, prestigious research than it is to actually do this research.

Scientific fraud is epidemic precisely because of the economics of universities.


I experienced this mindset in my graduate studies. Many graduate advisors spend so much time writing grant proposals and hobnobbing with government project managers that they didn't care what result their experiments got, as long as it was positive. I lost interest in being a researcher primarily because it's no different than working at a company - except you get paid less.

I never saw any researchers or students falsify or make up data, but often I saw experiments performed without defining a hypothesis first. The hypothesis was created a posteriori to match any interesting correlations that could be combed from the data. I felt like I was a marketer trying to find a reason to promote a product. This particular approach might be standard when dealing with in-vivo experiments (on human subjects). It's a time-intensive process and you can't just rerun the experiment like you can on an artificial system. However, I didn't feel like it was the correct way to do research.

There's been renewed interest in recent years of creating a journal of failed experiments for various research areas, but it never seems to catch on. It would be invaluable, but there are too many egotistical people in academia who don't want to be associated with failure in any way.



I know many, many researchers, and very, very few who have published anything "groundbreaking." Meaningful, original, often even important, but rarely groundbreaking. There are different tiers of venues to publish in, just as there are different tiers of institutions to do research in.

I think the economics often are perverse, but I think you're doing a disservice to the overall discussion by simplifying it too much.


You're splitting hairs. Groundbreaking has a pretty low bar these days.

Just remember that 50 year old profs came of age in a different era.


Huh? As a field matures, the bar for groundbreaking increases. Most research is iterative, including everything I've done, and almost everything my peers have done.


Why are you stating the obvious?

I am trying to make a point that in the world of vicious competition for grant money, every shitty incremental experiment is lauded as groundbreaking. For very understandable reasons.

I'm being facetious about the way that the key quality of a scientist has shifted from doing and teaching science to being a great marketer who can win grants. The "best" scientists these days are generally the ones who are really great at hype.


It was not clear that you were using "groundbreaking" facetiously. Hence my confusion.

I agree there are problems with how science works right now, and I agree on what the problems are. I disagree on the weights of the incentives, though. You describe "rampant" fraud. I have observed none. Then my conclusion is that the claim that fraud is rampant is incorrect.


>I have observed none.

How hard have you looked and what is your definition of fraud?


Having noticed "Hanlon's Razor" becoming the default reaction for a number of HN posters, I have to say what I feel is really bad about it when it applied to complex human endeavors.

When one deals with modern institutional corruption, what one is dealing with is a complex combination of incompetence, overt deception of others, self-deception and simulated incompetence.

Sure, incompetence is in the mix but when you use a construct like "Hanlon's Razor" to "label and forget" the situation, you hide the full situation. The complexities of these situations are there, there is no razor for a priori untangling.


I think of it as a bias towards suspecting incompetence rather than a bias towards suspecting malice. I think people are too quick to assign narrative to situations, and malice is a stronger narrative than incompetence.


Errors like this have a life of their own -- there is no ill intent. Once a mistake is made in a seminal paper, most papers that build on that research will usually copy the error. Most authors do not test the statistical assumptions made in the literature and copy the statistical techniques used previously. I once traced an absurd FDA policy decision to a simple statistical mistake that had passed unchallenged through multiple papers and review committees.


I do not think it is outright fraud but rather are result of publication bias. People tend to publish results more often than the lack of a result, and in areas of research where there is much noise compared to signal statistical errors will stand out as interesting.

This is also worsened by that today you have to publish often so you do not have the time to check the result before publishing.


Whatever the degree to which these errors are purposeful vs. accidental (a distinction I think can actually be hard to make when you consider how thoroughly and subtly people deceive themselves), remember that it's a hell of a lot easier to improve the statistics education of the researchers than their ethics. And improving their education for some of these simple ideas can be at least as effective; if it's obvious to everyone what the correct statistical technique is (because they've all been a bit better educated), then such papers won't be able to be published with these errors.


After reading "A Mathematical Model for the Determination of Total Area Under Glucose Tolerance and Other Metabolic Curves"[1] I am unable to see any malice in this sort of things.

[1]http://care.diabetesjournals.org/content/17/2/152


I wonder how many people make this error while A/B testing their websites...


Nobody does since we have this thing called the G-test. We can make other errors, but this specific one isn't possible.


I only wish it were so simple. I've presented correct statistical results for A/B tests only to have people try to argue me into accepting incorrect results that would follow from this logical error. If I knew less statistics, or was unwilling to argue with my boss, this error would have been made.

And another common variant of the problem happens when you're testing 10 variations. People want to do a pairwise test on the top and bottom right away, without realizing that, even if all are equal, the top and bottom frequently look different. Or the flip side of that error is that people see that the G-test says that there is a difference, and conclude that the current top one must be better than the current bottom one. Which is again incorrect.

There is a lot of subtlety, and just saying, "I have this statistical test that most people don't understand" is not really going to cut it.


Right, which is why I said "We can make other errors, but this specific one isn't possible" to the question: "I wonder how many people make this error while A/B testing their websites".

I'm familiar with the drawbacks of Taguchi methods and the subtle problems by changing distribution, and the problem of checking the G-test continuously and there-by reducing its effectiveness. But for a simple A/B test (and by that I mean challenger versus champion served randomly from the backend at a static distribution (50-50 through out the life of the test, say)), unless I need to hit the books again, this specific problem is not possible if everyone on board trusts the G-Test (the Yates correction on, etc).


The whole concept of "statistically significant" has to go imo. It causes much confusion and influences experiment design (because in short it focuses on obtaining maximum amount of measurement in unchanged conditions instead of measurements which maximize information value), All that in the name of some arbitrary chosen threshold for statistical significance.


That's quite impressive. Esp. when they suggest that researchers may choose to report differences in significance because the actual interaction effect is not significant. Part of the craze to publish or perish i guess. Having an open science approach would help identify these errors here.

That is not to say that the significant results are not significant, though, just that any claim you may read in the discussion section should be taken with a grain of salt (which i think is already the case, given that brain phenomena are borderline chaotic).

This may even indicate the mere fact that in neurosciences you have zero really large and deep labs (think LHC scale), but instead you have thousands of small labs doing largely overlapping work with usually small sample sizes, all competing for small grants.


This error builds on a simpler, even more common one (at least among students): Suppose you have just one treatment (A), and you find that A has no statistically significant effect. This does not show that A has no effect, or that A's effect is small. (It could just mean your dataset isn't large enough.) The error discussed in the article seems to build off this mistake.


Could you elaborate on this? Does this assume "you" extrapolate the effect of treatment A to the general population?

I mean, I understand that if you have a sample size of two, find that treatment A does not induce an effect, and conclude that treatment A has no effect [for all test subjects] this does not hold. However, surely if the sample size is big enough (which obviously isn't always clear, but for the sake of the argument let's assume it is) then drawing such conclusions does hold (within the certainty thresholds predefined for your statistical test of choice, such as 99% or 95% probability). Or have I misunderstood?


The problem is that standard statistical tests have two outcomes: (1) "reject the null hypothesis" or (2) "failure to reject the null hypothesis". Moreover in the most common statistical tests the null hypothesis amounts to something like "the two samples were drawn from the same distribution" so if you fail to find a significant difference you haven't shown they're the same, you've just failed to show they're different. If what you want to do is show that they're nearly the same, you can design a statistical test where the null hypothesis is instead something of the sort "these two samples were drawn from distributions that differ by > X amount" so rejecting the null hypothesis shows they differ by <= X.


The issue is that it is nearly impossible to detect "the same". Imagine the sample size required to differentiate between .5 and .500000001. Now, technically speaking, sure-- with any effect (however small) in the infinite data limit a difference would be detected. But this isn't really what these tests are intended for.


The author is himself imputing an absolutist interpretation of statistical significance. Statistical significance always has a p value (and an assumed distribution!) associated with it. The permutation of statistical significances of insignificances the articles describe could well be true for various p values.


Many people enter biology and related areas precisely because they dislike math.

How ironic...


That's not ironic. Irony occurs when the outcome is opposite of what actions intended. If many people enter biology and related areas precisely because they dislike math, then the expectation would be that math-related areas of the field (such as statistics) would suffer as a result. Irony (in this case) would be if people entered biology because they dislike math, but then biology as a field ended up being stronger in statistics than the more mathematically inclined fields.


It's ironic because they still have to learn math.


Fair enough.


In general, beware statistics. Check and double check before making assumptions. This applies to more than just neuroscience papers.

When reading scientific papers, beware conclusions.

Isn't the important thing whether someone else can replicate the experiments and achieve similar/same results?


>Isn't the important thing whether someone else can replicate the experiments and achieve similar/same results? //

Or whether a statistically significant proportion of experiments return a significantly similar result?

;0)>


If the experiments are replicated enough times by other labs to create an sufficient sample size.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: