Most biologists don't understand statistics either. To the majority, the unpaired T-Test is the only test that is needed. Ever. Doesn't matter if you have one or two tails, paired or unpaired trials, normally distributed population or skewed. Most biologists don't take proper statistics classes and most just don't care. I doubt most biologists would even be able to name alternative statistical tests.
In my experience, this usually comes from PIs that worked back in the days of "move bar of light across visual field, see neurons fire". Their experiments were ground breaking and had drastic phenotypes.
On the flipside, I've seen experiments where they are showing that one population of neurons is 2% more dynamic than another population, and sometimes you just wonder if that really matters at all (or if you are really sampling the population well enough)
See http://www-stat.stanford.edu/~ckirby/brad/papers/2005NEWMode... for a really interesting discussion of the way things are headed from a statistician's perspective.
Also, apologies for perpetuating the "arrogant economist" meme. Just reporting what I've experienced. :P
When you study psychology you don’t expect to sit around and do math all the time.
When you study biology you don’t expect to sit around and do math all the time.
When you study economics (especially if you actually want to become and economist) you do expect to sit around and do math all the time.
For many biologists and social scientists statistics classes must seem like a chore that has little to do with what they actually want to do. For economists statistics is just another one of their math tools.
My experience, though, is with applied macroeconomics. We'd derive models consistent with theory, then make sure we avoided a number of time series pitfalls (autocorrelation, mistaking cointegration for correlation, overspecifying a model) using a variety of statistical techniques.
During the last century empiricism’s role in the social science has been growing but social science as an empirical science is still relatively young and, I think more importantly, it’s not really dominated by empiricism the same way, say, physics is.
It’s still necessary to adapt and find methods.
I believe (?) this is normally done correctly in medical studies with placebos, where the typical analysis is to show that a drug has a statistically significant effect compared to the placebo baseline; it's not sufficient to show that the drug has an effect compared to a no-drug baseline, and, separately, that a placebo doesn't.
(E.g. if doing nothing cures 10%, the placebo cures 11% and the drug cures 12%, only the difference drug/nothing may be significant.)
As the linked article says, the idea of statistical significance being a binary state is to blame here.
I think a statistical nitpick is entirely appropriate given the article.
"Nieuwenhuis looked at 513 papers published in five prestigious neuroscience journals over two years. In half the 157 studies where this error could have been made, it was made."
Likewise, if there's a 50% chance that a neuroscientist messed up a particular kind of analysis, because they don't do it all the time means that less than 50% of neuroscience papers will contain that error.
Still, it's embarrassing.
So, with the data you have, it is often much harder to show A > B than to show A > C and B !> C.
They examined 5453 scientific articles in 12 leading journals from 1993 to 2002. I like how the authors tactfully state that "the gathered data reflects the relevance of software engineering experiments to industrial practice and the scientific maturity of software engineering research."
Couldn't we just do a way with statistical significance, and just publish likelihood ratios, or decibels of evidence (in favour of one hypothesis over another) ? That way, we should know exactly how much an experiment is supposed to be worth. No arbitrary threshold. Plus, you get to combine several experiments, and get the compound evidence, which can be much stronger (or weaker) than the evidence you get from any single one of them. And then you may have found something worthwhile.
This is especially crucial when said evidence is expensive. In teaching, for instance, one researcher can hardly do experiments on more than 2 or three classrooms, over little more than a year. This is often not enough to accumulate enough evidence at once for reaching statistical significance. But a bunch of such experiments may very well be. (Or not, if the first one proved to be a fluke.)
The article by Ioannidis, which I think is the most downloaded ever article on PLoS Medicine, is well worth reading for guidance in how to avoid errors in research design.
But to the point of the accuracy of the submission title, what the underlying journal paper
said was, "We reviewed 513 behavioral, systems and cognitive neuroscience articles in five top-ranking journals (Science, Nature, Nature Neuroscience, Neuron and The Journal of Neuroscience) and found that 78 used the correct procedure and 79 used the incorrect procedure. An additional analysis suggests that incorrect analyses of interactions are even more common in cellular and molecular neuroscience." In other words, half the time when the specific issue came up, the neuroscience authors got the procedure wrong.
AFTER EDIT: Meanwhile, a lot of other interesting comments have mentioned the role of statistics education or math-aversion in higher education in various disciplines. I'll comment here about some things I've learned about statistics since I completed my higher education, things I've self-educated about as a homeschooling parent for the last two decades and now a mathematics teacher in private practice.
First of all, even when undergraduate students take courses in statistics, the courses are not likely to be very helpful. Many statistics textbooks used in colleges are poorly chosen
and many statistics courses are taught by professors who themselves have very poor backgrounds in statistics, so the essential point that statistics is all about DATA never gets emphasized. Moreover, the undergraduate statistics curriculum historically has emphasized the wrong issues about valid inference
and most undergraduates who complete one or two statistics courses still have a very weak sense of what valid statistical inference is.
And all of this is not even to get into issues such as Bayesian versus frequentist approaches to statistics
in modeling reality. Yes, biologists need to get over fear of mathematics for biology to progress as a science,
and everyone can gain by learning more about statistics,
but statistics education isn't easy, and it can still be greatly improved even for the students who do step up to take statistics courses.
Particularly in light of the statland.org link above, what would you recommend as good reading material to help me "learn to think like a statastician"?
The actual paper is: http://www.nature.com/neuro/journal/v14/n9/full/nn.2886.html
It appears to follow on from an extremely controversial paper in neuroscience: Voodoo neuroscience http://escholarship.org/uc/item/51d4r5tn;jsessionid=FB5843BB...
If you look at the cited by links in scholar, then you can get access to a raging debate across quite a number of journals (well worth popcorn if you're nerdy and not involved).
The crux of the matter for me is that the brain cannot be fruitfully modelled using statistics developed for independent samples. This is going to cause us massive problems unless we can develop better statistical tools (or at the very least use the ones we have more wisely).
The prestige of getting published in Nature or Science far outweighs the criticism you will get for forging or manipulating your data. In large part because the later can almost never be proven. You can always say you just made a mistake or plead ignorance.
At the worst, you could only claim that people only submitted the results of the test that made their research look better than it otherwise would have been (with the correct test).
In this case, I think it is more of an issue with the reviewers catching the problems than the authors deliberately misleading.
What's worse than the possibility of a black mark on your career? Not having a career at all. Which is what will happen if you don't publish.
In other words, the choice is never "Forge these results, or have no career."
Yes it is, because in order to have a career, it is not enough to just publish. You have to publish in prestigious journals and you have to publish original, groundbreaking, important, and meaningful research.
A tiny percentage of academics actually end up keeping their job until retirement. If you don't plan on perishing, you publish in prestigious journals by whatever means necessary. Whatever. Means. Necessary.
"Accidentally" using bad statistics is the kind of "accident" that gets you tenure. It's significantly easier to forge groundbreaking, important, prestigious research than it is to actually do this research.
Scientific fraud is epidemic precisely because of the economics of universities.
I never saw any researchers or students falsify or make up data, but often I saw experiments performed without defining a hypothesis first. The hypothesis was created a posteriori to match any interesting correlations that could be combed from the data. I felt like I was a marketer trying to find a reason to promote a product. This particular approach might be standard when dealing with in-vivo experiments (on human subjects). It's a time-intensive process and you can't just rerun the experiment like you can on an artificial system. However, I didn't feel like it was the correct way to do research.
There's been renewed interest in recent years of creating a journal of failed experiments for various research areas, but it never seems to catch on. It would be invaluable, but there are too many egotistical people in academia who don't want to be associated with failure in any way.
I think the economics often are perverse, but I think you're doing a disservice to the overall discussion by simplifying it too much.
Just remember that 50 year old profs came of age in a different era.
I am trying to make a point that in the world of vicious competition for grant money, every shitty incremental experiment is lauded as groundbreaking. For very understandable reasons.
I'm being facetious about the way that the key quality of a scientist has shifted from doing and teaching science to being a great marketer who can win grants. The "best" scientists these days are generally the ones who are really great at hype.
I agree there are problems with how science works right now, and I agree on what the problems are. I disagree on the weights of the incentives, though. You describe "rampant" fraud. I have observed none. Then my conclusion is that the claim that fraud is rampant is incorrect.
How hard have you looked and what is your definition of fraud?
When one deals with modern institutional corruption, what one is dealing with is a complex combination of incompetence, overt deception of others, self-deception and simulated incompetence.
Sure, incompetence is in the mix but when you use a construct like "Hanlon's Razor" to "label and forget" the situation, you hide the full situation. The complexities of these situations are there, there is no razor for a priori untangling.
This is also worsened by that today you have to publish often so you do not have the time to check the result before publishing.
And another common variant of the problem happens when you're testing 10 variations. People want to do a pairwise test on the top and bottom right away, without realizing that, even if all are equal, the top and bottom frequently look different. Or the flip side of that error is that people see that the G-test says that there is a difference, and conclude that the current top one must be better than the current bottom one. Which is again incorrect.
There is a lot of subtlety, and just saying, "I have this statistical test that most people don't understand" is not really going to cut it.
I'm familiar with the drawbacks of Taguchi methods and the subtle problems by changing distribution, and the problem of checking the G-test continuously and there-by reducing its effectiveness. But for a simple A/B test (and by that I mean challenger versus champion served randomly from the backend at a static distribution (50-50 through out the life of the test, say)), unless I need to hit the books again, this specific problem is not possible if everyone on board trusts the G-Test (the Yates correction on, etc).
That is not to say that the significant results are not significant, though, just that any claim you may read in the discussion section should be taken with a grain of salt (which i think is already the case, given that brain phenomena are borderline chaotic).
This may even indicate the mere fact that in neurosciences you have zero really large and deep labs (think LHC scale), but instead you have thousands of small labs doing largely overlapping work with usually small sample sizes, all competing for small grants.
I mean, I understand that if you have a sample size of two, find that treatment A does not induce an effect, and conclude that treatment A has no effect [for all test subjects] this does not hold. However, surely if the sample size is big enough (which obviously isn't always clear, but for the sake of the argument let's assume it is) then drawing such conclusions does hold (within the certainty thresholds predefined for your statistical test of choice, such as 99% or 95% probability). Or have I misunderstood?
When reading scientific papers, beware conclusions.
Isn't the important thing whether someone else can replicate the experiments and achieve similar/same results?
Or whether a statistically significant proportion of experiments return a significantly similar result?