Hacker News new | comments | show | ask | jobs | submit login

Unfortunatley, publishing these kind of claims prematurely help the more gullible among us to fall for ridiculous claims from psychics and others who would take advantage of them.

True. But on the other hand, publishing ridiculous claims and incorrect results is a necessary part of science.

When we publish only results we know to be correct, because they agree with mainstream beliefs, we introduce a bias into the scientific process. In reality, if you publish 20 experiments with p=0.05 [1], 1 of them should be incorrect. If less than 1 in 20 of your papers isn't wrong (assuming p=0.05 is the gold standard), you are not doing science.

You can see a perfect illustration of this when people tried to reproduce Millikan's oil drop experiment. I'll quote Feynman: Millikan measured the charge on an electron...got an answer which we now know not to be quite right...It's interesting to look at the history of measurements of the charge of an electron, after Millikan. If you plot them as a function of time, you find that one is a little bit bigger than Millikan's, and the next one's a little bit bigger than that, and the next one's a little bit bigger than that, until finally they settle down to a number which is higher.

Why didn't they discover the new number was higher right away? It's a thing that scientists are ashamed of - this history - because it's apparent that people did things like this: When they got a number that was too high above Millikan's, they thought something must be wrong - and they would look for and find a reason why something might be wrong. When they got a number close to Millikan's value they didn't look so hard. And so they eliminated the numbers that were too far off, and did other things like that...

This is why I'm an advocate of accepting/rejecting scientific papers based solely on methodology, with referees being given no information about the conclusions and with authors being forbidden from post-hoc tweaks. You do your experiment, and if you disagree with Millikan/conclude that ESP exists, so be it. Everyone is allowed to be wrong 5% of the time.

[1] I'm wearing my frequentist hat for the purposes of this post. Even if you are a Bayesian, you should still publish, however.




If you're going to use highly subjective frequentist statistics at all, p < 0.001 should be the minimum gold standard for extraordinary claims. If the phenomenon is real, and not bad statistics, it only requires two and a half times as many subjects to get p < 0.001 instead of p < 0.05. Physicists, who don't want to have to put up with this crap, use p < 0.0001. p < 0.05 is asking for trouble.


A complication is that if the effect were real, all our ideas of prior vs. posterior probability would need re-thinking. The hypothesis is that humans can be influenced by posterior events. That includes the experimenters.


Ok, then let's funding psychology like we fund physics. I would love to run 1000+ patient studies to test psychotherapies, and in fact we'd be able to answer some really interesting questions if we did, but there is currently no way of doing this.


I repeat, you do not need 1000 times as many subjects to get results that are 1000 times as significant! If 40 subjects gets you results with p < 0.05, then 100 subjects should get you results with p < 0.001. Doing half as many experiments and having nearly all the published results being real effects, instead of most of them failing to replicate when tested, sounds like a great tradeoff to me.

And I suspect the ultimate reason it's not done this way... is that scientists in certain fields would publish a lot fewer papers, not slightly fewer but a lot fewer, if all the effects they were studying had to be real.


"1000+", not "1000x". Also, I'm assuming bigfudge was talking about p < 0.0001, given the comparison made to physicists.


Yes - thanks. The current norm for a 'suitably powered' trial of a psychotherapy is about 300. We've just got a trial funded for that number (admittedly in a challenging patient population) which will cost about £2.5m in research and treatment costs. We would love to run 1000 patients and start looking at therapist-client interactions, individual differences in treatment suitability but that's out of the question.


Let's fund psychology like that when the psychologists define their hypotheses before the experiment begins as well as physicists do.


That's a cheap shot. Our trial will publish a detailed protocol and analysis plan, as do most large, publicly funded trials. Small-scale experimental work is a different matter. I personally agree that all experiments which could end up in peer reviewed journals should be registered before participants are recruited.

This would be simple to do by submitting ethics applications and an analysis plan to a trusted third party which would only release them once the author is in a position to publish, or at a pre-agreed cutoff (perhaps 2 years), whichever is the shorter (to avoid scooping). Perhaps I should set something up...


Having moved from physics to biology, I am amazed with the difference in what the consensus of 'significant' is. Some of the difference is due to necessity, but not all.


There are more reasons to doubt our results, so we lower our standards of evidence?


Sounds reasonable.

When some people find that their model doesn't quite fit, they make a more accurate model. Others make a less specific model. It's the difference between model parametrization and model selection.

So when we get a dubious result, we can either say "no result" or "possible result". The choice tends to depend on how the finding affects future research. Biology is more exploratory than confirmatory, so they go that way.


I wish I could upvote this 1,000 times.

People occasionally mention p-value calibration and note, sadly, the damage caused by this reckless practice that allows false results to eke through the airtight, 300' tall walls of scientific publication. But there is value in being wrong. It's a part of science.

In a way, it's the MD's White Coat syndrome applied to PhDs. Something that is scientific and written in a journal is necessarily correct in public opinion instead of the rigorously considered opinion it really is. Both paper-reading public and the authors of some of those papers tend to believe this.

And to cover it from a Bayesian point of view, it's pretty vital to keep the culture such that the risk of publishing something incorrect doesn't to strongly dominate the decision to publish. You should be confident talking about your beliefs long before they distribute like deltas.


> In reality, if you publish 20 experiments with p=0.05 [1], 1 of them should be incorrect.

In reality it doesn't turn out this way because the results that get written and published tend to be biased in favour of novelty and demonstrating a relationship rather than the absence of one. How many similar experiments could have been terminated, never submitted or not published because they failed to show anything notable? This is one of reasons... 'Why Most Published Research Findings Are False' http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/

That meta-study applied to medical studies and I think this genre would probably fair even worse when it came to long term replicability.


And to go the other way, it doen't happen like that because the rule of thumb isn't p=0.05, it's p<=0.05 - and p can be quite small indeed, if you run out of ideas before running out of data (such as might happen in a novel area).


This is why I'm an advocate of accepting/rejecting scientific papers based solely on methodology, with referees being given no information about the conclusions and with authors being forbidden from post-hoc tweaks.

A million times yes. Also: no publication without the experiment's methodology and criteria for success having been registered prior to the experiment's commencement.


agreed - I replied to a comment above with this suggestion. It would be nice if grant bodies started requiring this for all funded research, and kept (public) track of researchers with a bulging file drawer.


Good point. I shouldn't have blamed the researcher -- I read the paper and it seems straight forward enough with a number of controls in place. (For example, running a second experiment using only random number generators that showed no such results.)

Instead, I should have focused on science journalists, who should be extra diligent when reporting these sorts of stories to point out the possibility that this is a false positive.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: