
How to Tell Good Studies from Bad? Bet on Them - mrjaeger
http://fivethirtyeight.com/features/how-to-tell-good-studies-from-bad-bet-on-them/
======
evmar
It is funny they were worried about whether they just got lucky with their
result, then did the prediction market thing, and then didn't worry whether
they just got lucky with _that_ result! (At least the article didn't, perhaps
the researchers did.) So here are some amateur stats, please check my work.

This article says that the prediction market correctly predicted 71% of the
replication results of 44 studies, or 31 correct.

Assume the studies have a 50% chance of being replicable. Then a random coin
would predict a mean of 22 correct with a std dev of sqrt(0.5 * 0.5 * 44) =
3.3. This sample has a z score of 2.72, which means there's a probability of
0.003264 (0.3%) of the random chance approach being correct 71% or better. So
the result seems pretty significant. (Changing the assumed 50% to other values
makes the probability even more extreme.)

~~~
Vraxx
Wouldn't changing the assumed rate of 50% to something higher make the
probability less extreme? I know my initial inclination is to believe that
less than 50% of the published studies have results that are able to be
replicated, but that could be just the cynic in me. I know that the article
mentioned that just 39% of the selected studies were able to replicate the
results, but that could be caused by a selection bias in which 44 studies were
used. I'm inclined to believe this study, especially because the results are
rather amusing, but there is definitely room to be more sure.

~~~
evmar
Thank you for thinking critically about my comment! As I mentioned I am an
amateur at stats and I now think I modeled the problem wrong.

Let's hypothetically suppose that we knew that 75% of the studies were
replicable. Can we make a better coin flip prediction? If you had a coin flip
that says yes 75% of the time, it isn't necessarily correct at a rate of 75%
-- instead it'd be right .75^2+.25^2 = 62.5% of the time. In fact a coin that
just predicted "replicable" every time in this scenario would be right 75% of
the time. So I think maybe my null hypothesis should've been based on "did
they do better than a parrot that always says yes", not a coin flip.

I _think_ the math in the original problem stays the same, it's just you
change it to "a coin that always predicts it will be replicable". And in that
case, if the underlying rate of replicable was 71%, then their prediction
market only does as well as the always-yes coin and is in fact not very
useful.

------
jimrandomh
This is a good answer from the incentives angle - how to motivate people to
check whether studies are good or bad. On the object level, the answer is
surprisingly simple: actually read the thing. The press is full of stories
where a journalist rephrased another journalist's story about a press release
about a study, and when you actually go to the study, it says something subtly
different. When I see those, I try to jump out of the journalist's summary and
get to a PDF as fast as possible, guess what the caveat is going to be, then
check for it.

~~~
pavpanchekha
There are two stories here. One is, how do you—I'm guessing you're not a
scientist—tell which studies are worth a damn. In this case, yes, reading the
study is the best thing to do.

The second is, how do scientists know which studies are good? This is a harder
job than for you¸ because you're only ever made aware of papers that made it
through the publication gauntlet. (No, they won't publish "anything" these
days, not in a journal that gets any coverage.) For scientists, the task is
harder, so prediction markets might be the tool they need.

------
masonhipp
"The beauty of the market is that we allow people to be Bayesian" [...]
"People come in with some prior belief, but they can also follow prices to see
what other people believe and may update their beliefs accordingly [...]
participants in the market could focus their bets on the studies they felt
most sure of, and as a result, rough guesses didn’t skew the averages as
much."

It certainly isn't a fool-proof method of increasing accuracy, and it does
favor popularity of a theory over other factors, but overall it's probably a
nice layer of data to consider adding to the mix.

~~~
smt88
It isn't fool-proof, but there is a lot of research into the phenomenon that
groups of humans are pretty good at predicting outcomes (much better than most
individuals). I forget the math behind it, but it makes a lot of sense
mathematically.

Here's a book all about it: [http://www.amazon.com/The-Wisdom-Crowds-James-
Surowiecki/dp/...](http://www.amazon.com/The-Wisdom-Crowds-James-
Surowiecki/dp/0385721706)

~~~
masonhipp
Very true. There's another one floating around somewhere about how good we are
at estimating the IQ of other people. Pretty interesting.

------
sharp11
The problem with this is that it seems likely to be biased against unexpected
results or results that contradict the dominant theory. The old saying,
"Science advances one funeral at a time," has a lot of truth in it.

~~~
yummyfajitas
If you are correct you can make money by betting contrary to the bias.

Science advances one funeral at a time. The stock market accelerates the
process by separating fools from their money.

~~~
Alex3917
> If you are correct you can make money by betting contrary to the bias.

This betting market only covers reproducibility, not truth. But since most
reproducible findings are still false, betting contrary to bias is unlikely to
work.

~~~
yummyfajitas
Could you give some toy example to illustrate this? I honestly don't
understand what you are trying to claim.

Toy example to illustrate my claim: Suppose a given finding has a probability
p of replicating, but the biased market estimates q < p. This means that you
must spend $q and if the experiment replicates you'll earn $1.00.

On average you'll win $p from these bets - $1.00 exactly p of the time. Your
net winnings will be $p-$q. As long as your theory more accurately predicts
the replication probability, you make money.

~~~
Alex3917
As an example, what if the statistical analysis for a study was wrong? In that
case the study may well be replicable, but the underlying claim being made by
the study is still false.

At least in the context of the phrase 'science advances on funeral at a time',
usually folks are talking about studies that are wrong for reasons other than
replicability. (Because in general it's pretty easy to convince folks that
something is false if it can't be replicated, but much more difficult to
convince folks that there is some deeper methodological or epistemological
issue.)

~~~
yummyfajitas
Given that replicability would presumably be about predicting observable
statistics, an incorrect calculation wouldn't change it.

I.e., observe a bernoulli trial and see 60 successes in 100 trials. Plan a
followup study which will repeat the bernoulli trial 100x - predictions would
be on the # of successes in the followup, not some statistical analysis.

The phrase "science advances one funeral at a time" is somewhat orthogonal to
this idea - it tends to be about older scientists being unwilling to accept
new theories in spite of repeated successes. Such scientists would lose their
money if they bet or be ignored if they didn't.

------
btilly
I really like the idea at the end of using prediction markets to figure out
which studies should be challenged by attempting replication.

~~~
nonbel
Replication is not supposed to be a "challenge", that is not the point. The
first point is ensure you can communicate the methods effectively, this
demonstrates the experimental conditions are understood. The other point is to
see if the observation is stable to whatever differences arise due to varying
time/location. And they should be looking at estimates of effect size, not
whether p<0.05.

Attempting to replace the role of independent replication with opinion is an
awful idea. And that is the goal here, to replace, not to supplement. Of
course, it doesn't seem replication attempts are very common in this area to
begin with. So this is actually going to be a justification for continuing
that pseudoscientific practice and to avoid checking all the previous results.

~~~
btilly
No, the main purpose of attempting replication is to challenge of the result.
"I think you made a mistake, let's try to replicate and show it." And if you
fail to replicate, you publish that and show the world that the result
shouldn't be believed.

If you successfully replicate, then it has survived the challenge. The more
challenges it survives, the more confidence we can have that the result is
valid.

That said, we do not try to make replication challenging. We try to present
experiments in a way that makes replication as easy as possible to perform.
Exactly so that people don't have to take what we said on faith.

The point of the prediction market here is not to replace independent
replication with opinion. It is to ensure that the energy that gets spent on
replication is more likely to be spent effectively.

Ideally, of course, targeting replication efforts more effectively will
increase the value of time spent in attempting replication. This should
therefore increase how much effort is spent on replication. Which is exactly
the opposite of replacing independent replication with opinion!

~~~
nonbel
> "if you fail to replicate, you publish that and show the world that the
> result shouldn't be believed."

A lone report of some observation should never be believed. It should be
verified by others who retrace the steps.

>"If you successfully replicate, then it has survived the challenge. The more
challenges it survives, the more confidence we can have that the result is
valid."

If the observation can be independently replicated, it shows that the methods
required are understood well enough to communicate and that it is stable in
the face of unknown influences. This increases our confidence that we
understand what is going on and that the phenomenon is worth theorizing about.

It has nothing to do with an observation being true or valid. The observation
was made, it happened. It is true. It is valid. (Excepting outright fraud,
which is treated equivalently to some severely unstable phenomenon)

>"The point of the prediction market here is not to replace independent
replication with opinion."

They explicitly say that is the goal in the paper. I like this method of
eliciting priors, not that the goal is to substitute it for actual
replications.

>"It is to ensure that the energy that gets spent on replication is more
likely to be spent effectively."

If a study isn't worth replicating, then it wasn't worth doing and reporting
in the first place.

~~~
btilly
While I generally agree with the attitude behind your comments, they do not
agree with actual research practice.

See [http://www.the-
scientist.com/?articles.view/articleNo/43875/...](http://www.the-
scientist.com/?articles.view/articleNo/43875/title/Psychology-s-Failure-to-
Replicate/) for horrible replication rates in psychology. See
[http://journals.plos.org/plosmedicine/article?id=10.1371/jou...](http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0040028)
for evidence that medicine is no better. See
[http://www.nature.com/nature/journal/v483/n7391/full/483531a...](http://www.nature.com/nature/journal/v483/n7391/full/483531a.html)
for evidence that the research on which we base clinical drug studies is also
in bad shape.

At this point anything that focuses attention on, "We should replicate
SOMETHING" is a huge improvement. Eventually we should get to, "You should
expect to be replicated." But we are a long, loong, loooong ways away from
that now.

~~~
nonbel
>"While I generally agree with the attitude behind your comments, they do not
agree with actual research practice."

I am well aware of this problem. I expect you will have no more success
convincing these people that replication is necessary than you would
convincing a fervent religious believer their deity does not exist.

These juvenile research practices really need to stop though. It is driving
the most intelligent and competent potential contributors away from careers in
these areas.

------
mistermann
I've often though there should be a similar mechanism for solving
disagreements in the workplace.

~~~
Rmilb
The CIA(1), and other Fortune 500 Companies have internal prediction markets
that work quite well. However, internal politics inside of the organization
sometimes spell the death of those markets.

This is more good data that gives me more faith in the www.Augur.net
decentralized prediction market premise.

[1] [https://www.cia.gov/library/center-for-the-study-of-
intellig...](https://www.cia.gov/library/center-for-the-study-of-
intelligence/csi-publications/csi-studies/studies/vol50no4/using-prediction-
markets-to-enhance-us-intelligence-capabilities.html)

~~~
ultramancool
Is there any software to create a simple intranet prediction market?

~~~
adam
We make prediction market software for companies to use internally:
[https://cultivatelabs.com/forecasts](https://cultivatelabs.com/forecasts) if
you want to chat sometime.

~~~
mistermann
Very interesting....since you're in the business, I wonder if you could share
any observations on any difficulty selling into organizations where
politically competent people are very much _not_ interested in discovering and
publicizing the ability of others in the organization to make correct
predictions?

~~~
adam
We often have people participate anonymously - either anonymously among their
peers, or in certain instances, anonymous from anyone in the company and we
serve as the 3rd party arbiter of all the data. It really just depends on the
organization's culture and how transparent they are ready to be.

Minimally though we're looking to establish an ongoing dialogue between the
different levels of an organization. Our belief is that people on the ground
building product, interfacing with clients, etc. aren't consulted nearly
enough about predictions that inform big strategic decisions. Instead, leaders
are making decisions based on input from a limited number of SME's, data
analytics, and their own beliefs. None of these are bad per se, but not
leveraging your own people we believe is a huge opportunity lost.

Happy to follow up live/over email if you'd like. adam at cultivatelabs

------
jeffdavis
The cost of a contract doesn't represent whether the result is reproducible or
not, it predicts the _probability_ that it's reproducible.

So what do they mean when they say it correctly predicted the outcome? Are
they just saying the odds fell on the same side as the reproduction indicated?

If so, that seems arbitrary. If the cutoff for a p-value is 0.05, then
shouldn't we say that any contract selling for less than $95 predicts a
reproduction failure?

------
benp84
So in other words, a bunch of people guessing which hypotheses were true was
more accurate than actual scientific studies of them (71% vs 39%). Great.

------
Houshalter
>With a p-value [of 0.01], the result hardly screamed “false positive” like a
barely significant one of, say, 0.05 might.

Is 0.01 that low for such a crazy finding? Let's say you believe that it has a
probability of 1 in 10,000. And that result really seemed really really
unlikely. 1 in 10,000 might be generous. Then, after this study, the
probability that it's true is 1%.

------
qznc
Prediction markets are great tools in general. Unfortunately, incentives are
usually against implementing them. Experts are easier to control.

~~~
nemo1618
There are a few nascent cryptocurrency-based prediction markets on the
horizon, Augur being the most well-known. If one of them takes off, it could
influence our economy and society in a big way.

------
jerryhuang100
isn't that just how options or event prediction exchange / markets work?

