
About 40% of economics experiments fail replication survey (2016) - nabla9
https://www.sciencemag.org/news/2016/03/about-40-economics-experiments-fail-replication-survey
======
chewz
I am still hoping that some day economy will find it's place where it belongs
- next to astrology, numerology and alchemy.

This is with tongue in cheek but as a mathematician (financial math) I find
economic equations and theorems laughable and it's scientific method just a
thin veil over ideology.

~~~
candiodari
This is a problem math majors have with pretty much every other science out
there. You can rank them if you like, and there are parts that are better and
worse (in economics you have game theory, which I'm sure you're not going to
find too ideological, and you have experimental economics ... which is little
better than social sciences).

[https://xkcd.com/435/](https://xkcd.com/435/)

~~~
Ntrails
My experience of economics teaching is something like this:

\- Here is a set of assumptions which are obviously wrong

\- Given that set of assumptions, we can make this model, which has been shown
not to hold true in reality due to the above

\- Repeat ad nauseum

~~~
candiodari
True, but consider utilitarianism. Have you ever tried modeling the economy
using a more reasonable set of assumptions ?

Not so easy.

But yeah I get it. Obviously the basic economic assumptions ignore:

1) government in general, but especially tax law. It's WAY too complex to
model, unfortunately (and I doubt that's a coincidence), nor is it even
constant.

2) it ignores that the vast majority of markets are supply- or demand-
constricted. In reality demand graphs flatten off at some point. Let's take
bread. There is only demand for so much bread, and lowering the prices past a
point no longer increases demand at all. So you can see it exists.

Supply bound is similar. We are just not going to produce more land than there
is. That market is absolutely supply bound.

The weird thing economics glosses over is that nearly every market is either
supply constricted or demand constricted, and the common equations just don't
work there.

3) information. Economics assumes perfect information, when that obviously
doesn't exist. Information spreads though, so over very long time frames
perhaps there's a case to be made that perfect information does exist as long
as the information is old enough.

The thing is, whilst I might agree that the first one is at least
theoretically solvable, it's just too much work. The second and third are
impossible to model. And I bet you could come up with 10 more.

~~~
nopinsight
Your general point that economics models greatly simplify the real economy is
a good one. However, the field has made significant advances to introduce
real-world constraints into the models.

Example of a subfield addressing your 3):
[https://en.m.wikipedia.org/wiki/Information_economics](https://en.m.wikipedia.org/wiki/Information_economics)

I believe 2) is addressed as well but I am not a professional economist so
maybe someone can help edify us.

~~~
candiodari
Even if there are a few game theoretic games in advanced papers that have
limited information it would be a rather large stretch to say that economics
incorporates limited information at this point in time. Maybe it's coming, I
don't know ... It's not going to be soon.

A similar problem exists for 2. Yes I can find a paper or two, or probably
even more that talk about it, but they don't exactly have a wide following
because it just makes things too hard.

Incorporating both ? Can't even find a paper on it.

------
archgoon
This seems to be on par with psychology and medicine.

[https://en.wikipedia.org/wiki/Replication_crisis](https://en.wikipedia.org/wiki/Replication_crisis)

~~~
forapurpose
I'm not sure it's a crisis. Nobody who understands research expects every
study to replicate, so what is a reasonable standard? 40% having weaker
effects isn't a surprise to me, a complete amateur, for whatever that's worth.
The point of research is to push the into novel territories of knowledge; like
new tech at startups, that's necessarily going to include many failures.

If it's a critique of the scientific method(s), I'm all for very serious work
on improving it. But I think the question is, how? Does someone know something
that works better that they are holding back from us? Has any method in the
history of humanity produced better results (or made more sense)? Should all
science stop until it's perfect? What should we do going forward?

EDIT: I meant to add: Maybe social sciences are just really hard. They are a
bit less deterministic than many aspects of physics, for example, where
gravity is so predictable that a small perturbation thousands of light years
away can tell us what's happening there.

~~~
sincerely
>Nobody who understands research expects every study to replicate

I guess I don't understand research then, because it seems reasonable to me to
expect most studies to replicate. Why not?

~~~
zamfi
Well, for one, because the standard for publication in many fields is “there’s
a less-than-5% chance that we observed these effects because of coincidence”.

Combine that with people not publishing negative-results studies, and all it
takes is someone doing 20 studies (or worse, 20 analyses on the same data set)
in order to find that 1-in-20 chance occurring...by chance.

Of course, this has little to do with research itself and much more to do with
the standards researchers hold each other to.

~~~
viraptor
While you're right about the 5%-or-less chance, it's still a 5% chance we're
aiming for. If we end up with 40% over a large number of papers, something
went wrong.

Not publishing the negative results shouldn't affect this number. They're not
included in the 5% chance in the first place.

~~~
evandijk70
This is a common misunderstanding of p-values.

There is a 5% chance of observing the effect, if the effect is not there.

I think the difference is best illustrated by an example:

If you have 20 researchers investigating a hypothesis that turns out to be
false 1 of them will report evidence for a false hypothesis, which will likely
not replicate. Thus, if for every true hypotheses investigated, 20 false
hypotheses are investigated, and the 19 researchers with false hypotheses do
not report their result, that means that 1/2 of the reported positive results
will not replicate.

------
rossdavidh
So, their take is that 60% replication is good news. Yikes.

~~~
air7
With the current Publish or Perish zeitgeist in science this number can be
seen as high.

Also, even if assuming legitimate significant results, it's hard to control
for every possible confounding variable: Maybe part of the effect is due to
some sociological parameters of of the test group, maybe it's the heat/cold,
maybe it only happens on Tuesdays. Even the same stimulus can change its
meaning over time (like the Williams video). Humans make for finicky test
subjects.

IMO a single scientific paper shouldn't be considered much more than a small
indicator that an effect might exist and it will require further work. I think
popular media does Science a disservice by ignoring this and headlining single
papers as "fact".

So given that, 60% is not that bad...

~~~
rossdavidh
True, but also not that good. All of the heat/cold/only on Tuesdays stuff
should have been baked into the original experiment, really. Not that they
should be able to take into account everything, but 60% replication does not
indicate a very high standard. I mean it's not much better than if you just
took and educated guess, with no data, since you could probably do better than
50/50\. The whole point of the study is that it is supposed to be better than
an educated guess.

------
krob
Isn't that how you know something is based in science? commonality of people
given the procedures to replicate, they come out with the same results
consistently? so then this would mean non-scientific results. I guess they're
not scientifically sound studies.

------
gbacon
This is a well-known problem.

 _Hypotheses must be continually verified anew by experience. In an experiment
they can generally be subjected to a particular method of examination. Various
hypotheses are linked together into a system, and everything is deduced that
must logically follow from them. Then experiments are performed again and
again to verify the hypotheses in question. One tests whether new experience
conforms to the expectations required by the hypotheses. Two assumptions are
necessary for these methods of verification: the possibility of controlling
the conditions of the experiment, and the existence of experimentally
discoverable constant relations whose magnitudes admit of numerical
determination. If we wish to call a proposition of empirical science true
(with whatever degree of certainty or probability an empirically derived
proposition can have) when a change of the relevant conditions in all observed
cases leads to the results we have been led to expect, then we may say that we
possess the means of testing the truth of such propositions._

 _With regard to historical experience, however, we find ourselves in an
entirely different situation. Here we lack the possibility not only of
performing a controlled experiment in order to observe the individual
determinants of a change, but also of discovering numerical constants. We can
observe and experience historical change only as the result of the combined
action of a countless number of individual causes that we are unable to
distinguish according to their magnitudes. We never find fixed relationships
that are open to numerical calculation. The long cherished assumption that a
proportional relationship, which could be expressed in an equation, exists
between prices and the quantity of money has proved fallacious; and as a
result the doctrine that knowledge of human action can be formulated in
quantitative terms has lost its only support._

See _Epistemological Problems of Economics_ by Ludwig von Mises.
[https://mises.org/library/epistemological-problems-
economics](https://mises.org/library/epistemological-problems-economics)

Looking backward to past events involving conscious, acting people is history,
not economics. Because human beings are not falling stones, the methods of
physics are a poor fit.

------
HillaryBriss
The study's abstract (
[http://science.sciencemag.org/content/351/6280/1433](http://science.sciencemag.org/content/351/6280/1433)
) says _We found a significant effect in the same direction as in the original
study for 11 replications (61%); on average, the replicated effect size is 66%
of the original._

On average, the replicated effect size is only 66% of the original effect
size. What explains this?

~~~
thaumasiotes
Assume you measure an effect and get two values, the estimated effect size in
standard deviations and a p-value representing the probability that, if the
effect size were zero in reality, you would have estimated an effect size at
least as large as the one you did get.

In phase two, assume you publish a paper if the p-value is less than 0.05,
regardless of the effect size. If p >= 0.05, you don't publish anything.

We are now done with the assumptions.

p-values get smaller as the effect size increases, and they get smaller as the
sample size increases. There is a concept called statistical power which
measures the smallest effect size it is possible for a study to detect at a
given p-value, given the (fixed) sample size that that study has to work with.
Larger sample sizes mean more statistical power and mean that it's possible to
estimate a smaller effect size for the same p-value.

Adding this all up, we can see that:

\- if the true effect size is small;

\- AND if the sample size is "small", defined relative to the true effect
size;

\- AND if we filter studies by whether they meet a p-value threshold ("reach
statistical significance");

\- THEN the only studies that can be published will find effect sizes that are
too large. They do not have the power to detect the true effect size; the only
possible results are "no effect" and "unrealistically large effect".

The quick summary is that a reduced effect size on replication is a strong
indicator that the original finding was spurious. As replications continue
they will trend toward the higher of the true effect size or the floor set by
the available statistical power.

~~~
overeater
There's a simpler mathematical explanation. It's called Regression Towards the
Mean:

[https://en.wikipedia.org/wiki/Regression_toward_the_mean](https://en.wikipedia.org/wiki/Regression_toward_the_mean)

~~~
thaumasiotes
That is not actually an explanation; you can only apply regression to the mean
if you know what the mean is. The explanation I give correctly predicts that
replicated experiments will see their effect sizes decline. Saying "regression
to the mean" does not.

(It is quite possible to interpret this as regression driven by the p-value
threshold, but if you do that, you're relying on the explanation I gave.)

~~~
overeater
That's not true. If I take the top 5 students based on performance in a test,
and put them in a group. They will likely do worse the second time around. No
need to know the mean.

------
cjohansson
It's better than nothing though, but it's probably wise to only used
reproduced science to back up arguments based on science.

------
mmwelt
The article is dated Mar. 3, 2016. It could be informative to add a (2016) to
the title.

~~~
dang
Sure. Done.

------
budadre75
In machine learning's POV, this means the models these researchers proposed
underfit the real-world.

