
Is Economics Research Replicable? Sixty Published Papers Say “Usually Not” [pdf] - Gimpei
http://www.federalreserve.gov/econresdata/feds/2015/files/2015083pap.pdf
======
stdbrouw
> The most common reason we are unable to replicate the remaining 45 papers is
> that the authors do not provide data and code replication files.

> We define a successful replication as when the authors or journal provide
> data and code files that allow us to qualitatively reproduce the key results
> of the paper.

Well this is underwhelming. I mean, sure, they're talking about papers in
journals for which sharing data and code is required when asked for, and so
they have definitely exposed widespread ignorance of the rules, maybe even a
refusal to adhere to them... but the replication that is being talked about is
"can we download the data, press a button and get the same results the
original authors did?" and not "can we run the experiment again or run the
analysis independently and get similar or the same results?"

Personally I like the the distinction between replication (a new experiment
but with the same setup), reproduction (corroborate using different methods)
and re-analysis (download the data, run the code, maybe do some additional
analysis). This paper is entirely about re-analysis, not about replication or
reproduction. (Cf.
[http://sequoia.cs.byu.edu/lab/files/reser2010/proceedings/Go...](http://sequoia.cs.byu.edu/lab/files/reser2010/proceedings/Gomez%20-%20Replication%20Reproduction.pdf))

In one sense, failed re-analysis means research cannot even clear the lowest
possible bar: you can't even check if the analysis produces the numbers that
are mentioned in the research. But in another sense, whether or not
researchers manage to release their code or not is only very weakly associated
with how good that research is. Research might "fail" re-analysis because no
code was provided, but survive both replication and reproduction.

The authors compare their work with the Open Science Collaboration which
recently pointed out so many unreplicable studies in psychology, but this is
not a fair comparison at all. The Open Science Collaboration was a huge
endeavor and redid a bunch of experiments from scratch. This is just asking
authors "give me your data" and checking a mark "did not replicate" if they
didn't.

~~~
inefficient
While I agree with the bulk of what you said, I do think it's important to
understand that the difference with the Open Science Collaboration isn't
really what you said. In my opinion it may be worse.

Being a (somewhat disillusioned) economist, I've read some of these papers but
certainly not all of them. What I can tell you is that there isn't an
"experiment" to replicate. These papers seem to be mostly or entirely
computational macroeconomics and econometrics. In stuff like this, they design
a model (a simple example is the real business cycle model), pump random data
into it and see how different changes effect the model (i.e. the relationship
between the volatility of unemployment with the volatility of output) and do
those relationships match what we see in the data?

The replication as you define it (and I agree with the definition), should be
pumping new random data into the model and still yielding the same results.
However, it still leaves a few big issues. Such as, are these relationships
really in the data? Because some of those relationships change depending on
the time frame. So it the results may actually explain what occurred, but they
shouldn't be used to explain what will occur later.

For reproduction and re-analysis, for this research, they probably need to go
together. If we've defined a mathematical model, then it should be possible to
program that model across platforms and software and still yield consistent
results with different sources of random input data. And for verification, I
think this is really important. Because I know I've messed up my programs
before and gotten completely reasonable output that turned out to be
incorrect. The authors didn't exactly describe how much they verified the
programs were doing what they were supposed to do.

Honestly, I don't know that I can make myself care too much about the output
results from the model until we can agree on what things are important for the
model to show in the past, present, and future. And in academic economics,
these important characteristics are almost canon and untouchable.

~~~
nmrm2
_> they design a model... pump random data into it and see how different
changes effect the model_

?!?!

This is such an odd way to demonstrate results about a model. For hypothesis
testing or preliminary research, sure. But as a result?

 _> If we've defined a mathematical model_

...then the way that we establish properties about the model is by _writing a
proof_.

What am I missing?

~~~
inefficient
> This is such an odd way to demonstrate results about a model. For hypothesis
> testing or preliminary research, sure. But as a result?

So these are typically some combination of dynamic and stochastic difference
equations that are supposed to mimic the real economy and may not have any
analytical solution. The goal is to include different types of shocks,
different inputs, and different functional forms and run simulations to see if
the output shows the same patterns as the real economy.

What we tend to find in these models is that we can easily get some of the
characteristics to match while others do not: i.e. matching the volatility in
unemployment while also matching the volatility in wages, output, consumption,
etc.

This is more along the lines of trying to be engineers rather than
mathematicians. Which I'm fine with. My issue is (1) the supposed
relationships we are trying to match are not guaranteed across nations or
time. Thus even if the model is describing a certain period of time very
accurately, it isn't useful at all in general. And (2) we take these models
and get the resulting parameter values from the simulation that is at best
good at describing some of the stuff happening in the economy and treat them
like they apply to the real world (an example parameter might be the risk
aversion of consumers in the economy).

We then use these parameters to describe what changes should be made to the
economy in the event of certain types of shocks.

And if I'm being honest, my biggest complaint for these models stems from the
idea that they all need magical micro-foundations in order to be considered
realistic. This stems from the Lucas Critique and is likely the biggest
travesty of economics in its history as a science. It is along the lines of
saying general relativity isn't realistic, because it needs to be built using
quantum mechanics.

What absolutely kills me is that many economists obviously compartmentalize
academic research from what they suggest doing in the real world (this can be
seen by the fact that the majority of economists polled supported the stimulus
in the U.S., felt that it helped recovery from the recession, or at least kept
it from getting worse).

------
jjoonathan
One of my favorite meta-analyses in this vein is the inverted funnel of
Doucouliagos and Stanley [1] which investigates elasticity in the employment
market. In particular, fractional change in employment / fractional change in
minimum wage. The idea is that so many studies have been conducted on this
topic that if you construct a scatterplot by putting N (study size) on the y
axis and elasticity (outcome) on the x axis, then you get a funnel (wide at
low-N, narrow at high-N) that can (arguably) reveal selection bias since
selection bias will disproportionately affect the wide, low-N side of the
funnel.

Of course, this technique can only be performed if you have not only large
studies, but a large number of studies -- so large that you can resolve an
empirical distribution of outcomes at several different N bands. It's
therefore limited to a small number of topics. Still, it's neat to get
quantitative insight into an effect that is usually unobservable.

[1] pdf page # 33 of
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.398...](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.398.2297&rep=rep1&type=pdf)

------
abcampbell
I find it kind of funny that people are _surprised_ by this.

Reading the paper, it also feels like the authors derive entirely the wrong
recommendations from this result.

Seems like they have identified a set of reasonable problems they encountered
in trying to reproduce these studies. Fair enough.

To think that the solution is just to add a lot of documentation requirements
on the researcher to address those specific problems just seems totally naive.

This would inevitably act as a form of regulatory structure on top of all the
gruff people have to go through to get results 'published.' In doing so, this
would also add cost and complexity to an already archaic system, and in a way
that does not deal with the underlying root causes that are creating these
problems.

------
ap22213
The federal reserve study seems a little suspect. Not saying it is, but it's
curious that it just happened to be 51% of the papers that were deficient.
And, their methodology kind of glossed over their sampling strategy.

I wish the science community had better methods for documenting the entire
scientific process and timeline. We live in an era where everyone has access
to computers, but research results are still offered as paper documents. This
provides a lot of opportunity to 'fudge the numbers' \- to re-write the
hypothesis and/or methodology to fit the intended outcome.

Wouldn't it be much better if research results were stored in standardized
file format that showed an unalterable timeline of all activity?

------
dang
Url changed from [http://www.businessinsider.co.id/federal-reserve-paper-on-
th...](http://www.businessinsider.co.id/federal-reserve-paper-on-the-
replicability-of-economic-studies-2015-10/), which points to this.

------
LiweiZ
In the end of the day, people need what they feel a better way to think and
discuss before doing. It's only related to what options they have at the
moment.

------
Mikeb85
Of course it's not replicable. There's simply too many external factors that
can mess up any well meaning experiment.

~~~
contravariant
So basically you're saying those experiments had no predictive power in the
first place?

~~~
Mikeb85
They have some. But its not about predictive power, its about replicable
experiments like you have in the sciences...

Economics is a useful social science (it's what I study), but it has
challenges that the more established sciences don't encounter...

------
baxter001
dismal science

