
No problem, we'll adjust the data to fit the model - llimllib
http://www.stat.columbia.edu/~cook/movabletype/archives/2010/03/no_problem_well.html
======
rm-rf
Assume that there are always errors in the data, and that the errors in the
data are unbiased - i.e. that half of the errors are biased towards the model
and half are biased away from the model.

In cases where the errors are biased away from the model (the errors do not
support the hypothesis that the model is based on), the scientist will have a
tendency to double-check the data, comb through it and correct the errors.
That's certainly what I'd do.

In cases where the errors are biased toward the model however, there is no
incentive for the scientists to comb the data and correct the errors. Why
would she, as the data re-enforces the model and the hypothesis. Publish the
paper, get tenure and take a sabbatical.

It sounds to me like there will always be a bias toward the model - only
because the data will only get the extra scrutiny that it needs to eliminate
measurement errors in cases where it disagrees with the model, thereby taking
unbiased errors and turning them into biased errors, with the bias always
toward the model.

That's my hypothesis, and to make sure it's valid, I'll accept without
question any data that supports it, and I'll double-extra analyze and
carefully correct any data that doesn't.

~~~
gjm11
If the model is very good then random errors will not be "unbiased" in your
sense: they will all be "away from the model". In fact, if the model is any
good at all then random errors will more often be away from it than towards
it.

There _is_ an incentive for scientists to check for errors that make their
models look better. You're more likely to get famous for finding an error in a
widely used model than for yet another confirmation that it works OK. (Even
more likely if you can come up with a better model. More likely still if that
better model is different in illuminating ways.)

I expect there still is a tendency for measurements to get distorted towards
better fit with models. But it's not as one-sided as you make it sound.

~~~
rm-rf
"There is an incentive for scientists to check for errors..."

Unless your career, grants and funding are dependent on the validity of the
model, or if the model is highly politicized. In those cases, there is no
incentive for finding errors that undermine the model.

Richard Feynmans paper on Cargo Cult Science has a explanation of how
Millikans electron charge measurements drifted over time because of
measurement bias. In that case, there almost certainly wasn't any politics
involved, so the follow on experiments drifted toward a more accurate
measurement. But as Feynman indicates in his paper, the fact that the
measurements drifted slowly and incrementally, rather than in a corrective
step indicates that the scientists who made the follow on measurements were
biased towards Millikans' original results - even though his original
measurements were not as accurate as the follow on measurements.

------
miked
_But when the data are complicated --- by which I mean, there are many
different effects that must be accounted for in order to interpret the raw
data as a measurement of a parameter of interest --- then it's not necessarily
a surprise to find problems with the data, and to find that when those
problems are fixed the result is better agreement with a model._

"The problems are fixed" based on what? You can't use the model to "fix" them,
since the model has validity only insofar as it tracks data points. In short,
this is simply circular reasoning: the model has no validity beyond it's
ability to explain and predict data values.

~~~
ezy
You figure out what the problem in the measurements might be and correct for
that based on your hypothesis of what the problem might be -- then compare the
new results to the model predictions (and _other_ data). The tacit assumption
is that model should predict the data, if it doesn't either the data or the
model is borked. If other data supports the model, then you're going to look
at the _data_ first -- especially if it's new data collected in a new way.

The satellite example is perfect, they didn't take the model output and add a
factor based on that. No, they hypothesized that the satellite height was
different and that was affecting the data, checked that, found ample evidence
for it. Adjusted the data _based on the height_ (not the model) and lo and
behold, the data more closely matched the existing model. The likelihood that
this kind of independent factor makes all the difference for a bad model is
quite small.

Where the bias might come in is that one would have a tendency to only look
for errors in data not fitting the model(s) or other data. But no one is
suggesting that you readjust the data to match the model output -- what they
may be saying is that if your data doesn't match the model -- you better be
able to justify it.. and that starts with the data and proceeds to a new
iteration of the model as you gain more confidence in the data.

~~~
yummyfajitas
Everything you describe sounds fine, but in the aggregate it privileges the
hypothesis.

When data supports the model, it is accepted. When it contradicts the model,
extra scrutiny is applied. In short, a higher standard of evidence is required
when you disagree with the hypothesis. It's kind of like accepting p-values of
0.05 when you agree with the model, but 0.01 when you disagree. It creates a
bias in favor of the model.

When huge arbitrary amounts of new evidence are available (e.g., an
experimental field), this won't matter. In fields without experiments (climate
science, oceanography, economics), this effect can protect a bad hypothesis.

------
tel
So how do you tell between options (1 & 2) and (3)?

It's not satisfying to chalk it up to "consensus", I think. Gelman himself
recommends very generalized model checking as a philosophy to search the space
of underlying theories, but aren't Bayesian methods supposed to let us "put a
prior on that"? Why must something like human intuition go unmodeled?

------
swombat
I guess it would make sense that the earth is in a _stable_ equilibrium rather
than an unstable one.

At the same time, that stable equilibrium may be a local maximum rather than
global one, so let's not push it too much.

Climate meteorologists have really done a brilliant job of telling the world
they have no idea what they're talking about. I'm not a global warming
skeptic, in the sense that I believe we should do something about our
production of polluting gases (greenhouse or others), but this seems one of
the odd cases where scientists simply cannot be trusted to provide useful
information.

~~~
gjm11
"Climate meteorologists"?

I think it's more accurate to say: A small number of people, most of whom are
not climate scientists, have really done a brilliant job of telling the world
that climate scientists have no idea what they're talking about. (The
difference between your description and mine is that yours has the climate
scientists doing something self-undermining and mine doesn't; so if my
description is better, then one actually has to look at the science to
describe which side is nearer to the truth. How inconvenient.)

------
mhb
The appropriate xkcd:

<http://xkcd.com/701/>

