
Nature publishes 17 parameter fits to 20 plus data points - revorad
http://condensedconcepts.blogspot.com/2010/02/nature-publishes-17-parameter-fit-to-20.html
======
mad44
[http://www.nature.com/nature/journal/v427/n6972/full/427297a...](http://www.nature.com/nature/journal/v427/n6972/full/427297a.html)
How one intuitive physicist rescued a team from fruitless research.

Quote from Freeman Dyson:

In desperation I asked Fermi whether he was not impressed by the agreement
between our calculated numbers and his measured numbers. He replied, “How many
arbitrary parameters did you use for your calculations?” I thought for a
moment about our cut-off procedures and said, “Four.” He said, “I remember my
friend Johnny von Neumann used to say, with four parameters I can fit an
elephant, and with five I can make him wiggle his trunk.

~~~
smanek
That anecdote was worth it just for 'Johnny von Neumann' ;-)

I've generally heard that most scientists of the day considered von Neumann
the smartest man in science (he'd have to be - he single handedly
revolutionized several branches of CS, Physics, Math, Economics, etc. Somehow
hearing him called 'Johnny' makes him much less intimidating though ...

------
hyperbovine
Funny, but not as good as the econ paper I once came across extolling the
virtues of a method which allowed one to "sidestep the issues associated with
negative degrees of freedom" (or something to that effect.) In other words,
fitting a line to one data point :-)

------
houseabsolute
As a non-statistician, am I right in thinking that the problem here is that
with this many parameters, you can fit almost any data? Or something?

~~~
larsr
It is probably a case of overfitting, but it could also be the case that they
are fitting an existing model that really has 17 parameters. Without more
context, it's a little hard to judge. If I remember, I'll look at the full
paper tomorrow. It might make a nice example.

~~~
duskwuff
If the model really does have 17 parameters, though, then they need a heck of
a lot more than 20 data points to demonstrate its correctness.

~~~
larsr
Probably, but all I see is a graph and the abstract. It also depends on how
the model is being used in the paper. Unless I'm mistaken, Nature puts papers
through peer review, which for me personally, anyways, means I'd want to
actually read the whole paper before fire and pitchforks.

------
keefe
That's pretty outrageous if the claim in the paper is true - this is a really
common problem that I ran into as an undergraduate doing modeling to fit
bacterial conjugation rates to differential equations for predator prey
dynamics.

------
nearestneighbor
In general, it's OK to use more parameters than data points even, if you
properly use regularization (such as weight decay), for example. Other times,
even significantly fewer parameters than data points can be wrong.

~~~
jey
That sounds pretty ad-hoc. Sure, you can throw an L1 or L2 regularizer on your
objective function, but it should be well-motivated.

Should probably just use Gaussian process regression if you want to do
inference over the space of all[1] functions in a principled (i.e. Bayesian)
manner.

1\. (or the space of all polynomial functions or something. I forget)

~~~
_delirium
I don't think there's any consensus in the statistics community on which of
the many ways to do inference over "nearly all" functions is the better one;
nonparametric regression is basically a whole field, and pretty in flux over
the past 10 years.

