
Second step with non-linear regression: adding predictors (2017) - ghosthamlet
https://www.r-bloggers.com/second-step-with-non-linear-regression-adding-predictors/
======
thanatropism
Polynomial regression isn't really a good way to obtain highly nonlinear
models for two reasons at least:

\- Runge's phenomenon: high-order polynomial interpolation is really unstable;
high-order polynomial regression can avoid this fate if your points are well
spread around each potentially unstable region of the regressor space. I.e.
not realistically.

\- Leverage points: a linear regression is pinned at the point (mean y, mean
x), and points at the edges of regressor space have a disproportionate
influence. Polynomial regression is just this in higher-dimensional space; but
then this gives you heartache trying to allocate weight to x^2 and x^4 terms,
for example.

There's a lost art of understanding regression as approximating y = f(x,z....)
and setting the linear functional form in order to give partial
derivatives/marginal effects dy/dx, dy/dz... that reproduce the effects you
expect. For example -- we may not believe that some phenomenon is actually
quadratic but we expect it to have a single optimal point. Then if we regress

y ~ a + b \\* x + c \\* x^2,

we have dy/dx = b + 2cx and a single optimal point at x = -b/2c. So for
example a model's earnings might be a function of weight-hips ratio (WHR) but
there's a platonic ideal of perfection, not an indefinite preference for small
or large waists. On the other hand, if the same model's earnings also depends
on weight and height, the following model

y ~ a + b WHR + c WHR^2 + e WEIGHT + f HEIGHT + g WEIGHT \\* HEIGHT + h
WEIGHT^2 \\* HEIGHT

has the following additional partial derivatives

dy/dWEIGHT = e + g HEIGHT + 2h WEIGHT HEIGHT

which gives us the optimal height-dependent weight

-(e + g HEIGHT)/(2 h HEIGHT)

The trick is to start thinking qualitatively from partial derivatives and
think of the polynomial model as a kind of Taylor approximation to the exact
nonlinear model you can't find.

~~~
mlthoughts2018
It’s interesting that your experience has focused on polynomial regression. My
experience is that “nonlinear” regression more means adding interaction terms,
testing for discontinuities, and allowing for partially pooled separate linear
functions based on strata of the data, like hierarchical regression.

For anything truly needing to model nonlinearities in the manifold of the
input data, I’d instead use tree based regression, SVM regression with an
appropriate kernel, Gaussian Processes /krieging, etc. I’ve never experienced
pragmatically useful results with polynomial regression that wouldn’t have
been more easily obtained from more appropriate nonlinear models, and wasn’t
aware anyone would ever actually try this in practice.

~~~
gtd9991
I think it is a matter of discipline. My economic background would emphasize
the polynomial approach to these sorts of problems over less parametric
methods such as regression trees. Fixed effect modeling (strict Hierarchical
modeling) is a standard practice is my domain as well.

------
wodenokoto
r-bloggers is a mostly an article aggregation site, that republish (with
permission) other blog post, in poorly formatted versions. I absolutely hate
reading stuff on r-bloggers.

Read the original article here [https://datascienceplus.com/second-step-with-
non-linear-regr...](https://datascienceplus.com/second-step-with-non-linear-
regression-adding-predictors/)

