
Types of Regression Analysis - _kcxz
https://www.listendata.com/2018/03/regression-analysis.html
======
snicker7
Despite what the article claims, normality is not actually an assumption of
linear regression. It is "required" for doing F-tests (the F-distribution
being related to the normal distribution), but it is not required for proving
that the regression coefficients are consistent.

~~~
knoepfle
It's actually not even required for that! See
[http://davegiles.blogspot.com/2011/08/being-normal-is-
option...](http://davegiles.blogspot.com/2011/08/being-normal-is-
optional.html) which cites King (1980):

> If the error vector in our regression model follows any distribution in the
> family of Elliptically Symmetric distributions, then any test statistic that
> is scale-invariant has the same null and alternative distributions as they
> have when the errors are normally distributed.

~~~
knoepfle
Note also that any distributional assumptions are really only necessary for
inference (i.e., tests and confidence intervals) in finite samples (read:
small samples); the central limit theorem guarantees the tests work
asymptotically, so you're usually going to be fine.

Most of the attention paid to distributional assumptions in regression is
wasted, and would be better spent on really thinking through the assumed
moment conditions underlying the estimator.

------
gmfawcett
> Assumptions of linear regression: There must be a linear relation between
> independent and dependent variables.

That's not wrong, but it's a strong way to word it. If linear regression were
only suitable when the variables were perfectly linearly related, it would get
a lot less use. Practically, linear regression can be used when the
relationship is linear-ish, at least in the interval of interest. In other
words, you can choose to declare linearity as an assumption (and take
responsibility for what that choice entails, and for the error it might
introduce into your analysis).

~~~
nerdponx
That the linear model is "correct" is only assumption if you're trying to draw
probabilistic inferences.

There's nothing stopping you from using it as a "best fit line", even when you
have no reason to believe those assumptions. But then it's _just_ a best-fit
line. It tells you the direction and magnitude of linear trend, nothing more.
That's never wrong in any sense, it's just that sometimes it's not very
useful.

~~~
smu3l
From the semiparametric perspective, you can still make correct inferences
about the estimated parameters even if the model is not correctly specified,
as long as you use the so-called robust estimator of the variance.

~~~
nonbel
This is impossible. If the model is incorrectly specified (does not include
_all and only_ the relevant parameters and interactions), it doesn't matter
much what games are played with the math. Changing the model will change the
estimates...

Edit: For example, see here where making arbitrary choices of how to code
categorical variables will change the estimates:
[https://news.ycombinator.com/item?id=16719754](https://news.ycombinator.com/item?id=16719754)

If you change the model the meaning of all the coefficients changes.

~~~
smu3l
Oops I did not see your response until now.

I agree, changing the model changes the estimates, because the parameters you
are estimating change.

However, given one misspecified model, the parameters of that model are still
well defined, though they may not have the interpretation they would if the
model was correctly specified. As OP called it, this is the "best fit line",
and is a projection of the truth onto your model. E.g. for a simple linear
regression of Y on X, where the true conditional mean of Y given X is not
linear, there is still some "true" best line. This line depends also on the
distribution of X, though it would not if the model was correct. Estimates
from linear regression will converge to the parameters of this line, though
using the usual standard errors will be wrong.

There's a very general theorem or corollary that covers this in Asymptotic
Statistics by van der Vaart. I think in the chapter about M estimators, right
around where MLEs are covered, but I don't have it in front of me.

~~~
nonbel
There are multiple inference levels here.

First, there is the statistical level, at which we are drawing some conclusion
about the model parameter. This may work even for a misspecified model.

Then there is the level at which you want to draw some conclusion about
reality, call it the "scientific level". If the model is misspecified, the
parameters/coefficients may or may not correspond to the thing of interest.
Perhaps the model is a close enough approximation for those values to be
meaningful, perhaps not...

I think it is the second ("scientific level") of inference that most people
are concerned about. The rigor of the proofs/theorems that may work at the
statistical level does not extend to the scientific level.

Afaict, the majority of erroneous inference occurs at the scientific level and
statistical error/uncertainty is a sort of minimum error/uncertainty.

------
siddboots
A tool that I've found myself reaching for more and more often is Gaussian
Process Regression [1] [2]

* It allows you to model essentially arbitrary functions. The main model assumption is your choice of kernel, which defines the local correlation between nearby points.

* You can draw samples from the distribution of all possible functions that fit your data.

* You can quantify which regions of the function you have more or less certainty about.

* Imagine this situation: you want to discover the functional relationship between the inputs and outputs of a long-running process. You can test any input you want, but it's not practical to exhaustively grid-search the input space. A Gaussian Process model can tell you which inputs to test next so as to gain the most information, which makes it perfect for optimising complex simulations. Used in this way, it's one means of implementing "Bayesian Optimisation" [3]

[1]
[https://en.wikipedia.org/wiki/Gaussian_process](https://en.wikipedia.org/wiki/Gaussian_process)

[2] [http://scikit-
learn.org/stable/modules/generated/sklearn.gau...](http://scikit-
learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html#sklearn.gaussian_process.GaussianProcessRegressor)

[3]
[https://en.wikipedia.org/wiki/Bayesian_optimization](https://en.wikipedia.org/wiki/Bayesian_optimization)

~~~
nonbel
When I tried this to choose xgboost hyperparameters it didn't seem to perform
much better than random search while also adding another layer of hyper-hyper-
parameters.

~~~
siddboots
Yeah. The hyper parameter story that comes with Gaussian processeses is a big
drawback. The choice of kernel has a massive impact.

In practice, I've found GPs to be great for getting actual insight into an
unknown function, but much less useful as a black-box learner.

~~~
elcritch
What kernels would you recommend trying initially? I’m still unclear if the
Gaussian processes require normal distribution (e.g. would they work on log-
log / binomial based functions).

I’ve wanted to apply the approach you mention a few times, but documentation
seems to go from “Wiki” level to novel research articles. Are there any good
introductory books / resources that aren’t beginner level? That scikit library
looks handy!

~~~
kiliantics
Gaussianprocess.org

------
thanatropism
Friends don't let friends use MS Word to produce equation screenshots. Not in
the age of MathJax.

------
MichailP
Now this is a topic I desperately need. Can anyone here by any chance explain
why would one choose predictors in multilinear regression that are NOT
correlated to the target? I am having trouble understanding paper [1] where
authors avoid using predictors that are correlated to target. Target is ozone
concentration shown by referent instrument and predictors are low cost sensor
outputs.

[1]
[https://www.sciencedirect.com/science/article/pii/S092540051...](https://www.sciencedirect.com/science/article/pii/S092540051500355X)
Section 4.1 about ozone predictors

~~~
cocoablazing
The issue is intra-predictor correlation. In the extreme case that a predictor
is duplicated, the correct beta might be {beta _a, beta_ (1-a)} for a in [0,
1], which an algorithm may not estimate in a stable manner. A significant
degree of correlation introduces this general problem.

~~~
MichailP
So say you have 3 predictors that have high intra predictor correlation. Can
you still pick one of them, and discard the remaning 2? Or you cant pick any
one of them?

~~~
cocoablazing
You can, but why trash information that is present when you can leverage it
with a different approach?

~~~
MichailP
Like PCA? But that way you loose physical meaning of the predictors.

~~~
closed
PCA is a special case of factor analysis, so you are representing them as
observations of a latent variable (which is often a narrative people use when
explaining why two x variables are correlated)

------
SubiculumCode
This article is obviously a jumping off point kind of article. Most people
using linear regression have never even heard of things like ridge regression.
So I like the article.

However, there are at least two types of regression I'd add to the list, and a
suggestion.:

1 Multivariate Distance Matrix Regression (MDMR; Anderson, 2001; McArdle &
Anderson, 2001).

2\. Regression with splines

3\. On polynomial regression, add mention of orthogonal polynomials.

~~~
michaelbarton
There's also hierarchical regression, where you can estimate multilevel
models. Also call fixed and random effect models. Assigning a variance
coefficient for each parameter can account for heteroscedasticity.

~~~
SubiculumCode
Mixed models..the hell (+_+) I live in =_=

------
projski
Why did the article cover a basic term like "outlier" under "Terminologies
related to regression" but omitted information about how to evaluate a
regression model? I liked that there was some information at the bottom about
"How to choose a regression model" that mentioned "you can select the final
model based on Adjusted r-square, RMSE, AIC and BIC" but providing a little
more context would make this post even better. Perhaps a link to a future blog
post on the topic?

------
matchagaucho
Are there any ML APIs or web services that accept a vector and run various
regression scenarios to identify optimal fit?

I suppose vectors for both training and testing would be required.

Would gladly pay $1-$5 per batch for a service to do this.

~~~
swebs
There is a Python library called TPOT that does this.

[https://github.com/EpistasisLab/tpot](https://github.com/EpistasisLab/tpot)

------
mnky9800n
Logistic regression is doing classification not regression. That is, it's
assigning/predicting categories of data points instead of predicting some
continuous value on some interval. Maybe this is splitting hairs but the way
you evaluate a classification model is totally different than a regression
one.

~~~
smu3l
This is not correct. Logistic regression can be used for classification, true,
but it can also be viewed as a way of estimating the conditional mean of an
outcome variable that has a Bernoulli, or binomial distribution, depending on
the formulation.

There are many ways to evaluate all of these methods, and for classification
you may favor something else, but it's completely reasonable to use the (cross
validated, or not) empirical risk for both logistic and linear regression.
That would be a negative log likelihood in both cases, from the
Bernoulli/binomial distribution for logistic regression or the normal
distribution for linear regression.

------
waynecochran
Don’t forget to put RANSAC on you list:
[https://en.m.wikipedia.org/wiki/Random_sample_consensus](https://en.m.wikipedia.org/wiki/Random_sample_consensus)

------
raister
I was hoping one _interesting_ graphic chart per Regression Analysis Type.
That didn't happen, and I felt lost at sea. Please, improve the post on such
amazing topic.

------
Myrmornis
> In simple words, regression analysis is used to model the relationship
> between a dependent variable and one or more independent variables.

“model” isn’t a simple word.

------
thanatropism
This is just horrible quality material. What in the heck is this?

> It is to be kept in mind that the coefficients which we get in quantile
> regression for a particular quantile should differ significantly from those
> we obtain from linear regression. If it is not so then our usage of quantile
> regression isn't justifiable. This can be done by observing the confidence
> intervals of regression coefficients of the estimates obtained from both the
> regressions.

~~~
sfink
I'm a typical "math is hard; let's go programming" type of person, but the
only problem i have with that quoted section is the missing antecedent of
"This can be done...". But I worked it out from context.

I thought the article was very good.

