
Appreciating R: The Ease of Testing Linear Model Assumptions - jtloong
https://joshualoong.com/2019/09/20/Appreciating-R-The-Ease-of-Testing-Linear-Model-Assumptions/
======
triska
R (and its predecessor S) is so great, thank you for writing and sharing this!

In a way, R is very similar to APL and J in that many operations automatically
scale from scalars to vectors, and you can therefore conveniently express
applications of operations to many elements at once. In fact, even its
assignment operator is chosen to syntactically resemble that of APL:

[https://blog.revolutionanalytics.com/2008/12/use-equals-
or-a...](https://blog.revolutionanalytics.com/2008/12/use-equals-or-arrow-for-
assignment.html)

Regarding the quote:

 _" All statistical models and tests have underlying mathematical assumptions
on the types of conditions upon we can generate reliable results (Hoekstra et
al., 2012)."_

Indeed! There is hence also a very close connection between statistics and
logic.

------
wjnc
Have I gone overboard in remembering that Gauss-Markov is overrated? It's nice
that some assumptions guarantee BLUE, but then there is real world data and
modelling. Understand where your data come from, what distributions they have,
what the relationship is with your dependent variable. Testing for these
assumptions might be helpful as a teaching tool, but please get to the level
to be able to demonstrate that your model can fail a few of these (like
heteroskedacity) and still work (or recognize you need more than a linear
model).

------
ivan_ah
It's not surprising that R has good tools for statistical assumptions
checking. It has _everything_ with an API perfectly suited for stats
professionals. In the Python world scipy and statsmodels implement most of the
important tests, but probably not as many as the ones available in R.

One recent project that I looks very promising in the Python world is called
Tea: a high-level language for expressing statistical analysis questions.
Basically the user describes the characteristics of the data they have, their
assumptions, and their hypothesis, and then the tea runtime checks all the
assumptions and figures out which test can be applied to test the hypothesis.
You still have to know some stats jargon, but the user-interface between human
and machine is revolutionary! Here is a bunch of links I collected about tea:
[https://www.one-tab.com/page/aUF1eWnDT8CIyrwScWD2uA](https://www.one-
tab.com/page/aUF1eWnDT8CIyrwScWD2uA)

