
Most commonly used statistical tests and implementation in R - nafizh
http://r-statistics.co/Statistical-Tests-in-R.html
======
IndianAstronaut
Shapiro Wilk isn't all that useful with practical data unless your sample
sizes are fairly small. Once you deal with anything above 5000 values, you are
better off with QQ plots.

------
ekianjo
> If the p-Value is less than significance level (ideally 0.05),

Erm, no. P=0.05 is borderline meaningless, there could as much as 30% chance
you are wrong about the actual difference being there depending on the true
probability of the initial hypothesis.

P-values should be used with strong caution.

~~~
cetacea
Even better, p-values should not be used at all. If I have data in hand, I
want to use it to find out the probability that my hypothesis is true. But
p-value analysis requires me to instead ask a different question that I don't
really care about, involving whether my data are consistent with the null
hypothesis.

Everything is just so much more sensible if you allow yourself to assign
probabilities to hypotheses, rather than assuming a hypothesis from the outset
and computing opaque statistics relating to your data.

~~~
healer
There is in fact a probability attached to p-values. A p-value of 0.05 for
instance means your conclusions will be wrong 5 out of 100 times. You can
reduce the p-value to e.g. 0.001 or any other value you want.

~~~
cwyers
No, it means that the probability of seeing an effect of that magnitude on a
dataset of that size when the null hypothesis is true will happen due to
random chance 5 out of 100 times. It says NOTHING about your hypothesis, it is
entirely a statement about the null hypothesis.

------
minimaxir
It's also worth looking at the documentation in R for each of the functions
too. (can invoke with console with ?chisq.test for example).

For example, the chisq.test has optional _built-in_ Monte Carlo testing, and
none of the other functions do, oddly.

------
cloakanddagger
This is a great post! Bookmarking this for future reference.

------
hackaflocka
This is a good resource for those new to R.

R has some really good GUI layers now. I struggled and struggled for years
trying to learn the command line methods, but it was too much for me. The
following do a great job (these are alternatives)

\- Deducer

\- R Commander

\- RKWard

~~~
earino
It seems like this list is incomplete without mentioning that both RStudio[1]
and Jupyter[2] notebooks now have really first class support for R. There are
also two upstatrs, Rodeo[3] and Beaker[4] are doing cool stuff as well.

The company I work for, Domino Data Lab[5], let's you fire up a lot of these
notebooks in a nice hosted environment on big cloud servers with minimal cost
and effort. It's a fun way to learn how all these new environments can work
together. From RStudio for exploratory analysis, to Jupyter notebooks for
presenting a topic. The other two I haven't really found the superior use-
case. The tools in this space are just getting better and better.

1\. [https://www.rstudio.com/](https://www.rstudio.com/) 2\.
[http://jupyter.org/](http://jupyter.org/) 3\.
[http://blog.yhat.com/posts/introducing-
rodeo.html](http://blog.yhat.com/posts/introducing-rodeo.html) 4\.
[http://beakernotebook.com/](http://beakernotebook.com/) 5\.
[https://www.dominodatalab.com/](https://www.dominodatalab.com/)

~~~
minimaxir
> _Jupyter[2] notebooks now have really first class support for R._

Jupyter and R is a bit iffy since the R kernel is not native. Although the
kernel _works_ fine, setting it up has a ton of manually-installed
dependencies, and in-line plots flat-out give unexpected output. (I've had to
cheat by embeding charts via Markdown. Although that has the benefit of having
the charts be responsive)

The important perk is that Jupyter notebooks are now rendered natively on
GitHub, which I've made considerable use of: [https://github.com/minimaxir/sf-
arrests-when-where/blob/mast...](https://github.com/minimaxir/sf-arrests-when-
where/blob/master/crime_data_sf.ipynb)

~~~
stared
Manually setting it is hard (on OS X + Homebrew Python I did it after a long
fight; main problem: rmzq library). But... it is super easy with Anaconda:
[https://www.continuum.io/blog/developer/jupyter-and-
conda-r](https://www.continuum.io/blog/developer/jupyter-and-conda-r)

~~~
minimaxir
Huh, I thought conda was Python only. I'll definitely take a look!

