
15 Page Tutorial for R - sndean
http://www.studytrails.com/blog/15-page-tutorial-for-r/
======
minimaxir
This tutorial is _much_ more basic and has much less practical _statistical_
applications than the R tutorial posted last week
([https://news.ycombinator.com/item?id=12264360](https://news.ycombinator.com/item?id=12264360)),
which _itself_ is out-of-date relative to the R for Data Science book
([http://r4ds.had.co.nz/](http://r4ds.had.co.nz/))

I really am curious why anything "R" and "Tutorial" gets massively upvoted to
the Top 3 of HN like clockwork nowadays. I might have to restart my R tutorial
screencasts since there appears to be a demand. :P

~~~
ekianjo
> I really am curious why anything "R" and "Tutorial" gets massively upvoted
> to the Top 3 of HN like clockwork nowadays.

Same thing I was wondering. Is it simply that R is gaining popularity among a
more mainstream audience, while it hasnt changed that much recently ?

~~~
Malarkey73
Idiomatic R has changed enormously with the growth of Magrittr the ℅>℅
operator, dplyr and now broom. Code by expert users is unrecognisable from
that of a few years ago.

~~~
nonbel
R still looks pretty much the same to me when I look at the code on kaggle,
stack exchange, etc. Can you share where you have seen this expert R use
and/or what use cases?

~~~
minimaxir
Most modern R usage I've seen, even on Kaggle, has used margittr/dplyr since
it's orders of magnitudes faster/easier than the base functions. (i.e. I
almost quit R in favor for Python without those two)

A quick example of my own notebook which uses margittr/dplyr heavily to
process Stack Overflow survey data: [https://github.com/minimaxir/stack-
overflow-survey/blob/mast...](https://github.com/minimaxir/stack-overflow-
survey/blob/master/stack_overflow_dev_survey.ipynb)

~~~
nonbel
Thanks for the example code, it actually has me wondering whether using the
pipes has any impact on performance. They are much easier to read than the
nested function calls that would be used instead.

However, I mean I just searched for an R script on kaggle and the first I
found is this: [https://www.kaggle.com/bpavlyshenko/grupo-bimbo-inventory-
de...](https://www.kaggle.com/bpavlyshenko/grupo-bimbo-inventory-demand/bimbo-
xgboost-r-script-lb-0-457/output)

From my experience that is a typical example (data.table is very common, but
not so much dplyr and the pipes).

I looked through the recent/featured questions on stack overflow and cross
validated and saw only "old school" R as well. For example:
[http://stats.stackexchange.com/questions/228800/](http://stats.stackexchange.com/questions/228800/)

------
fxj
R is essentially a LISP dialect. There is even a library that shows the syntax
tree in LISP syntax:

> library(codetools) > showTree(quote(1+2 _3+4_ _5)) (+ (+ 1 (_ 2 3)) (^ 4 5))

See also the R-to-LISP compiler:
[http://dan.corlan.net/R_to_common_lisp_translator/](http://dan.corlan.net/R_to_common_lisp_translator/)

------
_nullandnull_
Another great resource for learning R is Swirl. I can't recommend it enough.

[http://swirlstats.com/](http://swirlstats.com/)

------
fxj

      > a[1-3]
      [1] 1 3
      > a[[1-3]]
      Error in a[[1 - 3]] : attempt to select more than one element
    

The first line evaluates to a[-2]. I think what the author meant was a[1:3].
a[[-2]] gives an error, a[[2]] is valid R.

~~~
capnrefsmmat
a[-2] is valid R syntax. A negative index means that element is omitted from
the result, so a[-2] is a without the second element.

------
danso
As an aspiring R-learner, I have a few quibbles in the first pages I've looked
at:

re: the Assignment operator
[http://www.studytrails.com/R/Core/AssignmentOperator.jsp](http://www.studytrails.com/R/Core/AssignmentOperator.jsp)

> _However, there is a difference between the two operators. = is only allowed
> at the top level i.e. if the complete expression is written at the prompt.
> so = is not allowed in control structures. Here 's an example:_
    
    
          > if (aa=0) {print ("test")}
          Error: unexpected '=' in "if (aa="
          > aa
          Error: object 'aa' not found
          > if (bb<-0) {print ("test")}
          > bb
          [1] 0
    

What in the LOLWTF...This is a bad example because even as an experienced
programmer, I have _no idea_ what is _supposed_ to happen, or why this pattern
would even be used -- and even then, I was surprised with the result. This is
an overly complicated explanation even if it is correct. I prefer Hadley
Wickham's style guide:

[http://adv-r.had.co.nz/Style.html](http://adv-r.had.co.nz/Style.html)

> _Use <-, not =, for assignment._

(an assignment operator that _isn 't_ an equals sign is one of the things I
miss most when switching away from R)

From the next chapter, Listing Objects:
[http://www.studytrails.com/R/Core/ListingObjects.jsp](http://www.studytrails.com/R/Core/ListingObjects.jsp)

> _All entities in R are called objects. They can be arrays, numbers, strings,
> functions._

This may technically be the case, but any R tutorial that does not open up
with what makes R different from other mainstream languages is doing the
reader a major disservice. This Listing Objects chapters shows patterns that
use R in ways that I haven't seen used in other R examples (beginner and
advanced). Even if it's correct, what's the point in showing esoteric examples
unless this tutorial is meant to teach R to someone interested in the design
of languages?

Again, Wickham's Advanced R book handles this topic well (in fact, Advanced R
is probably the best book you can read if you already know how to program --
it is incredibly accessible) in his early chapter on Data structures:
[http://adv-r.had.co.nz/Data-structures.html](http://adv-r.had.co.nz/Data-
structures.html)

> _R’s base data structures can be organised by their dimensionality (1d, 2d,
> or nd) and whether they’re homogeneous (all contents must be of the same
> type) or heterogeneous (the contents can be of different types). This gives
> rise to the five data types most often used in data analysis...Almost all
> other objects are built upon these foundations. In the OO field guide you’ll
> see how more complicated objects are built of these simple pieces._

And this next sentence is the one thing I wished someone had printed out and
stapled to my forehead before I started to learn R:

> Note that R has no 0-dimensional, or scalar types. Individual numbers or
> strings, which you might think would be scalars, are actually vectors of
> length one.

Maybe that's self-evident to other programmers, but even as someone who once
programmed in MATLAB, I was stunningly ignorant I was of how every return
value I interacted with was a vector, even a single simple string. In
retrospect, the interactive shell alludes to this...but I didn't even bother
looking up the details about the shell:

    
    
             > 2 + 2
             [1] 4
             > 'a'
             [1] "a"
    
    

R is wonderfully easy to start up with and produce visualizations with. But
skipping over the language's fundamentals was incredibly painful for me. A few
minutes skimming "Advanced R" would have easily saved me hours of confusion.

~~~
stewbrew
wrt assignments: = does assignments too but not eagerly. It's used for late
bindings. This isn't a matter of style as your reference to adv-r would
suggest but a different use case.

I wish people would spend more time reading the official language reference
instead of all these half-baked tutorials.

~~~
oneweekwonder
I can see myself and other HN readers enjoy the design of the R Website[0].
But I believe that is not universal.

The HTML documentation is intimidating. Not that these are bad things. Science
is complicated, and it needs complicated tools.

These tutorials allow(half-baked or not) a casual insight into to complex
subject matter.

[0]:
[https://cran.r-project.org/manuals.html](https://cran.r-project.org/manuals.html)

------
poisonarena
I prefer Python

~~~
kimi
Agree. While R's statistical stuff is much more polished, doing anything but
running the regressions is a royal PITA.

~~~
blahi
You got it the other way around. Doing anything but model specification (i.e.
regression) is royal PITA in python.

~~~
nl
Regression in sci-kit learn is like 2 lines. It's hard to understand where
your painpoint is?

~~~
blahi
That's my point. Specification of the model is not (usually a problem). The
other tasks are the problem.

