
Introduction to R Programming - cecilialee
https://cecilialee.github.io/blog/2017/12/05/intro-to-r-programming.html
======
othello
One really underappreciated aspect of R is that it's a lisp at heart. This
enables the user (and enterprising package writer) to build really clean
abstractions for the task at hand.

The tidyverse suite of Hadley Wickham is a great example of this, notably with
the pipe operator %>% (similar to |> in F#) which is not part of the base
language and yet could be very easily implemented. Julia's macros probably
enables the same type of implementation, but I don't see how one would achieve
it as easily in Python for example. Non-standard evaluation is another example
of R's lispiness in action [0].

Also, consider how easy it is to walk R's S-exp. Expressions in R can only be
one of four things: an atomic value, a name, a call or a pairlist. Wickham's
Advanced R has a great intro on this [1].

I believe Wickham's amazing work with tidyverse (which really changes the way
you code in R) is just the beginning of a rediscovery of R's inner lisp power,
a kind of "R: the good parts" moment.

[0] [http://adv-r.had.co.nz/Computing-on-the-
language.html](http://adv-r.had.co.nz/Computing-on-the-language.html)

[1]
[http://adv-r.had.co.nz/Expressions.html](http://adv-r.had.co.nz/Expressions.html)

~~~
ropeladder
Anyone with a programming background getting into R should absolutely go read
_Advanced R_. I've been using R off and on for a while now but Advanced R was
a real revelation. All of R's weird behavior finally made sense.

Edit: Also there is a 2nd edition in the works: (confusingly hosted at the
same subdomain of a different version of Hadley Wickham's website.
[https://adv-r.hadley.nz/](https://adv-r.hadley.nz/)

~~~
hadley
The subdomain confusion will get resolved once the 2nd ed is a bit more mature
so I can just redirect the 1st ed.

------
amrrs
I have seen HN crowd hating R very similar to hating js. While I'm not getting
into those details, I'd like to list a few reasons why I like R:

\- RStudio is simply great. I know Python has got Jupyter notebook but RStudio
makes a good IDE for anyone (even beginners).

\- Python is great because it's easier for beginners to start doing magick
without getting frustrated hence a good beginners language and it is more
appropriate for R because anyone who wants to begin with Data Analytics, R is
a lot easier - without trying to figure out how to install a new package, load
a new package, make a plot or anything of that matter. Hence the fall out rate
would be less.

\- Tidyverse. Without denial, it's a better Universe than Marvel's cinematic
universe. Not a single day in my job goes without using dplyr.

\- While I've quoted tidyverse in general, ggplot2 - embracing the grammar of
graphics has set a very nice standard for visualization libraries which
matplotlib (the goto library of Python doesn't offer much)

\- Pandas is nothing but a library built on Numpy to offer R like data
wrangling functions hence I'd like to consider dplyr and R's inbuilt data
manipulation functions superior.

There is no doubt that Python has its own advantages with single library
scikit-learn and webservices, R is no way to be hated.

Even millenial companies have found interest in R [https://medium.com/airbnb-
engineering/using-r-packages-and-e...](https://medium.com/airbnb-
engineering/using-r-packages-and-education-to-scale-data-science-at-
airbnb-906faa58e12d)

Edit:

Missed _RShiny_ to simply create a web app (unlike in Python starting a Flask
server and then writing stuff on top of it)

~~~
gaius
I don't understand the Jupyter hype. Sure it's clever that it runs in a
browser but it's less capable than the MathCAD I remember using in the 90s.

~~~
cup-of-tea
Indeed. I used Maple for the same thing.

I think the hype is due to the fact that the literate programming thing is a
good idea but many people haven't seen it before and there aren't many tools
for doing it. I just wish I could use a proper editor with Jupyter. Editing in
the browser is horrible.

~~~
setzer22
I believe emacs org can be used for this kind of notebook developement,
however it looked like a configuration nightmare so I still haven't dived into
it.

~~~
cup-of-tea
It's actually pretty easy to set up for general use. I do know and use emacs
lisp, but I've not really used any at all for org-mode.

It does support "sessions" which allow persistence across the code throughout
the document (you could even have multiple sessions), but the wat it's done
for Python is quite hacky. It uses an interactive Python shell so you have to
write code as if you're using the shell (double returns etc.) There is a
better way using ob-ipython, but after spending a long time getting it to work
at all I found it not good enough. Using Jupyter kernels is the way to go, I
think, but it would be a lot of work to get it working well with org-mode.

------
realPubkey
I know many people think otherwise, but I hate R for many reasons. Here are
some of them:

\- You can use '=' and '<-' to assign values to variables and both do the
same, except in a few edge-cases where you now spend one week finding the
error

\- It confuses and mixes functional programming and oop not only per entity
but also between the usage of them. Want to get a value of entity X? use
x.getValue(). Want to get a value of entity Y? Use Y.getValue(y).

\- The ide crashes once an hour and does not detect file-changes which forces
you to restart it manually.

\- People say R is the best and optimized for data-analytics which is simply
not true. It's a marketing-lie spread by the creators. There is no data-
analytics-task that you cannot do with the same ease in other programming
languages.

Disclaimer: My big-data-profs enforced me to use R even for tasks where R
should not be used.

~~~
roel_v
I complain about this every time a post on R programming comes up here, but my
favorite thing to hate (our of many) about R is that there's no way to find
out what the directory of the current script is. Imagine someone would want to
use relative paths to their data files so that they could version control
their scripts and run them unmodified on different machines! We wouldn't want
to enable such abominations now would we!

~~~
Yeikoff
rstudioapi::getActiveDocumentContext()$path

I believe thats is what you are looking for.

~~~
roel_v
Yes, and now I want to also make it work when not invoked from RStudio; and
for various R version. So now I find myself wrapping all these options into a
function, which I have to copy for every 10 line script. So then I make a
package for it; or use the functions in someone else's package and add a
dependency which I'm not sure will still work a year from now.

Or I could just use a sane language and go home in time for dinner.

(I mean I know about all the solutions and non-solutions; I've looked into
this at least a dozen times over the last 5+ years. My point is that this
shouldn't have been an issue in the first place.)

~~~
Yeikoff
You are absolutely right, but then either you first post was missworded or I
missunderstood the issue (most likely the latter), as there is a way to know
the directory of the script.

100% agree with R is not a sane language.

~~~
roel_v
Ah yes now I see - I said 'there's no way to find the current script' which
isn't true. So that's probably what the others in this thread are also
objecting against :) I guess what I meant was 'there's no same way' or 'look
at how hard it is to do this tiny thingy which anyone with a programming
background would find so basic, they wouldn't even consider it might not
exist'. So yeah, I did screw up on making my point there.

------
JepZ
A few weeks ago I had to do some data transformation (just a few thousand
lines of data). Because I have some history with Excel I startet LibreOffice
and wrote some formulas. After a few days I reached the point when LibreOffice
required one and a half hours to recalculate the formulas.

That was the moment when I asked a friend of my who has some R experience to
help me with the basics (yes the syntax is kinda weird at the beginning).
After 4 hours of learning by doing we had the same result as what I had
reached in a few days of work with LibreOffice and it calculated everything in
about 17 seconds. Yes, this time I knew exactly what I wanted and R can do
much more efficient transformations than you could ever do with a spreadsheet
calculator. Nevertheless I was quite happy with the result.

As I am normally use to code with vim and tmux I use R just like a
(bash)-script with the following shebang:

    
    
      #!/usr/bin/env Rscript
    

That way I can throw it into a _watch myScript.R_ while I write it in vim in a
different tmux pane. That might have some disadvantages compared to RStudio
(e.g. can't view graphics in a terminal), but as it fits very nicely into my
normal workflow and performs very well, I am very happy with that solution.

~~~
dm319
Have you tried Nvim-R?

I love it.

You can send a line to the R console using <space>. I've assigned loads of
keyboard shortcuts beginning with your local leader that will do things like
str(), levels(), head(), tail(), sum() on the object under the cursor.

It works fine with plotting figures, and I think you can set it up with tmux,
though I use vim's buffers.

Haven't seen any disadvantages to compared to Rstudio yet. I guess you could
even do :!git add ... from vim.

[1] [https://github.com/jalvesaq/Nvim-R](https://github.com/jalvesaq/Nvim-R)

~~~
disgruntledphd2
Just to settle the Vim vs Emacs debate in the context of R, I refer you to R
FAQ 6.2: [https://cran.r-project.org/doc/FAQ/R-FAQ.html#Should-I-
run-R...](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Should-I-run-R-from-
within-Emacs_003f)

------
jeroenjanssens
The book "R for Data Science" by Garrett Grolemund and Hadley Wickham
(O'Reilly, 2017) [1] provides a comprehensive introduction to modern R and a
set of packages known as the tidyverse. Highly recommended.

[1] [http://r4ds.had.co.nz/](http://r4ds.had.co.nz/)

~~~
icc97
Hadley Wickham also has an Advanced R book [0] which has some of the
functional programming concepts that you can use in R

[0]: [http://adv-r.had.co.nz/](http://adv-r.had.co.nz/)

~~~
jeroenjanssens
I can't believe you start counting at 0 ;)

~~~
icc97
It's a common theme on HN, I copied it from other comments when I first
started commenting here

~~~
nkurz
It's also a common theme of those complaining about R that it starts with 1
rather than 0 like a "real" programming language:
[https://stackoverflow.com/questions/3135325/why-do-vector-
in...](https://stackoverflow.com/questions/3135325/why-do-vector-indices-in-r-
start-with-1-instead-of-0). Now that you've outed yourself as an insidious
traitor, Hadley will be by shortly to take back your copy of the book. At
least, that's how I read the smiley.

~~~
icc97
Ah, thank you for the explanation. You'll probably get done for being a
snitch.

------
jejones3141
To save others some of the head-banging sessions I've had with R:

R has an integer division operator, %/%. R gives you the ability to define
your own infix operators, as long as you give them symbols that start and end
with %. Here's the kicker--all such operators have a higher precedence than
multiply and divide, which can lead to unexpected results.

R as a programming language can be frustrating. It has scalar values; you just
can't store one in a variable (it becomes a vector of length one). Some
functions and operators will work with vectors of arbitrary length... but some
require a vector of length one.

(Speaking of which, binary operations on vectors are done by adding
corresponding elements, BUT if one operand runs out first, it will start
picking them off from the beginning again, with a warning if the length of the
longer one isn't a multiple of the length of the shorter one. This may be
surprising.)

The wonky list notation takes time to get used to: foo[1] gives you a sublist;
chances are you want foo[[1]].

Deciding which of the *apply() functions you want can be a pain. What passes
for lambda expressions in R is clunky.

m:n gives you a vector of m, m + 1, ..., n... unless M > n, in which case it
assumes you want m, m - 1, ..., n, so 1:0 won't give you an empty vector. This
makes for clumsy special case code.

~~~
kqr
> What passes for lambda expressions in R is clunky.

Is that really the case, though? It seems like `function (args) body` is about
as simple as it gets, and just as simple as in many other languages.

------
tekkk
Man I dislike R for its syntax. It does a terrible disservice to people who
start coding in R and then think that they "know programming" while they have
missed most of the basic programming paradigms any "normal" programming
language has.

I think R has a lot of similar ideology as PHP and well everyone has their own
opinion about PHP.

Also I found the tutorial seriously lacking I mean no data.frames, matrices,
vectors, tables or factors? How to iterate over data.frame might be the
biggest thing a beginner needs to know before shooting themselves in the head.
apply, lapply, sapply or vapply - which one do I need? Well IMO apply is the
best one to start with as it's the basis of them all. sapply is almost the
same but it just transforms the result into a vector or matrix.

~~~
yummy
Agree. It's amazing how such an ugly and inconsistent language can have so
many great packages.

~~~
cecilialee
The inconsistency is something that's quite annoying though.

------
neya
I'll probably get downvoted for this, but let me tell you - Please don't use R
in production. Please don't use R for any serious work.

Over the years, I've come to learn to appreciate the fact that languages are
just tools. You simply use the right tool for the job. If you let your
personal bias, love/hate get in the way, it will cause you a lot of pain in
the long run. In the same token, R is one of the most fucked up languages to
work with if you use it simply because you _assume_ it's good for all
analytics-related projects. It's not.

In one of my previous companies, we had a hipster, always used everything
that's on trend. Against all advice, he decided to use R for many of our
internal and client facing projects.

For what would have taken a week if Rails were used, he'd write everything in
R Shiny. Yes, he used a statistical programming language to write a web
application and serve APIs(!). Performance was terrible. There were lot of
break downs. Development prolonged, even his own team members lost morale. I
unfortunately had the ill luck of having to maintain some of his codebases and
those days were the worst in my life. Worse yet, he didn't have a formal
software engineering background, so he loved the idea that you are able to
code everything inside of this blackbox called R Studio. Fuck tests, there
were no tests written because he didn't understand the importance of tests.
The projects he worked on lasted for nearly 1.5 years without completion.
Almost every project had an instance on the cloud running an R server and it
also costed a LOT simply because it was eating a lot of memory. Even our Ruby
projects didn't consume as much.

Eventually most of the projects failed, we lost lot of customers. Many team
members quit. All because of one singular mistake of choosing a language
that's not right for the job. Eventually, one of our competitors came up with
a working prototype in production using Python, Flask and with much better
analytic capability at scale in less than 3 months. Python can do a LOT that R
can do and cannot do and the code is much, much easier to read.

For example, string concatination:

Python:

    
    
        hello + world
    

R:

    
    
       paste("hello","world",sep="")
    

If you're really interested in data science and/or analytics, I sincerely urge
you to start with Python and Pandas together rather than R. It is much, much
performant, easier to reason, and much, much easier to maintain and scale.
Please consider this as heartfelt advice based on my mistakes rather than a
rant. Thank you.

~~~
scottmmjackson
But if you know R, you can change the behavior of operators.

    
    
        > oldPlus <- `+`
        > `+` <- function(e1, e2) {
        +     if (is.character(e1) && is.character(e2))
        +       paste(e1,e2,sep="")
        +     else
        +       oldPlus(e1,e2)
        + }
        > "hello" + "world"
        [1] "helloworld"

------
kqr
I have started really enjoying R (with tidyverse) because it allows me to
present complicated topics in a very simple manner. I can easily embed short R
snippets and LaTeX equations in an Emacs Org mode document, and then export it
as a very nice-looking easy-to-read HTML or PDF document with basically no
effort other than coming up with the text itself.

It is incredibly liberating.

~~~
arca_vorago
I'm working on my data science degree, and this is my method as well, though
I'll admit I don't know much R yet so I'm using it very simply. I'll usually
have a mix of python, R, octave, etc snippets.

I really love emacs org-mode.

------
minimaxir
As the other comments on this submission imply, if you’re learning R _from
scratch_ , _start_ with tidyverse.

You can use base R, but when people talk about how much they hate R, it’s
usually because of base R, not tools like dplyr/ggplot2. (I had learned R and
used it in college, and _nearly quit R entirely_ until dplyr was released)

And over the last summer, I started using forcats/lubridate, and I am kicking
myself for wasting my time not using them sooner and using ugly hacks for the
appropriate functionality instead.

------
yters
R can be an annoying programming language, but for some reason I've found it
easier to use for prototyping than even Python. I think it's because I can
sloppily copy and paste between notepad and repl without much issue, whereas
in Python I have to be concerned about the whitespace and things are a bit
more verbose. I also get more out of the graphing capability of R, but that's
probably because I don't understand Python's graphing well enough. Be that as
it may, R just seems to have what I need to get things done as sloppily as I
need. My workflow tends to be a combination of Python or Java spitting out
numbers, and then using R to analyze and graph those numbers, all glued
together with Bash scripts.

------
catnaroek
> For someone like me, who has only had some programming experience in Python,
> the syntax of R feels alienating initially. However, I believe it’s just a
> matter of time before adapting to the unique logicality of a new language.

I preferred R to Python right from the start. However, R is anything but
logical, and its syntax is the least of its problems.

> And indeed, the grammar of R flows more naturally to me after having to
> practice for a while, and I began to grasp its kind of remarkable beauty,
> that has captivated the heart of countless statisticians throughout the
> years.

Wow, statisticians care about beauty? This is a shocking scientific discovery!
(In the social sciences, but don't let this detract from your achievement.)
What data do you use to support your theory?

------
tomerbd
I use R (or want to use it) whenever I find myself using excel or google-
spreadsheet. If I was more fluent in R I would use it many more times. I found
that using it instead of standard spreadsheet was much more robust.
Spreadsheet have their role, however R is an amazing tool to have in your
programming toolset.

------
closed
It's clear there are a lot of strong opinions about R!

One kind of obscure problem I run in to is R's embrace of a global namespace.
Package developers sometimes assume people are using this namespace, and
access it via the globalEnv() function. This means that to use the package
anywhere else, you basically have to patch their code.

(in contrast, I don't even think about problems like this occurring in python
packages. Worst case scenario, can just use a subprocess )

------
uptownfunk
R is great, haters gonna hate, but when you want to prototype a model, nothing
flows like R + tidyverse + RStudio.

------
doggydogs94
YAFL, yet another “fine” language.

------
abakus
The biggest problem with R: it is too slow.

------
jerianasmith
R is free programming - see the R site above for the terms of utilization. It
keeps running on a wide assortment of stages including UNIX, Windows and
MacOS.

------
gregman1
There's no reason to use R unless you are unable to learn.

~~~
pmyteh
That's silly. If you're doing research using new statistical methods, they're
almost certainly available on R first. And ggplot2 remains the best plotting
library I've ever seen.

