
Data Analysis and Visualization Using R (2014) - michaelsbradley
http://varianceexplained.org/RData/
======
minimaxir
These tutorials are from 2014. While they provide a good overview of R syntax,
a lot has been added to the R-verse such as dplyr, which the author primarily
used for his Trump Tweets blog post yesterday.

If you are interested in learning R, you may want to read the R for Data
Science book ([http://r4ds.had.co.nz/](http://r4ds.had.co.nz/)) book by dplyr
(and ggplot2) author Hadley Wickham.

Relatedly, I have my own (slightly more complicated) notebooks using
R/dplyr/ggplot2, open-sourced on GitHub, if you want further examples of real-
world analysis with publically-available data along the lines of the Trump
Tweet analysis:

Processing Stack Overflow Developer data: [https://github.com/minimaxir/stack-
overflow-survey/blob/mast...](https://github.com/minimaxir/stack-overflow-
survey/blob/master/stack_overflow_dev_survey.ipynb)

Identifying related Reddit Subreddits:
[https://github.com/minimaxir/subreddit-
related/blob/master/f...](https://github.com/minimaxir/subreddit-
related/blob/master/find_related_subreddits.ipynb)

Determining correlation between genders of lead actors of movies on box office
revenue: [https://github.com/minimaxir/movie-
gender/blob/master/movie_...](https://github.com/minimaxir/movie-
gender/blob/master/movie_gender.ipynb)

~~~
var_explained
Course author here; I agree about most of the lessons being outdated in the
last two years, and that R for Data Science is a great modern source.

I'm working with DataCamp to develop an R course that covers dplyr, tidyr, and
other newer additions to the R language.

~~~
minimaxir
Good to hear! :)

------
sgt101
I love R, but I have two problems with it that I would like suggestions to
deal with.

1\. Debugging seems way more primitive than in other languages; I get cryptic
messages and really struggle to pinpoint what is happening. Debugging in
(free) shiny is even harder, the page says connection closed and I have to
guess what has happened.

2) Code structure. R is simply fantastic in REPL and/or RStudio mode for
digging around in data, but longer programs remind me of COBOL (yes, I have
programmed in COBOL) longer programs written by other people remind me of the
need to drink alcohol. Creating good code with R is vastly harder than Julia,
in Julia the challenge is not to create working clean code - that's natural,
the challenge is to create the best code that it's ever possible to have. In R
the challenge (for me) is to make it work and not make a plate of spaghetti.

~~~
blahi
Sounds like you need some Visual Studio in your life.

And you probably need to be more assertive.

[https://cran.r-project.org/web/packages/assertive/index.html](https://cran.r-project.org/web/packages/assertive/index.html)

[https://www.youtube.com/watch?v=JWjiMvlfCwk](https://www.youtube.com/watch?v=JWjiMvlfCwk)

(and you do use testhat for unit testing, right?)

Additionally you want to write more modular code. There is lots of
infrastructure around that in R, but people just don't use it often enough
because a lot of them aren't programmers.

mlr provides very convenient infrastructure for building data mining pipelines
where you can fuse steps with each other.

[http://mlr-org.github.io/mlr-tutorial/release/html/](http://mlr-
org.github.io/mlr-tutorial/release/html/)

For non-model building activities, i.e. inference or exploratory analysis,
mason is a great way to do it.

[https://cran.r-project.org/web/packages/mason/vignettes/spec...](https://cran.r-project.org/web/packages/mason/vignettes/specifics.html)

~~~
sgt101
I use r-studio but not sure what visual studio will bring - will investigate.

~~~
blahi
Project management and develops tools.

------
Mikeb85
Considering how R has exploded in recent years, I'm sure a more recent article
could have been found. That being said, R is amazing, easily the best
language/software for any sort of data analysis. And bonus points for easy
Fortran/C++ interop, as well as easy multicore/cluster computing. Oh, and a
shout out to RStudio, which is also amazing.

------
armistace
I want to thank everyone for those links. I am learning R at the moment and I
am finding them immensely helpful

------
pwang
Since this link is from 2014, it doesn't mention rBokeh, which is a very
powerful interactive viz library for R:
[http://hafen.github.io/rbokeh/](http://hafen.github.io/rbokeh/)

------
denzil_correa
How good is the support for R when data is large and does not fit in memory?

~~~
blahi
plenty of options, depending on what you need.

~~~
denzil_correa
I guess one has options in every language including Python. Does something
make R stand out over Python for Big Data?

~~~
blahi
Everything that makes it stand over Python for small data.

------
vegabook
I use R multiple hours every day, I love it, I love the ecosystem, but I can't
help thinking that it's showing its age. It does a great job on static
analysis and visualization of smaller (< 1 gigabyte) data sets, but is
seriously challenged by anything significantly larger, and is unfit for
purpose if the data is changing rapidly (eg streaming). I unfortunately am
slowly coming to the conclusion that Spark and Flink style tools are where
data science will be at in a few years time, and while I know you can use R as
a layer on top of these, I think other aspects of R also hold you back,
paradoxically, things like the excellent base and ggplot graphics, which are
rightly lauded as excellent, but are very low-dimensional in a world where
tensors increasingly rule. I think R will remain hugely relevant for a long
time, and is, what I tell people, like Excel^2, but it's getting to point
where the world is moving on and it will struggle if it's not rewritten from
the ground up with a much faster, multicore, threaded, distributed
implementation.

~~~
dandermotj
I genuinely think these type of thoughts come from having extensive experience
as a programmer, that can consider building out systems that might reach the
performance ceiling of R. For 99% of people multicore/distributed architecture
will never even be a consideration. But I'm with you, in that having these
things from a system engineers perspective would be incredible. There are
other implementations of R out there (not just R GNU): Hadley Wickham's
discusses them in Advanced R somewhere.

