
Teaching R to New Users – From tapply to the Tidyverse - adenadel
https://simplystatistics.org/2018/07/12/use-r-keynote-2018/
======
nextos
I like R, and I think it has some great DSLs. Plus a wealth of packages
implementing many methods not found elsewhere. I also like its Scheme roots.
But I think R's design is starting to show its limitations. Ross Ihaka, one of
the main developers published some papers pointing to these issues and
suggesting to start over using another Lisp-based language [1].

I don't think he has developed anything yet, but interestingly Julia is quite
Lisp-based and addresses many of the problems he mentioned. I've used R
heavily for many years, but I'm toying with the idea of transitioning to Julia
because the combination of multiple dispatch and a type system designed for
efficient code is extremely pleasant to use. It's a simpler and faster
language, plus I can keep calling R code when needed. There are also awesome
packages that are getting developed in Julia and not found elsewhere [2].

[1]
[https://www.stat.auckland.ac.nz/%7Eihaka/downloads/JSM-2010....](https://www.stat.auckland.ac.nz/%7Eihaka/downloads/JSM-2010.pdf)

[2] [https://discourse.julialang.org/t/what-package-s-are-
state-o...](https://discourse.julialang.org/t/what-package-s-are-state-of-the-
art-or-attract-you-to-julia-and-make-you-stay-there-not-easily-replicateable-
in-e-g-python-r-matlab/11294)

~~~
peatmoss
Racket (see my other comment) is my dream for the right place to start over
for data and stats, but Julia is a pretty reasonable consolation prize.

I feel like there was enormous excitement about Julia early on, but that I
rarely hear much about Julia in the stats space these days. Several years ago
now, I took an introductory Bayesian stats course as part of a PhD program. At
the time, I was impressed that I was able to do the obnoxious toy problems
(the almost always unrealistic conjugate prior based examples) because the
distributions(?) package had built-in knowledge of conjugacy.

I think a lot of people are waiting for a "go signal" before jumping into
Julia. I think at some point, if R Studio were to announce support for Julia
within the R Studio IDE, I suspect that would serve as the signal. As for me,
Julia has been well supported by Emacs Speaks Statistics forever, so I feel
like I'm already living in the future of Julia editor tooling.

Maybe it's time for me to give Julia another look.

~~~
nextos
I think we are already seeing the first group of power users migrating to
Julia. Those that are unhappy with the current status quo and look for a
better language to create new stuff.

In my opinion, a sign of this are new packages that are quite unique to Julia:
JuMP, DifferentialEquations, OnlineStats...

------
peatmoss
This is an uncharacteristically deep discussion of R. I also agree strongly
with the "making R a 'real programming language' almost killed it"
supposition. I feel like a lot of the genuine warts on R come from trying to
take it from something that could be (and originally was?) a set of macros on
top of Scheme, and turn it into the OOP language du jour.

R was my introduction to functional programming. About the time I was learning
R for statistics in a masters program, I also discovered the book "How to
Design Programs." The combination of HtDP plus the fact that you could (and
perhaps should) mostly ignore the OOP parts of R made this a winning
combination.

Today, I _would_ like to see a slightly better general purpose language with
first class statistics and graphics DSLs, however. To me, the obvious choice
for all that is something like Racket, which already builds a sound base for
pedagogy.

I'd also say that the starting point for such an endeavor should more or less
start with the idea that it's cloning the Tidyverse. For me, the killer
feature of the Tidyverse is that so many components started with a reasonable
review of prior art in other languages.

Ggplot started with an academic understanding of the Grammar of Graphics, but
also mapped that conceptual framework onto the significantly well-researched
foundation that was R's Grid graphics. When I see people try and clone the
syntax of Ggplot, I often find myself cringing that they've missed the
extensibility of the underlying Grid Graphics system.

Given Racket's focus on building DSLs and purpose-specific languages, I think
excelling at creating a user-friendly set of stats and data DSLs could be the
killer demonstration for the Racket community. I just wish I had more time and
expertise in Racket to make that dream a reality.

~~~
klodolph
> I feel like a lot of the genuine warts on R come from trying to take it from
> something that could be (and originally was?) a set of macros on top of
> Scheme, and turn it into the OOP language du jour.

Well, R is based on S, which dates back to 1976. But R has lexical scoping,
which is one of the big things it took from Scheme. If you're going to steal
language features from Scheme, lexical scoping is at the top of the list.

~~~
peatmoss
That's a good point. My usage of R is quite a lot more recent than S, or even
the original advent of R. I know that R originally built on an open source
Scheme runtime, but in so doing, I don't know how much influence that had on
the evolution of R as a language separate from S.

------
nonbel
> _" But it’s worth noting that for the most part, people already had tools
> for analyzing data. They came in the form of SAS, Stata, SPSS, Minitab,
> Microsoft Excel, and my personal favorite, XLisp-Stat (thanks Luke
> Tierney!). But the commonly used data analysis packages had some key
> downsides:

-The graphics were too “quick and dirty” and did not allow much control over the details; they plotted the data, but that was about it; -There was relatively little ability to build custom tools on top of what was available (although some capability was added to most packages later)."_

Besides the XLISP-Stat, none of those are open source, so you cant fix the
bugs yourself and when there is a bug the company tries to hide it. I wouldn't
even put any of those tools in the same category as R when it comes to serious
data processing/analysis/stats. Its more R vs python.

~~~
jhbadger
BTW, although XLISP-Stat is mostly of historical interest now, it compiles
quite nicely on modern UNIX-like systems (even OSX, which didn't even exist
when it was last updated)

There's also a homebrew/Linuxbrew package that I contributed that makes it
even easier to install. It's still fun to play around with, and some things
like the spin-plot are still impressive today,

------
peatmoss
For people wondering about the XLispStat reference in the article, I'd point
you to this PDF written by someone from UCLA, who was a major user of
XLispStat:
[https://www.jstatsoft.org/article/view/v013i07/v13i07.pdf](https://www.jstatsoft.org/article/view/v013i07/v13i07.pdf)

It's a fascinating read, and makes me feel like we reinvent the world with the
same problems over and over.

EDIT: Actually, submitting this as a new article, as it may be of general
interest to the HN community.

------
fiveFeet
New users should probably learn Python instead of R.

------
aBioGuy
This was published yesterday (title says 2012).

~~~
sctb
Updated. Thanks!

