
R: Lessons Learned, Directions for the Future [pdf] - tosh
https://www.stat.auckland.ac.nz/~ihaka/downloads/JSM-2010.pdf
======
peatmoss
I feel like Julia is probably the thing that solves the technical /
performance issues with R.

That said, R’s strength these days is about expressive power through the
Tidyverse collection of libraries and DSLs.

If I were to pick an “R Next” project, it’d be to focus on a better, more
expressive Tidyverse for Racket that plays even more nicely with relational
databases and frameworks like Spark.

------
pickdenis
This is fine and all, but I think they're completely ignoring the elephant in
the room. R is a crazy, random, whimsical language that puts PHP to shame.

I use Python instead of R unless I'm told to use R. I hate using R, even if
there are better libraries written for it than the Python equivalents. I can't
stand the terrible naming conventions (seriously, can't you at least be
consistent with CORE FUNCTION names?) and ridiculous amount of data
structures. There are vectors, lists, matrices, tables, data frames, S4
classes, environments, oh my... I've been programming in R for a couple of
years now and it still takes me around 2-3 tries to figure out what's stored
in a variable and how to access it. Do I need two ['s, a trailing comma inside
the [], etc.

Debugging R basically seems to mean "use a hack to generate stack traces."

Maybe I'm just stupid, but I see _absolutely_ no reason to encourage use of R
over Python. I love lisp and the ideas it espouses, but R seems to take the
worst from that world.

~~~
scottlocklin
You haven't gone deep enough: the interactivity is vastly better, and the
package ecosystem for statistics and data science is generally much more
complete and actively developed. scikit learn is very good, but if you're not
using that or doing dweeb learning, you're up the creek without a paddle.

Python doesn't even give you matrices as first class citizens; while I used a
lot of Python before I used R, it still feels like they bolted lapack onto an
unrelated scripting language and built things with it. More or less because
that's what it is.

Personally I don't think the R language is anything special, good or bad: it's
a typical sloppy interpreted language (though many of the difficulties
described in the above 2010 document no longer exist). It's the package
management system that makes it useful. It's not even a great package
management system, especially when dumb kids use it like it's nodejs. But it's
good enough to allow potentially crummy programmers (aka statisticians) to
contribute meaningful and useful code to the ecosystem.

~~~
j88439h84
Why care if data frames are built into python or not? In R, I use tibbles
anyway.

------
ianbooker
Well, it is a document from 2010.

Machines are faster now, we have seen the hassle with Python 2 to 3 adoption
(or non-adoption) and how hard it is to change a language, generally the model
to use a slow but comfortable language for model specification and execute it
via C lib is more accepted now, and last but not least: The Tidyverse really
has momentum now.

Sure, Julia and Python are coming after R, but the ecosystem itself is far
from done..

~~~
tmalsburg2
Plus, R's performance limitations are not that big of a deal. In my
experience, the bottle neck is just a couple of lines of code that can easily
be replaced by some lines of Rcpp. Much easier than switching to a whole new
language and ecosystem. I was really excited when Julia was new but the cost
of switching is just never going to be worth it for me personally.

------
uptownfunk
I’ll be frank here. I’m an avid R user, I always hear about Julia but until
anyone can show me something even remotely close to tidyverse in Julia, I’ll
stick to my subset of R that gets me to 80/20 (and really it’s more like
98/2).

~~~
hardboiled
[https://www.queryverse.org/](https://www.queryverse.org/)

fin.

~~~
cwyers
That seems to cover the core of the tidyverse, but not the long tail.

------
RA_Fisher
R is plenty fast. Commonly it's calling C++. When I run a multilevel model,
it's C++ I'm waiting on, not R. The tidyverse + ggplot2 + statistical tools +
interactive nature makes working in R really productive.

~~~
hardboiled
Speed is not the only computational constraint - the tremendous memory
required for very large data sets is often a severe limitation especially
within R (and I actually enjoy R).

This is Julia's native, in-language approach (no delegating to C/C++):
[https://juliadb.org/](https://juliadb.org/)

------
timClicks
Note: this paper is from 2010 and he has been making similar statements since
2008
([https://www.stat.auckland.ac.nz/~ihaka/?Papers_and_Talks](https://www.stat.auckland.ac.nz/~ihaka/?Papers_and_Talks))

Giving the timing, I'm interested in what he might think of Julia, which seems
to have reaches a similar conclusion - statisticians need a new tool.

------
anthony_doan
I use R because of my stat major also I am not fond of SAS. Also most bleeding
edge statistic stuff are on R and no where else. There are tons of
statisticians and other researchers just publishing paper, packages and code
of their research and how to do it. R may be a one trick pony but it is very
very good at that one trick.

You can see via The R Journal
([https://journal.r-project.org/archive/2018-2/](https://journal.r-project.org/archive/2018-2/))
and read through what researchers have done and published via packages.

~~~
hardboiled
This is the best justification for R: Its statistical lineage and adoption by
researchers. There's plenty of great thinking manifested in these libraries
and its community. This can't be understated.

------
hexhead
R feels a lot like Perl to me, idiosyncratic but useful. I switched from Perl
to Python years ago, but like R, I remember it being a scrappy language. It
was sorta like how I use 'vi' for admin work, while I use a full IDE for large
code bases in C/C++. I can crank up R to do quick things I could do in Julia
or Python, but I am in the situation where all my colleagues use R (and
Matlab), since they are not programmers. If their prototypes work out, I code
them in C++ to run on large clusters. C(++) is the only language I've worked
in consistently for over 30 years. I have hope for julia.

------
data_spy
I use R because mainly due to the RStudio guys. I totally understand many of
the author's points but you can easily write your code to avoid storing
massive amounts of data in memory.

edit: grammatical error fix

~~~
peatmoss
I think this reason is very valid. R has warts and issues, but RStudio and
Tidyverse authors have shown that R is an expressive language for building
coherent DSLs that _users_ love.

~~~
hardboiled
R is quite good at developing DSLs (quite Lispy), but Julia is even Lispy-er
with a more flexible syntax and unicode identifiers. R, definitely, has a
great statistical lineage, but that is also unfortunately what limits it from
being as expressive - especially regarding new data types. No one in R creates
their own domain-specific data types (that are performant as well) whereas
this is the norm for the Julia ecosystem.

[https://docs.julialang.org/en/v1/manual/metaprogramming/inde...](https://docs.julialang.org/en/v1/manual/metaprogramming/index.html)

------
gnat
2010 document.

------
wodenokoto
Can we add 2010 to title?

