
What's Next for R? - carlosgg
https://qz.com/1661487/hadley-wickham-on-the-future-of-r-python-and-the-tidyverse/
======
meztez
I would highly recommend the use of the package data.table over tibble or the
basic data.frame if you are doing any type of modeling in R with larger
datasets. Yes R has many data structures but knowing how to use data.table
will blow your mind in term of efficiency. Matt and other contributors have
built something extremely fast and flexible.

I get that R is not for everyone but used correctly it is a beast.

Now this is anecdotal, but we have in the insurance industry what we call on
level premium calculators. It is basically a program that will rerate all
policies with the current set of rates.

Our current R program can rate 41000 policies a second fully vectorized on a
user laptop that has a an i5 from 2015.

In contrast, the previous SAS program could do 231 policies a minute on xeon
64 core processor from 2017.

For our workload and type of work, R has been a godsend.

Bonus, we can put what our data scientist develop in R directly in production.
(after peer review, testing, etc, not different than any other production
code)

Back when I started in 2005, we modeled in some proprietary software like
Emblem, used Excel to build a first draft premium calculator, rebuilt the
computation in SAS for the onlevel program and sent specs to IT to rebuilt the
program again for production. All three had to produce the same results.

I've tried Python, Go, Rust, Julia. I'd say Python could be a good alternative
but speed of data.table, RStudio IDE and ease of package management in R makes
R an obvious choice for us. I believe Julia to be the future but so far the
adoption rate in house has been low.

~~~
ACow_Adonis
As someone "fully fluent" in both, for many workflows that can be properly
implemented in SAS, you would expect on a technical level the SAS program
could be faster. It's a fully compiled language, it's a "simple" compilation
model (compared to R), and the interaction between incremental compilation and
the macro system allows you to do some really good blurring between run-time
and compilation when performance matters. Plus, by abusing the fact you can
define both sql and data step views to further minimise disk read/write,
database pass through on certain procedures, and allowimg for in-memory
operations (like R) with the sasfile command, from a purely technical point of
view, an experienced user of both should be able to beat R in SAS.

But... and here's the big but...I almost never actually meet anyone capable of
putting all these steps together in SAS these days that actually understands
the SAS computation model end to end.

And SAS's strength, a computation model not being limited by memory by
default, becomes a performance weakness when everyone reads/writes every step
out to disk and programs without understanding all those little intricacies.
SAS hasn't helped any of this by trying to move its eco system away from
"programmer" to "application users", so now "programmers" can pick up an
interpreted language like R with in-memory default vectorised operations and
beat SAS.

Course, I'd still recommend places move to python/R these days because of the
broader ecosystems, university talent pool, and avoiding the extensive lock in
of proprietary software, but I still feel I have to reflexively respond to "R
faster than SAS" claims :p

~~~
meztez
Believe me, I know. The code just becomes unreadable when you put all
execution inside the same data step and use hash table to do fast small to big
merging. And not to mention debugging that mess when you have a macro layer on
top of it. Not having access to function source code, installation process
being what it was. I do not miss it.

And yes technically SAS is faster than R but part of the equation is how many
people can make SAS code faster than R/python. I had maybe, 1-2 people that
could write efficient SAS code.

One version we had was a bunch of macro producing hash merge plus the whole
how can I do something without having to get out of the data step. Just
horrible. Number of characters in a line of code? You forgot your quote
somewhere and now you have to run the magic line.

I hope I'm not too emotional when I say I hope SAS disappears from my industry
and we embrace less adversarial licensing.

~~~
ACow_Adonis
I don't think that's being emotional at all.

I'm being emotional when I say I have a soft spot for it because of some
nostalgia and occasionally dropping in to do some "rock star" programming
moments with it. But that's the opposite of what I'd want if/when I was
running my own ship.

I too almost always try to steer myself and others away from it now because of
the licensing/customer hostility. It's absolutely ridiculous...

------
RA_Fisher
I'm so thankful for R, it's community and their great libraries! I've built a
eight year (so far) career in data science using R to model data and perform
experiments. I love R's functional programming style / dplyr which makes
manipulating data a delight. ggplot2 is such a great plotting library, well
worth the investment to learn. Then there's all the stats tools like glm,
MASS, through brms for advanced Bayesian analysis ([https://github.com/paul-
buerkner/brms#brms](https://github.com/paul-buerkner/brms#brms)). With R and
Python, it's a great time to be a statistician-programmer!

I recommend folks looking to start with R check out:
[https://r4ds.had.co.nz/](https://r4ds.had.co.nz/)

~~~
sedeki
There is also ”Advanced R” by Wickham, that goes into more technical details
on how the language itself works (and datastructures, etc).

It is also available for free.

------
latte
Cannot comment from my personal impressions, as I have almost zero knowledge
of R, compared to several years of using Python for writing apps and working
with data. I like R's focus on functional programming, though.

However, a couple of years ago, my wife tried to transition from business
consulting to a data analytics / data science role. She started with taking an
R course. She was put off by R's complexity and the course's early focus on
the details of R syntax, function definitions, closures etc. and abandoned it.

The year after, she decided to try again and enrolled in a course that used
Python (with numpy+pandas+scipy as data science stack) and she reported it to
be much simpler, more intuitive and easier to learn compared to her previous
experience with R. Now she has successfully completed the program and is
employed as a data analyst.

~~~
datashow
I guess that's more an issue with the courses than the language per se.
Sometimes it is a good idea to begin the course with direct application,
instead of focusing on the language.

~~~
cwyers
I have encountered a lot of really terrible R learning materials. One data viz
course I took (a very, very reputable and widely-used course on a major MOOC
platform) taught how to make several simple chart types in each of base R, a
library called lattice that I've never encountered since, and ggplot2. I think
a lot of it comes from R instructors who started out back before the tidyverse
trying to teach the path they _took_ to learning the language, rather than the
quickest path to being proficient in the language as it exists today.

The tidyverse is incredibly controversial in parts of the R community; it's
essentially an opinionated set of packages that basically comes with its own
"standard" library. But I think that wholeheartedly embracing it, and hiding
the way to do things in R that you would do them without the affordances that
the tidyverse offers, is absolutely the right way to teach R these days.
Unfortunately, a lot of courses and books haven't caught up to that yet.

------
glofish
What's Next for R?

Doing the exact same thing we did before!

We have a new library called "dtplyr" (no seriously!) it is designed to save
users from the arcane and obtuse sides of R by combining the power of "dplyr"
and "data.table", the two libraries that were designed to save users from the
arcane and obtuse sides of packages such as "data.frame" and ....

I wish I were kidding. There is the absurd contention in the R world that by
introducing yet another weirdly named package people can avoid having to learn
and suffer through the "real" R.

------
sammm
I started at a company using Shiny for their applications and R as part of
their data pipelines.

A huge pain point for us is the packaging system. It is absolutely awful.
Packages constantly get overridden so we have to install packages in a
specific order. Whenever I have reached out to the community (including
prominent members, which have written R books) I have always been told to just
use the latest version of all packages and just get on with it, which as
anybody knows, isn’t always possible, especially as there are constantly
breaking API changes.

I understand R’s history and that in general, it is a lot better than it use
to be, but I would only recommend R is used for notebook style work and to
keep it well away from production.

We have migrated to Python, which isn’t perfect, but the difference in logging
and packaging has been night and day.

~~~
truculent
I have also found R in production to be a nightmare. On packaging, the renv
package seems to be the new way to try to manage things. It’s not perfect but
seems to be a step up on what was around before. Have you tried it out at all?

~~~
sammm
I haven’t, thank you for the suggestion. I will give it a go.

------
bransonf
Disappointed in the lack of discussion of R-Shiny or Plumber.

R-Shiny is a full stack platform for web apps, and it’s how I leveraged my
data science background to get into web development. It’s incredibly powerful
in my opinion, with the only obvious limitation being the speed of R itself.

And Plumber. It’s become the defacto method for deploying R code in a REST
api. It too is still maturing, but I see it eventually becoming the Flask of
R.

Truth be told, however, after developing quite a few projects on the
Shiny/Plumber stack, I wouldn’t recommend anyone do it.

If for some reason you can only have an R interpreter, go for it. But learning
multiple languages really is the best solution if you want to manage efficient
applications. I say this, however, realizing that all of my colleagues writing
R don’t have engineering backgrounds.

I can’t help but feel like R is like JavaScript in many ways. Ease of use and
the ease of publishing packages very quickly clutters the repository.

R will always have a special place in my heart, after all it’s the language
that made me discover programming. However, I can’t help but feel that my
thirst for efficiency is making me outgrow it as a language quickly.

~~~
bllguo
on the shiny note - check out streamlit. declarative python equivalent. it's
pretty incredible how easy it is to use

------
thegginthesky
When I used R in University (majored in Applied Mathematics and Statistics) I
was always awestruck at how every sort of novel modeling technique from GLM,
to Beta Regressions, to GARCH, is all easily accessible for free, with proper
academic paper and documentation, and with a cohesive standard support.

It was really useful to be able to apply most theory I was learning to actual
research datasets. This is what I miss the most since moving to Python.

What I don't miss is R's terrible packaging system and how it made
collaborating with colleagues near impossible. I can't count the amount of
times I had to debug dependencies on others' script just to be able to move
forward with some team project.

~~~
CreRecombinase
What didn't you like about the packaging system? Even if you hate R the
language, R has among the most user-friendly, cross-plaform packaging systems
I'm aware of.

~~~
jhanschoo
[https://stackoverflow.com/questions/10947159/writing-
robust-...](https://stackoverflow.com/questions/10947159/writing-robust-r-
code-namespaces-masking-and-using-the-operator)

Historically, the conventional way to write R code was one that tended to
result in shadowed names (and hence brittle code).

------
luhego
I used R when I took an online course on Data Analysis. I didn't like it at
all. Its syntax is weird and painful to read. The only nice things about R are
Tidyverse and ggplot. I found Python to be a better alternative. You can use
Pandas for data analysis y EDA. Matplotlib and Seaborn for plotting. Scikit-
learn for training your models. An additional benefit is that Python is a
general purpose language that you can use to build a complete application.

~~~
curiousgal
In almost all of the use cases you mentionned, R blows Python out of the
water.

Working with dataframes in R is much much more convenient than Pandas (loc,
iloc, etc??)

Plotting is an obvious win for R, matplotlib is horrible, it's powerful yes
but it is an absolute pain when compared to ggplot.

Scikit is definitely unmatched but caret is not so far behind. Also, R has a
plethora of implemented models that Python lacks (from something as basic as
decent quantile regression to time series analysis tools).

As for building a complete application, Python is indeed the go-to.

Syntax wise, using magrittr's pipes is an absolute pleasure. Good luck doing
that with Python.

~~~
whoisnnamdi
Just as an FYI - the statsmodels python package just released numerous new
time series tools in version 0.11 rc1 [1] and also has functions for quantile
regression [2]

[1]
[https://github.com/statsmodels/statsmodels/releases](https://github.com/statsmodels/statsmodels/releases)
[2]
[https://www.statsmodels.org/dev/examples/notebooks/generated...](https://www.statsmodels.org/dev/examples/notebooks/generated/quantile_regression.html)

------
xvilka
This currently missing are better LSP (Language Server Protocol)[1] (it
supports only some of the LSP features), better linter[2] and static analysis,
better integration with GitHub[3], and so on. More on the tooling side, I
believe.

[1]
[https://cran.r-project.org/web/packages/languageserver/readm...](https://cran.r-project.org/web/packages/languageserver/readme/README.html)

[2] [https://github.com/jimhester/lintr](https://github.com/jimhester/lintr)

[3]
[https://github.com/github/semantic/issues/382](https://github.com/github/semantic/issues/382)

------
tzabal
I also got excited when I found out about R Markdown, and how well is
integrated with RStudio. I believe that it is a decent alternative to Jypyter
Notebook.

------
roel_v
I hope a hospice. Ugh that language has damaged me worse than Perl.

------
pickdenis
I know this is a dead horse, but I think R seriously shot itself in the foot
with its data structures[1]. I don't really see a solution for this, as fixing
it would never be backward compatible. I'll always pick Python over R because
the data structures actually make sense to me as a programmer (objects that
look like lists, dicts, matrices, etc. or any combination of the above, and
they all behave in very predictable ways). I think this puts off a lot of
other people like me.

[1]:
[https://jamesmccaffrey.wordpress.com/2016/05/02/r-language-v...](https://jamesmccaffrey.wordpress.com/2016/05/02/r-language-
vectors-vs-arrays-vs-lists-vs-matrices-vs-data-frames/)

~~~
zosima
True, the default semantics of R's data structures are somewhat arcane (of
course as they're based on S [1] from the 70's). And the current support for
e.g. 64bit integers leaves something to be desired.

But behind the scenes, R is just a lisp with some data structures that are
adapted to statistics and data science.

All base data structures are by default immutable. And e.g. the vector type is
extremely performant as it's just a thinly wrapped C Array. In Python you need
to reach for Numpy for anything similar, and you do feel some pain when
converting between native python types and Numpy types for various functions
which support one or the other.

The data frame is immensely powerful. And has excellent performance
characteristics as it's built upon vectors. A list of objects, like you'd make
in python is just a lot slower and more unwieldy to deal with. And much harder
to make generalizable functions upon.

Hadley Wickham's Tidyverse[2] is exactly an attempt to hide away the arcane
details and create a modern, coherent and consistent language on top of R,
keeping the power of all the great statistics R libraries. The fact that R
behind the scenes is a Lisp, with support for macros, makes this possible. For
doing data-transformations and statistics, I can't think of anything currently
as powerful as CRAN + Tidyverse.

[1]
[https://en.wikipedia.org/wiki/S_(programming_language)](https://en.wikipedia.org/wiki/S_\(programming_language\))

[2] [https://www.tidyverse.org/](https://www.tidyverse.org/)

~~~
jpalomaki
This 5 minute video by Wickham was eye opening for me regarding the lispiness
or R.

[https://youtu.be/nERXS3ssntw](https://youtu.be/nERXS3ssntw)

~~~
lispm
modern Lisps don't use unquote/quote like that.

This looks more like 'FEXPRS' from decades ago.

1962 the ideas of macros were introduced and macros are source code
transformers, which take source code and generate new source code. This can
also used in a compiled implementation, where macros translate the code before
compiling.

FEXPRs are then functions which get arguments unevaluated and can decide at
runtime which to evaluate and how.

