
Announcing RStudio v1.0 - bsg75
https://blog.rstudio.org/2016/11/01/announcing-rstudio-v1-0/
======
capnrefsmmat
I've been co-teaching a class in computing for statisticians this semester
(some details on the previous iteration here
[https://www.refsmmat.com/posts/2016-01-22-stat-
computing.htm...](https://www.refsmmat.com/posts/2016-01-22-stat-
computing.html)) and have mixed feelings about RStudio.

Most of our students use RStudio for their work. It's convenient and easy. For
developing standalone scripts or functions, rather than notebooks or R
Markdown files, the typical workflow is to write code in a file, then run it
in the current R session by selecting pieces and hitting "Run".

But this encourages terrible practices. After a few iterations, the current
environment does not reflect what's written in the file. If students write
tests in a separate file, they neglect to source in their functions, because
they're already in the workspace and so the tests run fine. Datasets that were
loaded in the R console are used in code without being explicitly loaded
there. Code gets changed without being run, or variable definitions are
changed but old copies in the workspace accidentally used instead.

We end up getting many homework submissions that simply don't run if you start
them in a new R session. Pieces are missing, code is out of order, tests only
run if you manually select the test code and run it. It only worked in the R
session of the original author.

R Markdown is a decent step, since rendering the HTML should start a session
from scratch, but when we're asking them to write well-tested and modular
algorithmic code, using R Markdown doesn't really fit.

I've begun to appreciate DrRacket's approach, where there is a "Definitions"
pane and an "Interactions" window. When you Run the definitions, the current
workspace is blown away and everything is defined from scratch from the
Definitions, so there's no lingering state from REPL interactions.
(Unfortunately your REPL history is lost, which can be annoying.) You can't
run into the same inconsistent state as RStudio actively encourages.

~~~
yihui
Isn't this a simple instruction you give students in the very first class like
"before you submit your homework, restart R session, and make sure your
submission runs in the new session"? This only requires them to click a menu
item (Restart R session), and a button (Knit or Source or something). Not
really a burden for them, but will save your life as the instructor.

As someone who had been a student in statistics for more than 10 years, I
confess I had never written a single test for my homework. Frankly I just
didn't have the time or interest (too much homework, and becoming a
professional software engineer was not the goal of the homework assignments).
That said, when I put on my software engineer hat now at work, I'd definitely
do what you advertise here and write tests carefully. If you want your
students to enjoy the benefits of both R packages and R Markdown, I wrote some
thoughts here a couple of years ago:
[http://yihui.name/rlp/](http://yihui.name/rlp/)

Don't get me wrong. I'll all for teaching students good practice of software
engineering. I just want to speak from my own memories and experience as a
student. Sometimes I feel teachers are like parents: they want kids to learn
all possible right things, no matter if they are practically able to swallow
all the good stuff (sometimes this has bad psychological consequences, like
rebellious children). If I were an instructor in statistics, I'd only require
students to submit an R Markdown document. Other things like tests can earn
extra credits but not required.

~~~
munificent
> Isn't this a simple instruction you give students in the very first class
> like "before you submit your homework, restart R session, and make sure your
> submission runs in the new session"? This only requires them to click a menu
> item (Restart R session), and a button (Knit or Source or something).

It's the nature of learners to make mistakes. The more things they have to
remember to do, the less cognitive power they'll have to focus on what they're
trying to learn.

~~~
cheriot
That's a good point, though in this case the thing they need to remember is a
key part of doing the job. Seeing a line of code run correctly once does not
mean it's correct. It's one of those concepts that comes up in many forms.

Perhaps one way to make a teachable moment of it is to help them set up a baby
CI environment. Then every time it catches something the value of good
practices is driven home.

------
minimaxir
If you're looking into learning data science/visualization, RStudio is one of
the best IDEs out there in that field.

One of the reasons I switched to using Jupyter over R/RStudio directly was the
native rendering of notebooks on GitHub, which made it pretty (example of
mine: [https://github.com/minimaxir/stack-overflow-
survey/blob/mast...](https://github.com/minimaxir/stack-overflow-
survey/blob/master/stack_overflow_dev_survey.ipynb))

The addition of _native_ R notebooks may make me switch back, although I'll
have to experiment on the differences between Jupyter Notebook rendering and
.Rmd rendering on GitHub. (and since the notebooks are theoretically language
agnostic, it might be fun to experiment with Python code too!)

Native sparklyr is something I'll also have to research/experiment with, since
according to the official Spark documentation, although R has first class
support with Spark, there is not API parity with Python + Spark, for example.
(although, sparklyr has most of the important transformers/models so it is
definitely worth a look:
[http://spark.rstudio.com/mllib.html](http://spark.rstudio.com/mllib.html))

~~~
trestletech
The older/official Spark integration for R (SparkR) is quite lacking. Sparklyr
is newer and makes up much of the ground that was missing on the Python
integration. Still a few features that need to be checked off the list, but I
think most users will find that sparklyr has the subset of features that they
need.

------
carlmcqueen
The updates here are massive.

I'd been waiting for the official release, instead of preview because of some
issues, the official release seems to have ironed out the issues.

Excited to code directly into notebooks as reproducible code for fellow
workers.

Sparklyr is really also changing the game for me in how I can integrate new
users into R. It used to take a little work to get people off SAS EG or
whatever statistics package they looked for. Not so much anymore.

~~~
blahi
R Notebooks are such a killer feature!

~~~
gshulegaard
Funny, many people say this, but iPython notebooks have been around for a
while (now called Jupyter Notebooks since they are not Python specific
anymore) and it didn't spur a massive migration to Python...at least as far as
I can tell.

Not a bad thing, it's just R Notebooks kind of seem like old news to me.

Addendum: I looked it up and it appears that Jupyter Notebooks actually have
an R kernel ([https://irkernel.github.io/](https://irkernel.github.io/)).

~~~
blahi
Yeah but jupyter leaves bad taste in the mouth. Very frustrating to debug,
binary file (no git/diff), much harder to layout than RMarkdown, no variable
explorer. Additionally, the kernel idea is kind of lame/not implemented well.
And you are getting nowhere in the data world without proper R support.

There wasn't migration to python because there's a total lack of anything to
migrate to in python. It has only textbook examples as far as statistical
models go. That's around 10% coverage and pretty much 0 coverage for all the
abstractions and utility packages in R. Pretty much the same with machine
learning (minus neural networks where it is strong). Python took some share
from matlab and data processing tools but I don't see it retaining that. IMO
it will lose it to Julia for the engineering & science people and to Scala for
the data processing people.

The "second best language for everything" doesn't really work too well in the
"data science" world.

~~~
plafl
What machine learning tools are missing from python? I think it's the best
supported language but I would like to know what I'm missing

~~~
apathy
GAMs, BARTs, extensions to the Cox model, etc.

------
statics2245
For all the naysayers... Try installing python/jupyter in a corporate
environment. It was a no go from the start at the last 4 companies I have
worked at.

R and RStudio just installed and worked for 3 of the 4 companies. The 4th
required a tweak to one environmental variable and everything installed/worked
after that.

Corporate IT restrictions can make or break software.

~~~
StefanKarpinski
The flip side is that RStudio is AGPL [1] – although not too surprisingly this
is not heavily advertised. It may be easy to setup, but your legal department
will have a heart attack if they find out that you're using it. At some point,
RStudio will ask you to comply with the AGPL or pay them for a non-AGPL
license.

[1]
[https://en.wikipedia.org/wiki/Affero_General_Public_License](https://en.wikipedia.org/wiki/Affero_General_Public_License)

~~~
jcheng
We're proud to be AGPL. The AGPL license is pretty clearly stated on both the
product page and the download page. We were an AGPL licensed project years
before we were a "real" company, because we thought that for server-oriented
software (RStudio was originally conceived to be server only), it was the
license most aligned with the principles of the R project.

It's certainly not our intention to deceive and then submarine our customers,
and I sincerely hope that's not what you were implying. IANAL but RStudio
users have nothing to fear from the AGPL, as the copyleft provisions are for
derivative works of RStudio itself.

If OTOH someone is trying to build an R editor interface for their commercial
SaaS data science startup, and want to leverage our code to do it, then yeah--
the AGPL is going to apply and if that's a problem then we try to work
something out.

(BTW I'm a fan of the work you're doing with Julia!)

~~~
StefanKarpinski
That's a fair reason for the AGPL license choice for RStudio. It is a product-
like software project after all. My experience with corporate legal is that
they are much more wary of GPL software than strictly necessary – hence the
warning about AGPL since ease of use != ease of permission. (We've had
JuliaBox fire-walled from a few big banks, albeit not for license reasons, but
because they don't want private data on the cloud, so it's a familiar issue.)
I had a hard time finding any mention of the AGPL on the RStudio site,
including some broken links.

Congrats on the release! RStudio pushes the envelope on very many data
analytics UI features. Excellent work.

~~~
yihui
Just for the record, if you follow the Download RStudio link on the RStudio
homepage, or go to Products -> RStudio, you will see the license clearly
mentioned before you try to download RStudio. AGPL is also displayed in our
Github repository if anybody cares about looking at the source repository:
[https://github.com/rstudio/rstudio](https://github.com/rstudio/rstudio) So I
don't know why it was so hard for you to find the mention of AGPL...

------
makmanalp
The profiler integration is so cool! I've never seen anything like that in a
free tool.

AFAIK RStudio is now being lead by JJ Allaire, same person who did Coldfusion
and stuff. Also in there is Hadley Wickham, of ggplot2 / dplyr fame.

------
sandGorgon
we have recently started migrating to python from R because of notebooks and
pyspark.

I dont see us moving back anytime soon, because production code in python is
orders of magnitude better than R.

I just wish there was a decent dplyr for python though :(

~~~
bckygldstn
I haven't used dplyr much, is is not similar to pandas?

~~~
sandGorgon
I think you should read the discussion here -
[https://news.ycombinator.com/item?id=11335491](https://news.ycombinator.com/item?id=11335491)

It explains it better than i can

------
jbmorgado
I wish there was something like this for Python. Sure we have Jupyter/iPhyton
and sure we have Spyder, but nothing even comes close to RStudio.

~~~
madenine
Check out Rodeo[0] from Yhat. Its not RStudio, but its an IDE with data
science in mind. Been getting better and better each release.

[0] [http://rodeo.yhat.com/](http://rodeo.yhat.com/)

~~~
jbmorgado
I was trying Rodeo these past few days. It still has a lot to improve, I
really don't think the autor should have put this out of beta so fast, it's
clearly still a 0.x version, not a 2.x.

Markdown is extremely limited (what we need is something more like knitr),
there are quite a few bugs and it's difficult to access documentation of the
packages.

I appreciate the work of the author, but all in all, it's nowhere near RStudio
and - except for the fancy interface - actually behind Spyder.

~~~
raizinho
There something close to knitr for Python:
[https://github.com/pystitch/stitch](https://github.com/pystitch/stitch)

~~~
jbmorgado
This is great thank you.

It would be great if Spyder would add a plugin to edit these formats or
something similar.

