Hacker News new | past | comments | ask | show | jobs | submit login
RStudio: Integrated development environment (IDE) for R (github.com/rstudio)
99 points by _benj 8 months ago | hide | past | favorite | 81 comments



As someone who learned most of my initial coding abilities through R and RStudio in a data science context, and since moved on to more “standard” languages and IDEs, I’ve yet to find anything that comes close to the flexibility and integration of RStudio for hacking together data analytics.

VS Code/Python has made some major improvements in the past couple years but it’s still very clunky compared to the ease of running R code line by line without having to start up a debug instance. And now with copilot the most frustrating parts of R (such as remembering all the Tidyverse syntax) have been abstracted away.


My partner does a lot of biostats in RStudio and I really think it breds terrible habits. Instead of categorizing code by files, everything is shoved into massive files. Instead of running a file top-to-bottom, code is run out-of-order which makes the code organization and flow of a program a complete disaster.

There is something to be said about running and processing large CSVs and keeping that in memory while running other parts of the program as well as having clickable access to all the dataframes loaded into memory.


There's nothing about RStudio that encourages big single files or writing huge unstructured scripts. RStudio is a pretty good IDE, and R is a highly expressive functional-first [0] language. R was heavily influenced by Scheme, and has its own powerful metaprogramming [1] system - which is used to great effect in Tidyverse[2] libraries to make APIs that are nicer and convenient than anything reasonably practical in Python.

The problem with a lot of end-user R code is that it is written by statisticians, not programmers. They'd write the same garbage and huge scripts in Python (trust me, I know).

[0] http://adv-r.had.co.nz/Functional-programming.html

[1] https://adv-r.hadley.nz/metaprogramming.html

[2] https://www.tidyverse.org/


I agree that RStudio isn't too awful, but the packaging management and reproducibility situation in R is dire, even compared to Python.

I have to deal with getting code from data scientists into production, and simply getting it to run outside of their mutant local environment can take days. Things are starting to get a bit better with packrat initially and now renv/pak/rig and the like, but most DS haven't heard of them, and major breakages between minor library versions are still commonplace, as are undocumeted system library dependencies. Then there is the whole stringsAsFactors nightmare, thankfully slowly on its way out but still around causing occasional catastrophic breakage.

There are lots of nice things about R, but it makes it very easy to shoot yourself in the foot.


Yeah, the package management situation is a big weak spot. There are some issues with renv, but it is usable. It definitely helps to keep a lid on the number of dependencies, and for God's sake never pull anything in from Bioconductor. IMO, new code should always prefer Tidyverse libs for basic stuff, and avoid relying on the ancient and warty standard library.

All that said, I still greatly prefer it over Python for DS work.


>I agree that RStudio isn't too awful, but the packaging management and reproducibility situation in R is dire, even compared to Python.

I've had exactly the opposite experience. For R, I download R and install it, and download Rstudio and install it. Then when I need a new package I just install.packages("coolnewpackage") and it just works (TM). Occasionally I get info messages about packages being built in newer versions of R, and once a year or so I eventually get around to looking up how to use the updateR() function, but in five years of doing biostats in R I can't remember a single time I had a dependency issue.

Python, on the other hand, is a nightmare. Conda makes life a lot easier, but it is not easy to learn if you are not a software engineer (remember, R was made not just by statisticians, but for them as well). For many projects, my Python flow was something like...

Try creating a new conda env with the packages I think I need. Try starting the project, oops I don't have spyder-kernels installed. Oh, and my environment isn't compatible with it. How about just running it in VScode? Well now I don't have my variable explorer. How about Jupyter? How do I get Jupyter to find my conda env again? Oh wait I need this other library it's only on conda-forge, and then the conda environment solver fails. I guess I'll start from scratch with a new conda env, and maybe after several trial-and-error sessions of carefully composing the correct "conda create -n ..." incantation in a text editor before copy-pasting them to the command line, I might get the environment I need up and running, after conda finishes its 10-minute compatibility search and downloads 80 GB of python libraries.

And using conda is the easy way of doing it! Don't even get me started on pip and venv...


With R on Windows, you get some binary dependencies, but on Linux you need the system libraries for any package that uses an external library. R uses the HTTP headers to determine which binary package to send you and no roll-your-own package system for virus scanning and the like supports either the Conda contrib patterns nor the R HTTP code binary scheme. I think Conda used to be kind of cool, but I have the same problems, and its position was always to make a ton of assumptions about what you want to do. R is like that... Sensible and automatic defaults that you can't find or aren't told about.


I have never needed anything more than pip in 8 years of development, and have always run into issues with r packages (every new version of r seems to break 30% of existing tidyverse packages)


Do you do much DS/ML in Python? I definitely agree that pip is totally fine otherwise.

At work, I've been giving out about pip to one of our DEs for a while, and when he needed to upgrade a bunch of DS packages he finally started coming around to my opinion.


Great summary of the situation. If you've ever been in the position of trying to explain to a bunch of R users why Python packaging is so much harder to deal with, you know the struggle. R/RStudio really makes it incredibly easy to get up and going for non-developers in a way that's probably hard to appreciate for many people on HN who are SWEs by trade.


Your own experience seems to disprove the claim that conda makes running analytical/numerical code easier in Python. Simple venv and pip really is the simpler choice.


I think a lot of the problem is that R does everything it can to prevent people from writing modular code.

It doesn't have modules or namespaces, and the current fashion is for packages to use non-standard evaluation which adds friction to user's writing their own functions.


R does have namespaces. Take a look at the NAMESPACE file found at the root of every R package, which defines the symbols and methods exported by the package.

Note for many R packages, the NAMESPACE file is autogenerated from roxygen docs: https://cran.r-project.org/web/packages/roxygen2/vignettes/n...


> which defines the symbols and methods exported by the package

Which are all dumped into the one single global namespace regardless if you want everything or not.

I can't remember the exact number, but tidyverse package imports literally thousands of things into your global namespace on package load, coupled with any other dependencies and you have a hell of a time figuring out where any function or constant came from.


Calling library() is kind of an antipattern in production R code. You can either call namespaced functions (like say dplyr::mutate()), or use roxygen.

https://roxygen2.r-lib.org/articles/namespace.html


Agreed but the GP isn't wrong. It's much much nicer to import a library with an alias in Python.


> it is written by statisticians, not programmers. They'd write the same garbage in Python

I guess I should take offense as a statistician. But its a fairly common complaint. The reality is, most of us statisticians are trying to compute a result. Like once. Or sometimes twice. For a paper. Or a task. If someone comes to me with a time series and asks me to test it for stationarity, or find the p lags to make it MA(p) stationary, they aren't asking me to write a program. The goal is not reproducibility. The goal is a fast answer. I've used R at trading desks & financial institutions - the goal has seldom been "run the same program again, but with this new input". If that was the case, I would write a function & stick it in a nice library with documentation. But these aren't tech firms. We aren't shipping software. The goal is to compute something fast so you can get on with life & make the trade, or draft the next paragraph in your paper, or... Like if they give me a set of bespoke mortgages with some hairy constraints & ask me to compute the value at risk, there is not much point in building some VaR function. Because its a once in a while thing. Next time it will involve a different set of args & they'd be different constraints & so forth. So just write some 10 line script & get the number & move on. Yeah, sometimes I would stash the script in some repo & write a 1-line comment on how it works - but its kinda pointless, it doesn't get much play/reuse. We aren't programmers in that sense, we are just trying to solve problems.

My kid knocked on my office door yesterday. He's in some AoPs course where they use generating functions to count stuff. So he had a problem about the number of ways to add three odd numbers to make 1001. He had worked out the algebra & gotten some number, but before he hits Submit, he wants to doublecheck with me because wrong answers have a penalty. Now, I don't have the time to go back to school and learn what is a generating function. And I don't want to write lots of for loops & if statements & fight with syntax errors & so forth. So my 1-liner in R

dim(subset(expand.grid(a=seq(1,1001,2), b=seq(1,1001,2), c=seq(1,1001,2)), a+b+c==1001))

tells me there are 125250 ways. He says he got the same number with generating functions. Boom done! So that's what R is for. Quick & easy.


I have been an R "user" for a while now, after reading your single line approach to the problem I am reminded of the saying which goes something like this "An idiot admires complexity, A genius admires simplicity!". Perfectly splendid!


> Instead of categorizing code by files, everything is shoved into massive files.

That's not really RStudio's fault. It is just how many people use R and were taught.

> code is run out-of-order which makes the code organization and flow of a program a complete disaster.

In my experience, with R Markdown, this is untrue. I see Jupyter Notebooks with cells run out of order much more often.


I have done a lot in R Markdown, and the project I'm currently working on has me mostly working in Databricks notebooks (which are very similar to Jupyter notebooks). My execution gets out of order a lot more often in Databricks.


This is the defacto standard way of operating it I understand, which is mostly just hacking at stuff in small chunks until it sort of works and leaving comments throughout it with "run this bit on Tuesdays only".

I recently had to inherit someone's R stuff and I had to learn R and fix it all. It now runs from a makefile repeatably.

Anyway it could be worse. It could be Minitab.


> Instead of running a file top-to-bottom, code is run out-of-order which makes the code organization and flow of a program a complete disaster.

That's more a REPL issue than specific to a particular language. It's the tradeoff you make. I write my R programs in Geany and then run the whole thing using Rscript. That gives me a clean environment on every run.


Emacs + ESS? Way more flexible. Maybe less integration because many of the big R package devs work for Posit. RStudio has a lot of superfluous junk in the UI I just don't need or care about.


I've used ESS for the past few years and recently tried using RStudio when I'm on Windows. For my purposes, which is just a little industrial statistics on the side, they are remarkably similar. I feel right at home in either!


I agree - I teach statistics at a University and there is really no alternative to Rstudio for working with R. This is especially true considering that the vast majority of folk using R (in my field) have no prior programming experience. Downloading R, Vscode, downloading some R plugin, getting them to talk to each other, and only then starting to learn R - isn't very straightforward. It's also remarkably consistent on different operating systems - something to consider when half the students are on windows, half on macos...


RStudio Server on a Digital Ocean instance made my life a lot easier. Students fire up a browser, log in, and they're using R with all the packages. It was horrible when students ran R on their own machines back in the old days. Most of the questions I got were tech support rather than related to the material. And these days it has good Python support too.


This works out of the box in VSCode?

Just open a .py file, then select the snippet of code you want to run and cmd+enter

It will open a new REPL for you (using your selected interpreter) the first time, and after that all commands are run in that same one.


RStudio is just way better at choosing what code to send (if you only send the line the cursor rests on you’re gonna have a bad time. VSCode is a bit better than that but not great. Also, where does your plots get drawn when you use this? RStudio just works in this regards)


It looks like, as far as I can tell, VS Code doesn't support the interactive window for working in R, which was a bit of a surprise to me when i looked it up.

The python interactive window has pretty much fully replaced my use of jupyter, since it gives you notebook-style output without the annoyance of the notebook format. My usual workflow is highlighting lines of code and shift-enter to execute (there's also a cells syntax).

I'm surprised by this because it _is_ possible to use R in Jupyter (although I never really liked the experience, R Studio was far superior).


?

Yes it does.


I'm specifically referring to: https://code.visualstudio.com/docs/python/jupyter-support-py

The support for R looks a bit different (to me at least?): https://code.visualstudio.com/docs/languages/r

In the screenshot the window on the right does not look comparable to the output in a jupyter notebook. It looks more like a standard terminal. e.g. does it support interactive charts, html tables etc?

The Python interactive window uses the ipykernel package to allow rich outputs like that.

I still might be wrong and would like to be corrected on this, since it would mean R support in VS Code is now better than I thought (I haven't tried it fora. while)


I use r in a Jupyter Notebook in VS via IRKernel. It's a gem.


Oh - nice, thanks - so it looks like the interactive window (which is effectivey the same as the output in a jupyter notebook) is also possible, but not (yet) 'properly'/'officially' supported

https://github.com/REditorSupport/vscode-R/issues/1412


Please supply references for the audience.


An alternative in the Python world that is definitely worth looking into is the JupyterLab Desktop app, which is a standalone installer that is cross-platform and works great for beginners (no command line needed): https://github.com/jupyterlab/jupyterlab-desktop?tab=readme-...

See my other comment in the main thread with more info.


> I’ve yet to find anything that comes close to the flexibility and integration of RStudio for hacking together data analytics.

Is there a good demo or video you can point to that shows this? I have no experience with R, RStudio, or data science, but you've piqued my interest.


Any of David Robinson's (or anyone else's) Tidy Tuesday videos.

https://www.youtube.com/@safe4democracy/featured


If you work with Python, Spyder comes really, really close and is way better than jupyter


jupyter


Jupiter (ipynb) notebooks in vs code.


cat, grep, sort and awk come pretty close :)


Came here to share that same experience. RStudio truly made me feel "close" to the data.


The killer feature of RStudio for me is RMarkdown.

I composed almost all my homeworks in grad school using RMarkdown in RStudio. You get LaTeX whenever you need it, code (I usually use it for R or Julia), and markdown for ordinary text. The kable function renders tables nicely from data frames and ggplot2 creates beautiful plots.

Mathematica and Jupyter have a few advantages, but overall I'm very happy with RStudio.


RMarkdown in RStudio was the killer feature, until the VSCode R extension matured. Not only does it support RMarkdown, it adds a ton of features RStudio doesn't have and runs a lot faster. https://github.com/REditorSupport/vscode-R/wiki/R-Markdown

For my uses, it replaced RStudio 100% of the time.


Can you use quarto in vscode? It's the next magic from Posit.co


Yes, quarto has native support for VSCode: https://quarto.org/docs/get-started/hello/vscode.html

There isn't much advantage to using it over RMarkdown for R, IMO.


Thanks for the link! Is it possible to display plots inline like in notebooks? (The screenshot shows a plot in a preview pane.)


Unfortunately no. (tbh I don't like that feature in RStudio anyways: it makes it longer to scroll through large notebooks, and ggsave is better at rendering charts than R's native rendering)

For knitting, you can use Markdown image links.


Thanks for letting me know.


That's a lot of prerequisites for something that just works in rstudio.


It takes 5-10 minutes to set up the dependencies.


I'll give it a try, your work has been an inspiration in the past so I trust your good taste!


It’s really nice to have everything you need in one spot. Plus it’ll run on any OS and is free. I started learning how to program with C++ back in the early 2000s which required Windows and a Visual Studio license and it was still a pain to get stuff done. Whether it’s RStudio or Jupyter there’s really never been a better time to start picking up a language and building something useful. Three cheers for the creators, maintainers and community who support tools like this.


Freemium is what they ("Posit") are pivoting to now.

https://posit.co/pricing/individual-products/

If you want a Rstudio server to host for a research group containing more than 5 people, talk to their sales Rep.

Otherwise each person will need to host their own Rstudio server side-by-side on the same machine.

Jupyter and JupyterHub is the way forward.

Especially if they get multi-kernel notebooks mainlined (read: what Org-Mode has been doing for decades)


That pricing sheet is for Posit Workbench; RStudio Server[0] can host as many people as you have the compute for, and it's free and open source. It does only support one session per user, but might meet the needs of a small research group.

[0] https://posit.co/download/rstudio-server/


The closest Python equivalent to RStudio is the JupyterLab Desktop app[1,2], which I highly recommend. I've entirely switched to using it for teaching, and it is a godsend, since it works the same way across platforms (win/mac/linux), installs its own Python interpreter independent of any system Python the student might have, and even comes with NumPy/SciPy/Pandas/Seaborn/statsmodels already installed, which makes it possible for me to skip the `pip ...` or `conda ...` instructions altogether.

Between the standalone desktop app, and the convenience of running JypyterLab in the cloud thanks to https://mybinder.org/ links, there is now a smooth path for beginners getting into stats/ML/data science: (1) read notebook on github or nbviewer, (2) run notebooks in the cloud via mybinder links, (3) install JupyterLab Desktop app, (4) learn to install Python+env-manager via command line. Previously, new learners were forced to jump straight to (4), but now there are logical steps along the way!

[1] https://github.com/jupyterlab/jupyterlab-desktop?tab=readme-...

[2] https://blog.jupyter.org/jupyterlab-desktop-app-now-availabl...


Is it different from running through the web server? I found it to have a lot of potential but not there yet


It's the same stack (jupyterlab server backend + web frontend) but wrapped as an electron app.

Yeah for sure when I use RStudio it seems much more polished, but I guess my attachment to (and comfort with) Python still makes it worthwhile to use JupterLab rather than switch to RStudio.


RStudio and the R language are a couple of my absolute favorite pieces of software. While I'm a software engineer by trade, every once in a while I need to do some data analysis work and throwing together a notebook in RStudio always makes me feel like I'm using a cheat code. For simple tasks, everything is incredibly seamless, plus coworkers who are unfamiliar with R are usually impressed by how nice ggplot visualizations can look.


Are we just submitting GitHub repos as posts now?


I was thinking the same. R studio is certainly not new, either.


Hasn't this been happening ever since GitHub opened?


The comment section is the most interesting after all, so why not link to the source instead of digging up a blog post no one will read anyway?


I'm about as old school as you can get with preference for CLI and simple text-oriented development environments. I recently picked up R again for a long-term data science project (https://matttproud.com/blog/posts/teaser-weather-temp-repres...) after having not used it since university. In spite of a fair bit of annoyance with the R language (https://matttproud.com/blog/posts/rant-and-r-melt-function.h...), I found RStudio to make the prototyping process with R actually tolerable. Big kudos to Posit and the R community for RStudio.

There are a couple of things I would love for the R ecosystem: project scaffolding to do bulk data generation (e.g., from continuously generated data sets). What's the best way to do this: makefiles, or what? I have a relatively short entrypoint R file that sources other leaf files to run specific analyses, but it makes the software engineer inside of me want to curl up and die.


reshape2 (where `melt` is from) has been deprecated for some time, and for pretty good reasons. Try dplyr and tidyr instead - they are much nicer and modern. The equivalent of melt would be pivot_longer. For packaging, renv is the usual choice. I wouldn't structure the package as a bunch of scripts with an entrypoint. Just write functions as you would in other languages, and keep any specific analysis script small.

https://tidyr.tidyverse.org/


I enjoy RStudio but the best feature of R is data.table. It’s simply unmatched.


Polars is faster? Data.table was a pioneering speed improvement at one point for sure.


It is but if we are talking speed, I’d just opt for RAPIDS.


Once you climb that steep learning curve, absolutely.


I think one of the most underrated pieces of software in modern history. Absolutely brilliant. Huge fan. I am glad to see it getting love. I’ve moved on from data science in a professional capacity but for some pet projects of mine it has been indispensable. I think managing the namespace was one non trivial concern (which may be resolved in modern versions). Otherwise very well built for data science applications. Interesting that it didn’t catch on for LLM training - I think a missed opportunity.


Weird one minute it feels like the internet is screaming that I’m an out-of-touch dinosaur for using R and the next a simple link to its most popular IDE makes the front of HN.


If I complain here will they fix my year old bug?

https://github.com/rstudio/rstudio/issues/12508


This particular issue should be resolved in the latest daily builds of RStudio. The underlying issue here was a conda patch included in the conda-provided builds of R, which interfered with the way RStudio attempted to load R. Please see https://github.com/rstudio/rstudio/issues/13184#issuecomment... for more details.


Can't make any promises -- our dev team is pretty small! -- but it's been flagged for triage.


The answer, it turns out, was yes!


Ahhh I started my programming with Rstudio. Since than I changed to Emacs with ESS.

Rstudio is nice but lacks a lot of nice things from something bigger.


If you work with Python, Spyder comes really, really close to RStudio and is way better than jupyter


Is there a way to visualize a dataframe like a spreadsheet, as RStudio does, but for VSCode?


i use jupyter a lot for python. i occasionally have to use rstudio for bioinformatics. the ux is much, much worse. just haven't bothered to get the R kernel for jupyter working.


Ahh cool, now r-studio brings up this instead of the 24 year old data recovery program.... :-(


RStudio is thirteen years old so I'm not sure what changed that makes the search results different "now"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: