Hacker News new | comments | ask | show | jobs | submit login

I have always considered R the best tool for both simple and complex analytics. But, it should not go unmentioned that the features responsible for R's usability often manifest as poor performance. As a result, I have some experience rewriting the underlying C code in other languages. What one finds under the hood is not often pretty. It would be interesting to see a performance comparison between Python and R.

Given that R folks are porting it to the JVM, I guess performance on the R side will improve thanks to Hotspot and Graal/Truffle.



Then there is PyPy as well.

I also think they should probably add Julia and Wolfram/Mathematica to these comparisons.

I would say they're both as limited as Python, Julia far more so. R's stats packages get ported to Julia faster, though. Mathematica still can't do mixed generalized linear modeling, and no other language (other than SAS and Stata) has a package for analyzing simple effects within them.

Thanks for the overview, I don't use them. It is more my language geek side speaking louder. :)

I have found Renjin quite useful in the past, and I love the motivation behind the project. I know that the guys at Bedatadriven hope to improve upon its performance, however it does not always (or often, depending on how you use R) outperform GNU R. Some great changes have been made lately (http://www.renjin.org/blog/2015-06-28-renjin-at-rsummit-2015...), so I hope to see Renjin's performance progress beyond GNU R across the board. I actually contributed Renjin's current PRNG – a Java translation of GNU R's – which was my first experience getting under R's hood.

The Purdue project you linked looks quite interesting. Unfortunately, development appears to have stagnated: https://github.com/allr/purdue-fastr

[edit] Another important aspect that Renjin contributes is the packages ecosystem: http://packages.renjin.org/

R being single-threaded internally may also result in performance hits.

R also has tools to spread tasks over multiple cores or over a cluster quite effortlessly. In practice, I can create a Fortran or C++ module, then use R to apply it over multiple cores, and get fantastic performance for certain tasks.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact