Hacker News new | comments | ask | show | jobs | submit login

The one thing that sometimes gets overlooked when people decide whether to use R or Python is how robust the language and libraries are. I've programmed professionally in both, and R is really bad for production environments. The packages (and even language internals sometimes) break fairly often for certain use cases, and doing regression testing on R is not as easy as Python. If you're doing one-off analyses, R is great -- for anything else I'd recommend Python/Pandas/Scikit.

Packrat is good for making production "packages" that need specific library versions etc. https://rstudio.github.io/packrat/

or Scala, Clojure, or indeed C.

R's great strength is finding the interesting bits of the data. Testing the Algo. Doing the R&D basically. Better than Python.

Once that's done, why stop at Python? If your game is production, Python will do it, but others will do it so much better, faster, more efficiently.

One nice thing about Python is that you can make a piecewise transition from Python -> C, as it is fairly trivial to wrap C code for use in Python. On the other hand, Java's C interface system JNI is pretty much universally reviled.

The same can be said about R. Rcpp makes it super easy for you to drop right into C++ for bits of code that need that level of performance.

You can beat scala and approach c in python, with python syntax, using numba. It compiles numerical python code.

Good point, but personally I am thinking about the future of clustered data analysis, and this seems to be a JVM world and Scala seems to be the language of choice. Flink / Storm / Spark etc.

Dask has that, and scikit learn is moving that way also. It even beats spark for out of core work on a single machine

Yes Dask looks good! It's definitely featuring in my "must consider" list, but I must also, for reasons of responsible planning, give a lot of weight to the JVM technologies, with all their corporate backing etc.

I'd love to hear what precise production problems that you're seeing. I know people are successfully deploying R in production, but I'd like to hear more about the challenges.

First let me say thank you for your work on R packages, you've helped a lot of people accomplish some great things!

Unfortunately I can't go into specific details without potentially divulging proprietary information, but broadly most of the issues I've seen in production with R are corner cases involving multithreading with large amounts of allocated RAM (over 100GB), and corner cases involving the data.table package. I've also seen packages that update and break backwards compatibility, although that's less of an issue. The biggest concern we have with R, however, is that the documentation and coding practices for most R packages make small bug fixes difficult without having extensive knowledge of the package code. This is not always true, but it's true enough of the time that we can't afford to maintain much production R code.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact