
How R Took the World of Statistics by Storm - mindcrime
http://www.statisticsviews.com/details/feature/8585391/Created-by-statisticians-for-statisticians-How-R-took-the-world-of-statistics-by.html
======
jordigh
I've regretted that Octave hasn't done for Matlab what R did for S. I
understand some of the cirumstances that made this happen, but I'm deeply
saddened of the entrenchment that Matlab has in scientific computing. It's
getting chipped away little by little at the edges by Python, and to a lesser
extent by Julia, but Matlab still is strong. And yes, some uses of Matlab can
be replaced by R, but overall the two packages target different problem
domains.

I'm taking a break from Octave, but I plan to come back to it and take Matlab
head-on, not chip away at the edges.

~~~
sandGorgon
what are the problem domains ? Do you think something like Python Pandas...
and especially Jupyter/IPython can replace it ?

~~~
B1gred
Python has a steeper learning curve and is not as curtailed to simple data
analysis. Many use Rstudio (an ide) and use the import data, and other tools
to make then skill entry even lower.

Also, mathematicians and statisticians think functionally and the general
attitude in python is to do object oriented programming while R is strictly
functional programming with a little bit of object programming.

~~~
PlotlyThrowaway
I'm a little at odds on this--for production quality analyses, (and only for
analyses) R is excellent.

However, in my experience, for the data munging required as a preliminary to
the analyses, R is worse than bad. It's as if satan himself designed a
language.

I find that what then happens is this: data scientists/statisticians/[your
favorite word here] become reliant on programmers to clean/format the data to
do the analyses.

This is all fine, but those same scientists are then put off learning python,
where they could do all of their own munging, and probably 95% of the analysis
they need to do, and where they could further add value by writing programs
that are easier to production-alize.

Job security for those who know how to write production code, I guess.

~~~
jasonpbecker
I couldn't disagree more. R is great at munging pretty much everything but
unstructured textual data. The tools are definitely behind Python if you're
dealing with literal written documents.

I don't know anyone who considers themselves a "data scientist" of any sort
that doesn't view their job as 80% or more data wrangling/munging/cleaning.

I write production ETL processes in R at my current job. AMA.

~~~
PlotlyThrowaway
May I ask what tools you favor in the R environment? I just haven't found
anything as performant for operations on irregular and poorly formatted time
series as the pandas library, and in fact I just finished an ETL in pandas for
my current job.

I'm always interested in learning a new tool, though.

~~~
jasonpbecker
I don't work much with data that would benefit from being very tight about
datetimes as a dimension. I'd have to know a bit more about what was
challenging before I could confidently recommend for your particularly case.
My email is on my profile and I'd be happy to chat there if it's something
that would be helpful.

I have largely avoided ts, zoo, etc where possible. Time series stuff seems to
have a lot of specialized tooling all of which tends to be much more strict
about data structure than I'm comfortable with for my flow.

------
B1gred
I program in R 80% of my day. I have experiences in all the major alternatives
but keep returning to R. It has one huge flaw, being slow but otherwise is
fantastic to work with and has a vibrant community.

The bigger issue is that while R is liked by statisticians it lacks many of
the features for the software development. We run across difficulties with
logging, version control of packages, speed, size of docker image, build time
etc. But, with these drawbacks I keep coming back because I develop faster and
better in R.

~~~
Simorgh
I agree with your assertion that R is slow, yet quick to develop in.

I recently had to loop through 1.3Gb of data (5000 files) and merge just one
column from each file into a new dataset. It did so in ~2 hours. Yet the loop
was just ~5 lines of code.

~~~
B1gred
It is slow. And it is ok. Very few times will R ever beat any other language.
Usually it is not off by much, but especially if coded by a novice using for
loops vs apply functions can make is 100 -1000 x slower.

Another example is the immutable structure that causes R to be a memory hog.
Creating copies of data everywhere. But, again if you plan well and execute
the 'best' solutions you can avoid the giant pitfalls but will rarely ever
beat a equally well written python equivalent.

~~~
jasonpbecker
Post R 3.1 there are far fewer deep copies (e.g. modifying a list or adding a
column to a data.frame no longer copies the whole thing like it used to).

------
anotheryou
What would be a good start to learn this?

I have some programming background and really would like to get into
statistics. Should I do some R tutorial and throw my weblogs at it to see what
I can do? Or is there some awesome learning resource you could share?

~~~
Simorgh
There is a nice book by Brett Lantz called Machine Learning. The first edition
(which I have) built machine learners in R, I assume the new second edition
does the same.

~~~
TheLogothete
Statistics does not mean machine learning just fyi.

------
frik
R replaced SPSS.

Octave replaced Mathlab.

Python based libraries are somewhere in between.

Julia with Jupyter will probably replace Mathematica, LabVIEW and Mathcad (and
unify all of the above) with a powerful native language and environment.

~~~
shele
Mathematica is quite a different beast. It will be a while until julia _has_ a
native CAS and even then it will likely not _be_ a CAS (Computer algebra
system).

~~~
ihnorton
Still early days, but see [http://www.nemocas.org/](http://www.nemocas.org/)

------
chmielewski
[http://dirk.eddelbuettel.com/code/rblpapi.html](http://dirk.eddelbuettel.com/code/rblpapi.html)

