
Why Learn R? It's the language of Statistics - Anon84
http://blog.revolutionanalytics.com/2010/06/why-learn-r.html
======
T_S_
The article provides a very good example in support of the argument domain
specific languages to help you crank out domain specific code. I like R and
use it. But I also code my own methods (in haskell these days, but that's not
the main point). Before you go and build your company's code base around R
realize the following:

1) The primary use case for R is by academics and students comparing various
methods on various data sets accessible through R.

2) The code is not really designed with superb reliability in mind. I have
debugged a contributor's fortran code and sent in a patch that never appeared
in the R code base. The bug remains. Professors often don't support code all
that well. Don't blame them. Support is left as an exercise.

3) There are no assurances about the scalability of any particular routine--
even if the algorithm scales in theory.

Do try R. It's good. But don't think SAS and the like will disappear. They
cater to the production requirements of big companies. And don't use R as an
excuse not to write your own production code.

~~~
thangalin
1) With the integration of R and PostgreSQL (PL/R), this might change. For
example, calculate the area of a spherical complex polygon on Earth using
PL/R:

    
    
      CREATE FUNCTION plr_polygon_area(
        latitude double precision[],
        longitude double precision[])
      $BODY$
        areaPolygon( cbind( longitude, latitude ) )
      $BODY$
      LANGUAGE 'plr' VOLATILE STRICT
        COST 1
        ROWS 1;
    

Pretty powerful.

2) The learning curve for R was 30 days for me. (Still learning, but
everything is now no longer alien.) I submitted a bug to a Professor. Not only
did he fix the bug within a few hours, but he offered to personally send a new
build for my platform. He also suggested a performance improvement for my code
(by practically rewriting it) that resulted in code 43 times faster.

The PL/R mailing list has been nothing but helpful and expedient.

3) Scalability of R functions is not too difficult to test.

~~~
T_S_
Foreign function calls are a good thing to support. R does this well and lots
of R routines already call C or Fortran libraries.

Domain specific languages are good (and fun). Interacting with open source
developers can be an awesome expersience. I agree.

I just don't see R as a platform that can integrate tightly with other
business systems. It's a user oriented tool. If there are use cases out there
that disprove this. It would be interesting to hear about. Especially as we
move into the era of "big data".

~~~
thangalin
It need not integrate tightly with other business systems. Especially as we
move into an era of big data, it need only integrate well with databases.

I am creating a website that allows the general public to create reports on
how the climate has changed, such as:

<http://i.imgur.com/o8fTg.jpg>

PHP, PostgreSQL, R, and JasperReports to analyse 273 million rows of data
across 8000 weather stations spanning The Great White (soon to be Green by the
looks of it) North for the last 110 years.

The trend line, shown in orange, is calculated in R using a Generalized
Additive Model. There is no way I was going to (or even could) write such a
complex algorithm myself. When I started the project, I was using MySQL. I
migrated the database to PostgreSQL specifically so that I could use R for the
analysis. I migrated the database before learning R.

~~~
T_S_
Good stuff.

------
tel
You know, I often forget to mention formula notation as a reason I love R. It
is superbly natural. R's natural support for data frame (collections of
multidimensional observations) and formula notation make's it easy for every
single disparate library to convene on a common language for high-level usage.
Pretty near every function shares the same first two parameters:

    
    
       out <- f(formula, data-frame, ... other options ...)

~~~
Sukotto

       Pretty near every function shares the same 
       first two parameters
    

Better then to make those two params passed by default so you don't have to
type them every time.

~~~
scott_s
The functions take the same _idea_ as a parameter, not the exact same
parameter. You still need to specify _what_ formula the function will operate
on.

------
thesnark
I certainly agree that coming up the learning curve with R has greatly
improved my working knowledge of statistics. However, lately I find myself
becoming increasingly frustrated with the language itself as my projects grow
in complexity. Of course this could be due to my lack of skill with R or
programming in general... Does anyone else have this issue?

~~~
golwengaud
I did, quite quickly. In particular, R seems to be focused on univariate data;
support for multivariate data (things like "lego plot" histograms, or any kind
of histogram in more than one variable) is patchy at best.

Having said that, I must note that I am new to R, and rejecting anything so
quickly makes me very nervous. I suspect that I have not even come close to
plumbing the depths, so to speak, of R's capabilities.

~~~
jacobolus
To both of you: I’m far from a statistician, so YMMV, but after trying to work
with multivariate data in both R and MATLAB, I really found using numpy/scipy
to be substantially nicer. And even better, if you ever decide to do anything
_else_ in a program, like munge text or interact with internet services, a
general-purpose language like Python is a big advantage.

------
buckwild
If you guys like using R for statistics, you should definitely try S/S-PLUS
;-)

<http://en.wikipedia.org/wiki/S_%28programming_language%29>

~~~
jacobolus
Why? (Genuinely curious; your comment is just a throwaway line, but you
probably wouldn’t have said it if you didn’t have some real reasons.)

~~~
buckwild
Matlab, R, and S-PLUS are all different animals. They are usually used for
different things (and what they are used for can vary from person to person).
For instance, I use matlab for object oriented scientific programming, R for
quick and dirty (but math intensive) scripts, and S-PLUS for quick and dirty
stats intensive scripts. Long story short, S-PLUS is geared towards statistics
(better modules, functions, etc). If you do some searching, you'll find that R
is actually an offspring of S, with S-PLUS being R's sibling. If one can
program in R, picking up S-PLUS should be cake since the syntax and
programming hats are similar.

I don't really like to type that much, hence my initial terse commentary--but
I can certainly oblige someone who is genuinely interested :-)

~~~
T_S_
R and S-Plus are very similar. Curious when you would prefer S-plus over R?

~~~
wildanimal
S-PLUS objects reside on the hard drive whereas R's are stored entirely in
memory. So R is faster for smaller computations but runs against limitations
when the data sets are large, though you can use the bigmemory library or
store your data in external databases - e.g., SQLite or PostgreSQL and pull
off chunks as you need them. R also had a much more extensive library but I
heard that S-PLUS (as of version 8) made their program compatible with R so
that R's libraries could be used in S-PLUS. Also, R has lexical scoping; I
think S-PLUS only has global and local like Matlab. I personally like lexical
scoping so can't think of cases when you'd find S-PLUS's scope definition
advantageous.

