I guess my question is aimed more at the angle of: how does R do the things it does so well? ggplot2 is great enough to learn R for it alone. And some of the data munging methods, such as `melt`, don't seem to have a well-supported port in all the other popular languages. I know that Python's pandas has one...Ruby does not. Is there something about R the language that makes it especially good at its data and statistical methods (in the way Matlab is geared toward matrix manipulation)? Or is it just that R was so heavily adopted by the stats community that, if they had picked another language, that language would have just as great as functionality as R does.
Note: I suffer from selection bias, though...a lot of the people I chat with are data scientists, where R is so ubiquitous. It may be that Python pandas is just as good as the R libraries, but I just know more R-users than Python-users.
But R has some things going for it. There are some algorithms and tools which exist in R but nowhere in Python (this set seems to both shrink and grow over time as both languages add more stuff). R's overly-terse syntax for some things is annoying for maintainers of R code, but R hackers enjoy it because they tend to be all about banging out piles of stuff quickly.
R also comes with a lot of stuff included that in the Python world would fall under many different umbrellas (see the several names I mentioned at the beginning--those are just some of the basics). Whether it's true or not, R users perceive Python as being relatively balkanized, with that long list of packages just to get started, and with the Python 2 vs. 3 divide which has plagued it for years and will continue for a while still.
And why would you need to google the function c? I don't think there's ever been anything more I've wanted to know about it than is written on ?c.
But your second paragraph makes a good point. For any given big csv of numbers it's a whole lot faster and fewer LOC to clean, organise and plot in R than in Python, even with Python's ever-growing list of imports.
Ultimately the ARMA calc didn't do what they wanted mostly because ARMA was the wrong thing to use on the dataset in the first place, IMNSHO. This could my general lack of experience with R but I've been programming for 15+ years and it was one of the rougher languages to work with.
Anyway I ported the code to python, numpy, scipy, scikits (and most significantly the time series stuff) and it was much easier to pull in the data an apply smoothing filters and do some general data clean up work but the ARMA was nowhere to be seen and I settled for simple linear and quadratic fits and think it did a better job of forecasting. I really liked some things that R did automatically like when trending data it added confidence intervals on the forecasts. I was actually tempted to port the ARMA libraries to python over this but didn't want to dedicate the time to debug and validate it. R was really good for interactive manipulation but python was better for actual deployment.
But other people always seem to have big problems with things that never even occurred to me.
In this case, I've been using R to pull data out of SQLite, SQL Server and Oracle db's every single day, for years. And I've never had any problems at all. It wouldn't even occur to me to think that R's ability to get data out of a db was anything other than "just fine".
I think part of the issue is that the typical use cases of Python and R are a bit different, so a lot of functionality that in case of Python comes in well-debugged and well-documented standard libraries, in case of R comes in relatively little-supported user packages.
Also, the standard package documentation system in R is absolutely atrocious; I am convinced that R would have been far better off without any package documentation standards at all.
I just skimmed the first few google results for "install python module windows" and none seemed particularly helpful. The page you point to says "The files are unofficial (meaning: informal, unrecognized, personal, unsupported) and made available for testing and evaluation purposes." Anaconda looks appealing, but wants my email address (and automatically checks the bother me box), ugh.
It's also worth bearing in mind that python has only become a reasonable competitor to R (for statistics) in the last couple of years. Without pandas and IPython, python is a much less compelling option, especially given that most R users are not programmers and just want to figure out what's going on in their data.
(And thanks for the kind words :)
R is a vector/array-based language which fits the problems of its domain in a natural way. On the other hand, you wouldn't really want to use such a language for anything else then data munching.
The language has it's flaws but those are well described, e.g., in "The R Inferno" (http://www.burns-stat.com/documents/books/the-r-inferno/).
As the above suggests I don't hold R the language in much esteem. What it had was familiarity for people who had used S, and now, years and years of accumulated libraries. I don't believe there were any technical features in R that led to success. It was purely social: a free and open source language that was close enough to an already familiar tool.
I'm interested to hear dissenting opinions, but you'll need to back them up with specifics.
First some preamble. One of the fundamental design features of Scheme is lexical scoping, at the time a relatively unusual feature. Lexical scoping greatly simplifies program comprehension and compilation. In a lexically scoped language, a binding -- that is, an association between a name and a value -- is only visible in the scope in which it is defined, and any scopes within that scope. This means the textual source of the language determines which bindings are visible -- simple and no surprises.
The save function in R doesn't save a value, it saves a binding. When you call load you actually add a binding into your scope (local or global scope? I'm not sure and the docs don't say.) This is absolute madness. It means you need to know the name that was bound to the value when it was saved, coupling the code that uses the saved value to the code that produces it. Imagine programmer A writes the code that calls save and programmer B loads values. They agree on a name, but then programmer A changes that name ... and breaks programmer B's code!
Now you might argue this is a standard library issue, not a language issue, but I argue the two are so tightly coupled you can't consider one in isolation.
Language and standard library are tightly coupled, but problems with the standard library are _much_ easier to fix than problems with the language.
I'm asking because S-PLUS (subsequently mostly replaced by R) was one of the first statistical languages I learned, and I have learned and actively used many other languages since, but I never before (or after) had this feeling that "this is the most convenient language in existence, and it does everything precisely the way I want and expect".
I don't know Ruby (the snippets I've seen do look very nice, but it's mostly used in a very different domain), but Python does not come anywhere close (e.g. compare the treatment of defaul parameter values!), nor does Matlab (one function per file? wtf?), or C++, or Gauss, or really anything reasonably high level that I can think of. SAS and Stata might have a slight edge over R in very specific use cases, but outside of those, there is absolutely no comparison. Julia has a lot of potential, but imo it's not quite there yet. Also, R is ridiculously easy to incorporate C/C++/Fortran/etc. code into, and S3 is a really wonderful OOP/abstraction system.
It's definitely not perfect, and today R does not quite evoke the same sentiment as S+ did back ~15 years ago, as R did away with some of my favorite S features and introduced a lot of complexity (although, it's also possible that my typical use cases became more complex, so I have to deal with internals more often). Also there are features like the apply() family of functions that I've seen done better, and some R features apparently make it hard to optimize code. But there is very little in the language that I could honestly say I seriously dislike.
I'll take a risk and ask: what are these strengths you're talking about that JS has?
For prototypal inheritance in R you can try the proto package .
No one expects the exception.