

How the R-project is taking over statistical analysis software - jsavimbi
https://sites.google.com/site/r4statistics/popularity

======
6ren
I think open-source eventually replaces commercial products, in the same way
that proprietary products become commoditized. The response for commercial
products is also the same: continual differentiation, adding new features,
benefits, support, documentation etc. Exceptions are also the same: natural
monopolies (e.g. strong network effects).

Open-source is great at hill-climbing, where there are clear directions for
improvement and especially for features that are obviously needed by users
(provided the structure of the project is sufficiently modular to facilitate
it), by tapping the collective intelligence of users.

It's not great at "hill-hopping": originating radically different products.

~~~
cageface
Counter-examples abound. Can you name even one open source app that has
displaced a mature, user-facing desktop app with a non-trivial UI, other than
a web browser?

Open source only seems to win in domains in which it makes sense for companies
to share work in order to compete at a higher tier of functionality.

~~~
jeffmk
It has by no means "displaced" its proprietary equivalent, but Inkscape is one
of the most user-friendly open source apps I've ever used. I find it far more
intuitive than Illustrator. An incredible amount of power and complexity is
presented in a way that makes it quite intuitive and a joy to use. It's also
easily extensible if you're a programmer.

------
dj_axl
Anecdotally, NumPy (Python) has some traction. Similarly they don't consider
SQL libraries. And I'm sure there are statistical analysis libraries for Java.
According to the bar chart below R is mentioned by 45%, SQL by 32%, Python by
25%, Java by 24%. This seems a more reasonable comparison to me than the
graphs earlier (higher up) in the post.

[https://sites.google.com/site/r4statistics/_/rsrc/1318535062...](https://sites.google.com/site/r4statistics/_/rsrc/1318535062528/popularity/Fig_6_KDnuggetsPollLanguages.PNG)

~~~
UrbanPat
What do you mean by "SQL Libraries"? Do these interface with SQL to perform
analysis?

------
dewarrn1
I use R as my primary data-analysis tool for almost all of my work, with
occasional recourse to SAS for certain specialized models (e.g., PROC GLIMMIX
for generalized mixed models).

My only complaint is the awful default IDE, which can be mitigated to a large
extent by scripting elsewhere and source()ing the script, and some odd edge
behaviors including the mystifying row names of dataframes, the difficulty of
dropping unused factor levels from aggregated or sliced data (another
dataframe issue), and the perhaps unnecessary obscurity of some of the
plotting functions (although holding R responsible for the lattice library is
unfair).

All that said, for a free tool, it's extraordinary, and the authors of the
base language and the many packages that I use have my gratitude.

~~~
roxtar
Default IDE? Do you mean the R interpreter REPL? If you are looking for a nice
IDE for R, I would suggest RStudio: <http://rstudio.org/>

~~~
dfc
Why do all the R guis depend on QT? Is qt big in science applications in
general?

------
jcdreads
Probably worth noting about the author:

> Robert A. Muenchen is the author of R for SAS and SPSS Users and, with
> Joseph M. Hilbe, R for Stata Users. He is also the creator of r4stats.com, a
> popular web site devoted to helping people learn R. Bob is a consulting
> statistician with 30 years of experience

Disclaimer: I hate R's syntax, but my company's analytics group uses R for
just about everything.

~~~
zzleeper
I started learning SAS (I mostly use stata/matlab/python for my daily needs)
but also ended up abhorring some parts of the syntax..

------
migiale
Unfortunately, it's almost impossible to work with a very large datasets in R,
because of the speed limitations. Many researchers I know use Matlab because
of this.

~~~
hvs
What about Octave? Other than my use in the Stanford Machine Learning class,
I've never really used either, so I don't have any basis for comparison.

~~~
migiale
Octave is Matlab clone, in fact Octave developers openly say that except for
some special cases, any difference between Octave and Matlab is a bug.

The biggest difference between Matlab and Octave is JIT compiler in Matlab,
which does incredibly good job at vectorizing simple (or sometimes even not-so
simple) loops.

I think it's fair to say that Octave performance is very close to a Matlab in
a pre-JIT time.

There's also a huge difference in toolboxes, profiling, sparse matrix
operations, parallel computing and many-many more. In these areas I'm afraid
Octave is light-years behind Matlab.

However, you still can do a lot of useful simple stuff with Octave and it's
free! Matlab-like syntax is really, really cool then it comes to vectorized
operations. So probably these two reasons determined Andrew Ng's choice of
Octave as a main environment for ml-class. Huge win for Octave I guess. This
might spur some interest in the development, attract new people to the
product. I think it's a well-deserved success for John W Eaton and other
people who develop(ed) Octave all these years.

~~~
mturmon
I agree with your take on Octave performance relative to Matlab. The Matlab
parallel toolbox is getting more and more useful in a multicore world.

As you note, the Matlab profiler is very nice. You can zero in on the 80% of
the 80/20 tradeoff very fast, during your usual development cycle. It's as
simple as:

>> profile on >> do_something >> profile report

and you get a nice graphical/textual report on time usage in everything
do_something called.

------
aditya
Can someone point me to a good introduction/resources to R? Especially for web
stuff?

~~~
csmt
R in Nutshell is pretty good book:
<http://shop.oreilly.com/product/9780596801717.do>

~~~
linhir
I do a good bit of R programming, and R in a Nutshell has been the best quick
reference guide I have found.

------
lemming
Moderately related - has anyone who previously used R in a serious way
switched over to Incanter? Is Incanter comparably powerful?

------
traveldotto1
love R.. but have to say because it's open source, you do have to watch for
the quality of libraries

~~~
burgerbrain
Surely that would still be the case under any license.

~~~
pyoung
He is probably comparing R to SAS (which are the two most popular statistical
programming languages). SAS doesn't really have libraries, instead you buy
additional packages from SAS, which are very reliable and well supported, but
expensive.

My company shuns R (although I personally like it), primarily because of this
issue. If we need to run a rare or uncommon statistical procedure, it is a lot
easier to trust the SAS procedure, rather than an open source R package
written by some grad student.

~~~
TalGalili
True, Though if you need to run a rare or uncommon stat procedure, SAS is not
likely to have it in the core, and then you are back to using what "some grad
student wrote".

------
dfc
If R did a little more hand holding it would be awesome.

------
georgieporgie
Looks interesting, but that page renders only on the right half of my Android
screen, and can't be zoomed or reflowed.

------
ginzasparrow
DIE SAS DIE

~~~
eftpotrm
Having worked on and off with SAS in recent years I'm aware it has its
limitations, but round here we like constructive contributions please. Would
you like to expand upon your remarks?

