

The future of R - pessimistic thoughts by R founder Ross Ihaka - TalGalili
http://www.r-bloggers.com/%E2%80%9Csimply-start-over-and-build-something-better%E2%80%9D/
The recent post on the shortcomings of R has attracted a huge number of readers and Ross Ihaka has now posted a detailed comment that is fairly pessimistic… Given the directions drafted in this comment from the father of R (along with Robert Gentleman), I once again re-post this comment as a main entry to advertise more broadly its contents. (Obviously, the whole debate is now far beyond my reach!)
======
adbge
_> The license will need to a better job of protecting work donated to the
commons than GPL2 seems to have done. I’m not willing to have any more of my
work purloined by the likes of Revolution Analytics, so I’ll be looking for
better protection from the license (and being a lot more careful about who I
work with)._

Not having used R or being more than passingly familiar with it, I'm wondering
if anyone could shed some light on what this is about? I notice that on the
Revolution Analytics website, they are selling an "enterprise" version of R
which they claim has a number of advantages over the mainstream form of R.

How exactly is a proprietary form of R even able to exist if R's codebase is
GPL'd?

~~~
whyenot
Revolution Analytics has released some extensions that are proprietary, for
instance their doSMP library that provides multicore support under Windows.
They have also released several of their libraries into the "commons" -- doMC
and foreach are the two that immediately come to mind.

I'm sort of torn on this issue because setting aside the terms under which RA
releases their libraries, what they produce comes with great documentation,
and tends to be pretty useful. Using foreach and doMC I was able to cut down a
calculation that normally takes 8 hours to 3 hours on a 6 core machine.

On the other hand, I strongly believe that proprietary code should be avoided
when doing scientific research because it inhibits peer review and makes it
harder for others to replicate your work. As Warren DeLano said: _"The only
way to publish software in a scientifically robust manner is to share source
code, and that means publishing via the internet in an open-access/open-source
fashion."_

~~~
bbgm
And Robert Gentleman has always been one of the strongest proponents of
reproducible research, which includes being able to create packages that can
be shared and freely distributed.

------
haberman
To me the problem with R isn't performance problems, which I've never run into
myself, but rather the complicated and confusing semantics of its data types.

R's aggregate data types are: vector, matrix, array, dataframe, and list. The
semantics of these types and the relationships between them are extremely
confusing. I wish I had gathered examples of this so I could be more specific,
but I have basically come to the conclusion that I will never get familiar
enough with them to do any better than random guessing until it works right.
And I've written somewhat in-depth analyses in R.

~~~
nanairo
I may be wrong but I think that:

list => are basically hash, or an array that can have mixed objects inside

vector, matrix, array => are all the same thing. They are what in most
computer languages are called arrays, and can have only one type. The
difference between those three is just the number of dimensions (vector:1,
matrix:2, array:3+).

dataframe I will concede is a little more complex, and I still have some
problem with it. But I basically think of it as a table, where a row
represents a value (say temperature) and the column different measuremnts. So,
for example:

rows=> temperature, humidity, hours of light, peak UV columns=> Day1, day2,
day3, day4, ...

Hope that helps.

~~~
pashields
Lists are hashes on acid. The default return value of indexing in is NULL.
That's a weird semantic for a list, but maybe not so much a hash. The weird
part is that if you explicitly assign null to an element of the list, it
deletes the element. That's particularly weird because the list has knows all
the elements in it, so it's not like it can't tell the difference between a
value that has been set to NULL and a value that has never been set. See
<http://gist.github.com/578110> for a little transcript.

------
nanairo
To be honest I've been using R a bit lately for my work and while I like it I
don't find it at all innovative. That's not a criticism of R: the libraries it
has are amazing, as well as the mindshare among people who care of statistics.

But I wonder why R actually needs to exist as its own language. It seems it
could be recast in Ruby for example or one of the latest functional languages.

So I am kind of pleased my this news... if there's gonna be a need for R to
have its own language, speed seems to be the most important distinguishing
feature. A bit like Fortran is still used in science.

(incidentally... don't let people tell you otherwise: Fortran(90+) is a very
nice language... much more pleasurable than C to use and gives you better
performance (unless you know a lot about compilers and compiling flags... but
most scientist don't ^_^))

~~~
ewjordan
_But I wonder why R actually needs to exist as its own language. It seems it
could be recast in Ruby for example or one of the latest functional
languages._

Indeed - it would be a shame for them to start over from scratch and end up
coming up with a brand new language, brand new syntax, brand new quirks, brand
new performance problems, etc., while they could have simply searched around a
bit for something that's already mature and somewhat optimized as well as
suiting their needs.

If they wanted to add on to or modify an existing language (for instance, to
provide more concise syntax for some of the things that are more important in
statistics than in general purpose programming), that would be just fine, it
could become a dialect of some other language, but starting fresh seems like
an awful waste of energy...

Something that ran on the JVM would be awesome, they'd have no trouble at all
rebuilding the massive library of contributions.

~~~
Quiark
For JVM, there is Incanter which is a statistics library written in Clojure.
It is backed by Parallel Colt for the heavy number lifting. Note that I'm not
trying to say that Clojure would be a scientist-friendly language :)

~~~
nanairo
How does Incanter work on HPC? R is pretty awful from that point of view, and
if there ever will be room for a specialised statistical language it's got to
be able to do massive number crunching.

I've heard bad things of JVM for tightly coupled jobs on HPC (though I know
there's been some improvement: e.g. a lot of work done by EPCC in Edinburgh).
Does Clojure manages to offer a good parallel implementation on top of the JVM
or has no work been done in this area?

~~~
Quiark
According to the website, Parallel Colt supports multicore machines. There's
no mention of MPI though. As far as I know, Incanter only wraps it, so it does
not influence the performance that much..

------
chrismealy
I would welcome a replacement for R just because the new language's name might
be easier to google. I hope they don't call it Q.

~~~
TalGalili
BTW, that was my motivation for starting the www.r-bloggers.com website. It
now has over 110 bloggers (who write about R) there. When I started and looked
for them on google, all I could find was bloggers who wrote about pirates :D

~~~
SkyMarshal
Google thought R = Arrrgghh? Lol...

------
gwern
It almost sounds like they want to switch to Haskell.

~~~
ghotli
Or perhaps incanter with all it's lispy goodness.

"Incanter is a Clojure-based, R-like platform for statistical computing and
graphics."

<http://incanter.org/>

~~~
gwern
Wouldn't get you pervasive laziness, though I don't know whether R is lazy by
default or like Clojure in that you have to ask for it.

~~~
zaphar
if you use Clojures primitives you pretty much end up with laziness by
default. for, map and most of the other list processing functions all return
lazy lists. So while Clojure does technically require you to "turn on"
laziness in most cases you'll find it's turned on already.

------
pajarito
a translator from R to Maxima would be another idea. Maxima is Lisp based, use
an algol like language, is used in education and is free. But developer are
scare so more hands are needed. Maxima can use BLAS and other libraries for
numeric computation, and it support many implementations of Lisp (sbcl, gcl,
cll, ecl ...), some of their member are working in a java based lisp (Arms
beast?). With so many tools available it is a something to consider.

------
hogu
python

has, RPy2 so you can still access R's statistics libraries
<http://scikits.appspot.com/statsmodels>, and <http://pandas.sourceforge.net/>
(dataframes)

plus, scipy has a decent stats library as well (random variables, etc)

still rough around the edges, but a good solution in my opinion

