

Revolution R Open: The Enhanced Distribution of Open Source R - orin_hanner
http://mran.revolutionanalytics.com/

======
chollida1
> includes The Intel Math Kernel Libraries, which bring multi-threaded
> computations to R.

Say no more, I'm sold!

Now that I can use multiple threads, please let me distribute my calculations
over multiple machines and I can drop python.

I used to be the compiler developer for a proprietary language that was
vectorized like Matlab and R. If R was where it is today in 2000, we probably
wouldn't have created a new language(Quicscript) for quants to use.

I can't recommend R enough to anyone doing modelling. It's almost replaced
excel for me and, in finance, that's pretty high praise :)

~~~
nkurz
This doesn't negate what you are saying, but I happened to be helping a friend
yesterday optimize her already optimized R routine by writing a C++ extension.
In this case, a straightforward translation gave us about a 10x speedup, due
much less memory shuffling. Switching to intersecting 2KB bitmaps with AVX2
intrinsics instead of non-vectorizable IntegerArrays (average length about 20
ints) gave another ~10x speedup. Profiling with 'perf', staring at the
generated assembly, and choosing very specific syntax that g++ didn't mangle
gave another ~2x (although Intel's icpc is still about 20% faster, and I
haven't yet been able to trick g++ into generating the comparable).

Was this worth it? In this case, I think so. It me took a day and half, and
brings her per model runtime down from about 3 minutes to 1.5 seconds. The
project requires running about 30,000 models, so the improvement will reduce
her total CPU time down from about 2 months to about 12 hours. The AVX2
requirement means she can no longer run it on the usual cluster, but the
speedup is enough that she do multiple runs per day on a standard Haswell
desktop.

This is to say, while there are great things about prototyping with R, and
multithreading can't hurt, raw speed and efficiency may still be issues for
some users.

~~~
random_number
Cool result. But:

"Was this worth it?" \- looking at the last step of optimisation, it looks
like that would be the hardest to do correctly without bugs, hardest to
maintain, document and so on, but it produced a gain of only 12 hours of
compute time (so cheap it's almost free).

On an 8-5 schedule, 12 hours is 24. Was it really worth, or just fun?

~~~
nkurz
> Was it really worth, or just fun?

That's the question for just about everything, isn't it? This was unpaid to
help with her PhD thesis. Maybe that makes it "not worth it" from the start?
Does the lack of financial incentive make "for fun" the only applicable
answer?

The project models the correlation between air quality in school classrooms
and student absence rates. Maybe twice as fast means that she can do twice as
many runs in the same amount of time, and produce results that are accurate to
one decimal place more? She'd probably graduate in either case, and the
chances that someone will act differently based on the difference in the
results is very low. For that matter, the chance that someone will act at all
based on results published in a PhD thesis is very low.

Had someone been paying me by the hour to optimize for them, perhaps that
would make it worth it to me but not to them? Is it just for fun if you are
learning details and gaining experience that you can apply elsewhere? At what
point does further learning become superfluous? I've never figured these out.
As it is, I learn more and more about optimizing code, my social capital among
friends increases, and my financial state grows worse and worse.

------
twelfthnight
Not sure I feel about this. It reminds me of Enthought or Anaconda python or
RStudio or RedHat for linux. Should companies be able to 'own' an open source
software by funneling a majority of users through their distributions and by
funding developers to build the language in a way advantageous to the company?
In some way, sure, since it can be mutually beneficial to the company and the
users. But in another way, it goes against the spirit of a community, rather
than oligarchy, contributing to the software.

EDIT: On second thought, it seems naive to think that any large scale open
source software doesn't have some corporate backing. For instance, Google and
python / Go. I guess that's not so bad.

~~~
snoman
RStudio has the Open Source AGPL v3 version that, ostensibly, is only
different from their commercial offering in that it there is support for the
commercial version, I'm perfectly fine with it.

This MRAN looks to be a similar situation (that is, it's available on their
OSS offering Revolution R Open)

------
Stubb
R has high-quality libraries for just about everything. You're good to go if
your need mostly involves stringing together calls to them.

Main problem for me was expecting to find things like namespaces, utility
classes, and other things you'd take for granted in more modern languages like
Ruby or Java. I loved R for whipping out quick analysis utilities, but writing
larger ones became insanely painful. I switched to Ruby for this kind of
stuff. The library support isn't as good, but it's reached critical mass and
the actual coding side is so much easier.

~~~
grayclhn
There are namespaces, but you have to put your code in a package to use them.

------
bipin-nag
I checked their site. Revolution R has 4 products: Open, Plus, Enterprise,
Cloud

1\. Open: "This one’s not a difference at all: Revolution R Open 8.0 beta is
based on R 3.1.1. No modifications are made to core R".

Simply put it is a repack, comes with extra packages like Reproducible R
Toolkit, and has a mirror for CRAN.

2\. Their Revolution R Plus is what is RHEL to linux. They provide technical
support on top of the Open distribution.

3\. This is where it smells fishy. "Revolution R Enterprise Workstation is
licensed for a single named user, and available in two editions:". But is it a
modified R version. They mention no change to core for open, but not for this.
If they use R which is licensed under GPL how can they sell it ? Else if it is
proprietary why call it "R"?

4\. They provide assistance in running Revolution R Enterprise on a Server.

~~~
Bootvis
I believe chollida comment, placed 10 hours before yours answers the question
how they can sell it:

It's linked with the Intels commercial BLAS, see here:

[http://mran.revolutionanalytics.com/documents/rro/open/#inte...](http://mran.revolutionanalytics.com/documents/rro/open/#intelmkl1)

~~~
bipin-nag
2 issues here:

1\. You can't package GPL stuff into another and then sell the new product.

2\. If it is required for Intel's commercial BLAS and they are giving it away
for free, it would be a great loss to Intel. So whatever they are giving away
must be available for free. Otherwise it makes no sense.

Edit: Open version makes use of non-commercial license MKL, which you can get
anyway, see
[https://registrationcenter.intel.com/RegCenter/NComForm.aspx...](https://registrationcenter.intel.com/RegCenter/NComForm.aspx?ProductID=1461&pass=yes).
And most likely they are using commercial version for enterprise. But again
can you compile R like that and charge for it.

------
JasonCEC
It's interesting to me that Open Revolution R supports R Shiny (developed by
RStudio). A natively multi-core version of R could be really useful in dealing
with the larger applications that are built on shiny, as computations, input,
graphing, and DB i/o are all (often) thread blocking.

------
grayclhn
Can someone please provide some context to the submission? Is this something
new?

edit: politeness.

~~~
_deh
It looks like they released it yesterday

~~~
grayclhn
Thanks!

------
wodzu
I wish there would be a maven for R. Hope the "Reliable R code" (RRT) can
achieve a similar effect. Currently CRAN is heavily broken. Installing exactly
the same version of the package will work today but might not work tomorrow.

------
minimaxir
What functions benefit from the multi threading? The examples they give are
matrix PCA and SVD, which is cool, but would this help normal library
functions? (E.g lm() and glm())

~~~
etrain
Various linear solvers (either via normal equations, QR, etc.) all have really
fast multi-threaded implementations in, e.g. OpenBLAS. These could directly
benefit lm() and glm(). That said - there's no reason why you couldn't already
call out to these (multithreaded) libraries with (single threaded) R.

------
hamiltont
Is this anything more than a repackaging of R? If you can put multithreaded
libs behind core R functions, why not do it on the main codebase instead of
repacking the project?

------
plg
iPython / SciPy

R

MATLAB

Julia

C

It's nice to have so many options today, it means when you bang up against the
shortcomings of one language/system you can often jump to another one to get
what you want (e.g. speed, or specific libraries, or graphics niceties, etc).

I only wish MATLAB wasn't so expensive and so proprietary.

~~~
wolfgke
> I only wish MATLAB wasn't so expensive and so proprietary.

Try Scilab or GNU Octave.

~~~
GFK_of_xmaspast
I have, and I wish matlab wasn't so expensive.

------
mahmoudimus
Can someone explain why R is preferred more than Python for modeling?

Is it just because of ggplot2?

~~~
msherry
My 2c is that R has a wealth of libraries and utilities (for statistics,
machine learning, finance, etc.) that are not as easily available for other
languages. Things like scikit-learn for Python close some of the gap, but
looking at packages on
[http://cran.r-project.org/](http://cran.r-project.org/), it seems that Python
might have some catching up to do.

I make no representation as to the _quality_ of the libs available for R, as I
don't have a whole lot of experience with it yet.

~~~
epistasis
Quality is all over the map in R, but given the choice between an R library
and a Python library for the same recently developed stats method, I would
think that the R library has been more widely used, had more attention from
experts, and is probably more robust.

------
voisin
How does this compare in terms of speed to SAS?

