

R for the Enterprise - stunr69
http://www.oracle.com/technetwork/topics/bigdata/r-offerings-1566363.html

======
mwexler
This happened back in March.
[https://blogs.oracle.com/R/entry/r_execution_in_oracle_datab...](https://blogs.oracle.com/R/entry/r_execution_in_oracle_database).
It's a desperate attempt to co-opt the Red Hat approach to riches via open
source support. Revolution R has been doing pretty well in this game, so
Oracle is replicating. The only contribution back to the community so far has
been to do a few mild updates to ROracle, the R to Oracle connector.

Oracle has basically wrapped up R and made it embedded in the Database. One
nice touch is that they have made overloaded versions of most of R's base and
stat functions, and let them handle the "ore" dataframes; these overloaded
versons are basically wrappers on the "big-data" functions inside the DB.

If this is really interesting to you, here are a set of PDFs describing
Oracles approach:
[http://www.oracle.com/technetwork/database/options/advanced-...](http://www.oracle.com/technetwork/database/options/advanced-
analytics/r-enterprise/index.html)

Bad news is: you are limited to Oracle approaches to scale and growth; while
there's some to love, there's also lots to hate.

I worry that some of this commercialization of R is going to cause trouble, as
we start having non-compatible forks of R creating non-replicable analyses. If
I build a great model that can only be replicated via some vendor's
proprietary approach, then it may be great for my business, but it doesn't
move the field forward.

~~~
batista
> _It's a desperate attempt to co-opt the Red Hat approach to riches via open
> source support._

Yes, because Redhat is doing so much better than Oracle with its OSS
approach...

And it's not like other major companies support OSS with their products, like
say IBM...

------
tom_b
Are data hackers gravitating to R? Given that Oracle and (surprisingly to me)
SAS now both support R in their offerings, it seems that at least the
enterprise will be taking up more R for analytics.

~~~
jordanb
I've been using R because I have an amateur's interest in statistics.

As a programmer, it seems like an awkward language, although no more awkward
than SAS, SPSS, etc. And as I do more analysis the language makes more sense.
It's a special-purpose language made to do a specific task.

The general workflow for doing data analysis is 1) import the data 2) clean it
and format it properly as input for a pre-built package that does the actual
analysis 3) feed it to the package and 4) interpret the results.

To that end, typically R programs are short and pretty declarative. R packages
contain C or FORTRAN extensions that do all the heavy lifting. Substantial
amount of imperative R code is going to be slow. For instance, looping over a
vector is always worse than applying a vector transformation, and R provides a
rich set of transformations for all its data types.

R has gotten popular because the proprietary guys dropped the ball at the
universities. I recall reading a posting by one researcher who said he
switched to R because his students could only use SAS at the school's stats
lab, whereas they could run R on their computers at home.

Once researchers switched to R, they started publishing their work with code
meant to be run with R. The cutting edge is important in stats, so people want
a short lead time between when a new test or model is published and when it's
available. SAS's "cathedral" can't really keep up. Combine that with SAS's
licensing costs (both arms and a kidney too) as well as its overall
"mainframey" feel, and you can see why R is winning.

EDIT: Another big win for R that I forgot to mention is its support for
visualizations. A step that should perhaps come after importing the data above
is investigating it with various diagnostic charts (scatter charts, box
charts, etc) these are all just function calls in R. In addition, R has a
powerful graphics engine and there are a huge number of packages available to
create more sophisticated visualizations:
<http://addictedtor.free.fr/graphiques/>

~~~
pbh
This is a great description of R usage (import, clean, fit models), but I
think a slightly erroneous explanation of R history.

John Chambers created "S" at Bell Labs. S was a programming language designed
for interactive statistical analysis. Much like gcc and icc are
implementations of C compilers, R and S-PLUS are implementations of S. S-PLUS
was/is the primary proprietary implementation of the S language, whereas R is
the primary free one (also, sometimes called GNU S). (SAS and SPSS are
completely different languages/systems as far as I know.) I think that
statisticians at some point made a conscious effort to publish their work in
R, rather than S-PLUS (or any other statistical system like SAS) because it
was more widely available. That in turn led R to be a viable competitor to
S-PLUS (and other systems) because it had vast amounts of recent statistical
libraries, often implemented by the people who developed the techniques. That
said, SAS and SPSS seem to pretty much still have social science students
locked up --- the market for R is probably statisticians who are also
excellent functional programmers.

This history is in really marked contrast to MATLAB and its corresponding free
version Octave, where computer scientists pretty much refuse to use Octave,
despite MATLAB's massive price tag to pretty much everyone involved (even with
90% discounts).

(That said, if anyone lived through the change over from S-PLUS to R, I'd love
to hear if this history is wrong!)

~~~
dekayed
> This history is in really marked contrast to MATLAB and its corresponding
> free version Octave, where computer scientists pretty much refuse to use
> Octave, despite MATLAB's massive price tag to pretty much everyone involved
> (even with 90% discounts).

Do you have any insights as to why Octave does not have higher adoption?

~~~
pbh
I've always been a bit sad about it, but everyone involved is probably a
rational actor.

Computer science professors probably view a couple hundred dollars per MATLAB
network license as a tiny expense on a $1m+ grant (whereas statistics grants
are apparently often smaller), and they may be charged for it in departmental
overhead anyway (removing the incentive to cut costs).

The type of people who could contribute either core code or toolbox type code
to Octave often have an extremely rare quantitative skill set that is worth
hundreds of dollars an hour, so there is a huge incentive to get paid to do
similar work instead. There probably isn't much community recognition (to
balance things out) for implementing a library in Octave. (Though, in the R
world there are certain recognizable superstars like Hadley Wickham.)

Graduate students (who might work for cheap on these problems) are probably
more focused on publications and networking.

As long as all of this is the case, Octave will always kind of just be a worse
MATLAB that happens to be open source, so a new user choosing between them
will probably just choose MATLAB by default.

~~~
jordigh
Octave core developer here.

It is true that we have a lot of trouble attracting new contributors. Most of
our users keep demanding features that seem to us unimportant but to them are
all the world: a GUI ("whatever for?", we think. "Use a real text editor!"), a
JIT compiler ("here's a nickle, get better vectorised code, kid"), perfect
Matlab compatibility (a never-ending chase, not very fun, in which we must
always be behind).

Of these, we're finally slowly listening to our users. Two of our current
three GSoC students are working on a GUI and a JIT compiler respectively. I
have wild hope that this will attract more users and developers. I'm also
currently hosting an Octave conference in a few days towards this goal:

    
    
        http://www.octave.org/wiki/index.php?title=OctConf_2012
    

By the way, Octave is GNU (so is R, supposedly), so we're not really open
source; we're free. ;-)

I don't know why Octave hasn't been able to replicate R's success. I don't
know if R's not really being GNU despite in name has something to it (R
developers routinely try to find new ways to get around the GPL and link R to
non-free code, and I don't doubt that this linking to Oracle's database is
another example of that). I don't know if it's just that a lot of people with
big money care more about statistics and R than they care about Octave (banks
and brokers for R, electrical and civil engineers for Otave). Maybe our code
sucks more than R's.

Do you have any suggestion how to make Octave the standard instead of Matlab?
The recent gratis classes that emerged from Stanford gave Octave a lot of
publicity. Do you have any suggestion of what else we might do?

~~~
pbh
You're probably in a much better position to evaluate than I am! My guess is
that more Octave-based classes would translate into more users and more code
written for Octave down the line, but I'm not sure how to encourage more use
of Octave in the classroom in the first place.

------
edouard1234567
R is great mostly for the breath of plugins/implementations contributed by the
community... much better than its alternative SAS for an infinitely lower
price... ($0 vs ~$10k). Not surprised by this move from Oracle as data
analytics/mining gets more popular. For R enthusiasts there is a great blog I
follow [http://www.win-vector.com/blog/2012/07/modeling-trick-
masked...](http://www.win-vector.com/blog/2012/07/modeling-trick-masked-
variables/)

------
mbq
This explains the care about Solaris compatibility of R packages.

