
How Google and Facebook are using R - soundsop
http://dataspora.com/blog/predictive-analytics-using-r/
======
pfisch
I've been working on a startup that is heavily reliant on R for more then a
year. R is not at all an easy language to use/learn. I have never been able to
really find what I would call a good resource for learning R. The online
manual has incomprehensible examples, very rarely shows the output, and has
very vague descriptions about what any method does. Additionally all the
datatypes are bizarre.

I would recommend to anyone trying to learn R to digest the all the online
books/materials on the Rattle site, and then to use Rattle heavily and look at
the logs it produces.

Also getting R to work with php is unpleasant and involves reading a fair
amount of Spanish comments.

All that hate said. I really love R and I don't know what I would be doing if
it didn't exist.

~~~
earl
Out of curiosity, where / what do you do? Feel free to email in profile if you
don't want it public. I'm always curious about what people are using R for
professionally.

------
dmolnar
One neat thing about R is that it's become standard in academic statistics to
include an R implementation of your new idea with a journal paper. For
example, Gareth James and his colleagues came up with a new method called the
Gauss-Dantzig estimator for doing prediction in the case where the number of
parameters is much larger than the number of data points. You can download the
R code from his research web page here: <http://www-
rcf.usc.edu/~gareth/research/>

This makes it much, much easier to try out new prediction methods on your own
data. No more having to write code from the paper's description and hoping
(praying) that you got it not entirely wrong! Instead you can use the
researcher's own code to quickly figure out if the new method is better or
worse than previous methods on your own data.

That being said, R does take a lot of getting used to. Graphics in general are
tricky, although the ggplot2 package makes some things easier and can produce
pretty results: <http://had.co.nz/ggplot2/>

There also isn't a great story for using R on massive data sets which don't
fit in main memory, so far as I know. It doesn't take much before you start
hitting data for which an algorithm that requires O(n^2) memory will eat > 15
GB of RAM. At that point you're out of the territory of Amazon instances you
can rent cheaply and into building a box just for R, or you're into
refactoring your data so you can do the computation in pieces. So you do have
to watch out for that a bit when using the default packages.

------
colins_pride
Web data mining is so due for a renaissance. At big companies, purge rules
eliminate old data to save on storage costs. At startups the focus is always
on getting a product out, adding features & getting users. When guys start
figuring out how to tease real value out of the data, this is going to
snowball. And R is the right platform for figuring it out.

~~~
davidw
How so? Can you elaborate?

~~~
colins_pride
Sure; I'm assuming you're asking about my assertion that "R is the right
platform ... " I've used S-PLUS fairly extensively, and I think there's a lot
in common between R and S-PLUS. But R doesn't have the licensing constraints,
which makes all of the difference in the world. To me, R represents a good
high-level prototyping language with a very extensive native statistical
library.

I would also add that I'm much more attached to the idea that web data mining
is the future, than I am attached to the idea that R is the best platform.
We'll see what happens!

~~~
joeyo

      I think there's a lot in common between R and S-PLUS.
    

Indeed. R is sometimes called "GNU S".

~~~
zacharypinter
That would probably be easier to search for...

------
zacharypinter
Are there any good videos online for learning the statistics behind R? Most of
the R videos I've seen have been focused on the tool, not the statistics
behind the tool.

~~~
jfarmer
Statistics is statistics -- just get a good introductory statistics textbook.
MIT's OpenCourseWare is probably a good start, maybe this:
[http://ocw.mit.edu/OcwWeb/Mathematics/18-443Fall2003/CourseH...](http://ocw.mit.edu/OcwWeb/Mathematics/18-443Fall2003/CourseHome/index.htm)

------
aneesh
I hope they post the video soon. I'm an amateur R user, and it'd be great to
see actual use cases from statisticians & developers from Google, facebook, et
al.

------
babo
A good post with interesting pointers for someone who is eyeing with R like
me. A solid background of statistics is required, it's time to catch up for
me!

------
pskomoroch
I was at the meetup and there were few bits about how Facebook uses R, Hadoop,
and python for different levels of data analysis. The guys sitting behind me
were talking during the whole presentation which was really annoying... please
don't do that.

------
physcab
I just recently heard of R, so don't chew me up for this...but...

How is R different than Matlab (besides the license issue)?

~~~
lrajlich
R is a different language (which is vectorized and partially functional) and
has a more comprehensive stats library and better for "statsy" things, whereas
matlab is better for more "mathy" things like linear algebra.

------
paraschopra
What about MATLAB? I like using it

------
globalrev
Does R have a Java binding?

