
Interact with R from Python - kirubakaran
http://www.daimi.au.dk/~besen/TBiB2007/lecture-notes/rpy.html
======
apathy
This is cute, RPy is " _the simplest thing that can possibly work_ " (and it
does!), and most of all, it's awesome to see an R story high on the front page
of YC.news.

But I would encourage readers (of this article and of YC news) to try playing
with R interactively, just like you should try using iPython+SciPy
interactively, to realize the truly awesome power of the libraries+language in
each case. Then you will know what can and cannot be done easily in each, and
can use your code in the other (or C, or _shudder_ maybe even Java if
necessary) to chink the gap. Both languages can play nice with C and Java,
incidentally. But that's not the point -- let's look at an everyday situation.

For example -- I might like to do some dimensionality reduction on a high-
dimensional dataset which is information-starved along some of the features
that interest me (until I collapse it a little). Obviously, I'd like to read
in the data (in as rich a form as possible), look at the combinations of
dimensions that are most informative (probably via principal components
analysis, aka PCA, but maybe I'd try some other techniques as well), plot them
as predictors, and then perhaps use a bunch of resources on the Web to do some
further annotation or testing and see how useful my pile of predictors is for
various tasks. Off the top of my head, here is what I'd think of doing:

1) depending on the nature of the data, parse it in either R or Python, and
bind it into a dataframe (sort of a fancy matrix) or list of dataframes so I
can manipulate it in R.

2) almost certainly I'd do the PCA and other EDA in R, due to the sheer power
and variety of packages available for this sort of task in the R environment.
There are lots of libraries available for this sort of thing in Python, too,
but if you have a wild hair up your ass to try some revolutionary technique
that you saw in _Bioinformatics_ or _Genetic Epi_ (or whatever), yesterday,
odds are that it was released as an R package.

3) Most likely I'd plot the correspondences for each predictor with responses
I care about in R, maybe with GGobi if I had to deal with time series or high-
dimensional plots. Not that Python can't do an awesome job, but hey, we're
already in R so let's get this over with, shall we?

4) I'd probably want to use the results in Python for web- or database-backed
inquiries, because R's DBI packages sort of suck. Maybe I'd save entire
dataframes to MySQL or SQLite or what have you, and then retrieve them in
Python to monkey around with the results (or use a stripped-down algorithm
based on my R results to implement the 'app' version in Python, because
eventually I'll bet we put this behind Django anyways...). But Python for sure
if it's ever going to talk to Windows or the Web. Like, duh, _rite_?

So the point here is that it pays to have a couple of sharp tools lying around
when your interesting problem shows up and starts flopping around on your
desk. Don't be like the RDBMS guys who try and drive every damned screw with a
filigreed hammer!

Hope this gets one or a few people to try R (on its own) and then stick it in
their utility belt for later use.

