

R in production systems - papps
http://erehweb.wordpress.com/2011/02/02/r-in-production-systems/

======
thangalin
The Climate Reports (<http://www.whitemagicsoftware.com/software/climate>) has
been running for 10 months, without a single failure. The Climate Reports uses
R to calculate the trend line.

During development, I encountered a bug with one of the packages. I told the
author and he not only fixed the bug within 24 hours but also offered to
create a custom build for my CPU architecture.

I have only good things to say about R and its community. And I think R is
sufficiently stable for production.

------
wesm
R is a stable enough language to be used successfully in production systems.
But I would definitely _not_ recommend building anything "big" in R-- I've
tried and the entire process was not pleasant. R is a research environment for
statisticians or statistically-minded practitioners in other fields-- and
there has not been terribly much effort (from what I can tell) to change its
status. I'd be happy to hear from others who've had similar (or different,
perhaps positive) experiences.

It's actually my experiences trying to build production statistical processes
in R that have motivated me to work on building out the statistics-related
libraries for scientific Python. Hopefully within a couple of years we'll have
something that's competitive with R in terms of ease-of-use and comprehensive
statistical functionality out-of-the-box.

~~~
chillaxn
Do you think that Python libraries or a comprehensive development push will
overcome barriers to entry for statistical process platforms? What if the
greater adoption of these techniques is more limited by communities of users
than the developers?

~~~
hyperbovine
I have often wondered when something Python-ish will come along to compete
with R. I like the power of R, and there is no better choice if you require an
esoteric statistical estimator, but there are also a lot of downsides. It's
difficult to debug, the interpreter seems flaky, and parts of the SPLUS syntax
feels dated. Also, I know it claims to be object oriented, but I have never
really understood the OO system in R. So much more time and effort is being
poured into improving the Python runtime, that it seems silly not to try to
build something on top of it.

I am starting a stats PhD in the fall and have about nine months to kill
between now and then. I'm seriously considering devoting all that time to
building something like this.

~~~
wesm
I'm actually working on a stats PhD (in the early stages) and I've made it my
goal to build a lot of Python software as I go along. So if you have the
inclination I'd recommend giving it a shot. When you consider the wealth of
tools out there for both high level (NumPy-based) and lower-level computation
(e.g. Cython, for speeding up algorithms), and software development
(especially interactive debugging and testing), it's a fairly compelling
proposition (for me, at least).

But long story short is that people _are_ working on making Python more
amenable to applied statistics work. And the more people working toward that
goal, the faster we'll get there.

So I would recommend: join numpy-discussion and scipy-user mailing lists,
explore projects out there-- last few SciPy conferences are a decent place to
start.

~~~
hyperbovine
Yes, it's a tantalizing prospect because so much of the groundwork has already
been laid. I have the sense that something is just needed to tie it all
together into a nice, easy-to-use package with a consistent interface, easily-
understood object model, etc.

My masters is in math, and I have watched Sage create a small revolution in
the past couple years by doing exactly this.

Food for thought...

~~~
wesm
I agree-- and doing so could make some pretty serious waves across both
academia and industry. You should join the discussion on the mailing lists: in
particular pystatsmodels and the numpy/scipy lists. Any input would be much
appreciated.

------
pjscott
I believe this is one of the problems that Incanter was made for: really easy
embedding in a larger software system. Of course, it's nowhere near as
featureful or widely-used as R.

<http://incanter.org/>

------
chillaxn
I have been reconsidering my workflow for a more data-centric approach. When I
first looked around the search results for data-centric workflow, I was
shocked that it was limited mostly to theoretical discussions and
commentaries.

Why don't we have communities online focussed around using tools and platforms
that allow informed discussion and decision making to solve entrepreneurial
problems??

------
ShabbyDoo
ESRI (The geospatial software company) and SAS (the commercial alternative to
R) both got caught up reacting to their existing user bases instead of
proactively watching the analytics world. Both companies have (or at least
until recently had as my knowledge is circa-2008) very desktop-centric product
lines. Their customers used their products as if they were fancy versions of
Excel -- a perfectly valid use case, but not one suited to production
analytics. Both companies first forays into production systems involved clunky
wrappers around desktop components. I'm sure they will eventually put out
first-rate headless production components, but there's probably a nice window
available in which start-ups may innovate.

------
mattwalker62
good article... I have just launched a site (<http://www.promepi.com>) which
uses R to recommend news articles to users. My co-founder used a lot when
getting his masters degree, but I was new to it. It has been great to quickly
develop our algorithm.

Anyone have experience with Revolution Analytics? My cofounder looked at these
guys as we try to scale the site... he definitely agrees with one of the
comments that "big" datasets are good in R and hopefully our site gets to a
big dataset, so we are looking at scaling options.

------
secret
The advice to go 64-bit and throw RAM at the problem brings back memories of
my computational math class.

