

Switching from C++ to R - ekm2
http://quant.stackexchange.com/questions/728/switching-from-c-to-r-limitations-applications

======
wesm
As a disclaimer, I'm a noted advocate of using Python to build production
systems for quant finance (old talk but:
[http://python.mirocommunity.org/video/1531/pycon-2010-python...](http://python.mirocommunity.org/video/1531/pycon-2010-python-
in-quantitat)). I've been very successful at doing it and largely as the
result of my example many other quant shops have chosen the Python route to
excellent results. The pandas Python library (<http://pandas.sourceforge.net>)
is an open-source outgrowth of my proprietary work.

My question is: why program in C++? I don't think anyone will argue it's an
insanely low productivity language relative to Python or R. But Python and R
are slow for iterative, procedural code. The near panacea for Pythonistas is
to use Cython (<http://cython.org>) to develop C speed code but take maybe
only 1.5-2x longer than writing Python code (to get all the type declarations
right etc.). You can also directly call methods in C / C++ libraries using
Cython, so it really is the best of both world in my experience.

I think in general that hybrid systems are best avoided if at all possible
since debugging across "the bridge" is a thorny problem. You typically end up
with more code than you planned in the higher-productivity language (e.g. R).
I like Python because Python is good at all the things that R is not good at.
Yes, Python's statistics libraries are very weak (though we're making progress
in <http://statsmodels.sourceforge.net>) compared with CRAN, but in quant
finance it turns out that 90% of the modeling and data analysis that you
_actually_ do isn't that statistically sophisticated. It's largely a
relational data manipulation and time series processing problem (which pandas
takes care of in spades-- has much better integrated data alignment features
than _just_ about anything in R, too).

I suppose I should post this in stackexchange-- Python is also excellent for
building GUIs. I've used wxPython and PyQt and found that I could hack
together a GUI in an afternoon that would have taken a week to do in Java or
C++.

~~~
tricky
Not exactly the right place for this, but thank you for pandas.

------
lrm242
Dirk's answer on that page is what I believe to be the prevailing viewpoint.
Why switch? R is very productive and more than suitable for data exploration.
If you need to you can always write pieces of your research tool chain in C++
and integrate it quite easily with RCpp. Many of the R packages do just this.
I find R to be very enjoyable, if not a bit eccentric at times. The
combination of R, RStudio, and Vim make it a very productive environment for
data exploration and modelling.

That being said, all of my production code is typically implemented in C++.
However, knowing that I can always integrate with R from C++ if I need to (for
example, if I'm too lazy to implement an available algorithm in some R
package). If I did do this it would merely be a stop gap until a suitable C++
implementation could be deployed.

------
sunkencity
I like R but some problems really run into performance problems with R. In
particular snowballing memory if not possible to convert data to enum/factors,
and predictions with lots of Date data really skyrocket memory use.

I like to play in the kaggle.com competitions, and I'm thinking of switching
over more stuff I do there from R to waffles, which is a toolset and a c++
lib. It's speedy, and rather user friendly. <http://waffles.sourceforge.net/>

------
mjbellantoni
This is pretty timely for me.

I just ported some code that performs probability computations from Ruby to C
for performance reasons. While I'm really pleased with the runtime, I was
pretty unhappy with the implementation time.

I need to extend this work for the next phase of the work I'm doing and have
been contemplating moving to R, but have hesitated because I couldn't find any
info on its performance.

Now I have!

------
Jun8
An analogous discussion goes on in image processing and computer vision about
Matlab and C++, one easy to use but slow and a memory hog, the other is hard
to quickly develop and play with ideas.

On a different note, I didn't know about rcpp, sounds great.

~~~
mturmon
Yes, and I think the prevailing wisdom is the same: Matlab for fast
prototyping, C/C++ for speed and memory-efficiency in non-vectorizable
contexts or pain points. Call the C/C++ from Matlab if you wish.

You can tell the same story about Python, although I've not been as happy with
the graphics in Python/Matplotlib/etc.

For large-scale work (and a lot of work will always be large-scale), it's not
possible to stick to a friendly managed-memory REPL environment.

People want the answer to be otherwise, hence these repeated questions.

------
16s
Nothing compares to C++ when it comes to sheer speed. If you want language X
to do task Y as fast as C++, then you must use C++.

The reason language X is slow is because it's been made _safe_ to use... it
has lot's of checks and safety features to prevent accidents. It's not unlike
an over-protective mother hovering around her child double-checking everything
he picks up. And there's nothing wrong with that... unless you want to go
fast.

The reason C++ is fast is because it allows programmers to do away with all
the checks and safety features and just burn through CPU cycles. It doesn't
hover over the programmer asking him if he really wants to do that... time and
time again. It doesn't check behind every thing he does. It assumes that he's
a big boy and that he understands the implications of his actions. C++ let's
him run with scissors because sometimes it's OK (and even necessary) to do
that.

