Hacker News new | past | comments | ask | show | jobs | submit login
Switching from C++ to R (stackexchange.com)
58 points by ekm2 on Oct 5, 2011 | hide | past | web | favorite | 9 comments

As a disclaimer, I'm a noted advocate of using Python to build production systems for quant finance (old talk but: http://python.mirocommunity.org/video/1531/pycon-2010-python...). I've been very successful at doing it and largely as the result of my example many other quant shops have chosen the Python route to excellent results. The pandas Python library (http://pandas.sourceforge.net) is an open-source outgrowth of my proprietary work.

My question is: why program in C++? I don't think anyone will argue it's an insanely low productivity language relative to Python or R. But Python and R are slow for iterative, procedural code. The near panacea for Pythonistas is to use Cython (http://cython.org) to develop C speed code but take maybe only 1.5-2x longer than writing Python code (to get all the type declarations right etc.). You can also directly call methods in C / C++ libraries using Cython, so it really is the best of both world in my experience.

I think in general that hybrid systems are best avoided if at all possible since debugging across "the bridge" is a thorny problem. You typically end up with more code than you planned in the higher-productivity language (e.g. R). I like Python because Python is good at all the things that R is not good at. Yes, Python's statistics libraries are very weak (though we're making progress in http://statsmodels.sourceforge.net) compared with CRAN, but in quant finance it turns out that 90% of the modeling and data analysis that you actually do isn't that statistically sophisticated. It's largely a relational data manipulation and time series processing problem (which pandas takes care of in spades-- has much better integrated data alignment features than just about anything in R, too).

I suppose I should post this in stackexchange-- Python is also excellent for building GUIs. I've used wxPython and PyQt and found that I could hack together a GUI in an afternoon that would have taken a week to do in Java or C++.

Not exactly the right place for this, but thank you for pandas.

Which do you prefer between wxPython and PyQt btw?

Dirk's answer on that page is what I believe to be the prevailing viewpoint. Why switch? R is very productive and more than suitable for data exploration. If you need to you can always write pieces of your research tool chain in C++ and integrate it quite easily with RCpp. Many of the R packages do just this. I find R to be very enjoyable, if not a bit eccentric at times. The combination of R, RStudio, and Vim make it a very productive environment for data exploration and modelling.

That being said, all of my production code is typically implemented in C++. However, knowing that I can always integrate with R from C++ if I need to (for example, if I'm too lazy to implement an available algorithm in some R package). If I did do this it would merely be a stop gap until a suitable C++ implementation could be deployed.

I like R but some problems really run into performance problems with R. In particular snowballing memory if not possible to convert data to enum/factors, and predictions with lots of Date data really skyrocket memory use.

I like to play in the kaggle.com competitions, and I'm thinking of switching over more stuff I do there from R to waffles, which is a toolset and a c++ lib. It's speedy, and rather user friendly. http://waffles.sourceforge.net/

This is pretty timely for me.

I just ported some code that performs probability computations from Ruby to C for performance reasons. While I'm really pleased with the runtime, I was pretty unhappy with the implementation time.

I need to extend this work for the next phase of the work I'm doing and have been contemplating moving to R, but have hesitated because I couldn't find any info on its performance.

Now I have!

An analogous discussion goes on in image processing and computer vision about Matlab and C++, one easy to use but slow and a memory hog, the other is hard to quickly develop and play with ideas.

On a different note, I didn't know about rcpp, sounds great.

Yes, and I think the prevailing wisdom is the same: Matlab for fast prototyping, C/C++ for speed and memory-efficiency in non-vectorizable contexts or pain points. Call the C/C++ from Matlab if you wish.

You can tell the same story about Python, although I've not been as happy with the graphics in Python/Matplotlib/etc.

For large-scale work (and a lot of work will always be large-scale), it's not possible to stick to a friendly managed-memory REPL environment.

People want the answer to be otherwise, hence these repeated questions.

Nothing compares to C++ when it comes to sheer speed. If you want language X to do task Y as fast as C++, then you must use C++.

The reason language X is slow is because it's been made safe to use... it has lot's of checks and safety features to prevent accidents. It's not unlike an over-protective mother hovering around her child double-checking everything he picks up. And there's nothing wrong with that... unless you want to go fast.

The reason C++ is fast is because it allows programmers to do away with all the checks and safety features and just burn through CPU cycles. It doesn't hover over the programmer asking him if he really wants to do that... time and time again. It doesn't check behind every thing he does. It assumes that he's a big boy and that he understands the implications of his actions. C++ let's him run with scissors because sometimes it's OK (and even necessary) to do that.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact