Hacker News new | past | comments | ask | show | jobs | submit login

I’ve known many quants use both R and python/numpy/pandas for complimentary tasks. The R standard library was generally spoken about in positive terms, but for data massaging and manipulation beyond pure maths/stats analysis a python environment probably offers much more flexibility.

Note that I don’t claim expertise in the above, but a bunch of very talented people I’ve worked directly with, and who were very directly incentivized to be productive, used R.

Perhaps your profs were trying to help you learn R, including its limitations, when they were setting you tasks?

This is a really big deal. In the first edition of Python for data analysis, they suggest using mean imputation. In case you don't know, this will totally break your variance calculations and thus any statistical tests.

In the second edition, they suggest doing some interpolation. Meanwhile, in R land there are multiple ways (as always) to do useful multiple imputation which gets you a much more accurate analysis which makes better use of all of the data (mice, Amelia and mi are all good, and somewhat complimentary).

That being said, I just thought of using PyTorch and a GAN to do multiple imputation, so maybe it's not impossible to do in Python. There is way, way less support for it though (but of course you could probably build in Numpy).

I guess the big difference is that R comes with numpy equivalent (matrix), a pandas equivalent (data.frame and base), and a well-tested, numerically-stable and reference implementation of pretty much all widely used statistical models.

Like, I really don't understand why you wouldn't want to look at residuals, even if all you care about is prediction. Your predictions will be much more stable and accurate, and it can often inform you as to how to model things more appropriately.

Finally, R's formula interface is a thing of beauty. Honestly, why the hell do I need to generate a model matrix for regression/classification when I can get R to do it for me.

I will also say that R is a frustrating, domain-specific, really irritating, wonderful language. But then I'm a crazy person, I wrote a stockfighter client in R.

I agree that there are also some good parts with R.

But the argument "It's good because many people use it" is the one I heard most often when it comes to discussion about programming languages especially old ones like R and java.

Actually for data massaging and manipulation, R is absolutely superior to Python.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact