Hacker News new | comments | show | ask | jobs | submit login

having programmed R on the job for some heavy statistics, I will say this: good for quick analyses but burdened by legacy functionality from the S+ days. I switched to Python/NumPy and rewrote all the R code I had, could not be happier with the results. Of course, you have to create your own data structures if you want something like R's data frame, but at least you have a rich language to do that with.

however, if you need to do anything systematic, do NOT use R, bugs are elusive and extremely tedious to debug




I have used R for a number of minor projects. I really like its functional programming design, but its syntax can be obtuse. It's preferable to MatLab, IMHO. What do you consider to be the legacy aspects that weigh it down? What do you like better about NumPy? While I'm asking questions, have you tried Sage? If so, what do you think of it as a meta package of mathematical software?

Also, here are interesting links I've found comparing R and similar statistical programs. Sorry, don't know much about Numpy's capabilities.

R equivalent in speed to Matlab:

http://www.sciviews.org/benchmark/index.html

Someone's research for a good data analysis language:

http://www.cs.ubc.ca/~murphyk/Software/which_language.html


One of R's big benefits is the huge amount of statistical functionality, for example numerous different quantilization algorithms.

If you are working with a lot of heterogeneous data in R it becomes a real headache. Merging data frames seems like it should work like you think it should but if one of your sets of keys (strings) are 'factors' (what I am calling 'legacy S+ functionality', I'm sure they're useful for many algorithms), you'll end up with garbage. There's a hack you can put in your code ('options(stringsAsFactors=FALSE)') which alleviates some of this but in general aligning data I found to be a huge pain. If you're running regressions this is pretty important

haven't tried sage but have heard good things. NumPy is a good alternative because it's extremely well implemented and has consistent behavior across the board. Extensibility (with Fortran, Cython/Pyrex, C/C++) is clean and easy. Never thought I'd write Fortran 77 code being born quite a few years after '77 but it's an easy way to speed up simple procedural algorithms 50x or more.


factors are nothing but enums and are used to shrink data and speed processing. Plus that matches what you typically want to happen in regressions: strings turn into (n-1) indicator variables. Otherwise, what is the meaning of using a string as an explanatory variable in a regression?

If you want to merge data frames that were created w/ different factors, perhaps the easiest thing to do is turn your factors into strings?

If d is your data frame, then:

d$factorVar <- as.character( d$factorVar )

merge your two data frames, then

merged$factorVar <- as.factor( merged$factorVar )

should set you right...

earl




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: