Why R Doesn't Suck

dododo · on June 19, 2010

i've used R for several large projects. it sucks in several ways for the projects i worked on:

1. it is extremely slow for numerics. slower than MATLAB. slower than python (with numpy). after talking to several stats people, it seems pretty much everyone ends up writing most of their code in C when using R. (in contrast, i didn't find this necessary in python or MATLAB for similar projects.)

2. its syntax is quite clunky. want to concatenate two strings? paste(string_a, string_b, sep=''). radford neal has a series of blog posts on R's design flaws: http://radfordneal.wordpress.com/2008/09/21/design-flaws-in-...

3. uninformative error reporting. by default, the stack trace isn't printed. even if it is, often the errors don't really tell you what went wrong.

i don't see any advantage to R over python. yield makes a nice replacement for the lazy evaluation of R.

thesnark · on June 19, 2010

I agree with all of your points, yet I still find myself using R on a day to day basis instead of python. My reason for doing so is the large number of statistical packages that exist for R. I know that scipy exists, but it does not seem to have nearly the same coverage as CRAN + base libraries... Are there other python libraries I should look at?

dododo · on June 19, 2010

it's true that CRAN+Bioconductor have a lot of coverage of statistics. if you need to fit a particular kind of model and it's not using MCMC then perhaps it's the best you can do if someone has already written the code.

we use sage (http://www.sagemath.org/) which includes a lot of the common python numerics: numpy, scipy, matplotlib, networkx. it provides nice interfaces to R and gp/pari. i also use mayavi2 for 3d plots (something i could never get R to do well under linux...) enthought have a lot of nice things for python and scientific computing. there's also pymc which i've not used (i just write the MCMC code directly).

revorad · on June 19, 2010

Regarding syntax, what I find most annoying is the lack of a standard way to do common things across different functions. For example, to ignore missing values in a dataset (denoted by NA), there are as many variants are there are functions: na.rm, na.omit, na.action etc. So there's different syntax for doing the same thing in different parts of the language.

xtho · on June 19, 2010

na.rm is a argument name, na.omit a function, na.action an attribute. They are not the same thing.

Anyway, you're right with respect to the lack of standard way. One thing I personally find most annoying is function naming. Some functions use the dot convention (do.this), some use camel case (doThis), some use underscores (do_this) etc. And what is most annyoing: this is even true for novel functions that were just introduced in recent releases.

revorad · on June 19, 2010

Thanks for the correction, but I think you made my point even clearer.

ggruschow · on June 19, 2010

Are useful and powerful the opposite of sucky to most people?

Not to me. Consider a parallel example:

Visual Basic (pre-.NET, think ~3.0) opened up programming for GUIs to a much much broader audience (because Hypercard was made for a far less popular system). VBmade embedding hugely powerful applications within your own, remote object calls simple, etc. It was the most popular programming language.

It was incredibly successful, useful, and powerful. None of that could change the fact that it fundamentally sucked though.

It sucked so much though that Microsoft killed it. They killed their most popular programming product ever. They killed the product that was used to make most of the applications for their platform, their office suite, their web server, their database server, etc. In the transition to .NET they apparently they felt its foundations were so fundamentally flawed that they had to redesign the language.

phaedrus · on June 19, 2010

I agree with your thesis but I wouldn't classify classic VB as particularly powerful either. Usually when people say a language is powerful, they seem to mean that a small number of primitives can be combined to produce a large variety of structures or that brief code can accomplish a lot of work.

Maybe a better term for powerful would be "power to weight ratio".

ggruschow · on June 19, 2010

You're right: it's primitives sucked. However, VB let people do a huge amount of work without code: Think the GUI designer, VBX/OCX controls, and OLE embedding. That's the power i was talking about, not a way to implement advanced algorithms. (People used freaking SQL databases to get passable data structures... Ugh!) Most code I've seen people writing was more related to that gluey crap than advanced topics.

Side-note: Most of my work in the past year has been done in R.

derefr · on June 20, 2010

Those weren't really parts of the language, though; they were parts of the platform. We inconsistently compare languages (i.e. Java) to platforms (i.e. the JVM), which is, I think, a lot of the reason we're so bad at comparing both.

roryokane · on June 19, 2010

The term for that idea would be “efficient”.

hga · on June 20, 2010

"It sucked so much though that Microsoft killed it."

Hmmm, I prefer the explanation that they just lost their Raymond Chen style backwards compatibility religion. Note that they did the same with Windows right after the release of XP. Joel has a lot more to say about this: http://www.joelonsoftware.com/articles/APIWar.html

ggruschow · on June 20, 2010

Perhaps you're right.. Have they continued to eliminate features they didn't like in releases since 2002?

As far as I could tell, VB.NET wasn't an upgrade to VB6, it was a product introduced to smooth the transition to C#... like Lotus 1-2-3 shortcut compatibility in Excel. VB.NET simply couldn't do what I was using VB for.

It's like if you went to the hardware store and they told you that instead of selling hammers, now they sell screwdrivers. They're better because they're easier to get out, they hold better, etc. That's all fine and good.. unless you've got a truck full of nails and the roofing contract calls for the shingles to be nailed in.

carbocation · on June 19, 2010

> As in Haskell and O’Caml, operators are just syntactic sugar for ordinary functions. Enclosing any operator in backticks lets you use it as if it were an ordinary function. For example, calling `+`(2, 3) returns 5.

This is awesome. This is probably one of those things that I should have known but didn't. It strikes me as being very useful in combination with the 'apply' functions.

masklinn · on June 19, 2010

> It strikes me as being very useful in combination with the 'apply' functions.

It's very useful with higher-order functions in general. Even more so because most of the languages with operators simply being (binary) infix function calls also default to curried functions[1], which you can easily partially apply.

Haskell also has the reverse operation (MLs probably have it as well) of being able to use a binary function as an operator: "a `foo` b" is equivalent to "foo a b", but sometimes reads much better.

[1] http://en.wikipedia.org/wiki/Currying

tel · on June 19, 2010

To me it's a statistical calculator with beautiful graphing capabilities. Once the question becomes difficult enough to consider it programming, it's time to pull up numpy.

makmanalp · on June 19, 2010

Anyone who thinks R sucks obviously hasn't used the ggplot2 package for it: http://had.co.nz/ggplot2/ !

The neat thing about R is that it supports the functional paradigm, but it doesn't bash you on the head with it. My fellow programmers who are not familiar with lazy evaluation, continuations, list iterators (is that the right word? such as map / filter / fold) can still use it without feeling like they're missing an arm.

silentbicycle · on June 19, 2010

> is that the right word?

Higher-order functions, or "combinators" if you want to sound all math-y. They're not really list iterators, because it makes just as much sense to map or fold over trees, arrays, matrices, etc. Whether you need structure-specific versions like maplist, maptree, etc. is just an implementation detail.

RK · on June 19, 2010

I have never used R, but why (in the beginning) would you generate data in python and then plot in R, rather than just plot in python (which looks much nicer IMO)?

thesnark · on June 19, 2010

Plots in R are far superior to those generated with matplotlib or other packages. Take a look at ggplot2.

dododo · on June 19, 2010

what about 3D or 3D with time in R? this is what python has: http://code.enthought.com/projects/mayavi/

i'm not really sure about far superior for 2D: http://matplotlib.sourceforge.net/users/screenshots.html vs. http://had.co.nz/ggplot2/

revorad · on June 19, 2010

I remember looking at that matplotlib page for the first time and coming away thinking those graphs look a lot better and sharper than the default R ones. However, I found the reason for that is the images on that page are quite high resolution PNGs. If you make graphs of 200dpi or more in R, they look equally good. The annoying thing though is that you have to adjust all the margins and character size settings in R to plot at a higher resolution.

Jach · on June 19, 2010

Well done sir, you've just made me take the time to install Mayavi. (There were even a bunch of packages I had to unmask in Portage to do it.)

revorad · on June 19, 2010

One reason is the huge number of R libraries available for doing statistical analysis and graphing.

sandGorgon · on June 19, 2010

Given the existence of Incanter+Clojure plus the Leiningen project (similar to R's inbuilt package management) - how does R compete? Especially considering primitive types support in Clojure - http://groups.google.com/group/clojure/browse_thread/thread/...

dagw · on June 19, 2010

R's libraray support is far more complete. R has been the defacto standard for statistics research for over a decade. This means that someone has already written an R library to handle basically any type of statstical or probability application you care to imagine. Also, for much the same reasons, the documentation for R is much much better.

hyperbovine · on June 19, 2010

someone has already written an R library to handle basically any type of statstical or probability application

Also, frequently the person writing the library is the guy who invented the estimator in the first place. To me this counts for something.

agentq · on June 19, 2010

My programming background is fairly extensive in non-functional languages. I work in finance, and the company I joined uses R for a considerable amount of model prototyping.

I'd really like to switch to Python+numpy/scipy, but I haven't been able to find an equivalent of a data.frame, or some numeric+string data structure that allows for easy slicing on both.

Does anybody have any suggestions?

etal · on June 20, 2010

Does numpy's record array do what you want?

http://www.scipy.org/RecordArrays

It doesn't quite have first-class status in the library the way data.frame does in R, but it does let you index an array using strings.

agentq · on June 20, 2010

Thanks, I'll check it out!

wildanimal · on June 20, 2010

I'm having the same problem. R is slow; yet R's plyr, reshape, and lattice packages are indispensable (ggplot2 wasn't quite mature enough when I wanted to learn it). Maybe something could be written around record/structured arrays. (I'm halfway inclined to try...)