i've used R for several large projects. it sucks in several ways for the projects i worked on:
1. it is extremely slow for numerics. slower than MATLAB. slower than python (with numpy). after talking to several stats people, it seems pretty much everyone ends up writing most of their code in C when using R. (in contrast, i didn't find this necessary in python or MATLAB for similar projects.)
I agree with all of your points, yet I still find myself using R on a day to day basis instead of python. My reason for doing so is the large number of statistical packages that exist for R. I know that scipy exists, but it does not seem to have nearly the same coverage as CRAN + base libraries... Are there other python libraries I should look at?
it's true that CRAN+Bioconductor have a lot of coverage of statistics. if you need to fit a particular kind of model and it's not using MCMC then perhaps it's the best you can do if someone has already written the code.
we use sage (http://www.sagemath.org/) which includes a lot of the common python numerics: numpy, scipy, matplotlib, networkx. it provides nice interfaces to R and gp/pari.
i also use mayavi2 for 3d plots (something i could never get R to do well under linux...) enthought have a lot of nice things for python and scientific computing. there's also pymc which i've not used (i just write the MCMC code directly).
Regarding syntax, what I find most annoying is the lack of a standard way to do common things across different functions. For example, to ignore missing values in a dataset (denoted by NA), there are as many variants are there are functions: na.rm, na.omit, na.action etc. So there's different syntax for doing the same thing in different parts of the language.
na.rm is a argument name, na.omit a function, na.action an attribute. They are not the same thing.
Anyway, you're right with respect to the lack of standard way. One thing I personally find most annoying is function naming. Some functions use the dot convention (do.this), some use camel case (doThis), some use underscores (do_this) etc. And what is most annyoing: this is even true for novel functions that were just introduced in recent releases.
Are useful and powerful the opposite of sucky to most people?
Not to me. Consider a parallel example:
Visual Basic (pre-.NET, think ~3.0) opened up programming for GUIs to a much much broader audience (because Hypercard was made for a far less popular system). VBmade embedding hugely powerful applications within your own, remote object calls simple, etc. It was the most popular programming language.
It was incredibly successful, useful, and powerful. None of that could change the fact that it fundamentally sucked though.
It sucked so much though that Microsoft killed it. They killed their most popular programming product ever. They killed the product that was used to make most of the applications for their platform, their office suite, their web server, their database server, etc. In the transition to .NET they apparently they felt its foundations were so fundamentally flawed that they had to redesign the language.
I agree with your thesis but I wouldn't classify classic VB as particularly powerful either. Usually when people say a language is powerful, they seem to mean that a small number of primitives can be combined to produce a large variety of structures or that brief code can accomplish a lot of work.
Maybe a better term for powerful would be "power to weight ratio".
You're right: it's primitives sucked. However, VB let people do a huge amount of work without code: Think the GUI designer, VBX/OCX controls, and OLE embedding. That's the power i was talking about, not a way to implement advanced algorithms. (People used freaking SQL databases to get passable data structures... Ugh!) Most code I've seen people writing was more related to that gluey crap than advanced topics.
Side-note: Most of my work in the past year has been done in R.
Those weren't really parts of the language, though; they were parts of the platform. We inconsistently compare languages (i.e. Java) to platforms (i.e. the JVM), which is, I think, a lot of the reason we're so bad at comparing both.
"It sucked so much though that Microsoft killed it."
Hmmm, I prefer the explanation that they just lost their Raymond Chen style backwards compatibility religion. Note that they did the same with Windows right after the release of XP. Joel has a lot more to say about this: http://www.joelonsoftware.com/articles/APIWar.html
Perhaps you're right.. Have they continued to eliminate features they didn't like in releases since 2002?
As far as I could tell, VB.NET wasn't an upgrade to VB6, it was a product introduced to smooth the transition to C#... like Lotus 1-2-3 shortcut compatibility in Excel. VB.NET simply couldn't do what I was using VB for.
It's like if you went to the hardware store and they told you that instead of selling hammers, now they sell screwdrivers. They're better because they're easier to get out, they hold better, etc. That's all fine and good.. unless you've got a truck full of nails and the roofing contract calls for the shingles to be nailed in.
> As in Haskell and O’Caml, operators are just syntactic sugar for ordinary functions. Enclosing any operator in backticks lets you use it as if it were an ordinary function. For example, calling `+`(2, 3) returns 5.
This is awesome. This is probably one of those things that I should have known but didn't. It strikes me as being very useful in combination with the 'apply' functions.
> It strikes me as being very useful in combination with the 'apply' functions.
It's very useful with higher-order functions in general. Even more so because most of the languages with operators simply being (binary) infix function calls also default to curried functions[1], which you can easily partially apply.
Haskell also has the reverse operation (MLs probably have it as well) of being able to use a binary function as an operator: "a `foo` b" is equivalent to "foo a b", but sometimes reads much better.
To me it's a statistical calculator with beautiful graphing capabilities. Once the question becomes difficult enough to consider it programming, it's time to pull up numpy.
Anyone who thinks R sucks obviously hasn't used the ggplot2 package for it: http://had.co.nz/ggplot2/ !
The neat thing about R is that it supports the functional paradigm, but it doesn't bash you on the head with it. My fellow programmers who are not familiar with lazy evaluation, continuations, list iterators (is that the right word? such as map / filter / fold) can still use it without feeling like they're missing an arm.
Higher-order functions, or "combinators" if you want to sound all math-y. They're not really list iterators, because it makes just as much sense to map or fold over trees, arrays, matrices, etc. Whether you need structure-specific versions like maplist, maptree, etc. is just an implementation detail.
I have never used R, but why (in the beginning) would you generate data in python and then plot in R, rather than just plot in python (which looks much nicer IMO)?
I remember looking at that matplotlib page for the first time and coming away thinking those graphs look a lot better and sharper than the default R ones. However, I found the reason for that is the images on that page are quite high resolution PNGs. If you make graphs of 200dpi or more in R, they look equally good. The annoying thing though is that you have to adjust all the margins and character size settings in R to plot at a higher resolution.
Given the existence of Incanter+Clojure plus the Leiningen project (similar to R's inbuilt package management) - how does R compete?
Especially considering primitive types support in Clojure - http://groups.google.com/group/clojure/browse_thread/thread/...
R's libraray support is far more complete. R has been the defacto standard for statistics research for over a decade. This means that someone has already written an R library to handle basically any type of statstical or probability application you care to imagine. Also, for much the same reasons, the documentation for R is much much better.
My programming background is fairly extensive in non-functional languages. I work in finance, and the company I joined uses R for a considerable amount of model prototyping.
I'd really like to switch to Python+numpy/scipy, but I haven't been able to find an equivalent of a data.frame, or some numeric+string data structure that allows for easy slicing on both.
I'm having the same problem. R is slow; yet R's plyr, reshape, and lattice packages are indispensable (ggplot2 wasn't quite mature enough when I wanted to learn it). Maybe something could be written around record/structured arrays. (I'm halfway inclined to try...)
1. it is extremely slow for numerics. slower than MATLAB. slower than python (with numpy). after talking to several stats people, it seems pretty much everyone ends up writing most of their code in C when using R. (in contrast, i didn't find this necessary in python or MATLAB for similar projects.)
2. its syntax is quite clunky. want to concatenate two strings? paste(string_a, string_b, sep=''). radford neal has a series of blog posts on R's design flaws: http://radfordneal.wordpress.com/2008/09/21/design-flaws-in-...
3. uninformative error reporting. by default, the stack trace isn't printed. even if it is, often the errors don't really tell you what went wrong.
i don't see any advantage to R over python. yield makes a nice replacement for the lazy evaluation of R.