But when you start having to massage the data in the language (database lookups, integrating datasets, more complicated logic), Python is the better "general-purpose" language. It is a pretty steep learning curve to grok the R internal data representations and how things work.
The better part of this comparison, in my opinion, is how to perform similar tasks in each language. It would be more beneficial to have a comparison of here is where Python/Pandas is good, here is where R is better, and how to switch between them. Another way of saying this is figuring out when something is too hard in R and it's time to flip to Python for a while...
we just released v1.4 with all kind of new features, check it out: https://github.com/twosigma/beaker-notebook/releases/tag/1.4...
I didn't get it working on my Linux machine, but you will definitely see some pull requests once I have time to fiddle with it. The electron version is a nice idea but I would prefer better instructions for installing the normal version. "This script will do it all" is not always helpful.
We are working on better linux packing and distribution (see our issue tracker), but it is not easy to do it right, and it will take a while.
PRs very welcome!
To be honest, matplotlib seems a good contender to me (http://matplotlib.org/).
Also, what's wrong with comparing R to Pandas/Numpy ? They can only be used from within Python, right?
Edit: just realised from another comment that Pandas/Numpy can be accessed from R, too.
> To be honest, matplotlib seems a good contender to me (http://matplotlib.org/).
They're quite different, though, and I can see why many prefer ggplot. It's a declarative, domain-specific language that implements a Tufte-inspired "grammar of graphics" (hence the gg- in the name; see section 1.3 of , and [2,3]) for very fast and convenient interactive plotting, whereas matplotlib is just a clone of MATLIB's procedural plotting API.
I've waxed lyrical about Python all over this thread, but here you have to give the medal to R. Matplotlib is one of my least favourite libraries to use, been doing it for almost 2 years, and I still spend half my time buried in the documentation trying to figure out how I'm supposed to move the legend slightly to the right or whatever.
ggplot probably has slightly less flexibility overall (mpl is monolithic), but for just doing easy things that you need 99% of the time, ggplot is king.
I am not familiar with ggplot, so I wasn't comparing them on the ground of the easiness of use, but by looking at some ggplot examples, they looked like something you can do with matplotlib, too, so I pointed that option out, too.
I was referring to the article title that is was an R vs. Python comparison. Python is so much more in terms of a general purpose language than R is. Similarly, R is much more in terms of stats (built-in) than Python. I just thought that it would be more accurate to call the article an R vs. Pandas/NumPy comparison.
Even though both of them need an extra plotting library to make publication quality plots. Matplotlib isn't bad by any means - and it's gotten better over the years. But R/ggplot2 produces nicer plots (IMO). I'm not sure that I'd export data from Python into R just for ggplot, but I might.
I am not that familiar with ggplot myself, but I'll give it a go as soon as I'll have the chance.
On paper perhaps, less so in application. Sure you can probably make matplotlib do everything ggplot does with enough work, but working with ggplot is just so much quicker easier and more fun.
And I say that as someone who does all his data analysis in Python.
You can try out my dev version  (rewrite branch). It will be nearly API compatible.
I tried the ggplot for python (ggplot.yhathq.com/) but eventually settled for seaborn (http://stanford.edu/~mwaskom/software/seaborn/). It is really quite easy to get most of the common plots that I wanted and hasn't let me down yet. The standard plots look SO much better than the standard plots of MPL without a lot of customization.
On the other hand, with ggplot, you can create a good enough chart in couple of lines of code almost for any data.
BTW, there's a ggplot port for a Python: http://ggplot.yhathq.com/
Matplotlib with some good definitions ends up providing much better results and nicer looking plots fro B&W unlike what people normally think.
And yet, you go right on in the next sentence to make it a Python/Pandas/Numpy vs. R/everything in CRAN comparison. Libraries count.
Out of curiosity, why do you consider CRAN to be much better than PyPI?
And the last time I checked, "pip install numpy" could be quite a pain, especially if you needed to compile dependencies. Rstudio makes it ridiculously easy to install R and add packages.
However - for all other types of packages, PyPI is obviously superior. The breadth of packages on PyPI is much better than CRAN.
It about choosing the right tool for the job.