Hacker News new | comments | ask | show | jobs | submit login

This is interesting, but not really an R vs. Python comparison. It's an R vs. Pandas/Numpy comparison. For basic (or even advanced) stats, R wins hands down. And it's really hard to beat ggplot. And CRAN is much better for finding other statistical or data analysis packages.

But when you start having to massage the data in the language (database lookups, integrating datasets, more complicated logic), Python is the better "general-purpose" language. It is a pretty steep learning curve to grok the R internal data representations and how things work.

The better part of this comparison, in my opinion, is how to perform similar tasks in each language. It would be more beneficial to have a comparison of here is where Python/Pandas is good, here is where R is better, and how to switch between them. Another way of saying this is figuring out when something is too hard in R and it's time to flip to Python for a while...

totally agree and that's why we made Beaker: http://beakernotebook.com/

you can code in multiple languages in your noteobook, and they can all communicate, making it easy to go from Python to R to JavaScript, seamlessly.

we just released v1.4 with all kind of new features, check it out: https://github.com/twosigma/beaker-notebook/releases/tag/1.4...

I tried to install this the other day.

I didn't get it working on my Linux machine, but you will definitely see some pull requests once I have time to fiddle with it. The electron version is a nice idea but I would prefer better instructions for installing the normal version. "This script will do it all" is not always helpful.

Thanks for the report. Yea it is not easy to install on linux unless you use the docker version. There are many dependencies and PPAs required in the script because it does everything.

We are working on better linux packing and distribution (see our issue tracker), but it is not easy to do it right, and it will take a while.

PRs very welcome!

FYI - I tried one of the Mac all in one downloads, and it looks promising. However, all I get are status messages saying that it is waiting for Python or R to initialize...

Thanks. We don't have an all-in-one download, you have to install Python or R separately. But if you already have them, it should just work if they are in your PATH, and that path is setup by .bash_profile? Did you install the required R packages? Do you have IPython (not just Python)? We can probably better debug this by email or as a github issue than in this forum.

Sorry, I meant the Electron version...

OK well it loads the backends the same way. Please raise by email or github.

> And it's really hard to beat ggplot.

To be honest, matplotlib seems a good contender to me (http://matplotlib.org/).

Also, what's wrong with comparing R to Pandas/Numpy ? They can only be used from within Python, right?

Edit: just realised from another comment that Pandas/Numpy can be accessed from R, too.

> > And it's really hard to beat ggplot.

> To be honest, matplotlib seems a good contender to me (http://matplotlib.org/).

They're quite different, though, and I can see why many prefer ggplot. It's a declarative, domain-specific language that implements a Tufte-inspired "grammar of graphics" (hence the gg- in the name; see section 1.3 of [1], and [2,3]) for very fast and convenient interactive plotting, whereas matplotlib is just a clone of MATLIB's procedural plotting API.

[1] http://www.amazon.com/ggplot2-Elegant-Graphics-Data-Analysis...

[2] http://www.amazon.com/The-Grammar-Graphics-Statistics-Comput...

[3] http://vita.had.co.nz/papers/layered-grammar.html

"matplotlib seems a good contender to me'

I've waxed lyrical about Python all over this thread, but here you have to give the medal to R. Matplotlib is one of my least favourite libraries to use, been doing it for almost 2 years, and I still spend half my time buried in the documentation trying to figure out how I'm supposed to move the legend slightly to the right or whatever.

ggplot probably has slightly less flexibility overall (mpl is monolithic), but for just doing easy things that you need 99% of the time, ggplot is king.

There is a gpplot clone in python. Also bokeh is starting to develop a grammar of graphics interface. Then there is seaborn and mbplot. Lots of stuff besides mplotlib

I must give you that after few years of using it, I still have to look for documentation for elementary things.

I am not familiar with ggplot, so I wasn't comparing them on the ground of the easiness of use, but by looking at some ggplot examples, they looked like something you can do with matplotlib, too, so I pointed that option out, too.

i couldnt agree more - the API seems very confusing and the examples provided are shitty in my opinion

> what's wrong with comparing R to Pandas/Numpy

Absolutely nothing.

I was referring to the article title that is was an R vs. Python comparison. Python is so much more in terms of a general purpose language than R is. Similarly, R is much more in terms of stats (built-in) than Python. I just thought that it would be more accurate to call the article an R vs. Pandas/NumPy comparison.

Even though both of them need an extra plotting library to make publication quality plots. Matplotlib isn't bad by any means - and it's gotten better over the years. But R/ggplot2 produces nicer plots (IMO). I'm not sure that I'd export data from Python into R just for ggplot, but I might.

Thanks for the clarification. I am sorry I had got your comment in the wrong way.

I am not that familiar with ggplot myself, but I'll give it a go as soon as I'll have the chance.

matplotlib seems a good contender to me

On paper perhaps, less so in application. Sure you can probably make matplotlib do everything ggplot does with enough work, but working with ggplot is just so much quicker easier and more fun.

And I say that as someone who does all his data analysis in Python.

I have rewritten the python ggplot to put it on per with ggplot2.

You can try out my dev version [1] (rewrite branch). It will be nearly API compatible.

[1] https://github.com/has2k1/ggplot

Please write a blog post when you are done?! This will be huge :)

Even regular R plotting is still far easier and more intuitive than matplotlib, not just ggplot.

I completely agree. ggplot is the only reason why I sometimes use R.

I don't have a lot of experience with either, but I was close to really digging in and learning R just for the ease of use of ggplot.

I tried the ggplot for python (ggplot.yhathq.com/) but eventually settled for seaborn (http://stanford.edu/~mwaskom/software/seaborn/). It is really quite easy to get most of the common plots that I wanted and hasn't let me down yet. The standard plots look SO much better than the standard plots of MPL without a lot of customization.

Ggplot for python is almost done. there is an active dev branch

matplot allows you to create almost any chart you want. However, it is very low level.

On the other hand, with ggplot, you can create a good enough chart in couple of lines of code almost for any data.

BTW, there's a ggplot port for a Python: http://ggplot.yhathq.com/

Matplotlib can produce high quality plots. But it requires lots of code, and hours of digging around the API docs and tweaking subclasses.

Happily, a Python port of ggplot is underway [0], although it's still very much a work in progress.

[0] https://github.com/yhat/ggplot/

...with stalled progress. Right now I rather inline R code (with %%R in IPython Notebook) and use the real ggplot2.

There's a dev branch that has been actively developed

Well, for scientists wanting to publish, GGplot it's quite unpractical. Most of the time we have to publish in B&W magazines and GGPlot simply lacks the capabilities to do so properly (por instance B&W filling patterns).

Matplotlib with some good definitions ends up providing much better results and nicer looking plots fro B&W unlike what people normally think.

... and I remembered why I don't use ggplot at all, thanks. After lots and lots of plots done with R, I was starting to feel a bit weird reading the comments.

> It's an R vs. Pandas/Numpy comparison.

And yet, you go right on in the next sentence to make it a Python/Pandas/Numpy vs. R/everything in CRAN comparison. Libraries count.

mbreese's point was not that that is wrong or misguided, just that it was happening.

R has pandas/numpy/scipy integrated in the language (for the most used features at least), but that doesn't make much of a difference because any person that wants to use these tools will do a quick "pip install" to grab them. (which is pretty fast with the new Wheels system)

Out of curiosity, why do you consider CRAN to be much better than PyPI?

I'm only thinking about CRAN > PyPI in terms of statistical packages. CRAN is where new statistical analysis techniques / packages are initially published. If you're lucky they might get ported to Python after the fact. I didn't even mention Bioconductor, which is another beast entirely. There isn't an equivalent of Bioconductor for Python at all.

And the last time I checked, "pip install numpy" could be quite a pain, especially if you needed to compile dependencies. Rstudio makes it ridiculously easy to install R and add packages.

However - for all other types of packages, PyPI is obviously superior. The breadth of packages on PyPI is much better than CRAN.

It about choosing the right tool for the job.

the best way to get the entire numpy/scipy/numba package is anaconda[1]

[1] https://www.continuum.io/downloads

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact