
Comparison – R vs. Python: head to head data analysis - emre
https://www.dataquest.io/blog/python-vs-r/
======
mbreese
This is interesting, but not really an R vs. Python comparison. It's an R vs.
Pandas/Numpy comparison. For basic (or even advanced) stats, R wins hands
down. And it's really hard to beat ggplot. And CRAN is much better for finding
other statistical or data analysis packages.

But when you start having to massage the data in the language (database
lookups, integrating datasets, more complicated logic), Python is the better
"general-purpose" language. It is a pretty steep learning curve to grok the R
internal data representations and how things work.

The better part of this comparison, in my opinion, is how to perform similar
tasks in each language. It would be more beneficial to have a comparison of
here is where Python/Pandas is good, here is where R is better, and how to
switch between them. Another way of saying this is figuring out when something
is too hard in R and it's time to flip to Python for a while...

~~~
nicolapede
> And it's really hard to beat ggplot.

To be honest, matplotlib seems a good contender to me
([http://matplotlib.org/](http://matplotlib.org/)).

Also, what's wrong with comparing R to Pandas/Numpy ? They can only be used
from within Python, right?

Edit: just realised from another comment that Pandas/Numpy can be accessed
from R, too.

~~~
sweezyjeezy
"matplotlib seems a good contender to me'

I've waxed lyrical about Python all over this thread, but here you have to
give the medal to R. Matplotlib is one of my least favourite libraries to use,
been doing it for almost 2 years, and I still spend half my time buried in the
documentation trying to figure out how I'm supposed to move the legend
slightly to the right or whatever.

ggplot probably has slightly less flexibility overall (mpl is monolithic), but
for just doing easy things that you need 99% of the time, ggplot is king.

~~~
nrpprn
There is a gpplot clone in python. Also bokeh is starting to develop a grammar
of graphics interface. Then there is seaborn and mbplot. Lots of stuff besides
mplotlib

------
bigtunacan
R is certainly a unique language, but when it comes to statistics I haven't
seen anything else that compares. Often I see this R vs Python comparison
being made (not that this particular article has that slant) as a come drink
the Python kool-aid; it tastes better.

Yes; Python is a better general purpose language. It is inferior though when
it comes specifically to statistical analysis. Personally I don't even try to
use R as a general purpose language. I use it for data processing, statistics,
and static visualizations. If I want dynamic visualizations I process in R
then typically do a hand off to JavaScript and use D3.

Another clear advantage of R is that it is embedded into so many other tools.
Ruby, C++, Java, Postgres, SQL Server (2016); I'm sure there are others.

~~~
randomsearch
> R is certainly a unique language

I'd say R is a _terrible_ language. Its types are just really different from
every major programming language, and it's horrible for an experienced
programmer to use.

I totally agree that R has fantastic libraries, but I'd like to see people
focus on improving libraries for Python rather than sticking with R, which as
a language is less well-designed than Python.

[I use R for most of my stats, I also use Matlab and Python]

~~~
hadley
I think you're wrong. R is an excellent language, targeted specifically around
the problems you commonly see when doing data analysis. On the whole the
standard libraries aren't particularly good, but I think the language is good.

That said, the language is often taught poorly. Here's my attempt to do
better: [http://adv-r.had.co.nz](http://adv-r.had.co.nz)

~~~
roel_v
Well, time to bring out my favorite dead horse to beat:

    
    
       - http://stackoverflow.com/questions/1815606/rscript-determine-path-of-the-executing-script
       - http://stackoverflow.com/questions/3452086/getting-path-of-an-r-script
    

(where you already commented, so it's not like this is something new...)

I would say that any language that does not have a facility to get the path of
the current file, is not 'excellent' under the criteria an experienced
programmer would use for assessing it.

Now, I very well know that those criteria are different from what _scientists_
use, but still...

------
phillipamann
R is a wonderful language if you chose to get used to it. I love it. I've even
used R in production quality assurance to check for regressions in data (not
the statistical regressions). I see countless R posts where people try to
compare it to Python to find the one true language for working with data.
Article after article, there clearly isn't a winner. People like R and Python
for different reasons. I think it's actually quite intuitive to think about
everything in terms of vectors with R. I like the functional aspects of R. I
wish R was a bit faster but I am pretty sure the people who maintain R are
working on that. You can't beat the enormous library that R has.

~~~
baldfat
I also LOVE R. Plus the fact that Microsoft and other corporations are
supporting R will help more and more. With Hadly Wickham's universe it is a
great place to do all your work.

~~~
Mikeb85
Yup. R is supported by MS, Oracle, IBM and others, and companies like Twitter
and even the Python shop that is Google use it.

------
danso
I spent a few weeks a few months ago learning R. It's not a bad language, and
yes, the plotting is currently second-to-none, at least based on my limited
experience with matplotlib and seaborn.

There's scant few articles on going from Python to R...and I think that has
given me a lot of reason to hesitate. One of the big assets of R is Hadley
Wickham...the amount and variety of work he has contributed is prodigious (not
just ggplot2, but everything from data cleaning, web scraping, dev tools,
time-handling a la moment.js, and books). But that's not just evidence of how
generous and talented Wickham is, but how relatively little dev support there
is in R. If something breaks in ggplot2 -- or any of the many libraries he's
involved in, he's often the one to respond to the ticket. He's only one
person. There are many talented developers in R but it's not quite a deep
open-source ecosystem and community yet.

Also word-of-warning: ggplot2 (as of 2014[1]) is in maintenance mode and
Wickham is focused on ggvis, which will be a web visualization library. I
don't know if there has been much talk about non-Hadley-Wickham people taking
over ggplot2 and expanding it...it seems more that people are content to
follow him into ggvis, even though a static viz library is still very
valuable.

[1]
[https://groups.google.com/forum/#!topic/ggplot2/SSxt8B8QLfo/...](https://groups.google.com/forum/#!topic/ggplot2/SSxt8B8QLfo/discussion)

~~~
revorad
Hadley is actively working on ggplot2. In fact, he just tweeted a list of
improvements -
[https://twitter.com/hadleywickham/status/654283936755904512](https://twitter.com/hadleywickham/status/654283936755904512)

[https://github.com/hadley/ggplot2/blob/master/NEWS.md](https://github.com/hadley/ggplot2/blob/master/NEWS.md)

~~~
danso
Thanks...I didn't know that (though I had been paying attention to bug
fixes)...but my point exactly, he's prodigious, so maybe "maintenance mode" to
him is "major features every 3 months instead of 2) :).

Also worth pointing out, he's actively working on a new book for ggplot2,
which, AFAICT, he's providing for free (you just have to run the build tools)

[https://github.com/hadley/ggplot2-book](https://github.com/hadley/ggplot2-book)

I think if someone were to run an analysis of Wickham's Github activity, it
would produce a freakishly busy chart.

~~~
revorad
Agreed about Hadley's prolific work.

I used to work a lot with R many years ago. I was shocked to find how bad the
documentation was, and worse how rude and unfriendly the "community" of grumpy
professors was. I shudder to think of the horrible meanness towards beginners
asking questions on the mailing list.

I got so fed up I even wrote a book about R data visualisation. But this was
all just around the time ggplot2 came out. Unfortunately I stopped using R
soon after, but since then Hadley has single-handedly done more good for the
language than anyone else.

I don't know what the R community is like now, and whether people like Hadley
have made it friendlier, but it's clearly one reason Python is superior.

~~~
danso
I'm a late arrival to the language and have almost interacted with it
exclusively through StackOverflow and Github. I've been astonished at not just
how friendly people are, but how quickly I can get a helpful response to even
what I feel are pretty esoteric (and dumb) questions...again, one of the
problems of coming into R is that, because of the relatively small community,
there aren't as many references or easily Googlable answers compared to
Python...but getting answers to questions if you ask them is very easy, and I
think that's a credit to the community.

On the other hand, there seem to be a lot of useful libraries that haven't
been ported over to Github or are otherwise easily accessible beyond
CRAN...Many of them probably don't get as much exposure as they would if they
were more easily discoverable...and I honestly don't even know where, in those
cases, to start the bug reporting/patching process. That's obviously the fault
of my being spoiled by Github...but that's kind of the point, there's a bit
more friction in contributing to R than you might find in Python/Ruby/etc.

~~~
revorad
Yeah there's a lot more R stuff on SO now than when I was using it. The
mailing lists were more active so that's what I had to use to ask for help.

------
sweezyjeezy
This is just a series of incredibly generic operations on an already cleaned
dataset in csv format. In reality, you probably need to retrieve and clean the
dataset yourself from, say, a database, and you you may well need to do
something non-standard with the data, which needs an external library with
good documentation. Python is better equipped in both regards. Not to mention,
if you're building this into any sort of product rather than just exploring, R
is a bad choice. Disclaimer, I learned R before Python, and won't go back.

~~~
The13thDoc
I agree. Once you incorporate the other necessary work and preparation, a
well-documented, object oriented language is a better way to go.

~~~
vegabook
I have to agree that Python is more powerful, and I am indeed doing more and
more in Python. Python was my first language, before R.

However when the dataset is medium sized (i.e.: fits into your computer's
memory / 2) R crushes Python ( _and_ Pandas) for the 80% of the time you'll be
spending wrangling. The reason is that R is vector-based from the ground up.
Pandas does everything that R does, but does it in a less-consistent, grafted-
on way, whereas the experienced R person who "thinks vectors" is way ahead of
the Python guy before the analysis has even started (i.e., most of the work).
I know both really well. I use Python when I want to "get (semi) serious"
production wise (I qualify with "semi" because if you're really serious about
production, you're probably going to go to Scala).

But when it comes to taking a big chunk of untidy data and bashing it around
till it's clean and cube-shaped, will parse, and has no no obvious errors, R
is miles ahead of Python. R is where you do your discovering. Python can do it
too, but I would estimate the cognitive overhead as double.

By the way, that's why people who "think time series" all day long (i.e.,
vectors, not objects), and who want to implement their algos, not think CS,
will first typically build it in R, which is why CRAN beats Python all the
time and every time for off-the-shelf data analysis packages. Data people go
to R, computer-people go to Python (schematizing).

R is slow. That's its main problem. And that's saying something when comparing
it to Python! But the gem of vector-everything makes it a much more satisfying
language than imperative, OO, Python, when it comes to the world of data
first, code second.

Finally I'd add that Python 3.x is arguably distancing itself from the
pragmatism which data science requires, and 2.x provided, towards a world of
CS purity. It's not moving in a direction which is data science friendly. It's
moving towards a world of competition with Golang and Javascript, and Java
itself.

~~~
sweezyjeezy
Algo people use R because it's faster, nothing to do with being 'data people'.

I am a data person, and I have to deal with a lot of text in my job. If I had
to do it in R, I would quit.

Can you explain why you think it is easier to wrangle data in R? My experience
is the opposite.

~~~
vegabook
Do you mean they use _Python_ because it's faster? yes sure. But then, just
use scala. 10x faster again. With a REPL.

Perhaps I should clarify, I'm talking mainly time series and/or data which is
vectorizable. Python is better if you're scraping the web. If there's a lot of
if else going on. Ie imperative programming.

R's native functional aspects (all the apply family) and multilevel
vector/matrix hierarchical indexing is better built from the ground up for
large wrangling of multivariate datasets, in my opinion.

------
Mikeb85
The reason I like R - it just makes data exploration and analysis too damn
easy.

You've got R Studio, which is one of the best environments ever for exploring
data, visualisation, and it manages all your R packages, projects, and version
control effortlessly.

Then you've got the plethora of packages - if you're any of the following
fields: statistics, finance, economics, bioinformatics, and probably a few
others, there's packages that instantly make your life easier.

The environment is perfect for data exploration - it saves all the data in
your 'environment', allows you to define multiple environments, and your
project can be saved at any point, with all the global data intact.

If I want some extra speed, I can create C++ modules from within R Studio,
compile and link them, as easily as simply creating a new R script. Fortran is
a tiny bit more work, still easy enough however.

Want multicore or to spread tasks over a cluster? R has built in functions
that do that for you. As easy as calling mcapply, parApply, or clusterApply.
Heck, you can even write your function in another language, then R handles
applying that over however many cores you want.

Want to install and manage packages, update them, create them, etc...? All can
be done from R Studio's interface.

Knitr can create markdown/HTML/pdf/MS Word files from R markdown, or you can
simply compile everything to a 'notebook' style HTML page.

And all this is done incredibly easily, all from a single package (R Studio)
which itself is easy to get and install.

Oh yeah, visualisation, nothing really beats R.

And while there are quirks to the language, for non-programmers this isn't
really an obstacle, since they aren't already used to any particular paradigm.

As for Python, I'm sure it's great (I've used it a little), but I really don't
see how it can compare. R's entire environment is geared towards data analysis
and exploration, towards interfacing with the compiled languages most used for
HPC, and running tasks over the hardware you will most likely be using.

------
c3534l
I like Python better as a language, but Python's libraries take more work to
understand and the APIs aren't very unified. R is much more regular and the
documentation is better. Even complicated and obscure machine learning tasks
have good support in R. _BUT_ the performance for R can be very, very
annoying. Assignment is slow as all hell and it can often take work to figure
out how to rephrase complicated functions in a way that R can figure out how
to do efficiently. I think being much more functional than Python works well
for data. I mean the L in LISP stands for list! Visualizations are also easier
and more intuitive in R, too, IMO. Especially since half the time you can just
wrap some data in "plot" and R will figure our which one it should use.

I think the conclusion of the article is correct. R is more pleasant for
mathier type stuff, while Python is the better general-purpose language. If
your jobs involves showing people powerpoint presentations of the mathematical
analysis you've done,you'd probably want to use R. If, on the other hand,
you're prototyping data-driven applications, Python would probably be better.

That said, I really like Julia, but can't justify really diving into it at
this point. :\

~~~
baldfat
> prototyping data-driven applications, Python would probably be better

I would disagree. Python's libraries are really reimplementing R in Python
(Mainly Pandas). I find R to be very flexible and especially in the last 5
years with Hadley Wickham's libraries things are concise and very powerful.

Please look at dplyr and see how this new way fo doing R works. Especially
with piping with %>%.
[https://cran.rstudio.com/web/packages/dplyr/vignettes/introd...](https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html)

Code in R can look like this beautiful code (If you don't code in R and I
would expect anyone can see what is happening) This is why I disagree that
prototyping in Python would be better.:

flights %>% group_by(year, month, day) %>%

    
    
      select(arr_delay, dep_delay) 
    
      summarise(
    
        arr = mean(arr_delay, na.rm = TRUE),
    
        dep = mean(dep_delay, na.rm = TRUE)) %>%
    
      filter(arr > 30 | dep > 30)
    
    

Python has .pipe but I find it strange it goes to the new line before the
items.

Python Code: >>> (df.pipe(h)

... .pipe(g, arg1=a)

... .pipe((f, 'arg2'), arg1=a, arg3=c)

... )

~~~
avdempsey
I find the following Pandas code pretty easy to read:

    
    
      (df
       .groupby(['a', 'b', 'c'], as_index=False)
       .agg({'d': sum, 'e': mean, 'f', np.std})
       .assign(g=lambda x: x.a / x.c)
       .query("g > 0.05")
       .merge(df2, on='a'))
    

There are now methods in pandas to do pretty much anything, so you can chain
them together into one easy-to-read manipulation without lots of intermediate
variables.

------
evanpw
If you only have time to learn one language, learn Python, because it's better
for non-statistical purposes (I don't think that's very controversial).

If you need cutting-edge or esoteric statistics, use R. If it exists, there is
an R implementation, but the major Python packages really only cover the most
popular techniques.

If neither of those apply, it's mostly a matter of taste which one you use,
and they interact pretty well with each other anyway.

~~~
blumkvist
R does not mean only esoteric statistics. You have many more utilities in the
R packages to diagnose and select models. Fitting a model is like 1% of the
work, diagnostic is the more important part and R has much more to offer than
Python ever will.

~~~
nrpprn
Statsmodels has tons of model diagnostic...and there is no R equivalent to
Pymc3 (stan has less capability and worse API)

------
acaloiar
I have always considered R the best tool for both simple and complex
analytics. But, it should not go unmentioned that the features responsible for
R's usability often manifest as poor performance. As a result, I have some
experience rewriting the underlying C code in other languages. What one finds
under the hood is not often pretty. It would be interesting to see a
performance comparison between Python and R.

~~~
pjmlp
Given that R folks are porting it to the JVM, I guess performance on the R
side will improve thanks to Hotspot and Graal/Truffle.

[http://www.renjin.org/](http://www.renjin.org/)

[http://www.oracle.com/technetwork/java/jvmls2013vitek-201352...](http://www.oracle.com/technetwork/java/jvmls2013vitek-2013524.pdf)

Then there is PyPy as well.

I also think they should probably add Julia and Wolfram/Mathematica to these
comparisons.

~~~
jtth
I would say they're both as limited as Python, Julia far more so. R's stats
packages get ported to Julia faster, though. Mathematica still can't do mixed
generalized linear modeling, and no other language (other than SAS and Stata)
has a package for analyzing simple effects within them.

~~~
pjmlp
Thanks for the overview, I don't use them. It is more my language geek side
speaking louder. :)

------
mojoe
The one thing that sometimes gets overlooked when people decide whether to use
R or Python is how robust the language and libraries are. I've programmed
professionally in both, and R is really bad for production environments. The
packages (and even language internals sometimes) break fairly often for
certain use cases, and doing regression testing on R is not as easy as Python.
If you're doing one-off analyses, R is great -- for anything else I'd
recommend Python/Pandas/Scikit.

~~~
vegabook
or Scala, Clojure, or indeed C.

R's great strength is _finding_ the interesting bits of the data. Testing the
Algo. Doing the R&D basically. Better than Python.

Once that's done, why stop at Python? If your game is production, Python will
do it, but others will do it so much better, faster, more efficiently.

~~~
CuriouslyC
One nice thing about Python is that you can make a piecewise transition from
Python -> C, as it is fairly trivial to wrap C code for use in Python. On the
other hand, Java's C interface system JNI is pretty much universally reviled.

~~~
tareef
The same can be said about R. Rcpp makes it super easy for you to drop right
into C++ for bits of code that need that level of performance.

------
ggrothendieck
For R: (1) instead of `sapply(nba, mean, na.rm = TRUE)` use `colMeans(nba,
na.rm = TRUE)`. (2) instead of `nba[, c("ast", "fg", "trb")]` use
`nba[c("ast", "fg", "trb")]`, (3) instead of `sum(is.na(col)) == 0` use
`!anyNA(col)`, (4) instead of `sample(1:nrow(nba), trainRowCount)` use
`sample(nrow(nba), trainRowCount)` and (5) instead of tons of code use
`library(XML); readHTMLTable(url, stringsAsFactors = FALSE)`

------
The13thDoc
The "cheat sheet" comparison between R and Python is helpful. The presentation
is well done.

The conclusions state what we already know: Python is object oriented; R is
functional.

The __Last Word __appropriately tells us your opinion that Python is stronger
in more areas.

------
vegabook
Python's main problem is that it's moving in a CS direction and not a data
science direction.

The "weekend hack" that was Python, a philosophy carried into 2.x, made it a
supremely pragmatic language, which the data scientists love. They want to
think algorithms and maths. The language must not get in the way.

3.x is wanting to be serious. It wants to take on Golang. Javascript, Java. It
wants to be taken seriously. Enterprise and Web. There is nothing in 3.x for
data scientists other than the fig leaf of the @ operator. It's more
complicated to do simple stuff in 3.x. It's more robust from a theoretical
point of view, maybe, but it also imposes a cognitive overhead for those
people whose minds are already FULL of their algo problems and just want to
get from a -> b as easily as possible, without CS purity or implementation
elegance putting up barriers to pragmatism (I give you Unicode v Ascii,
print() v print, xrange v range, 01 v 1 (the first is an error in 3.x. Why
exactly?), focus on concurrency not raw parallelism, the list goes on).

R wants to get things done, and is _vectors first_. Vectors are what big data
typically is all about (if not matrices and tensors). It's an order of
magnitude higher dimensionality in the default, canonical data structure.
Applies and indexing in R, vector-wise, feels natural. Numpy makes a good
effort, but must still operate in a scalar/OO world of its host language, and
inconsistencies inevitably creep in, even in Pandas.

As a final point, I'll suggest that R is much closer to the vectorised future,
and that even if it is tragically slow, it will train your mind in the first
steps towards "thinking parallel".

------
xname2
"data analysis" means differently in R and Python. In R, it's all kinds of
statistical analyses. In Python, it's basic statistical analysis plus data
mining stuff. There are too many statistical analyses only exist in R.

------
acomjean
I work with biologists. R which seems strange to me they seem to take to. I
think some of it is Rstudio the ide, which shows variables in memory on the
side bar, you can click to see them. It makes everything really accessible for
those that aren't programmers. It seems to replace excel use for generating
plots.

I've grown to appreciate R, especially its plotting ability (ggplot).

~~~
mbreese
Rstudio is R for a lot of people. I'm a computational biologist in a group.
Our PI is trying to get the postdocs to learn R themselves, but it's an uphill
battle. I eventually warmed up to it - primarily for the plotting.

But a few weeks back he asked me how to do some kind of data sorting /
manipulation in R. My answer was that it was a 10 line Python script and I
gave him the code. Alas, he couldn't figure out how to save the script and run
it from a command-line.

You can't underestimate at how important Rstudio is to the popularity of R for
non-programmers.

------
falicon
Language comparisons are equiv. to religion comparisons...you aren't going to
find a universal answer or truth, it's an individual/faith sort of thing.

That being said - all the _serious_ math/data people I know love both R and
Python...R for the heavy math, Python for the simplicity, glue, and
organization.

------
zitterbewegung
This is not just interesting for comparison but its interesting for people
that know R/Python how to go from one to the other.

~~~
jtth
Kind of, but the R code is written a little oddly to my eye.

~~~
CJKinni
How so? As someone familiar with Python but not R, I've always been hesitant
to jump in. This code was very readable and made me think that it might be a
far more accessible language than I'd previously assumed.

~~~
geomark
One example in the section titled "Split into training and testing sets" would
be to use the createDataPartition() function from the caret package for
creating training and testing sets.

He says "In R, there are packages to make sampling simpler, but aren’t much
more concise than using the built-in sample function" but using caret is more
concise.

Added: Later in the section on random forests he says "With R, there are many
smaller packages containing individual algorithms, often with inconsistent
ways to access them." Which is why you want to use the caret package as it
makes accessing many machine learning packages consistent and easy.

------
fsiefken
It would be nice to compare JuliaStats and Clojure based Incanter with Python
Pandas/NumPy/SciPy.
[http://juliastats.github.io/](http://juliastats.github.io/)

------
willpearse
Very picky, but beware constantly using "set.seed" throughout your R scripts.
Always using the same random number is not necessarily helpful for stats, and
makes the R code look a lot trickier than it need be

------
wesm
I hope you all know that the people who have invested most in actually
building this software care the least about this discussion.

~~~
blumkvist
I see Hadley Wickham commenting here, so yeah...

~~~
danso
And now the creator of pandas -- whom you just replied to -- is here. It's
officially now a party :)

------
daveorzach
In manufacturing Minitab and JMP are used for data analysis (histograms,
control charts, DOE analysis, etc.) They are much easier to use and provide
helpful tutorials on the actual analysis.

What features or workflow does R or Pandas/Numpy offer to manufacturing that
Minatab & JMP can't?

~~~
dbbolton
R, Numpy, and Pandas are all FOSS. Probably not much of a practical concern,
but it might be preferable in some cases.

I don't know anything about Minitab/JMP scripting myself, but my understanding
is that R is generally the most intuitive of all the aforementioned (although
that would basically boil down to individual preference).

Here's a review including Minitab and R that might be of interest:
[http://www.prostatservices.com/statistical-
consulting/articl...](http://www.prostatservices.com/statistical-
consulting/articles-of-interest/a-review-of-the-top-five-statistical-software-
systems)

------
andyjgarcia
The comparison is R to Python+pandas.

The equivalent comparison should be R+dplyr to Python+pandas.

Base R is quite verbose and convoluted compared to using dplyr. Likewise data
analysis in Python is painful compared to using pandas.

------
thebelal
The rvest implementation was the main thing that seemed like an R port of the
python implementation rather than best use of rvest.

An alternate (simpler) implementation of the rvest web scraping example is at
[https://gist.github.com/jimhester/01087e190618cc91a213](https://gist.github.com/jimhester/01087e190618cc91a213)

It would be even simpler but basketball-reference designs it's tables for
humans rather than for easy scraping.

~~~
baldfat
>seemed like an R port of the python implementation

End of the github for rvest:

Inspirations

    
    
        Python: Robobrowser, beautiful soup.

------
xixi77
Really, syntax "nba.head(1)" is not any more "object-oriented" than "head(nba,
1)" \-- it's just syntax, and the R statement is in fact an application of R's
object system (there are several of them).

IMO, R's system is actually more powerful and intuitive -- e.g. it is fairly
straightforward to write a generic function dosomething(x,y) that would
dispatch specific code depending on classes of both x and y.

~~~
illumen
Single-dispatch generic functions are easy in python too:
[https://www.python.org/dev/peps/pep-0443/](https://www.python.org/dev/peps/pep-0443/)

~~~
xixi77
That's good to know, thanks :) Although, for single dispatch, the S3 system of
R is kinda hard to beat -- you just name your function print.myclass and you
are done :)

------
dekhn
In general, if I have to chose between two languages, one of which was
designed specifically for statistics, and one that was more general, I will
chose the more general one.

R's value is in the implementation of its libraries but there is no technical
reason a really OCD person couldn't implement such high quality of libraries
in Python.

------
vineet7kumar
It would be nice to also have some notes about performance of both the
languages for each of the tasks compared. I believe pandas would be faster due
to its implementation in C. The last time I checked R was an interpreted
language with its interpreter written in R.

~~~
hadley
And like pandas, many of the performance bottlenecks in R have been re-written
in C. See dplyr and data.table for packages that solve a similar problem to
pandas with similar speed (and for some scenarios they're actually faster!)

~~~
vineet7kumar
Looks interesting! Thanks for the information.

------
jkyle
Caret is a great package for a lot of utility functions and tuning in R. For
example, the sampling example can be done using Caret's createDataPartition
which maintains the relative distributions of the target classes and is more
'terse'.

    
    
        > data(iris)
        > library(caret)
        > data(iris)
        > idx <- caret::createDataPartition(iris$Species, p = 0.7, list = F)
        > summary(iris$Species)
            setosa versicolor  virginica
                50         50         50
        > summary(iris[idx,]$Species)
            setosa versicolor  virginica
                35         35         35

------
hogu
IF you do your stuff in R, how do you move it into production? Or do you not
need to

~~~
Mikeb85
There are packages for that (web servers and such). Or you can call it from
Java/Python/whatever.

Most R tasks that people use exit. Typical data science task is: gather data,
apply an operation over said data, analyse results.

------
Myrmornis

      python < world > csv
      R < csv > analysis

------
k8tte
i tried help my wife who use R in school, only to get quickly lost. also
attended ~1 hour R course on university.

to me, R was a waste of time and I really dont understand why its so popular
in academia. if you already have some programming knowledge, go with Python +
Scipy instead

EDIT: R is even more useless without r studio,
[http://www.rstudio.com/](http://www.rstudio.com/). and NO, dont go build a
website in R!

~~~
untothebreach
Maybe you didn't mean it this way, but to me your comment reads as, basically,
"I tried R for an hour and didn't immediately grok it, therefore it is a waste
of time."

That may not be what you meant, so I haven't downvoted yet, but it doesn't
seem to be an attitude that is helpful for the conversation.

~~~
k8tte
Thanks for your explanation. It seems my ability to communicate is getting
worse every year :-/.

What I meant to say was that I helped my wife during her master thesis (~6
months) with R, in addition to spending an hour in one of the classes.

Her teachers also were novices of both R and Excel, and we had several issues
with everything from how R processes csv:s, to just figuring out the proper
syntax to have R do what we wanted.

Sorry if my comment wasnt helpful, i was merely attempting to add some
reflections from personal experience to the discussion.

