I also think it gets something right - it does feel different to use Python and R, and the reason may be rooted in how these languages arrived at data science. Python, as the article points out repeatedly, is a general programming language that scientists liked using for numerical computing, so slowly it acquired a billion libraries for data analysis, stats, machine learning, and so forth. R was created by scientists and statisticians specifically to do stats and analysis, but in order to be useful, it needed to acquire the full capabilities of a programming language.
What feels most natural to you often depends on what direction you come from. If you're a programmer taking on data science, you may gravitate toward python. If you're a statistician getting deeper into code R may feel more natural. It would behoove you to learn both.
What on earth is wrong with that? I suppose that for a few people on HN, this might be a bit repetitive. But otherwise, I'd recommend this article as a relatively non-combative bit on explaining the different languages, especially for someone getting started.
These tools are grafted onto R - but seem to have a completely different design philosophy. I actually don't know why they're in R and not Python or C++ or whatever other language - but they form a set that is very easy to work with and produce results really quickly (especially in combination with RStudio).
So the design principles behind R (or I guess the S language) kinda becomes irrelevant.
Guido has explicitly stated that he does not want Python to be "more lispy" e.g. in regards to lambdas (asterisk). Thus I've seen many people even at, say, Stanford, Harvard, and Cambridge going back to R from Python. Sometimes there does not exist a language that best suits a workflow, and a DSL works better. That is where lispy languages hold an advantage.
Use the right tool for the the job, imho, but I fucking hate people that mix the two within a project intended for wide public release. Worst of both worlds, again imho.
(Asterisk) apparently functional data structures such as iterators and generators are OK though. Wtf guido
R is good for tabular data and Python is good for text/image/nontabular data. And there's nothing wrong with knowing and using both languages.
Likewise, the world will not end if you use Python pandas for tabular manipulation or various bespoke R packages for nontabular manipulatition.
This isn't the battle that people should be fighting. It's not even a religious argument like web development stacks where a language can eek out better benchmarks. And as others note, this very article concludes that they both have their advantages.
No kidding. If you want to be a "data scientist" (whatever that means), it's good to have a scripting language (Perl, Python, Ruby), a math language (Matlab/Octave, R), and a fast language (probably C). You can torture any one of these to fill the others' roles, but usually it's easier to use the best tool for each job.
There's several times I've come across something in R or Matlab that I want to do in Python, and it's easier to port over code/processes if you have an awareness.
I'm a thousand times better at Python then R/Matlab but being familiar with them has helped me a lot.
Since this is a data scientist thread, n=1
On the other hand R and Python (not to forget Latex) are easy to connect so there is not need to choose only one of them. You can call R from Lisp also of course.
Everyone I know who's used both prefers Pandas to R for tabular data.
Any time someone blames their tools for their own inadequacies, show them this video of Kelly Slater surfing better on an overturned table than most of us can surf on a 7 foot three fin board: https://m.youtube.com/watch?v=XQ4owd3yQ_4. Up your game instead!
Edit: hacker news doesn't use that part of markdown
Some of our other FOSS luminaries have chosen a different interaction model. Sometimes that other approach to technical project leadership is touted as necessary. I'm sure Hadley and Wes suffer their fair share of fools, and they generally seem to do so with kindness.
The hell does this even mean?
I'm guessing it means clean your data first; then process your cleaned data; and if you're doing that, why not try another language.
I think the author has a forest and trees issue here. He's definitely missing the point of Python for his use case.
Maybe R would work, but I'm familiar with Python. So Python stayed out of my way
I have a lot of experience with Python doing all sorts of programming. This was my first R program and I don't regret it. The libraries in R(ggplot2) for making pretty graphs are much better than anything I could find for Python.
In Python, you can use Altair  and in R you can use the vegalite package . Note also that R's ggvis uses vega under the hood.
For static images to put into a PDF, maybe it doesn't matter what you use, but when you see the ease of creating interactive data exploration graphs on a web server using d3.js and its ecosystem, you will be pleasantly surprised.
I really like this approach, actually. Taking advantage of the strengths of both languages.
But you get the point: this article has no point.
Use both because the real tool that you are using for data analysis is the computer, not the programming language. R and Python are both just parts of the toolset.
Nowadays there are lots of ways to combine the two from Rpy2 to Orange3 to Jupyter and the Beaker Notebook. Notably the last two let you use Groovy,Java,Scala and a host of other languages as well. Apache Taverna also plays in this space of integrating multiple tools with different strengths to do a job.
R will likely never be eclipsed by anything because it has such a broad and deep collection of statistical libraries. But Python won't go away because it is a great tool for general purpose computing and even hardcore stats heads have a lot of general purpose computing problems to deal with.
It is sad to read R code that copies files, gets data from S3 buckets, runs SQL queries, and so on. So much of it is crudely hacked together and even the libraries that support this are shoddily built. The best of both worlds is to use Python for pre and post processing, but R for the stats libraries (CRAN, BioConductor).
For lots of S3 wrangling the best tool is a Java library called Je
tS3t, and using a language like Groovy or Scala makes it easy to tame. And Groovy is integrated deeply into Jenkins which has evolved beyond a CI tool into a general purpose dashboard for managing and running "jobs". Works great for big data stuff that is not purely Map/Reduce.
Beaker Notebook is leading the charge by integrating seamless conversion of data frames between languages so that you can write a script in two or three languages at the same time, building on the strengths of each one.
If you stick with just one language then expect the next generation of data scientists to leap far beyond you in a few years. A sea change is coming.
Neither Python nor R run on the JVM, so if you end up using Java,Scala,Kotlin,etc then you've decided to open that JVM can of worms which is another huge pile of tradeoffs.
> Je tS3t, and using a language like Groovy or Scala makes it easy to tame. And Groovy is integrated deeply into Jenkins which has evolved beyond a CI tool into a general purpose dashboard
If you end up there, know that only a subset of Apache Groovy is used by Jenkins, e.g. Groovy collections methods aren't supported. Each step along the "native Python or R" -> "Java on JVM" -> "Scala or Groovy" -> "Jenkins as dashboard" decision process entails some cost-benefit tradeoffs which need to be assessed.
Only after they learn a boring language either R or Python.
Don't bet your money on Julia, it's only at 0.6 so the API ain't even stable yet.
Devs promise no changes to the language when it hits 1.0.
At the same time, they also have a lot of weaknesses, most of which are summarized by the Julia benchmarks (https://julialang.org/benchmarks/). You can criticize these particular benchmarks, but similar patterns emerge in lots of other benchmarks.
R was never meant to do the heavy lifting it's doing today. Ihaka sort of lamented this fact for a while, and then got ignored as people went on to use it anyway.
Sure, you can wrap things around low-level C/C++/Fortran in either language, but eventually if you find yourself getting into nitty-gritty stuff, the computation and/or memory use of R and Python becomes a problem. It also complicates a task to rely on juggling two platforms at the same time.
Julia is new, but it reminds me a lot of R in its early stages. I started using R when it was in beta because it offered something new, and Julia has a similar feel at the moment. Maybe Julia will die away but it doesn't seem that way to me at the moment. I've seen lots of prospects come and go, and none of them had the same traction as Julia.
I guess my point is if a student asked me, sure, I'd recommend they prioritize R or Python first, but I would also explain Julia to them and recommend they become familiar with that as well.
If you are too afraid to actually analyze a situation and give your opinion, then just don't write about it and spare us all the time it takes us to read it.
I have no clue where he's getting this.
R have 3-4 ways to make a class btw. The code base isn't cleaner. Most packages that need speed is coded in faster languages. So R is gluey for packages.
The code base is decent overall but I think Python is much better.
> Python does everything any language can do
I want it to preemptively stop processes like Erlang but it can't. So this is wrong.
> The Python world has been trying to catch up lately by working with existing IDEs like Eclipse or Visual Studio.
It have Rstudio equivalent, it's Rodeo.... this guy. He mentioned it later on for some reason which contradict his previous statement.
I think the article is unorganized brain dump. Maybe he just need to reorganize his thoughts.
> I have no clue where he's getting this.
The original S language from the Bell Labs (and the commercial version S-PLUS) used dynamic scoping (like Emacs lisp).
And Spyder is another alternative.