But I stick to teaching Python because my priority is that students become general-purpose programmer/hackers, and data science is a very small part of what they might use programming for in their careers. Python's pandas is definitely not as elegant as R's tidyverse, where the conventions and data structures are more baked into the language. But numpy+pandas is manageable enough for novices (and just fine for experienced programmers). It's a small tradeoff for all the things you can do in Python, whether it is build and deploy a web app with Flask/Django/etc, or integrate with AWS via boto, or more specific things like manipulate video with moviepy.
That said, obviously not every stats-focused person wants to or needs to become an all-purpose engineer. I think R has basically become the lingua franca for folks in poli sci and similar departments who want to do statistical analysis, and it seems to be a huge step up from doing things in SPSS or Excel.
Here is my opinion of the Python vs R "debate" in a nutshell. Programmers prefer Python because data science is just another hack for them to accomplish as they make their applications. Statisticians/Data Scientists pefer R because it is a standalone mathematical suite. I believe what drives a person to use R or Python is what kind of tool they want to use in their toolbox to accomplish their respective task at hand.
If you watch you will notice I do a really sneaky thing where it's possible that I'm comparing the two in order to show the folly of saying that it's R vs Python... :) If you have any questions, I'll do my best to be around! Cheers.
As a side note on julia. The only reasons I haven't started using it on the daily - and I've tried it and liked much of what I've seen - is that the module system doesn't let you build modular production code and apps like Python does and R is just so good for what I use it for there's no reason to look for a replacement.
Can you elaborate on this?
In my opinion this is where the difference lies: with python you can easily build bigger systems with more flexibility, where your data science code is an important piece. R shines in statistics and you can also produce an end-to-end system, but given that you fit well the R ecosystem and its constraints.
It is relatively easy if your app fit the whole loop. I would recommend with some Shiny tutorial to get you started: http://shiny.rstudio.com/tutorial/
But if you need e.g. some stream processing or more complex guis, then R might be not enough I guess.
I mostly use MATLAB (ugh).
Dollars to donuts it won't go smoothly. Let me know how long it takes you to successfully install it.
"THEY do a lot of things wrong in the United States, but they do a lot of things right, too.
The NFL draft is at, or at least near, the top of the good list."
The problem it creates is that someone, somewhere, will eventually present some projection they put together quickly in R, and that will set expectations on stakeholders that know nothing about software to start demanding that solution to scale indefinitely, compute in realtime, or be used inside any kind of production system really.
It doesn't mean the same can't happen w/ Python, but it at least offers some migration path to more scalable / hardened solutions if you're careful.
Never said that.
What I said is that R, nowadays, creates the same kinds of issues / attrition Excel created on companies back in the 90's - you end up w/ someone, in some corner of the company, creating solutions that can't scale on top of it. In comparison, if the original work is in Python, you usually have more alternatives when this prototype lands on the hands of a software eng. That is the main, and only difference that matters, IMO.
As a tool for research, I agree it's completely fine.
Also, I too feel the pain of some algos, sadly, being available only on R (had to write my own wrappers to estimate some models on top of R script and export the coefficients to be used in Python's sklearn).
My comment from the thread: At the end of the day, for machine learning applications, your data is in a tabular format. (in Python, a pandas data frame) Yes, Python has a few tricks like list comprehensions for speeding up data processing into that analyzable form. R has a few tricks for processing tabular data as well. (e.g. dplyr).
There are tradeoffs and the skill is finding which works best. Using a single programming language is a bad philosophy even for non-statistical developers.
R's advantage is singular and simple - its not that the language is better,.. but rather that it has existed in its niche for much longer, so it has a much larger set if libraries that exist in the stats space.
in other words, CRAN.
Spoiler R and Python(Pandas) are both great tools
R = Domain Language
Python = General Purpose
Some things in R are worth it for many people.
Just because it's a company's blog doesn't make the info bad, my experience with that particular blog makes me hesitant.
I think the consensus is to let the data scientists use whatever tools they feel most productive with whatever that tool is.