Was not a great place to find data science jobs during my last job hunt, but that might just be personal experience.
Beyond that, wouldn't there be some bias - in that the jobs that don't get good candidates are going to have to be re-posted more often to maintain their visibility, leading to an over representation of undesirable roles in the data?
My response is always use-the-most-appropriate-tool-for-the-job-dammit and don't pigeon-hole yourself into one language, since each language has their pros and cons. I am very tempted to write an HN autocomment bot at this point. (in Python instead of R, of course, since that's the most appropriate tool for this job)
The other advantage of Python is that as a scripting language it's very powerful for data wrangling and pre-processing, without needing all the boilerplate that e.g. C++ would require.
I built a recommendation system in Spark earlier this year that used terabytes of input and would run it on a 40 node EMR cluster so it took less than half an hour. It wasn't trivial to make it run in a clustered environment, but it wasn't very hard either.
You generally want an interactive language, though, because there is an iterative cycle in prototyping models.
R and Python are probably the two with the most support/community materials around them - lots of tutorials, libraries, guides etc.
I am preferring python to "R" because it gives me better search results.
> only choices
Obviously not, according to the linked article. However, many people like Python and/or R. Perhaps you should find out why before dismissing their choices.
there will always be someone saying your statement as well.
Wow, and to think some fools think the most inappropriate tool should be used.
I don't really like the idea of looking at search correlations to infer popularity in a given field. People who use R might have a higher level of education, resulting in search results that are are narrower and more focused than Python users, or simply be more likely to call it "AI" or "statistical learning" or something of that nature. Or it may be that people learn a language or tool because it is useful in a field, whereas people who use a more popular tool might tangentially search for a given combination, even though they're not really in that field or doing any real kind of ML work.
Although KDNuggets survey is self-selected, which is inherently an unsound method, but it's not like the google search results are really a random sample, either.
There is also a tendency that jobs that require deep understanding of some area (say video encoding, cryptography, statistics) not are announced, but instead the programming languages and the frameworks used in the surrounding system is announced as essential for the job. This means that if your core competency is in such areas, it can be hard to find the employers that need such skills, even if the skill is in demand.
That being said, for adhoc work, with Scala you get the best of both worlds.
Python is certainly approachable from early programmers, and R from mathemticians/business folks. Sadly there are more early access libs here, but all the popular stuff is aviailable in the languages above.
C++ is really efficient, but it's a bit unforgiving.
I believe the majority of data science jobs today are involved on doing only the first (to gather pontual insights) and dropping the ball on the second since it involves a lot more software engineering, and those jobs are currently being fulfilled by those without this skill.
I foresee this being a source of frustration in the next years for companies that fell for the data science hype, once they figure out it takes significant investment and commitment to build intelligence into their systems, or even curate high-quality data to do it right in the first place.
For machine learning it was: Python, Java, R, C++, C. I have to wonder if that difference in ordering is actually real.
You wouldn't use sklearn as a C++ developer, for example. But you could totally use TensorFlow.
Since the hard part about ML is more about manipulating the data into something manageable, this makes python well suited for the task.
If Tensorflow for .NET had come out sooner, I might have jumped on that as C# is a nice balance of performance vs ease of manipulating data, but now all my code snippets are python so I'd need a large probject before the benefits of changing over push me that way...
The short story is that rust is awesome for handling data - especially unclean data. But Python, or really anything with a REPL, is still ideal for the exploratory phase using ipython notebook and whatnot.
Rust would be suitable for building machine learning algorithms but it's still immature in that area. For now you could perform the data processing in rust and the ML in Python.
From my understanding, most ML work isn't really done at the algorithm level but at the data level, as in manipulating data and experimenting with a variety of existing implementations. Since we have so much experience already available in optimizing C and there's not that much actual low level code to write, it makes sense to stick to C.
IMHO, ML people are pretty pragmatic, possible due to the fact ML itself a mixed paradigm with people from different background, so they don't hold religious belief towards programming languages comparing to some pure CS background folks.
Rust is entirely practical and well suited for the task. And yes, people use rust.
What I am objecting here, are those annoying cool kids try to sell their flashy green solutions, like 'A new machine learning library written in Rust' to other people, because 'yeah, Rust is better programming language than C++'.
That is what I called 'religion belief', for those people who don't really care what problems they solved, they just naively assume they are better because of the language they are using is different.
I don't think people are saying that their solutions are better just because they're in rust. Maybe somewhere someone is, but the majority of what I see doesn't really sound like that.
The Rust ecosystem is still very immature.
Give it a few years.
edited for typo.
Legit 3 lines
import rpy2.robjects as robjects
r_source = robjects.r['source']
print ‘r script finished running’
Before clicking on the link, I thought it would be Python, R, or Octave.
If the author or an editor of the article reads this: It would help a lot if you could move the graph legends to the left.
For the person who downvoted me, I wasn't saying R is better than Python. I was just saying that if you have just R, you're probably not doing data wrangling..
What this post is completely failing to capture is the exceedingly high value work in machine learning versus the typical work any skilled undergrad can do.
I also dont understand what you mean by "exceedingly high value work". For both production as well as research (ICML/NIPS/CVPR) the languages used are in most cases Python/C++/Lua.
Also Stats PhD (who I believe are the sole users of R) aren't typically hired into Machine Learning roles.
R is okay for wrangling tabular data, applying statical model for hypothesis testing and generating pretty charts for papers. But not suitable for state of art Audio, NLP, Vision or Reinforcement Learning.
FWIW, I used R for my undergrad, and still use R for personal projects afterwards. (with modern R, dplyr/ggplot2 are an order of magnitude easier to use, in my opinion, than the Python equivalents)
This is very different than trying to recognize street signs.
I've been using it for several years to build pipelines