What? R has a ridiculously low learning curve. I remember literally the first time I used R I loaded up a dataset and had a histogram and qqplot within 5 minutes and 3 or 4 lines of code. Just figuring out what libraries I would need to do that in python (and installing them) would probably take me at least 30 minutes.
I think it's still highly debatable if Python is the way to go for general data science, especially if you're spending a lot of time analyzing data. R is more mature, but the tides are steadily moving in python's direction.
Also, as the comments below highlight, actual statistical analysis is but a small part of the data pipeline. Python has great facilities for interacting with data stores/sources in addition to being a powerful tool to clean and munge data.
>When I think about, in my data programming related work, I'd say about 5% is doing analysis or executing statistical routines. And 95% of my time is spent on finding, cleaning, and properly normalizing data.
I hope the post doesn't downplay the importance of R to statistical analysis, it is a mature language with a great community surrounding it. The toolset of a data scientist is probably one of the most heterogeneous out there and necessitates learning and using many different abstractions.
For such a new (and hard to define) subject, I think dialogue is crucial to constructively advance the field. I would love to hear suggestions from the HN community about how to train the next generation of data scientist, what aspiring data scientists want to learn (or find difficult to learn), and how we can build a great data community.
Data analysis is one of the most important steps in data science, so I think it's worth keeping R around.
When I need to write something more substantive and reusable though, Python forever.
When I think about, in my data programming related work, I'd say about 5% is doing analysis or executing statistical routines. And 95% of my time is spent on finding, cleaning, and properly normalizing data. This applies to whether you're a solo researcher or Facebook...think about it: Facebook is a pretty good website, but what it excels better at than just about anyone is being a platform to collect personal data in a way that...well, causes you to quite willingly give it your personal data.
There was a presentation where Peter Norvig pointed out a data routine in which someone had implemented with a naive Bayesian classifier with a comment saying that they'd think of something better...and years later, no one realized it was still a todo. Norvig said something like "You don't have to be very smart when you have a lot of data"
It's called data munging. Good short article on dataspora about it a while back:
1. Your boss says something vague
2. You think very hard on how to move the needle
3. Where’s the data?
4. What’s in this dataset?
5. What’s all the f#$#$ crap in the data?
6. Clean the data
7. Run some off-the-shelf data mining algorithm
9. Productionize, act on the insight
10. Rinse, repeat
Assuming they don't do job placement (didn't see anything about that) then this is a total rip off and is just some people trying to cash in on the data science fad.
Besides, they just listed a bunch of free resources that invalidates the need to go through them, so unless they offer job-placement in an actual data-science-like position, why waste your money on this?
Besides, 12 weeks reminds me of Peter Norvig's "Teach Yourself programming in 21 days"
While we have shown there are many online resources available for understanding data science, we’ve found that structured, in-person programs provide the best environment for a collaborative learning experience.
Scholarships are also offered for particularly promising applicants and for students in need of financial aid. Please reach out to us directly if you would like to know more about our financial assistance options: email@example.com
It is nice to present the wealth of resources that is available to anyone looking to its skill set in this field.
However as far as course intro goes, it doesn't get me very exited, partly on the way it is worded: Python as a language choice, using libraries that other have build. Would it not be better to teach the basic from basic, and then acknowledge there is a library that can handle that?
Another grip I would have here, as well, is the page formatting: the single column layer you have on that blog does not fit the length of the text you have. I am reading: http://www.moserware.com/2010/03/computing-your-skill.html at the moment and apart from the nice picture to look at (who doesn't like shiny), the layout is much cleaner and the information is more readable.
Finally, and as it has been mentioned in other post, you ought to have a small sum up of what 'data science' is (in relation of other used term for describing statistical analysis of dataset) and where it is coming from.
I hope they will realease course in online form (similar like coursera) and offer it for reasonable price (max 100usd per person).
There are plenty people outside of USA willing to take lessons on this kind of a course.
Little Data being:
That and basic statistics. The latter is probably the one developers (at least here) would spend most of their effort on.
"Little Data" in itself can take you a long way.
A really friendly place to start understanding it is Scott Murray's tutorial: http://alignedleft.com/tutorials/d3/
If it doesn't involve data, it's not science.
And frankly, even if data science is just statistics rebranded, anything that can get more people to take an interest in statistics is a good thing. If a hype-fueled new sticker is what it takes, then why not.
And I would argue that qualification that the data be "numerical" in this definition is wrong. Plenty of statistical analysis invokes categorical text data (e.g. sex, race, occupation...).
Quantitative business analyst just doesn't sound as wallet-opening.
Unfortunately data science is not well defined (in part because it is a young field), but it draws extensively from computer science and is targeted toward data sets that are different from those that have typically been used in 'traditional' statistics - huge, dirty data sets.
This debate has a similar flavor to Physicists saying that Chemistry is not a science because it's all just physics, and Chemists saying that Biology is not a science because it's all just chemistry. They are extensions, just like Data science, and pretending that an expert statistician will be and expert data scientist is just not tenable.
There is room for something more.