But yeah, you're right, nobody wants to do reporting because it's tedious and doesn't pay particularly well. And really, most suits are just looking for data that fits the story they want to tell, so the quality of the analysis that many "data scientists" do is irrelevant. This isn't the case at every company, but I've seen it happen enough to know it's not a rare occurrence.
I started in industry as "an applied or pragmatic statistician" that is someone trained in social science research with a strong quantitative methodology bias. As I went along I added focus group moderation, in-depth interviewing, competitive analysis, ROI analysis and strategy consulting... so I stared calling what I do "Research-based Consulting."
But that label doesn't seem to quite capture building taxonomies and text indexing systems or doing latent semantic analysis. Nor does that "Research-based Consulting" capture teaching myself web development in order to create data-focused web applications. And, what about all the database work that I do in operational systems? Or, how do I fit in things like managing and validating data collection and aggregation systems that track prices for ~10K sku's across multiple retail websites, combine them in a weighted algorithm that reflects my client's business priorities and drives thousands of automated transactions every day?
So even though I came from a background with a lot of grad level statistical training and even at one point somewhat identified as a statistician it feels like current definitions of "data scientist" captures more of what I actually do. So I have come to be at peace with the term.
I totally agree with the points in the article about a mult-disciplinary team. I would love to recruit people who are better than me at each sub-discipline and figure out how to help them work together.
Data science isn't a perfect label but it seems to be currently defined in a way that fits pretty well with the mix of things that I do - so I am willing to use it.
A data scientist is a statistician who lives in San Francisco.
Part of the problem is that data science doesn't have nearly the same formalism in its definition that statistics does. What's the difference between BI's, Data Miners, Data Analysts, Data Scientists, etc? The tools used to arrive at conclusions (R vs. Python vs. SAS vs. Tableau/Excel/SPSS) doesn't seem like a good way of differentiating the roles.
A more useful discriminator would be the application of statistics (BI vs. Biostatistician, for instance), the depth and complexity of the statistical algorithms used, and whether the main use is stat inference or prediction (machine learning doesn't seem to focus on inference a whole lot, for example).
I agree wholeheartedly. It's not just data science, but all science that requires good statistics.
EDIT: IANADS, but data science seems tightly intertwined with statistics, to such a degree that I've had to double-check the difference multiple times. (Seems to be mostly terminology-based, tbh.)
It's worth reading the president of the American Statistical Association's take on the stats - data science divide as well.
And then toward the middle of it, it basically said data science is a marriage between stat and comp sci.
Congrats I have comp sci as an undergraduate.
I think it's weak at best. Data science is a jack of all trade and master of none.
I disagree that it's just cs and stats, I know it might be pedantic but it also ML which involve math that is a bit more than stats. Math it's either right or wrong. Stats you can kinda bend that in such that it's close enough.
Some may say what's the difference. This article doesn't address the difference but from what I've gathered Neural Network will tell you or categorize stuff likewise with KNN, but you won't gain insight into the WHY it is categorized that way and this is where statistic can tell you why. From lurking in the subreddit /r/statistic, Bayesian will tell you why but NN will not.
You still need statistic. It's just that this is a new field and many people don't have the depth to grasp what's important.
It's like hyping up a nosql database and promising many things and get people to adopt it. Eventually they'll realized that it's just broken promises and they're stuck with it. In this case, the industry can just get smarter and have better idea of what they really need.
Because they could not translate their sense of entitlement to actual results.
> pure statisticians often scoff at the hype surrounding the rise of data scientists in the industry
> some statisticians simply have no interest in carrying out scientific methods for business-oriented data science
Statisticians are often too careful. They let tests decide if they should continue on a certain path. Machine learning researchers run blindfolded and trust cross-validation. The latter, though reckless, gets more impressive results.
You can perfectly be a data scientist coming from a statistics or physics background. Adapt to it and use your knowledge to your advantage. You can't keep calling yourself a statistician and own data science at the same time. Start automating yourselves, like the rest of us are.
It says "72 hours of new video uploaded to YouTube every day".
Actually, "300 hours of video are uploaded to YouTube every minute".
The source of the infographic got the "minute" part right. (And probably the matter of 72-->300 is growth since the original infographic was produced... Web citations are hard!)
That's simply an incredible amount of data. The storage for all that must be huge.