There's a large, active community of engineers who specialize in data, whose job is to technologically enable data scientists the means to perform their analyses. I know these people exist because I'm one of them, and I work with them, and I've met them at meetups and conferences. I don't know why the author doesn't think these types of engineers exist. Not all of us who code want to work with the web.
> If you read the recruiting propaganda of data science and algorithm development departments in the valley, you might be convinced that the relationship between data scientists and engineers is highly collaborative, organic, and creative. Just like peas and carrots.
Almost every data team I've worked with is structured this way. I work daily with data scientists. I have a data scientist sitting to my right, two data scientists sitting across from me. Our teams are highly integrated and I can't imagine it working any other way. If the teams the author is familiar with don't operate in this manner, then I can see why he'd think the endeavor is hopeless.
I also disagree with the author's conclusion. The data scientist's job is to analyze and interpret data. They should not be spending any time thinking about how to get that data. They should not be concerned about where the data is coming from. The more time scientists have to spend thinking about ETL, the less time they have to do what their training is in, statistical analysis.
The data we use comes from relational databases and document stores operated by different departments, external APIs and third party services, SalesForce, server log files, etc. A stats PhD does not have the training to gather this data themselves.
In terms of a hybrid scientist/engineer role, I don't know many software engineers who are also good at stochastic calculus or ensemble learning. Likewise, I don't know many data scientists who are also comfortable writing cronjobs to retrieve external API data or have the ability to diagnose server problems.
"In November 1997, C.F. Jeff Wu gave the inaugural lecture entitled "Statistics = Data Science?" for his appointment to the H. C. Carver Professorship at the University of Michigan. In this lecture, he characterized statistical work as a trilogy of data collection, data modeling and analysis, and decision making. In his conclusion, he initiated the modern, non-computer science, usage of the term "data science" and advocated that statistics be renamed data science and statisticians data scientists."
From the same article, a quote from Nate Silver:
"I think data-scientist is a sexed up term for a statistician....Statistics is a branch of science. Data scientist is slightly redundant in some way and people shouldn’t berate the term statistician."
If your skillset differs from a statistician, then calling yourself a data scientist is not going to be a differentiating title in common parlance.