My experience doing data science at small companies that can't afford to hire more than 1 person for the role is that it is so much more than just building models or doing statistics.
You have to:
1. Build APIs and work with developers to get predictive models integrated into the rest of the software stack
2. Know how to add logging, auditing, monitoring, containerizing, web scrapers, cleaning data(!!), SQL scripts, dashboards, BI tools, etc.
3. Do some basic descriptive stats, some basic inferential stats, some predictive modeling, work on time-series data, sometimes apply survival analysis, etc. (Python/R/Excel who cares)
4. Setting up data pipelines and CI/CD to automate all this crap
5. Trying to unpack vague high level requirements along the lines of "Hey do you think we could use our data to build an 'AI' to do this instead of manually doing it" and then coming up with a combination of software / statistical models that perform as least as good or better than humans at the task.
6. Work with non-technical business users and be able to translate this back to technical requirements.
Hey, if all you do all day is "build models" then that sounds like a very cushy DS job you have. It's definitely not been my experience. I would describe it more like a combination of software engineering and statistics and business analyst. That's why it pays higher than just statistics. But this is just my experience..
The data science world may reject me and my lack of both experience and a credential above a bachelors degree
More likely the data science world will reject him because he is so confident a field he has so little experience or knowledge of.
Data scientist is a profession rather than the name of an academic field. So data scientists' job is to solve practical problems. That involves a lot more than class assignments, and in some cases involves using machine learning to maximize predictive accuracy (because common ML models like gradient boosting capture interactions and non-linearities in a richer way than the GLM models the author is familiar with).
Their argument "that's a garbage model because we can't reasonably interpret underlying parameters," is replacing their personal criteria above what is needed to solve some problems.
They can blame it on only having bachelor's degree. But the real problem is the belief that a bachelor's degree taught them everything there is to know, and those in the DS field are ~ idiots who got lucky enough to be paid more.
You're basically doing 3 jobs for the price of one: Software Engineer, Data Engineer, Data Scientist.
Sure, you'll be a jack of all trades, as far as data goes, but it'll be at the cost of some specialization.
I'm probably gonna get a lot of sh!t for this post - probably from data [x] people that are in that exact position themselves, but the above description is exactly why I'd aim for larger companies with somewhat established analytics / data / ML teams or offices. You get to focus on the important stuff, instead of juggling ten balls at the same time.
(And it's not only in the field of data science. Some of the traditional SE positions I see at startups or small companies look absolutely grotesque - basically the whole IT and Dev. department baked into one job)
You're basically being arbitrarily restricted to learning and enjoying exactly one thing when it would often make more sense in context to become involved in: Customer relations, systems administration, management, software engineering, data science, etc.
Sure, you'll become really good at that one thing, but it'll be at the cost of personal growth and job satisfaction.
> I'm probably gonna get a lot of sh!t for this post...
I mean, yeah. You've basically lampooned anybody who enjoys working in ill-defined cross-disciplinary circumstances as having "[not an] ounce of self-worth".
It sounds like, from your perspective, your field is "the important stuff" and other fields are just balls to be juggled. There's nothing wrong with that, but lots of people don't think that way. To some people, the important stuff is anything that makes their customers happy. To others, it's anything that helps them learn.
And let's dispense with the notion that "doing 3 jobs for the price of one" is an accurate description of having broad rather than narrow responsibilities. One comes at the cost of the other. If you're an equally capable specialist and generalist, and you're capable of genuinely performing those 3 jobs at once, then if you were to specialize you'd be performing the work of 3 average specialists, and you'd be in the same boat as before.
Do what you're best at and try to get the best possible compensation for it, monetary or experiential. It's as simple as that.
You're kind of saying that no one should work for a startup ever. At small companies you have to do many things. As you said a software engineer, there aren't dedicated front-end engineers or dev-ops. Marketing teams don't have content vs. growth vs. performance vs. brand vs. email marketers. You might be the first sales person, which means no sales ops support, no account manager for ongoing relationships, etc.
There are trade-offs, of course! There are trade-offs to anything. Some people value working on many aspects of a company. Some people find understanding more than their narrow field to be interesting and rewarding and, you know, self-worth-y!
So I think it might be helpful to step back and consider that not everyone has the same priorities, experiences, interests, or definitions of happiness and self-worth as you do. And that's okay.
So on one hand, you can't build any models without the work of the engineers, but on the other the model building is "the important stuff"?
Maybe it's just me, but I enjoy working on all aspects of the data pipeline.
specializing in physics as a whole is way to broad. but is specializing in front end software development to broad?
Once you have learned the fundementals of computer science and its assosicated fields (networking and systems engineering mainly), the difference between doing back/front end works is not that large.
The problem with computer science is that it may take a lot of time to understand something thoroughly, but it only takes a while to find a tutorial, copy some example code from web, and build a simple version that only contains a few bugs and will be a nightmare to maintain and scale, but hey, it mostly works, as long as you use it in the predicted way and don't put in too much data and don't type or click too fast.
Yes, software developers who don't have deep knowledge in any field, and who do mediocre work, are quite interchangeable. No need to invent specialized job positions for them. If your company develops 10 products, all the wheels will be reinvented 10 times and most of them won't rotate properly, but that's life. The advantage is that replaceable people are cheaper and more obedient.
The general breakdown I give people is:
* Get data.
* Clean data (~60% - 70% of time required).
* Low level data analysis.
* Building models.
It's mostly the "knowing data" and the modelling.
* Data storage
* Data processing
It's about getting the Data Scientist's output into production / making data easily available to them.
This is especially true for big ETL jobs. The more we can automate your ETL jobs, the happier you'll be!
What makes for a nice paper doesn't necessarily make for a model that will survive contact with new user generated data.
Some math PhD was in charge of that modelling. Without any domain knowledge concerning the data (logistics, consumption, maintenance) and thus unable to properly interpret the raw data to begin with. Most of the time was spent on writing some Python scripts to analyze the raw data, still full of errors. And build predictive models on top of that mess. Kind of formed my view of data science, unfairly so.