I've worked at productionising data science models for the past 4 years. I'm currently responsible for delivering technology platforms to ~180 data scientists.
I find the data scientist label misleading.
Roughly 70% of of the data scientists I've encountered are actually Excel analysts with little experience outside of a Windows desktop bar Facebook on a Mac. They're unable to use basic software engineering tools such as git, vscode and python. Excel users and their managers are hostile to solutions that aren't excel-like. They will fist-fight you if you restrict them from downloading and exploring data on their computer. Few understand their compliance/legal obligations.
Another 20% are familiar with a wide range of tools - such as Matlab, R, Jupyter notebooks and various ML/AI toolsets. As developers they're unaware of the tech stack, short of "I installed ananconda and it doesn't work" but are happy to work in the cloud and learn new tech. They understand PII requirements and memory/cpu limits but don't always demonstrate the latter in practice. Nonetheless they produce the bulk of your analysis, having studied classification, and reasonably cost efficient if you pair with a SWE.
The final 10% have mastered containers, venvs, wheels, cloud sdks and how to configure their software in environment independent way. They require help to achieve production quality but are great self-starters. Given enough time and support they're able to quickly replicate this effort and teach others. As relative superstars they're in high demand which makes capacity planning difficult. This pushes up their premium.
IMO the best data scientists are 1 in 10. Because we're desperate for quality almost anyone can assume the title meaning the market open to new comers - you just need to be skilled in Excel (harder than it seems - most developers can learn a lot observing an analyst/consultant use Excel).
To answer your question: No - you're not too late. Just by posting here I expect you'll be in the top 30% - an asset in demand.
This is a deeply misleading (though somewhat accurate) comment.
The reason it's misleading is because the 70% above (who may be called data scientists) are not actually data scientists, at best they are data analysts.
In general, the core difference between data scientists and data analysts is that the former can code in at least one language (SQL doesn't count, unfortunately).
However, because the term data science became so popular, everyone re-branded their analyst roles as data scientists leading to this concern.
Additionally, the post I'm replying to is pretty biased, as the OP talks about productionising models. While this is a major facet of DS work, it's not the whole thing. TBH, I can find people to productionise models a lot quicker than I can find people who can figure out what to model, and how to measure it.
Some of those people are most comfortable with Excel, and while I'd prefer they used a different tool, I can't argue with their output.
Also, the OP here is focused on deployment of Python ML models, which again is a subset of a very, very broad field.
That being said, i agree with most of the categorisations, except that the two critical attributes of good data scientists are a strong background in statistics and data common sense.
Data common sense is a weird attribute where when you look at the numbers and see if they are reasonable. For example, if you are running a mobile gaming company and see an ARPU of $5, something has either gone horribly wrong, or you're going to be a billionaire (assuming you have equity).
This attribute is actually not that common amongst DS people, so it tends to be the limiting factor, rather than ability with containers and deployment (which I do agree is very important).
> are not actually data scientists, at best they are data analysts.
Unfortunately the phrase's usage has been corrupted by HR departments, and the BA types of job listings now outnumber the "real data scientist" listings.
No, it was a different role. The original data scientists were people who could both run large scale social science experiments and write mao reduce jobs to analyse them.
Unfortunately, it was such a great name that everyone stole it, and they eventually had to call all of their analytics people data scientists (as otherwise they couldn't hire).
I remember being very angry when they changed all the product analytics people to be data scientists as many of them (the ones I knew, at least) we're strictly SQL monkeys.
I find the data scientist label misleading.
Roughly 70% of of the data scientists I've encountered are actually Excel analysts with little experience outside of a Windows desktop bar Facebook on a Mac. They're unable to use basic software engineering tools such as git, vscode and python. Excel users and their managers are hostile to solutions that aren't excel-like. They will fist-fight you if you restrict them from downloading and exploring data on their computer. Few understand their compliance/legal obligations.
Another 20% are familiar with a wide range of tools - such as Matlab, R, Jupyter notebooks and various ML/AI toolsets. As developers they're unaware of the tech stack, short of "I installed ananconda and it doesn't work" but are happy to work in the cloud and learn new tech. They understand PII requirements and memory/cpu limits but don't always demonstrate the latter in practice. Nonetheless they produce the bulk of your analysis, having studied classification, and reasonably cost efficient if you pair with a SWE.
The final 10% have mastered containers, venvs, wheels, cloud sdks and how to configure their software in environment independent way. They require help to achieve production quality but are great self-starters. Given enough time and support they're able to quickly replicate this effort and teach others. As relative superstars they're in high demand which makes capacity planning difficult. This pushes up their premium.
IMO the best data scientists are 1 in 10. Because we're desperate for quality almost anyone can assume the title meaning the market open to new comers - you just need to be skilled in Excel (harder than it seems - most developers can learn a lot observing an analyst/consultant use Excel).
To answer your question: No - you're not too late. Just by posting here I expect you'll be in the top 30% - an asset in demand.