Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Is the industry saturated with data scientists already?
49 points by legobridge on Dec 22, 2022 | hide | past | favorite | 36 comments
I'm still pretty early in my career, which I started as a software developer. I transitioned into a data scientist role about a year ago, then moved to the USA for a Master's degree in AI/DS.

Here I'm seeing a trend of software developers being paid better than data scientists in general, and I was wondering if I've made a mistake transitioning away from software development. The number of opportunities also seem to be dwindling (or maybe I'm not looking well enough, please feel free to correct me).

My question is this: Did all the talk of data science being the "sexiest" job cause the market to become saturated, or is it still a viable career path?




I think it's less a case of saturation and more a case of companies realizing that most data scientists don't actually deliver value commensurate to the salaries they were asking. Most companies simply don't have enough data, or don't have hygienic enough data, or don't have the engineering heft to build a reliable data pipeline, so the data scientists often find themselves set up to fail. I've seen that happen a few times.

But, it's not like the fundamentals of the field are wrong. Predictive modeling is still really useful. It's just larger firms are the only ones capable of realizing that value.


In my experience it is common to meet managers who think they have far more data than is really the case. What data they do have tends to be inconsistent due to changes in processes and program changes. This is particularly gruesome when a migration between ERP systems had been undertaken or after a company merger or even reorganization.

It is impossible for a data scientist to reliably identify the data problems of a prospective employer during the interview stage. The biggest companies often have the messiest data. But you would need to talk to the IT staff in the trenches to discover that fact.


> In my experience it is common to meet managers who think they have far more data than is really the case. What data they do have tends to be inconsistent…

Can also back this up 100%.

Too many times I’ve had “we have some x data, we need you to do y”. Turns out, the data they “have” is a 15 line excel file of unknown provenance and replicability; nobody in the dev/marketing/finance/etc team has the time, interest, or project alignment to care what you want, and what they’re trying to do needs a team of phd’s anyways, but all of this falls on deaf ears.


Well, where I work at, I was hired as a data scientist and end up doing everything related to data lol, engineering, analytics and what not. It doesn't matter as long as they pay me to do that.


Same here!

I ended up shifting more towards the dev side - devops, software engineering, etc.


The problem with the Data Scientist market is that Data Scientist is a fuzzy definition. It's pretty clear what skillset makes a good software engineer, but nearly anyone can call themselves a Data Scientist, even if they just clicking around in Excel. This has resulted in these bootcamps and certifications that "make you a Data Scientist in 14 days" - now everyone with minimal qualifications can apply for Data Scientist roles. That's why the market appears so crowded, and it is. People thought it's an easy and quick way to a high paying job, resulting in a flood of low-quality applicants for these positions.

There is still a lot of room for people who have strong engineering AND data engineering/ML/math/statistics skills. But then don't call yourself Data Scientist because that puts you into the same low-barrier camp as all the others. From my own experience it's a clear resume red flag: Almost anyone that market themselves as primarily a "Data Scientist" has little technical skills.


> But then don't call yourself Data Scientist because that puts you into the same low-barrier camp as all the others. From my own experience it's a clear resume red flag: Almost anyone that market themselves as primarily a "Data Scientist" has little technical skills.

How do you imagine the specifics here? Let's say that you have the technical skills, and come across and meaningful, technically challenging job that you like and the job title is data scientist - do you not take the job because of the title? Or do you lie about what your title is in the future?


To be blunt, the market is saturated with people who call themselves “data scientists” but are actually just reasonably skilled software engineers with at best a college sophomore level understanding of math/stats.

On the other hand, the market is nowhere near saturation for people with both advanced software engineering and math/stats skills (i.e. PhD-level).


I don't think that is correct. I don't think the market actually needs phd level stats skills. Its just like the market doesn't need phd level pure math or doesn't need phd level CS. The number of situations where these specialized knowledge is useful in industry is vanishingly small. Having a phd is more about being even considered for an interview for such a position. The actual day to day knowledge used will consist of topics that even a person with a bachelor could require without putting in years.


They are saying that the thing you are talking about right now is saturated - but positions that authentically need that level of expertise are not.


Thanks for making my point more clearly than I did!

As an aside, regarding whether people actually do PhD-level math at these jobs, the answer is indeed often no. However, the jobs can still require PhD-level experience. This is because for the average person, there is a lag between being able to merely learn concepts at a given level versus being able to actually synthesize those concepts to solve novel problems. It is relatively easy to learn a subject and solve exercises in that subject that you know pertain to the concepts you just studied, as you would encounter in a course. It is much harder to be given a problem out of the blue and realize what concepts are required to solve it, as you would encounter in a scientific career.

As a concrete example, I once was explaining neural networks to a bright college freshman. I showed him the forward pass equation, then asked him how he would optimize the network weights given said equation. Even though he learned the chain rule in his courses, he didn’t think to apply it to derive the backpropagation step. By contrast, a talented junior or senior can easily figure this out.

In my experience, for the average person, the learning/synthesis gap is usually a few years. Hence, your average new PhD-level data scientist would be capable of synthesizing advanced undergraduate material towards solving novel problems in their job. And there are a hell of a lot of data science jobs that require that.


This is the truth. Model tinkering in Jupyter notebooks is heavily saturated and won’t get you very far now days.

If you know data engineering and a little devops as well as graduate level stats and linear algebra… the jobs are bountiful.


Just out of curiosity, is there a way for someone to show off those skills if they don't have a PhD?


I'm a data science hiring manager. Other things besides just the raw academic credential (a PhD which wouldn't typically be in "Data Science" anyway, i.e., even the PhD's are just Data-Science-adjacent) are publications, conference appearances/posters, generated data products/pipelines, and contributions to relevant software. (For me, in that approximate order.)

I'd work on building some of those, because you do need to stand out against the field @MonteCarloHall described. (Software engineer + undergrad math/stats)

Those kind of achievements would satisfy screening filters. Then of course you'd have to have knowledge to back that up. I think it's reasonable to say that typically this will be domain-specific, e.g. you would end up with a different background knowledge base for NLP than for spatiotemporal problems than for network/graph problem domains. With all the growth has come specialization.


> the market is nowhere near saturation

Need to show > publications, conference appearances/posters, generated data products/pipelines, and contributions to relevant software.

These appear to be in conflict. If companies need people with such skills, they can't just hope to get the elite few who present at conferences, they need to be hiring among the audience members as well. If they are in fact just hiring the speakers, then the jobs aren't really "bountiful"


> are publications AND/OR conference appearances/posters AND/OR generated data products/pipelines AND/OR contributions to relevant software. (For me, in that approximate order.)

^--- Taps sign

The above does not imply that I'd only consider conference speakers, right? (Although, there are a lot of conferences!) For some roles, contributions to relevant software would be the right standard.


I consolidated for brevity's sake, how could you not realize that. You are still talking about only considering the 1-3% (of conference speakers, article publishers, contributors to popular repos, pipeline generators) not the other 97-99% of audience members, article readers, users of those repos, pipeline consumers.

Is that clearer? Same exact point. Tap tap.


You don’t need any complicated mathematics to plot analytic charts or similar.


Call this the 'Electrician Cycle'.

A job is initially incredibly sexy, risky, and newfangled, requiring knowledge that is not widespread. Like being an electrician in 1922.

There are no standards, and because everyone is therefore effectively trading on reputation, salaries are high, for the same reason that Heinz ketchup costs more than Kroger: the brand carries the value.

The job eventually becomes normalized. As part of normalization, the delta in quality between the highest and lowest earners becomes much smaller. If the industry becomes regulated, this gap narrows further. Consequently, salaries fall at the high end of the profession.

Eventually, being an electrician in 2022 is roughly as sexy as being a plumber in 2022, and both are approximately as sexy as being a plumber in 1922.

We've already seen this cycle consume web development and what used to be called system administration -- two positions which were HoT Sh_T in 1995, but are increasingly generic office jobs in 2022.

This cycle will eventually consume every technical field, a kind of sociological eutrophication, but the good news is, it starts fresh with each new gyre.

The bad news is, it happens faster with each gyre, because of the 'complexity ratchet'. You'd think the ever-increasing complexity of technical fields would slow down the cycle! But no -- the human capacity for knowing and valuing is fixed; so the complexity ratchet just means that the social-value cache gets flushed more often.

Data scientists are just plumbers from 2052


> Is the industry saturated with data scientists already? No :)

> Did all the talk of data science being the "sexiest" job cause the market to become saturated No :)

> is it still a viable career path? Yes

Source:

a) I Co-run the Data Science Weekly newsletter.

b) I was a mod of https://www.reddit.com/r/datascience/ from 15k to 30k members and people were asking that about 5 to 6 years ago. The sub now has ~829k members and that question still comes up.

> The number of opportunities also seem to be dwindling The reason for this is that initial it was "data science", then it was "data science and machine learning researcher", then it was "data science and data engineerings and machine learning researcher", then it was "ai, data scientists, machine learning researcher, machine learning engineering, data engineer, nlp", etc. So the jobs have multiplied but so have the position titles as well. So while you could just search for data scientist positions before you now have to get a bit more specific.


This is a non answer.


> a) I Co-run the Data Science Weekly newsletter.

Is the newsletter still running? I looked at the archive on the website and I didn't see anything more recent than April 2022.


https://us3.campaign-archive.com/home/?u=71a2b2a38789d4d25b7...

We are moving platforms (email & hosting and some personal stuff happened related to a flood, so website archives haven't been updated yet).

Thanks for checking it out!


Data science as a title has bifurcated several times. A lot of the engineering heavy DS roles are now data engineer or machine learning engineer. Probably those are the roles you should be looking for. Many of the DS jobs now are essentially analyst jobs, where SQL and stakeholder management is 75% of the role and writing code is just a bonus.

Data science isn't going away. Leadership is always going to need numbers explained to them, but DS roles have never been as numerous or as well paid level for level as software engineering.


Perhaps what is holding you back is phrasing it as "a Master's degree in AI/DS"!


Anecdotally, as a Principal Data Scientist, I'm getting a lot less recruiter spam than I did a year ago. Though I think that's probably something that's affecting everything in tech as the industry as a whole takes their foot off the gas.

At the entry level, I'm sure there's more competition for fewer spots.

I think the new generative AI models are absolute game-changers and will only get better. If I were starting out, I'd focus there.


<<I'm getting a lot less recruiter spam than I did a year ago.>>

For one thing, there's a lot less recruiters than there were a year ago!


It seems like you we’re just chasing by trends. Trends don’t matter that much for an individual. If you focus on being an expert in something that is valuable at all then you should valuable. Trying to mom-max your career based on Gartner or something like that isn’t generally optimal for individuals.


I'm not familiar with the current job market, but data scientists being paid worse than developers is surprising to me considering the StackOverflow 2021 survey (in the US):

Median Salary | Job Title

------------|----------------

$177,500 | Senior Executive (C-Suite, VP, etc.)

$165,000 | Engineering manager

$150,000 | Engineer, site reliability

$135,000 | DevOps specialist

$133,000 | Developer, back-end

$130,000 | Product manager

$129,250 | Engineer, data

$128,000 | Developer, game or graphics

$127,500 | Marketing or sales professional

$125,000 | Data scientist or machine learning specialist

$120,000 | Developer, desktop or enterprise applications

$120,000 | Developer, embedded applications or devices

$120,000 | Developer, full-stack

$120,000 | Developer, mobile

$120,000 | Scientist

https://insights.stackoverflow.com/survey/2021#work-salary


Data science is a fake field and a fake career. Used Excel once? Congrats you’re a data scientist.


Working as a data scientist for last 8 years and have never used excel. Only time I had to open an excel with data was when some company sent me their data in an excel for take home assignment. At least send me a csv! I never bothered to finish that task.


I don't know, and it's really impossible to tell since things are changing quickly. People work as "data scientists", but there are headwinds and tailwinds. the main headwind, is that companies are cutting budgets due to the recession and dropping analysis groups that are part of cost centers. It's also easy to get the basic skills (coding/stats) done in your undergrad or masters w/o having research experience. The tailwinds, are that ML capabilities are improving day by day, so the potential to use that to make money are increasing. There's also a huge digital transformation happening, and companies have more data than ever before and potential to leverage that into savings, additional revenue, or new services.

When I started on my data science path, about 10 years ago, and there was no training pipeline, so when I dropped out of a PhD a few years later it wasn't that hard to get a data science job with the intersection of skills: math/stats/coding/research. Today that role is probably filled by someone graduating from an undergraduate or grad program, but I know the same company is still hiring for improvements on the research project I helped start.

Good data science, for me, is when you "apply predictive models to end user problems and ship solutions in products", but when I looked around for other jobs I realized that so few companies are able to act cross functionally to exploit the value of ML in products and services. Sure, finance does it, ads does it too, but it seems like the jobs I had access to were some ill-thought out skunkworks that a VP or exec thought was a good idea, or doing work tucked away in some business unit. There are like 10 individual problems there for YC to solve, but the more fundamental issue is that as long as we are still in the hype phase of data science, there will be incentive for business leaders to spend money on it in wasteful ways (at least for your career).

If you want to do data science or ML, it'd encourage you to find tech first companies that are actually using ML to solve real world problems for people, and avoid working on projects that haven't shipped. Also, stay under engineering orgs. In business units, you'll have a boss that doesn't understand what you do, and you'll be promoted out of tech.

Ultimately, I left data science and am now on an infrastructure team at a database company, which is just a better fit for values. If you can get into big tech or any tech first company, the data science is mostly figured out, but in my experience lots of companies aren't offering constructive experience. Good luck.


I believe, unless the role is understood at the top and given some slack to do properly, data scientists are setup to fail.

The expectations are generally wildly unrealistic and the work may touch on many departments. It is a minefield politically, unless it is a very clear priority for the company.

If the value is understood at the top data science can provide immense value.

When the media mentions x job being sexy or a shortage of x workers there is usually an agenda. I would not take those assertions at face value.


It is more a case of companies hiring drivers and then realizing they need to pave the roads first in order for them to get to a destination in a meaningful amount of time. However, paving “data roads” seems to be quite a tough problem, hence the success of companies that promise to deliver that (such as Palantir or Snowflake)


a side, but a sincere question: what do Data Scientists do and what can I expect one to produce as productive output?

I've worked as an SDE on data engineering projects myself (Spark / Hadoop stuff) and have friends who are ML researches and develop things like better recommendation results. Never met a data scientist.


It's pretty nebulous, but the work output of a data scientist would be a predictive model, where those results are either useful for some business unit (forecasting), or capable of being shipped as part of a software system and product.

Using Uber as an example, a data scientist would figure out the algorithm/model for assigning the next driver when you request a ride, the model for the shortest route (maybe), and delivering a model to correct GPS works in cities with huge buildings so drivers know exactly where to pick people up when they stand next to skyscrapers.

It requires a ton of infrastructure to do good data science work, since you not only need to validate that the model works using the exact same data in testing/production, but you need to take code that a non-SWE runs, integrate it into the build, figure out the right operational metrics, then deploy as part of some release strategy.

The model is really the smallest part of that process, but occasionally, you can get a huge lift by having someone apply a lot of interesting math.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: