
Ask HN: What's the state of the job market in data science and machine learning? - TXV
	If one were to use Hacker News as their only source of information, it would seem that machine learning is a very overrated topic. There is something related to it on HN&#x27;s front page almost every day. This proliferation of courses, resources, books and startups would hint that machine learning is becoming more and more accessible to the average programmer and that the market is on track to getting saturated quickly. Is this the current trend? If yes, is it limited to the US? What about the machine learning scene in Europe? Maybe someone here could provide some perspective.
======
rm999
Speaking for NYC, but I imagine silicon valley is similar.

The supply-demand dynamics have changed a lot in the last couple years. I'd
roughly break it out into two groups: people with work experience + strong
software development skills, and those without. The first group is in higher
demand than ever, and tend to add a lot of value to companies that really need
it.

The second group has gotten extremely crowded, especially from STEM graduates
- usually with a masters or phd - who have completed MOOCs or bootcamps.
Supply keeps growing while demand is flat or shrinking (especially as
executives get burned by "data scientists" who don't know how to help them
build things of value). There's a huge crunch here; a lot of people I know in
this group have been searching for jobs for months, eventually settling for a
low quality job or giving up entirely :(

~~~
drew
The problem is DS is really 2-3 different disciplines under one nebulous
title. What you're describing is folks who are prototyping and productionizing
models. That's definitely in short supply, but random STEM PhDs are in no way
competitive for those roles unless they're coming from CS programs + have work
experience in production engineering.

But that's by no means all of the DS field. There are lots of DS jobs where
you're collecting and interpreting and communicating about complex data sets.
An engineering mindset is occasionally helpful, but a bias towards building
versus towards analyzing and writing can just as often be counter-productive.
Not all problems are solved by systems; lots of problems are solved by better
understanding the problem and then letting other specialists build the right
solution.

The bootcamps have contributed to the problem by focusing so much on building
things. The idea that you can go from an econ undergrad to being a self-
sufficient member of a production ML team in 6-12 weeks is nuts. What's less
nuts (and what I wish programs like Insight focused on) is taking people from
having data skills in one domain and with one set of tools (e.g. logitudinal
medical record data, stored in CSVs and handled in Stata) to another set of
tools (billions of rows of event-based product data stored in a data
warehouse, processed in R or Python). But instead the bootcamps behave like
the missing skillset is the ability to make a predictive random forest model
on some arbitrary data set and build an AWS web app around it. THAT job market
definitely doesn't exist and is completely over-saturated.

But people who are smart communicators about data, can manipulate and make
sense of massive data sets, can ask incisive questions about their data, and
can use data to convince people of a complex argument are always going to have
job opportunities, even if they're not production grade engineers. If that
sounds like you, I'm hiring - hit me up on Twitter: @drewwww.

~~~
ma2rten
I was part of the Data Science team in my previous company. We mainly build
models for production, but we also were responsible for generating both daily
and ad hoc reports. We tried to hire someone to take over the reporting part,
but we found out even that requires engineering skills. This role ended up to
even more difficult to hire for because it's hard to find someone who has the
engineering skills but wants to work only on reporting. Maybe if we had a
dedicated data engineering team the story would be different.

~~~
jaxn
That is me. A decent developer who gets data and enjoys reporting,
particularly the variety of hard problems that crop up.

I am happily serving a niche market with my own company and I suspect part of
the difficulty in finding the skillset is that we can just start our own thing
when we find a domain that we like.

------
hardtke
I hire machine learning engineers and data scientists. In my opinion there is
a great shortage of truly qualified machine learning engineers. A lot of
people are entering the market with a general knowledge of machine learning
tools. These people should be considered analysts or product data scientists.
When it comes to people that can build machine learning systems that work at
scale, they are very rarely available for hire and often are the subject of
bidding wars by multiple companies. The key difference is whether the
candidate truly understands the mathematical and statistical basis of machine
learning, has the programming skills to execute their ideas, and is able to
write code that can be used in large scale production systems and can be
leveraged by others.

~~~
runT1ME
> The key difference is whether the candidate truly understands the
> mathematical and statistical basis of machine learning

Can you elaborate on this, and at what level? Are you talking about a PhD
level of understanding of cutting edge mathematics, or do you mean understand
the basics, or somewhere in between?

~~~
hardtke
Somewhere in between. A simple question I use is "how do I decide whether to
add a feature to a classification model?" Most candidate are fine until I
bring up the topic of correlated features.

~~~
DrNuke
Are you hiring for finance? In Europe it still depends very much on the
industry: some mean coding, some maths, some infrastructure, all want data
preparation and in-depth stats. edit: other than understanding the business
fundamentals, of course.

------
PLenz
I've been working in DS role for a few years now in NYC - and I definately
feel the role is more valued on the east coast over SV. SV has a focus on
consumer facing applications that are in many ways fancy CRUD. DS roles have
thier place but aren't the core of the business. East coast has a b2b /
infobroker focus where DS is the product. Media (especially adtech), finance,
government consulting are over on this coast.

I think you also need to not confuse the growing ease of machine learning
tools with the role becoming more accessible. There is a wide gap between
tooling and knowledge to use those tools appropriately and creatively.

And may I never write another HN comment on my cell phone again.

------
aub3bhat
Stack overflow salary calculator shows a significant 50% premium over
Developer salaries, all other things remaining the same. [1] Even though in my
opinion the tool is flawed and actually significantly underestimates
(stackoverflow underpays) salaries in SV/NYC. It is still a good indicator.

The major issue is that Data Scientist is a very fuzzy term with it being
applied to everyone from undergraduates with Stats degree and to those with
PhDs and papers at KDD/ICML/NIPS/CVPR.

However rather than doing a Frontend or Mobile developer coding bootcamp, a
data science bootcamp is likely to lead to more transferable skills in case
you wish to get an MBA etc.

[1]
[http://stackoverflow.com/company/salary/calculator?p=7&e=1&s...](http://stackoverflow.com/company/salary/calculator?p=7&e=1&s=2.5&l=1)

~~~
huac
From my experience this year recruiting coming out of undergrad, for the top
kids who do DS vs the top kids who do CS, the median comp is higher for DS but
the highest comp packages come for CS. I wouldn't be surprised if this holds
for more experienced people too.

------
caminante
Currently, Gartner analysts place ML at the "peak" of its Hype Cycle for
Emerging Tech [0] with a runway of 2-5 years for mainstream adoption.

[0]
[http://www.gartner.com/newsroom/id/3412017](http://www.gartner.com/newsroom/id/3412017)

~~~
dnautics
that must be qualified because there are a lot of ML applications that have
basically been adopted in the mainstream, like voice recognition (which is
pretty good, probably in the 80-90% accuracy range and better for specialized
contextual tasks). I remember driving for lyft and being able to input
destination addresses by voice (2 years ago; had a Moto X where this was
cutting edge) was a godsend and improved my customer service ratings.

~~~
Eridrus
> voice recognition ... probably in the 80-90% accuracy range

State of the art systems are far better than this. Microsoft recently
published a paper with a 5.9% word error rate for conversational speech.
Speech directed at computers/assistants is already in the high 90s, though I
don't have a figure off the top of my head.

~~~
dnautics
I gauged it off of my personal gut feeling, which includes a coefficient for
"well I don't have a strong enough internet connection so I get google's
spinner instead - and then it fails."

Occasionally, android gets the words right (as demonstrated by the onscreen
text) and then flubs passing the correct intent because of "loss of
connection", which is just about the most frustrating ML fail.

No doubt android's voice recognition is spectacular. I can prompt it in three
different relatively orthogonal-sounding languages (English, French,
Japanese), and it can figure out which language I'm using and usually get the
transcription correct. Notably, I can't activate the italian/japanese pair and
get useful results - which makes sense if you know both languages.

Google voice is horrible, however, at transcribing voicemail.

------
vtange
The startup I work at really favors their data scientists, though I am not one
of them (I'm a frontend guy). The CEO and CTO pretty much keeps a personal eye
on those guys' work.

Right now however the theme I've heard from the higher ups has been
profitability, and this applies to all tech companies in general. Easy capital
is gone and now companies are in the spotlight for not making profits.

So at least from my company's perspective, it's not that data science is
saturated, it's that we're trying to not break the bank and hire too much.

------
stared
I have only anecdotal experience (I live in Warsaw, but do contracts mostly
for Poland, UK and US).

General data science is in need. I can get contracts easily, I know that
people looking for competent people need to wait; especially as it is a skill
much harder to pick than, say, front-end web dev (unless someone starts from a
highly quantitive background like physics, modelling in biology, etc). My
general impression are:

\- ML (especially practical one, like logistic regression and random forest)
is often integral parts of many data analyses (or at least a plus),

\- there are not as many jobs solely focused on ML; and if so, often they
require some specialistic expertise,

\- and even less only for deep learning (also, for DL there is relatively high
threshold for having skills at "hireable" level).

Some of my tips on how to learn data science:
[http://p.migdal.pl/2016/03/15/data-science-intro-for-math-
ph...](http://p.migdal.pl/2016/03/15/data-science-intro-for-math-phys-
background.html) (on purpose I put the emphasis on general data
exploration/analysis before machine learning).

~~~
fnbr
How do you find contracts? I'm interested in doing contract data science work,
but I don't know how to start finding interested potential clients.

~~~
stared
In the last ~1.5 years it is solely people contacting me. (But I give a lot of
talks, workshops, and the chain of recommendations is going.)

One day I want to write how I get started, but I am not sure which steps were
essential, which - irrelevant. And many things are not ones one can replicate.

~~~
fnbr
Ah, fair enough. I'd be interested in reading it, but I think that
(unfortunately) you're correct that it's mostly non-replicable, at least based
on my experience.

------
user5994461
Like about everything on HN... You're either in the Silicon Valley or it
doesn't apply to you.

In my opinion, you could start by defining what is a data science, a quant, or
a machine learning job. Because that's not clearly defined. It means different
jobs to a lot of people, jobs that are all hard to learn and absolutely NOT
interchangeable.

------
lowglow
We hire applied ML/AI specialists. For me it's not just an understanding of
mathematical concepts, but also being able to apply new ideas to new problems.

This depends quite a bit on critical thinking, a good fundamental ability to
analyze a problem and understand its parameters, then manage the logical
operations required to deliver the feature and solve the problem.

As for why I think it's on HN every day: I also like to think of an innovation
pipeline happening something like this:

    
    
         [---------explore------|----------exploit-----------]
      ,->developers -> engineers/scientists -> data scientists->--,
     /----------<----------------<--------------------<----------/
    

We're now in some sort of refinement cycle of innovation, where the current
medium has been saturated on some level and there is a lot of push to mine
value from the discoveries.

~~~
solomatov
It seems that you made it reverse of what you wanted. As far as I understand,
data scientists should start the exploration and developers should finish
exploitation.

~~~
lowglow
Interesting observation! Perhaps its a bidirectional graph.

I was thinking in terms of the development like this:

1\. Observation made 2\. Idea created 3\. Software/Hardware made 4\. Revenue
achieved 5\. Business parameters tuned using insights 6\. Maximum profit
achieved

------
wjn0
I'm an undergrad at a big university known for CS in Canada. The CS program
here has several possible 'focuses'; 4 of 9 are related to ML/AI directly
(computer vision, NLP, AI, scientific computing). 2 others require AI/ML/NN
courses.

The bias might stem from the fact that we have some huge names in AI doing
research here, but the data points seem clear (we say undergraduate education
is slow to catch on, right?): the topic as a whole isn't overrated.

However, there seems to be a lack of understanding by people working in tech
of the differences (in uses, theory, implementation) between ML, AI, NN, DL,
etc. This might stem from a lack of understanding of the foundations of these
topics (ex: statistics, vector calculus) or simply because we can abstract a
lot of this away (ex: TensorFlow).

~~~
curiousgal
>or simply because we can abstract a lot of this away (ex: TensorFlow).

That would work up to the point a better abstraction tool/framework comes
along. I'd never try to build a career on a single framework, because
frameworks come and go.

~~~
wjn0
Building a career around a framework is never a good idea. If you know your
shit, it shouldn't matter what framework you're using.

Theano and TF, for example, both make similar abstractions: graphs and
numerical functions on top of the same matrix library, even. I would suspect
someone could move between the two fairly easily. The problem is that a
programmer can use TF/Theano/etc.'s built-in gradient descent functions pulled
from a tutorial with their data subbed in _instead_ of learning the details of
backpropogation, end up with decent results, and claim to have a basic
understanding of ML - when really, they've managed to avoid it almost
completely.

~~~
kafkaesq
_If you know your shit, it shouldn 't matter what framework you're using._

And yet, what is the incessant drumbeat of most job ads, these days - even in
data science?

That's right: "N years in framework X"

~~~
wjn0
I don't follow job postings for ML, but I will say this - there is inherent
value in being familiar with certain frameworks when hiring for non-entry-
level positions. This is particularly true for ML, where it can be very useful
to know the optimizations that the libraries do/don't make behind the scenes,
etc.

Should most ML job postings require n years of experience in some framework?
Maybe not - but I can see why a company might see value in it.

------
TYPE_FASTER
In my limited experience, there's a difference between a data scientist who
can process data given data and a set of questions about it, and a data
scientist who can figure out what data you need, and the questions that need
to be answered.

I think making the transition from the first role to the second role comes
with experience, both with the toolsets, and thinking about the problem as a
whole.

~~~
thearn4
> data scientist who can figure out what data you need, and the questions that
> need to be answered.

Isn't that describing a statistician?

~~~
infinite8s
Isn't that how the joke goes? A data scientist is a statistician in
California...

------
simonhughes22
I am the Chief Data Scientist of Dice.com. If you are interested in working as
a junior Data Scientist, and are smart and hard working, please apply here:
[http://careeropportunities.dhigroupinc.com/](http://careeropportunities.dhigroupinc.com/).
The position is a telecommute role. We will absolutely consider people with no
data science experience, so long as they demonstrate an aptitude for data
science \ machine learning and can code.

~~~
simonhughes22
For those with difficulties accessing the link, I apologize. I have notified
HR, who maintain that aspect of the site. In the meantime, please email me
your resumes to simon.hughes@dice.com

~~~
simonhughes22
*

1 point by simonhughes22 0 minutes ago | edit | delete [-]

This links directly to the dice.com version of the posting (same position).
[http://www.dice.com/jobs/detail/-/Diceinc/790523?rno=7814317...](http://www.dice.com/jobs/detail/-/Diceinc/790523?rno=7814317..).
The application process via that link is smoother, or so I am told. Please
note that while this position is remote, we are only looking for US citizens
or Visa holders located within the US right now.

~~~
milia
thanks

------
platz
I considered a graduate program in data science, but compared to average
programmer salaries, it doesn't seem like data science pays all that much
(excluding data science jobs for PHD's in silicon valley). It's more
interesting that programming, but seems like a _much_ tighter market with no
discernible demand driving salaries up.

~~~
freyr
Interesting. I have a statistics/data science background, but personally I
find programming much more satisfying.

Programming is a tool to create and synthesize. It leads to new products,
companies, and solutions. Data science is analysis, not synthesis. You collect
data, you interpret it, you move on to other data. Nothing gets created, which
for me, is a deal breaker for job satisfaction.

~~~
taway_1212
The premise is often that, as a programmer, you are a part of tightly
controlled agile/SCRUM team, while DS get much more independence in their
jobs.

~~~
freyr
> _as a programmer, you are a part of tightly controlled agile /SCRUM team_

I see. I'm currently in a research lab, and programming is my main day-to-day
activity. I have a high degree of autonomy, but have recently considered
moving to a product division so I can work on something that actually gets
shipped. But it doesn't sound so great, as you describe it. Maybe it's a case
of the grass always looking greener on the other side.

~~~
taway_1212
Millions of programmers work on stuff that gets shipped and I'd say their job
satisfaction isn't very high on average. Shipping is overrated; the culture of
shipping as a cool thing is cultivated by company owners who want people to
think that they enjoy realising company goals.

------
vogt
I'm a designer but work for a data science company (LMI specifically). All of
our data work is done in D, which I never even knew existed until I started
working here.

I can't speak to anything regarding ML, but for whatever it's worth in our
segment of the market we have seen a lot of competition emerge in a big way
the last few years. Former academic-type firms who specialized in bespoke
economy analysis reports are starting to build software around all of the data
that is out there since it's never been easier to collect and normalize it. I
think it's a stretch to say the market is approaching saturation for us,
though.

------
laughfactory
As many have said here, and as a working and apparently in-demand data
scientist, I agree that the tricky part about data science is that being
effective isn't a matter of just any one thing. You have to be a unicorn of
sorts who is, above all things, capable of solving any problem which comes
your way. You have to be exceptionally flexible and very scrappy.

There are a lot of people who know more about modeling, software engineering,
statistics, machine learning, analytics, and so on than I do. But I excel at
bringing everything together and solving difficult business problems. It's
really difficult to train someone to be this way. It takes a lot of time,
experience, skills, and a unique disposition to be an effective data
scientist. At least to be the kind of data scientist I am. And I'm still early
in my career.

Just my two cents. I suspect there will continue to be a glut of people who,
on paper, have the data science skills, but lack all the intangibles. Who
knows, maybe the various programs and boot camps will start doing business
scenario learning: here's a tough real world problem where we don't tell you
how to solve it, but we desperately need you to figure it out. Go!

------
apohn
Background: I currently lead a Data Science team at a big non-tech company.
Previous to this I worked at a software company that had a Data Science team
in their customer facing consulting group.

I'm going to speak primarily about applied data science. This means a data
scientist who is solving a business need by doing ad-hoc analysis or building
a reusable solution (e.g a R+Shiny dashboard) to a business problems.

Jobs: There are plenty of jobs out there, but you have to be careful. Many
"Data Science" jobs are really BI, Business Analyst, or Sales Engineer types
of jobs where some VP got it in their head that they need a Data Scientist.
These jobs are great for people who are okay with Technology and Data Science
being 10% of their job - and many people are like that. They don't care about
engineering, coding, or tech and statistics beyond the minimum to do their
jobs. But if you really want a job that involves solid tech and stats/ML
skills you will be unsatisfied at these types of jobs.

Right now there are plenty of hard business problems that people want to turn
into Data Science problems because they think it'll give them a competitive
edge or something to market and show off. This results in more data science
job openings. However, they are not really data science problems. As somebody
else said, people will eventually realize they are not getting the value they
need with data scientists doing these types of jobs. Then they'll replace that
person with an MBA with some DS coursework (e.g. MBA who can use KNIME or SAS
Enterprise Miner) or eliminate the position.

People: I interview people and I know people at other organizations who
interview candidates for Data Science roles. MOOCs and many degree programs
(including 2 year MS degrees) are pushing out people who have a very
superficial overview of data science. Basically they teach them about every ML
algorithm in the known universe and the functions to call them them in
R/Python/SAS. The end result is a mediocre coder or non-coder who boils
everything down to a confusion matrix or root mean squared error. But they
cannot actually think through a business problem or see why a low error
doesn't equal a good model (see [http://www.tylervigen.com/spurious-
correlations](http://www.tylervigen.com/spurious-correlations))

Finding good people is hard and you have to be flexible to realize great
people can come from different backgrounds.

------
plafl
I can speak for Spain, although I sometimes get calls from other European
countries. Relative to the pathetic Spanish work market data science/machine
learning is doing great. I think right now there is too much hype, which is
going to stay for a few years. After that I suppose it won't be a hot thing
but I don't think it's going to disappear. I hope I'm mistaken and we are
really seeing some AI revolution, but after all my job is putting the trust on
the data, and past data says fads come and go. If that happens I will keep
with me the math, the statistics, any development skills I can learn meanwhile
and of course the challenge of someday achieving true AI.

------
DrNuke
Worth a serious effort if you are going to use it originally in your own niche
/ industry, otherwise statistics will still help you more in any given market.
So just learn statistics very very well and then ask again.

------
manbilla
I am currently an MIS graduate student with 3 years of SAP functional
experience. After this boring stint and hearing the hype around Data Science,
I decided to give it a try (Decent statistics and engineering skills but no
coding expertise. I also finished MOOCs and am currently working on some small
projects during the holidays). Considering average pay as a prominent factor,
what is a better option - Learning extra SAP skills (HANA etc) and try for a
job in SAP or diving into Data Science completely and try to start as an entry
level Data Analyst.

------
androck1
Is there a market for competent developers without professional/academic
experience in data science or machine learning? Perhaps just a MOOC or some
Kaggle projects?

~~~
manish_gill
This is what I would like to know as well. I'm a profession dev competent with
handling large scale systems. I try to learn ML on my own time but that's not
quite as thorough as getting a dedicated degree. I can catch up with the grad
students if I put in more time but will an employer see it? Will they take a
risk even if I haven't had enough projects in the belt. etc etc

------
numinary1
If you're seeking work: If you want to be in demand, be the machine learning
person for __________ , electric energy revenue protection, or healthcare
payer fraud detection, investing, or supply chain. Pick a specialty.

If you're hiring: Get the above out of your pathetic small minds and start
hiring the smartest people you can find. Look for successes in any industry.
Your business isn't that unique. The best people can learn it much faster than
you did.

------
nl
I'm in Australia.

I'm hiring 6 people in a range of roles between "pure" data scientists to more
data engineer/SWE roles. The exact mix depends on who we can get.

The ability to find good people is the biggest constraint on the work we do.

Our current team ranges from applied mathematicians (as in they are Math
professors) to people with traditional SWE backgrounds. Basically we are a
long long way from saturated.

~~~
bear_child
What is your company? I am Australian, finishing a PhD in mathematics and
looking for a job.

~~~
nl
Email me. Contact details in my profile.

------
riqwant
People with acquired skills are plenty and not really up to scratch most of
the time. So people who have these "acquired" skills have a high likely hood
of being scrapped at the CV stage.

If you're serious about machine learning - build a blog or online repository
of quality work and use that to get a job instead

------
throw_away_777
From someone who is looking to transition to data science, the field is
terrible if you haven't had an industry job before. I am ranked in the top 150
on Kaggle and can't even get phone interviews without someone in my network
recommending me for a position.

~~~
gallamine
Post a link to your resume. There may be some obvious problems we can
diagnose.

------
hamilyon2
There is certainly a big market for both data analysts and system builders
here in Moscow. Most want a person with credentials, e.g. Yandex school of
data analisys. Field is rather on fire with big companies investing a lot of
money in it

------
pknerd
Off Topic: What will you advice to someone who writes code in Python(scraping,
mining) and have a done of ML by wathing Udacity courses, how can I polish
myself to get into position for a job?

------
praveer13
This thread is really depressing as someone currently going through a
bootcamp. (dataquest) Is it more realistic to aspire for data
engineering/analyst roles?

------
whenwillitstop
I am curious about this as well. I think the difference between machine
learning and software engineering is that companies may only need a few dozen
machine learning engineers. They may need thousands of software engineers.
There may be increasing demand, but the demand will never reach the demand of
software engineering. Except at the premium ultra competitive level, where a
data scientist who is globally known can have a massive impact on the
companies bottom line. But we arent talking about those types of jobs.

I also believe that most traditional companies do have data scientists, but
they havent really start incorporating machine learning into their products,
they are analyzing information about their customers, but their products are
not reliant on using data. Once that becomes more common, things will pick up.

~~~
wjn0
It seems that this is true for now (for 'traditional companies').

Soon, however, one could argue that 'traditional companies' will no longer be
the norm - data science, ML, etc. will play such a crucial role in the
majority of tech firms that the number of companies using it will rise. That's
when I expect we'll see a huge portion of software engineers knowing ML
concepts. Alternatively, I wonder if we might see the rise of smaller
companies contracting out all of their ML to larger ones.

I would also be curious to know if ML background helps one to get a job at a
place like Amazon/Google for even 'traditional' positions right now. The
amount of data they have now must drive demand for engineers who can write
software that takes advantage of it, regardless of position. Of course, like
you said, they'll always require engineers to fill more traditional roles with
no data interaction.

~~~
user5994461
You need a friend at GooMAzonSoft to refer you. It's always been and will
always be the easiest way in. Then traditional uninteresting phone call and
uninspired 6 hours on site with people who probably didn't read your resume.

Maybe if you have a good profile and you get lucky, you'll go interview
straight for one group who's interested in you, but I wouldn't bet on that.

~~~
shepardrtc
Referrals from your friends do little to nothing at those companies. At
Microsoft, for instance, all it does is assign a "handler" to the applications
you put in. They just let you know if anyone is looking at them. Which, as you
can imagine, is useless.

Almost no one, except for maybe very high level hires, get a pass for the
initial weed-out interviews.

~~~
whenwillitstop
Not true. I'm a run of the mill developer. I had another offer, I called
google, told them I had another offer. They skipped all the rounds but the
final.

~~~
shepardrtc
They skipped screening and whiteboarding because you simply said you had
another offer?

------
payne92
Machine learning PhDs >>> everything else.

