
I wasn’t getting hired as a data scientist, so I sought data on who is - jonbaer
https://towardsdatascience.com/i-wasnt-getting-hired-as-a-data-scientist-so-i-sought-data-on-who-is-c59afd7d56f5
======
tjpaudio
This is an awesome analysis of the situation. Some things I have noticed as a
data scientist of 4 years so far: \- Increasingly, data scientist is just
being used in place of senior analyst because it attracts more applications.
\- At the firms I've worked that are software tech companies, there was an
outsized interest in mid-level software engineers wanting to be data
scientists, mostly because the career development prospects at that stage are
grim and data science usually means a pay bump. This demand has had the
opposite effect - software shops are leery of promoting engineers to data
scientists for fear of inciting contention among the ranks. \- Building on the
data scientist usually means senior analyst, it has also come to mean analyst
that can build their sql query into a scheduled ETL or daily process of some
sort. You work in pandas instead of excel sorta thing. \- I have personally
gotten all my data science jobs from talking about the business side of
things. I think engineers approaching the field from a hard-skills perspective
is totally wrong. My last technical take home was in a language I had never
used before and likewise my execution was shitty, but I was able to well
explain the problem, how the data could be used to predict the variation, and
how the data science product fit into the business. I got an offer before I
left the building.

~~~
learc83
>as an outsized interest in mid-level software engineers wanting to be data
scientists, mostly because the career development prospects at that stage are
grim

What are you basing this on? Senior software engineer jobs are a lot easier to
come by than data scientist jobs and from what I've seen, pay better than the
average data scientist job as well.

~~~
tjpaudio
I am just going on my experience, but according to glassdoor data scientists
of the same years of experience earn about 20k more than senior software
engineers here in Boston.

~~~
codingslave
Glassdoor is not accurate

~~~
treypitt
source?

------
choppaface
“Finding 1: most data scientists have postgraduate degrees”

Data Scientists primarily function to tell a story (based upon data) that
technicals and non-technicals alike will use in business decisions. It’s
critical that a Data Scientist _be perceived_ as trustworthy, since the
decision-makers are unlikely to reproduce or even understand the Data
Scientist’s full argument.

What signals trustworthiness? A graduate degree from a Harvard Yale Princeton
Stanford (HYPS) or similar university definitely speaks well for a candidate.
Online degree programs like Coursera / Udacity / etc won’t carry nearly the
same weight until their alumnae network grows, and that will require growing
into non-technical fields.

What signals _untrustworthiness_ ? Sadly, the “hacker” skills that are so very
key in DS (e.g. for data cleaning) are completely at odds with traditional
(and especially non-technical) assessments of trustworthiness. Many companies
in the Bay Area will look past this issue, but it’s arguably a competitive
advantage to simply be able to assess “hacker” skills effectively. That also
entails making space for “hackers” at your company. Can’t take hackers? You
probably will never hire a good Data Scientist.

~~~
nerdponx
That, or, candidates without formal education lack the basic statistical
knowledge and/or communication skills required.

PhDs can't code, hackers don't know stats. A good team has both. If you're a
one-man show, you need to do both yourself.

~~~
Blackstone4
Hedge funds often have data engineers who focus on building the infrastructure
and cleaning the data.

They also have quant guys/data scientists who use the data to help drive
investment decisions.

------
turingbike
Leo Breiman [0] (inventor of bagging and random forests) wrote a paper called
"Statistical Modeling: The Two Cultures" [1], and since I read it, I see it
everywhere. The basic idea is that Statisticians place(d) too high an emphasis
on model interpretability ("data modeling" in the paper), and as a result,
missed out on the revolution of machine learning ("algorithmic modeling" in
the paper). In the author's words (parenthetical added by me), "[T]he focus in
the statistical community on data models (simple, interpretable models) has
[l]ed to irrelevant theory and questionable scientific conclusions."

In this TDS post, the author says "Statisticians and Actuaries are at the
bottom of the heap as a prior role for existing data scientists." Maybe this
isn't a coincidence? Plenty of companies had statisticians on staff, but the
explosion of data science happened anyway. Why? Because data scientists do the
same types of tasks as statisticians, but while statisticians are of the data
modeling culture, data scientists are expected to be of the algorithmic
modeling culture. It seems that the market is saying that the algorithmic
modeling culture is getting results.

The author references "Type A vs Type B Data scientists" [2], which seems to
be getting at the same thing: "The Type A Data Scientist is very similar to a
statistician... Type B Data Scientists share some statistical background with
Type A, but they are also very strong coders and may be trained software
engineers. The Type B Data Scientist is mainly interested in using data "in
production." They build models which interact with users, often serving
recommendations (products, people you may know, ads, movies, search results)."
For whatever reason, there is a correlation between Algorithmic modeling /
Type B and "getting things done".

[0]
[https://en.wikipedia.org/wiki/Leo_Breiman](https://en.wikipedia.org/wiki/Leo_Breiman)

[1]
[https://projecteuclid.org/download/pdf_1/euclid.ss/100921372...](https://projecteuclid.org/download/pdf_1/euclid.ss/1009213726)

[2] [https://www.quora.com/What-is-data-science/answer/Michael-
Ho...](https://www.quora.com/What-is-data-science/answer/Michael-Hochster)

~~~
gshdg
With the failure modes we’ve seen from deep learning and related AI, maybe the
statisticians are on to something?

------
hansamad
Hello guys. It’s me, Hanif. Just a response to some of the comments here:

1\. I think it’s absolutely fair to criticize this aspect of the analysis: the
relative frequencies of the backgrounds of data scientists have been presented
as suggesting the success rate from each field. Many of the comments in the
post itself made a similar critique. As I’ve acknowledged in my responses to
these comments, what we need are the relative frequencies of applicants from
the different backgrounds, not just hires. However, one can justify the
inference about the success rate of, say, Statisticians and Actuaries if one
has the prior belief that the relative frequency of statistician applicants to
DS positions should be higher than the observed relative frequency of
statistician hires (<1%!) to DS positions. I don’t think this is unreasonable.
2\. I make a similar argument with regards to MOOCs/bootcamps: my prior belief
is that the relative frequency of bootcamp-only applicants should be higher
than the observed relative frequency of bootcamp-only hires. Hence my
statement about necessity vs. sufficiency. 3\. It’s somewhat more complicated
for applicants with both degrees and MOOCs/bootcamps. I haven’t done this, but
what I can do is to look at the education distribution for hires with and
without MOOCs. If the education distributions were similar, it would suggest
that MOOCs have negligible impact. If, however, there is a higher relative
frequency of say Bachelor’s degrees in the MOOC category, that would suggest
that MOOCs/bootcamps have some value-added impact. 4\. An ideal prospective
study for the above would be to extract a sample of individuals from a
precursor role, say, data analysts (hence naturally controlling for
education). Note which of them have MOOCs or bootcamps, then follow them up in
time to see how many end up as data scientists in each category. 5\. I might
actually change that profile picture. It’s 3 years old, in more innocent
times. 6\. As it happens I have landed a data scientist position in Singapore
and will be starting in September.

~~~
nickpsecurity
Nice that you showed up to respond to this stuff. Instead of a blob of text, I
suggest an edit to your comment to put each numbered point in its own
paragraph. It will make it more readable. Lots of folks here sort of mix
skimming and focused reading with the opening sentences helping them determine
what they should invest time in.

~~~
hansamad
Thanks nick. I’ve been trying to but the updates are never reflected :(

~~~
nickpsecurity
The oldest rules...

[https://github.com/minimaxir/hacker-news-
undocumented](https://github.com/minimaxir/hacker-news-undocumented)

...mentioned 2 hours as the limit on an edit. I don't know what it currently
is. You're past 2 hour mark, though. Might explain it.

------
joker3
> For myself, it was worth noticing that Statisticians and Actuaries are at
> the bottom of the heap as a prior role for existing data scientists.

This has a lot more to do with the relatively small number of statisticians
and actuaries out there than it does the odds of people from various
backgrounds transitioning into data science roles.

~~~
dsqrt
Exactly. The data is about the backgrounds of data scientists, but is
incorrectly interpreted as the probability of becoming a data scientist given
a certain background. Obviously the two are related (Bayes' theorem), but to
draw any conclusion one would need to know the number of PhDs, Masters, etc.
that are applying to become data scientists. For example, the fact that a
small fraction of data scientists has a MOOC degree does not imply that the
probability of becoming a data scientist if having "only" a MOOC degree is
low. For all that we know the few people in the market having this kind of
non-traditional preparation could have 100% success rate in getting those
jobs.

------
gumby
> ...I had conflated the practice of data science with the strategy to become
> part of it.

What an excellent analysis that applies far beyond Data Science.

Perhaps describing themself as "Statistician, Data Scientist, Software
Developer" might have a better hit rate against the skimmers who pre-screen
the resumes. An honest-to-ghu statistician who became a programmer is much
more exciting than someone who looks like a programmer attempting to leverage
themselves into a new hot sector.

------
randyzwitch
Like any other job hunt, it all comes down to networking with people you hope
to work with. Blindly learning techniques won't get you anywhere if you don't
have anyone who will vouch for you/introduce you to other people

~~~
asdffdsa
Doesn't that defeat the premise of the tech industry as a meritocracy?

~~~
WalterBright
> Doesn't that defeat the premise of the tech industry as a meritocracy?

If you don't do any marketing of yourself, nobody is going to know about your
merit. "Build it and they will come" is a Hollywood fantasy, not reality.

~~~
Traster
I think this is unfair. There are lots of jobs out there, and lots of
employers looking for people with specific skills. The reason networking works
is because discoverability is a problem. If I apply to every job that listed
my skills as required I'd waste 90% of my time because my area of expertise is
so specialist that only 10% of the people who have all the skills listed are
experienced in the right ways.

Whereas if I network with people the jobs they recommend are far more likely
to be a fit because I've networked with them - they know my skills.

~~~
WalterBright
> I think this is unfair.

Fair or not, it's how most everything in life works. For good things to happen
you, you have to put yourself in a position where good things can find you.
That means marketing yourself. It applies to getting your dream job just as
much as it applies to getting your dream partner.

Even if you get your dream job, talent and hard work simply isn't good enough.
You'll need to be able to sell your ideas to others in the company.

------
punnerud
I am looking for people that find and solve problems on their own, especially
for data science roles. The Linkedin profile (hanifsamad) is all about how he
is good at solving problems given TO him. Indicators:

\- "I am for the problems worth solving (..)"

\- Use the default header on Linkedin

\- No sign of roles or additional engagement that indicate 'self driven
problem solver'

I love the article, but I need actionable insights.

~~~
rconti
I sure as heck put a lot more weight on his clever approach and execution in
gathering data on a problem he discovered _on his own_ than I do on what
header (??) he used on LinkedIn.

~~~
punnerud
It’s a good sign that you question what is given to you

------
killjoywashere
I'm in the US, but I have worked with a couple of folks from Singapore, so I'm
at least going to attempt to make what may be a culturally sensitive
suggestion: I would consider a somewhat less 'free, young, and happy' LinkedIn
picture. Collared shirt, short sleeve, empathetic smile. Your pic makes you
look like you're 16. I suspect you want to look like you're 25-30. Your
prospective employer wants to imagine you listening to them explaining their
problem, prepared to work with them on the data collection and software
development. Look like that.

------
data4lyfe
Data science is so obscure the author is right to point out how there's a real
difference between what a data scientist actually does and what he needs to do
to actually get a job in data science. If you can write queries and understand
product metrics, that's about half of all of the data science jobs and
interviews.

But IMO the field is getting to a point where the engineers are going into
machine learning because it's a pay bump and the data analysts are realizing
they can start calling themselves data scientists with some more experience
under their belt.

Ultimately the field is saturating to where people are now going into this
field as an easy way to hit six figures. You see that with the rise of the
data science bootcamps but I digress as I wrote more about it here:
[https://www.interviewquery.com/blog/the-saturation-of-
data-s...](https://www.interviewquery.com/blog/the-saturation-of-data-science)

------
sushilewis
Is this a situation of garbage in, garbage out?

That is, there might be more interesting distinguish factors, but he was
limited to education, position, and years of experience?

------
wickerman
I started out my career as a 19 year old in business intelligence and old
school data warehousing, and only in the last three years been able to
properly apply my skills in the world of big data as a data engineer, and I've
found that regardless of the fancy titles the kind of stuff I do is exactly
the same. Perhaps the most surprising thing is that because "data engineering"
is disassociated from the notion of traditional warehousing, you get lots of
"experts" who have never heard of an ETL and think about software instead of
data pipelines.

And with data scientists, I've found that it's a mix of a) people who did
mathematics or physics degrees suddenly getting into computer science b)
senior analysts learning how to power up their analysis and c) computer
science graduates who went on to do phd in data science

Working with them my humble opinion is that a) you can't ignore the software
aspect of your job, meaning that you need to understand basic database
principles, parallel computing, SQL, etc. b) you also need to understand that
it's not about how fancy your algorithm is, but also how it can be quantified,
how you can manage the life cycle, how you maintain it, etc.

~~~
dijksterhuis
> you can't ignore the software aspect of your job, meaning that you need to
> understand basic database principles, parallel computing, SQL, etc.

I still find it amazing that so many data scientists I know do not understand
basic data software principles. Stuff like distributed vs parallel, database
types (NoSQL vs RDBMS), immutability etc.

Madness.

------
cryptozeus
“Finally, I would note that while the data is silent on the necessity of
skills acquired from non-traditional certifications such as MOOCs and
bootcamps, it does suggest something about their sufficiency: they clearly
aren’t. A postgraduate degree is a far better indicator of your prospects as a
data science hire. ”

------
sadness2
Superb. This analysis shows that you are not only able to process data, but
that you will creatively find opportunities to make data useful, in ways which
other folks in whichever company you end up in may not even think to ask for.
If people with data science openings are not attempting to hire you from this
thread, they're crazy.

------
lifeisstillgood
This is such a mind booglingly _obvious_ way to find the skills needed to get
a position that I am amazed I have never heard of it before (I wish I had
thought of it!)

It also shows what Inthink is the most vital skill of a Data Scientist - to go
and find the data to support the question

nice one

------
semantic_x
Analysts have been doing data science long before Data Science was a thing,
unless i am missing something, it's a rebranding of tasks that have always
been performed.

~~~
zwkrt
My grandfather was an “analyst” for GE, which involved a lot of physical
calculation, but also FORTRAN and COBOL. It used to be before programming
became its own profession that managers, accountants, secretaries, and other
disparate jobs were just expected to program off-handedly. It’s interesting
that we’re somehow in the same place again, but in reverse, having to make new
names for jobs that involve programming but not “software engineering”

------
franciscop
Sidenote for anyone who encounters the same: Medium is giving me a paywall, so
I opened a private window and it works. Does anyone know how Medium paywall
works? Is it the author that says "premium" content, or is it medium based on
traffic?

~~~
teej
The author determines if it is premium or not. Medium visitors get an limited
allocation of premium articles they are allowed to view for free each month.

------
autokad
anecdotal evidence ahead ...

I wanted to be a data scientist at a top tech company, so I did as Hanif did,
and went to the data on LinkedIn. my search was more specific - only data
scientists at top tech firms, and is also a very tiny sample.

But first, my situation at the time: Masters in Computer Science at the
University of Pennsylvania. strong database, AWS, spark, and python skills.
worked in a social media research lab that looked social media impacts on
health, mostly did NLP involving twitter and health outcome data. Coauthored a
paper that ended up in JAMA (journal of american medical association).
Eventually I got what I wanted, but it wasn't easy.

TLDNR:

\- message recruiters directly

\- find a way of showing them you are a good candidate beyond your resume - I
found kaggle was really helpful, I recommend it.

\- be careful of getting pigeonholed out of DS positions by recruiters. your
LinkedIn should speak 'I am a data scientist at heart'

\- be prepared to fail interviews and learn from mistakes

\- study stats (youtube,books), coding (leet code), and SQL (leet code?)

Findings: Degree - Need masters or PHD. Major - Statistics or some version of
it was most common and MBA's at top MBA programs 2nd, Computer Science very
rare. why MBAs? probably because those programs had wonderful stats programs.
School - Top schools are very important. previous job - Intern at a top tech
company. Intern as a data scientist hugely beneficial. Next most important
feature was whether the previous positions was data scientist. Not data
backed, but I would argue that becoming a data engineer to get adjacent skills
is a bad strategy. DE are highly needed, a recruiter will put you on DE loops,
not DS ones. I feel like data analysts also struggle to become DS.

I thought my situation was at least somewhat ideal, but I was not getting
interviews. 0. Its hard emotionally to not be able to get to where you want
without been giving a chance, just got to keep trying. Findings helped me
realize (previous job) that I was going to need to go about getting interviews
in a more efficient manner.

I needed a way of getting attention. the reply rate of websites seems to be
1/50, which is problematic if you want to work for a specific set of
companies. I think the best thing to do is to go on LinkedIn, search <company
name> \+ recruiter. Message the recruiters directly, they have all the power
in setting up phone screens, and they send batches of candidates to open
positions hoping some of them will get a role. Now you got their attention, so
you also need a way of getting into those batches.

An important metric for success was already being what I wanted, so I had to
find a way of saying 'I can do this'. I starting spending most of my free time
on kaggle, the zillow home price prediction. I finished top 100, which I
STRONGLY feel this helped me get interviews. I recommend it. Its a free, zero
risk way to get experience and display your passion/skills.

Next, I got some phone screens but failed a few and failed an on site.
Technical phone screens are either stats, coding, or SQL - never been asked ML
questions. Sometimes I failed coding questions, sometimes I failed stat
questions. I addressed this by studying lots of stats (YouTube was very
helpful) and coding (leet code). I already had years of SQL experience, so
those questions were always easy for me but be prepared to answer the
histogram question. I had some recruiters tell me in initial phone screen
'even though you applied to a data science position, we think you would be a
better match for this data engineering position instead'. ouch, I realized my
LinkedIn and resume looked very DE like because of my years as a database
administrator and I added lots of spark/HIVE to my resume because I saw that
on most DS postings. Its important, but don't over highlight the wrong things.
I politely declined and kept trying.

Eventually I got exactly what I wanted, and I am very happy for it. It took me
2 years after graduation to get there, and I had failures at all parts of the
process. I know it sounds cliche, but keep trying is my best advice.

