
A Student's Guide to Preparing for Data Science Interviews - rogocopH
http://www.acheronanalytics.com/acheron-blog/how-to-prepare-for-a-data-science-interview
======
bllguo
Unfortunately there is very little here that isn't just general interview
advice.

As a new grad that went through the hunt very recently - it was a messy
process. Very few places will consider you without extensive experience, or a
masters/Ph.D. Of course if you're hiring people to research machine learning
algorithms that's justifiable, but plenty of the responsibilities people
associate with data scientists don't require advanced degrees.

And the number of posts asking for 5, 7, even 10 years of experience...
absolutely astounding.

As someone uninterested in going back to school, I've resigned myself to
getting some work experience and doing personal projects for 1-2 yrs before
trying again.

~~~
mindcrime
_spoopy01_ nailed it. Most of the people hiring data scientists don't really
know what data science is, or probably even why they need it. So they inflate
the hell out of the requirements to a. CYA and b. hopefully get somebody so
experienced they can come in and make up for the lack of organizational
understanding of data science. IOW, they want somebody who can "teach us what
we don't know".

~~~
softawre
Teaching me what I don't know is what I want with all of my engineering hires.
I want people better than me who will tell me why my architecture isn't ideal
or whatever the case is.

~~~
mindcrime
Yes, but I'd argue that's on a different level from "teach us why we need an
'architecture'" or "teach us how to use this 'data science' stuff". Some
people are trying to be buzzword compliant when they don't actually understand
the buzzwords. Ya know?

Or maybe there's a better explanation for asking for a candidate with a Ph.D.
in Statistics to create a linear regression model in Excel. Because truth is,
for many companies that's all they need.

~~~
gaius
_a Ph.D. in Statistics to create a linear regression model in Excel_

Linear regression, logistic regression and k-means clustering, if you can get
a project into actual real-money production on one of those, you are already
well ahead of 90% of data scientists. And these techniques are decades old!

------
minimaxir
So I've been interviewing for data analyst/science positions since leaving
Apple in April.

I may do a postmortem on my search later, but speaking from my experience with
many, many interviews over the past couple months, the TL;DR is that the
conventional interview wisdom on Hacker News/the cscareerquestions
subreddit/this article is _wrong and out of date_. Interviews for such
positions require a different set of skills than just reading Cracking the
Code Interview (and ones that you _can 't get at a data bootcamp_).

~~~
alexchantavy
What kinds of technical questions do they ask in a data science interview?

~~~
minimaxir
On the stats side, often higher-level theory questions, such as "How does the
k-means algorithm work?", "How do you select the best k for k-means?", "What
is the curse of dimensionality?" which again would not be things covered at a
data boot camp or data science thought pieces on Medium.

On the technical side, there is often more-advanced SQL (nested JOINs +
PostgreSQL window functions). On the big data side, there is often discussion
of distributed systems (e.g. Spark clusters) and practical algorithmic
complexity at scale (i.e. instant fail if you suggest anything loglinear or
slower).

~~~
moab
I overheard some colleagues talking about a recent interview where a candidate
with "stellar industry experience", i.e. Kaggle wins and previous ML
experience at a valley company, who couldn't explain Bayes rule to them, let
alone rederive Naive Bayes. While books like the one below are extremely
theoretical, anyone interviewing for these kind of roles should spend at least
a week or two just looking through this to see what kind of algorithms and
properties are studied in theory.

Foundations of Data Science (Blum, Hopcroft, Kannan)
[https://www.cs.cornell.edu/jeh/book2016June9.pdf](https://www.cs.cornell.edu/jeh/book2016June9.pdf)

~~~
reader5000
This is like not hiring a [big name coding competition] winner because he
didn't know radix sort.

~~~
mmierz
I think it's more like not hiring a big name coding competition winner because
they never bothered to learn how to use version control, or any coding best
practice, or any language other than C.

Trying to do data science with zero knowledge of the fundamentals of
probability is _dangerous_. Bayes rule isn't some kind of deep magic, it's
covered within the first few lectures of an undergraduate probability course
and it's absolutely necessary to understand the output of any machine learning
model.

~~~
reader5000
>I think it's more like not hiring a big name coding competition winner
because they never bothered to learn how to use version control, or any coding
best practice, or any language other than C.

Depends on what you're hiring for, but I'll take "competition winner with no
version control" over "average programmer with expert VC capabilities".

>Bayes rule isn't some kind of deep magic

Yes, it's largely conceptually obsolete.

The people jamming out weekly SOTA machine learning models on arxiv aren't
sitting around meditating on conditional probabilities. They're making little
tweaks to giant models that are basically impossible for a human to
comprehend.

~~~
achompas
> Yes, it's largely conceptually obsolete.

I'm sorry, what? How did you arrive at a point where you believe this is true?
This is like calling compilers "obsolete."

Is it because you believe deep learning has "taken over" or something?

~~~
reader5000
Try to derive e.g. a face detector from bayes theorem. You immediately arrive
at computationally intractable sums/integrals. Yet, we have super-human image
classifiers. Therefore, bayes theorem is obsolete. Sure, you can try to
retrofit bayes theorem on top of a neural net, but who cares?

~~~
achompas
> You immediately arrive at computationally intractable sums/integrals.

So we instead sample from that posterior.

Unless you think MCMC is also obsolete, in which case I’ll see myself out.

~~~
reader5000
You're right, but a) you have comp efficiency issues with MCMC, and b) just
empirically MCMC models don't work as well as gradient descent + NN for many
tasks.

~~~
achompas
And you don't have computational efficiency issues with NNs?

We're also ignoring the benefits of a posterior distribution, which is useful
for understanding the data-generating process.

~~~
reader5000
Yeah of course. I can't explain to you why NNs outperform bayesian approaches,
probably just NNs are capturing the correct type of prior for vision/language
tasks. And yeah bayesian models are more interpretable but when you have
millions of latent variables I'm not sure interpretability is a thing.

~~~
achompas
Yep, we arrived at my larger point: if you care about interpretability, NNs
are horrible and Bayesian techniques are pretty damn great.

~~~
reader5000
Well certainly, but interpretability is obsolete.

~~~
achompas
Now you're just trolling. :)

------
mcrad
Analyzing the Analyzers, free eBook. Assuming the student is sharp on
technical skills, this look at the human side could be helpful to prepare.
[https://www.amazon.com/Analyzing-Analyzers-Introspective-
Sur...](https://www.amazon.com/Analyzing-Analyzers-Introspective-Survey-
Scientists-ebook/dp/B00DBHTE56)

------
booleandilemma
How many data scientist jobs are actually out there? I can understand data
scientist being a position at one of the big 5 tech companies, but are they
really in demand elsewhere?

I've never actually met someone off the internet who calls themselves a data
scientist.

~~~
reader5000
Technically anybody who uses Excel is a "data scientist". Just got to get the
right buzzwords.

~~~
achompas
This is absurd and false. This person is an analyst of some sort.

Maybe this holds in the consulting world? It definitely does not hold in the
tech world, IME.

~~~
gaius
_This is absurd and false. This person is an analyst of some sort._

And what is a data scientist then, if their work does not involve analysing
data and presenting their analysis?

99.9% of "data science" is exactly what people used to do in tools like Excel,
MATLAB, even SQL, just in Jupyter instead. On a Mac while sipping a latte.

~~~
achompas
> And what is a data scientist then, if their work does not involve analysing
> data and presenting their analysis?

This is a dead giveaway that you have no idea what you’re talking about.
You’ve captured about 5% of my work.

The rest of the time, I’m writing software (ETL pipelines or real-time
services, including tests), debugging some distributed system, collecting or
cleaning data, or gathering requirements and developing feature specs with
other folks.

Fortunately for you, you nailed the Mac-using latte-drinking part!

EDIT: Reading your comment history on regression and k-means. you _do_ know
what you’re talking about. It _is_ hard to get models into production, so I’m
surprised to see your snark here. What gives? Do you have experience with DS
who don’t deliver?

~~~
gaius
_What gives? Do you have experience with DS who don’t deliver?_

I have experience of DS who define what they do by the tools they use, not the
results they deliver, it's a pet peeve of mine :-)

Thanks for going back and making the edit!

------
denzil_correa
The thing is : Data Science requires ... "scientific" rigor and thought
process. A lot of people who hire often forget that science is integral to
data science: it's right there in the name.

~~~
rogocopH
I think this was posted earlier. But some companies really just want a
statistician.

Very few companies are actually using their data scientists as scientists.
From my experience.Except for when I worked at a large hospital. We had a
research board, and had to be certified to study Humans CITI. But beyond
that..

~~~
gaius
_But some companies really just want a statistician_

In what way is what statisticians do _not_ "scientific"? Setting up and
rejecting (or not) the null hypothesis is the very definition of the
scientific process...

------
reader5000
"Data science" is a field with so much conceptual churn and fads that
interviewing for it is a completely ridiculous notion.

