
Show HN: Real world (Jupyter notebook embed) way to assess data scientists - rvivek
https://www.hackerrank.com/tests/7j6a5mhn6lb/78766bda888f6c08605024290c926550
======
soVeryTired
What you're asking for looks a lot like a kaggle challenge. You're asking if
someone can use xgboost (or linear regression). It's the _last_ ten per cent
of a data science problem, if that.

The hard parts of data science include the following:

    
    
      - choosing the right input data (rather relying on regularisation)
      - figuring out what the consequences are if you're wrong in a specific way, and avoiding the bad cases
      - wrangling your data into a nice CSV format
      - handling missing data
      - spotting biases in your data collection methodology
    

I'd expect a graduate to know about regression. For anyone else, this wouldn't
help me assess their skills.

~~~
shikharja
@soVeryTired, the challenge showcased in the test does involve a lot of
wrangling, handling missing data points, spotting bias and identifying the
right features for the regression model. The challenge is designed to allow
for a candidate's creativity.

Do you think the data set we have used doesn't do it to the extent you'd
expect from a data scientists?

~~~
tfehring
Any assessment that directly provides data sets - even with "gotcha"s like
missing values - is testing, based on the conventional wisdom, at most 20% of
a real-world data science workflow. And IMO it's the least critical 20%.

The only good end-to-end "technical" data science assessment I can think of is
to pose a broad question or business problem that's addressable by applying
data science techniques to publicly available data. But a nontrivial version
of that assessment would take half a day on the very low end, and long
assessments anti-select against good candidates.

IMO, when it comes to evaluating data scientists, the only thing that online
coding assessments are good for is to ensure that they can perform basic
coding and data manipulation tasks. (I'd include tasks like web scraping,
image manipulation, API calls, and ORM stuff in this category). Everything
else needs to be evaluated in person.

~~~
shikharja
Do you think candidates looking for Data Science jobs would be open to
performing a half a day exercise?

We optimized these challenges to allow candidates to show as much of their
skills they can show in a timed window, without killing their creativity. I'd
be curious to know what do think is a good way to interview data scientists.

~~~
tfehring
I think _some_ candidates would be open to performing a half-day exercise. But
the best candidates wouldn't, which is what drives the anti-selection I
mentioned in my previous comment. More broadly, I don't think it's realistic
to create an assessment that's representative of real-world data science
workflows without being onerous enough to exclude good candidates.

If representative isn't an option, highly correlated is the next best thing.
In practice, for my team specifically, this means screening for math aptitude
and general business acumen during a phone screen, data manipulation
(moderately complex SQL + tidyverse/data.table/pandas) during a "take-home",
and delving more into problem solving approach, model selection and
validation, etc. during an onsite. Broad business questions (e.g., "How does a
life insurance company make money?") and communication skills generally weed
out the candidates who picked up the bare minimum math and programming
background through Kaggle + MOOCs.

As an aside, I absolutely think that the sort of assessment in the OP kills
creativity. I care a lot about whether a candidate would think to include
covariates like Internet usage and segmented urban population when predicting
mortality rates; I don't care at all whether they're able to write the trivial
amount of code that's needed to include those covariates in a model, given a
data set that already contains them.

------
dannykwells
Working data scientist here. As many have said, this is, effectively a Kaggle
challenge. I mean honestly at this point, I don't care, at all, how well
someone can predict anything to be a data scientist - there is very little
correlation between that and how good of a data scientist they are.

Tools to hire data scientists and going to continually fail until they realize
that the interesting, hard part of being a data scientist is closer akin to a
business lead (which can't really be tested in 60 minutes).

Concrete feedback:

\- You ask for writing and descriptions on why a model was chosen, why
features matters - are you grading this automatically? That would be a feat.

\- The task is waaay to easy (even if you do believe there is a market for
identifying people who can predict well).

\- Python is overly limited. Why not SQL or R?

~~~
minimaxir
Disclosure: Got a preview of this product, my opinions only.

> You ask for writing and descriptions on why a model was chosen, why features
> matters - are you grading this automatically? That would be a feat.

Grading is apparently not automatic, which is good as I am not a fan of the
Kaggle approach in this demo.

> The task is waaay to easy (even if you do believe there is a market for
> identifying people who can predict well)

You'd be very surprised about how candidates can respond to these types of
questions!

> Python is overly limited. Why not SQL or R?

The full product allows Python, R, and Julia, with popular packages
preinstalled for Python/R.

------
data4lyfe
I am in the camp to think that this notebook judges data scientists in a way
that soon will be obsolete.

If I'm given this clean dataset with all of the features properly set in
columns and data types labeled, I could spin up Azure or Google Cloud's ML
capabilities and have them run gridsearch and optimize my model.

To test data scientists, it seems like it's generally falling more into the
buckets of people that can pull analytics and query databases to create the
datasets and features OR people that can build the infrastructure to serve
models, engineer pipelines, etc...

FYI though we're working on this now at
[https://www.interviewquery.com](https://www.interviewquery.com) to try to
start creating suitable tests to assess data scientists without having them do
10+ take homes every month.

~~~
minimaxir
> If I'm given this clean dataset with all of the features properly set in
> columns and data types labeled, I could spin up Azure or Google Cloud's ML
> capabilities and have them run gridsearch and optimize my model.

That's the fault of the test design allowing such techniques without scrutiny,
not with the Notebook format.

------
ryanferg
I'm a data scientist (for an MLB team that will win the WS this year!) and I
love this. Of course this isn't a whole end to end evaluation platform. But we
will get 300-500 applications for a position sometimes, and often folks have
no business applying and this would be a great way to filter out some of the
noise. Great job!

~~~
shikharja
That's great to hear Ryan! You can sign up for the free trial for a full
experience here - [https://www.hackerrank.com/products/free-
trial](https://www.hackerrank.com/products/free-trial).

------
pequalsnp
This was pretty cool. For fun, I tried to get the best possible score I could,
using XGBoost, without any feature engineering and achieved a MAE of
0.042422154541399665.

------
morelandjs
No one does good science in 60 minutes.

~~~
minimaxir
True. But as long as _all candidates_ have the same time limit / same
expectation of work depth, and the test providers have a reasonable
expectation of how much can be accomplished in that timeframe, then it's fair.

That said, this demo should have a several hour time limit.

~~~
listenallyall
A standardized, precise 40-yard dash might be _fair_. But it is also pretty
useless if you are evaluating runners for a 1 mile race, or a marathon.

~~~
kthejoker2
I like this, let's consider how do you evaluate someone for a marathon.

Average marathon time is about 4.5 hours. Let's say you expect you're hired to
last around 9 years (average is 8 according to GlassDoor but you're woebegone)

So scaling linearly, for every hour you get to assess a job candidate, you'd
get __0.2 seconds __to evaluate a potential marathoner.

Assuming a typical candidate gives you maybe 12, 24 hours tops, you have all
of 3 to 5 seconds to evaluate a marathoner.

The futility of such an exercise is obvious.

The only solutions are to:

* insist on a longer evaluation period

* hire them on a probationary basis

* only hire people who have ran official marathons before, with proof and such

I leave it as an exercise for the reader to determine what's right for their
own evaluation process.

------
b_tterc_p
I think this is good as a way to filter people out, but not as a way to rank
people to find the best.

I would want to see a short script to clean and predict a dataset, plus a
small description of why choices were made.

Wouldn’t care much about the performance of the model.

------
tryitnow
I think this is great as a self-assessment tool, especially for beginners.

It would work great with other learning tools, like MOOCs, datacamp,
dataquest.io, as part of an overall data science learning process.

I'm more skeptical of its ability to help companies select candidates, but I
could be very wrong about this and if I am then it's a huge win, so thanks for
developing it.

I am super interested in seeing how you all develop this in the future,there's
a lot of potential here. Is there a data science specific mailing list I can
sign up for? I honestly, have zero interest in hiring for other roles so I am
not going to sign up for a general mailing list.

~~~
shikharja
If you are interested, we are looking for data scientists and developers in
general from the community to help us built these solutions and provide us
with honest feedback. We are also looking at building support for other Data
Science roles like a Data Engineer. I would be more than happy to show you
what we have and hear your thoughts on the same. Let me know if you'd like to
be a part of it and how can I reach out.

------
sireat
I suppose I am in the minority but I thought it was a pretty good FizzBuzz
challenge for DS.

In fact I'd say it is a bit aggressive for a 60 minute challenge.

Quite a bit of data wrangling is expected to complete modeling on all columns.
Some regex knowledge would help here too (for example for wrangling
internet_users column)

What was the idea behind asking for 20 most important features when we have 16
columns? Is it expected to do some feature engineering?

Disclaimer: I teach Python and basic Data Science to adults and I'd say most
people would struggle to complete this in 60 minutes including myself.

~~~
shikharja
We had to reduce the time limit on the test to handle the traffic. The
intended test duration is 90-120mins. I have updated the test duration to
90mins now.

There is indeed some feature engineering involved. The challenge in the test
can indeed be solved in the most obvious way possible, as well as in the most
creative fashion. We believe how a Data Scientist goes about solving the
problem was more important than a fixed outcome.

------
kthejoker2
Data scientists are fundamentally problem solvers.

The best way to assess technical problem solving is a structured hackathon.
That is, to be given

* a problem with multiple subproblems and solution milestones

* with both objective and subjective criteria

* freedom of tools

* a "junkyard" of resources

* a fixed amount of time for each deliverable of the problem

And then you observe the process and the results.

For data science, the subproblems should be:

* requirements gathering / understanding the problem

* data acquisition, prep, and analysis

* refinement of requirements / communication

* feature engineering

* modeling

* presentation / storytelling / viz

------
ska
Here are my high level thoughts after a quick look at the question and some
clarification in thread below.

\- a single question is difficult to evaluate. "Answering a business question"
is at the very end, usually, of a bunch of exploratory steps

\- 60 min is reasonable but not much time to evaluate real work. You either
need to expand the time (also a problem, for interviewing) or allow scoring of
"what I'd do next"

\- tooling familiarity is going to be a huge factor with short time. Are you
testing general knowledge or environment knowledge?

\- too focused on models, too "kaggle-like". That covers about 20% of the
skills and job.

Here are the sorts of things I look for. Do they understand:

1\. How to verify & validate data, clean inputs, handle coding errors and ELT
type issues

2\. How to evaluate data set issues like bias, missing data and outliers, and
account for that (and when you can't)

3\. (situational) How their infrastructure works an what they need it to do
(e.g. for distributed training, if appropriate). How to use it effectively.

4\. How to control data and code throughout lifecycle, so you don't waste time
and experiments

5\. How to choose between approaches and models

6\. How to evaluate performance rigorously

7\. How to monitor performance over time

but here is the kicker

8\. How do you know you are trying to solve the right problem?

For junior people, the emphasis will be on the earlier points. For senior
people the last point is key.

Your question partially addresses some of the early points only.

Off the top of my head suggestions.

\- Have separate stages. Cleanup & verification can have objective and
subjective issues (missing & corrupt data? Outliers?)

\- Don't focus too much on modeling, it's the least interesting part.

\- Allow different toolsets possibly (e.g. R)

\- Initial cleanup/eval stage on a CSV, but following stage pull from SQL?

\- Possibly allow multiple inference choices from same or a few data sets.
Give a short list of things the "business" is interested in, they pick and
describe why

\- good idea to focus a bit on producing one/two graphics/tables to
communicate to a lay audience.

\- more focus on verification

\- add a validation discussion requirement. How are you going to know what you
did is worth doing?

\- add a "next things I would try/do"

The latter is going to be text heavy but no way to avoid this unless there is
a follow on voice/personal interview.

There isn't any way you are going to auto score this stuff reliably, so that's
probably ok. Consequence is your evaluators are going to actually have to be
good at this.

~~~
shikharja
Thank you ska. This is pretty insightful, and actually makes sense. We will
try to incorporate your suggestions as we create more challenges. We are also
looking for more data scientists and developers in general to help us built
these solutions, review them and share honest feedback. Would you like to work
us on the same? I would love to hear your thoughts on the new features we are
building for more Data Science roles. If yes, let me know how to reach out to
you.

~~~
ska
Happy to discuss that, I have a lot of related experience that might help you.
Where can I reach you by email?

------
PLenz
Data scientists (in my opinion as one) should be spending most of our time
listening and talking to our colleagues, our clients, our peers, teachers, and
- last of all- to the data. Last because we dive into the data searching for
things, answers to questions, people asked for and that other people inform
our journey and our methods.

Counter-intuitive in regards to the phrase but good ds is people work first
and data work like 9th.

------
rvivek
Hello folks, would love your feedback on our new product to assess data
scientists.

~~~
ska
I have a number of concerns about the efficacy of this, but they are made more
difficult to rank by not understanding how you are planning to evaluate and
use the results.

Can you elaborate?

~~~
anilgulecha
Evaluation is subjective at the moment, by a review of the jupiter session by
the hiring managers.

For certain data science usecases, evaluation is possible by using a CSV
output bu a user, and comparing that to an expected CSV.

(I worked on the product).

~~~
ska
Ok, although I would be wary of using a numerical comparison for anything
except catching obvious errors.

I should have asked this before, but "data science" is a pretty broad term -
who are you hoping to target with this? I'm guessing for pretty junior
positions but want to clarify.

Oh, and one other question, do you /can you enforce the 60 min time? [edit:
never mind, I answered this experimentally, you do cut it off at 60 min]

~~~
shikharja
Agreed. Data Science is a very broad term. The challenge is designed for Data
Scientists. We are trying to target all experience levels as of now through a
screening/take-home test that should take about 60-90mins at a stretch. Do you
think the timed challenge should be different for senior vs junior data
scientists?

What skills would you consider important for senior vs junior Data Scientist?

~~~
ska
I've added some top level comments.

For what it's worth, I think junior and senior DS roles should have fairly
different evaluations & interviews.

~~~
peterbell_nyc
I think at best this is fizzbuzz for DS, which is not inherently wrong. It's
nice to know a software developer can write a loop and a data scientist can
use a JN, so for weeding out people who have no practical experience with a
given tool set, it could make sense.

The question then is how do you algorithmically (or even just consistently)
distinguish a great data scientist from one who can accurately model answers
to a question that was badly thought out?

Plus as pointed out before, the length of a take home could reduce
applications from the most qualified candidates.

Wonder if this should be even shorter and more quiz/fun like so it intrigues
rather than annoying more senior applicants, and still wondering the best way
to identify the data scientists who ask better questions.

------
tzm
This test is a relative assessment that tests the employer more than the
employee.

