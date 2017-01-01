Hacker News new | comments | show | ask | jobs | submit login
Some Reflections on Being Turned Down for a Lot of Data Science Jobs
Many recommend setting up an online portfolio for these types of positions. But I've applied to a number of Data Analyst/Scientist jobs recently and I am immediately rejected almost every time despite highlighting my blog/portfolio (http://minimaxir.com) and my GitHub with open-source code/Notebooks for each and every post (https://github.com/minimaxir), both of which have topped HN on occasion.

Internal recruiters have hinted that my Software QA Engineer background + no CS degree implies I have no technical skill.

The first job is the hardest to get. Keep trying, lower your expectations: Take anything you can and don't expect to be paid much.

Contrary to what the reddit folks would have you think. The ONLY thing that consistently get someone through the door is demonstrable real work experience, on real projects, in the industry (read: not academia). Side projects are not a substitute and the lack of degree is a barrier.

I recruited for one data science job. The hiring manager was super focused on getting someone with a PhD. Is that the norm for a position like that?

Yeah. The problem is recruiters are almost universally non-technical, so they can't properly gauge your talent nor do they understand the job requirements fully.

I've seen your posts on Spark. That's sad to hear.

Are you applying online or via getting coffees with people in the data science team and just talking shop? The latter seems to have a high conversion rate if you can nail it.

Send the recruiters your Reddit posts in the r/beautifuldata of a/e that subreddit is. I'm sure the recruiters spend a lot of time on Reddit. Recruiters are ridiculous.

This is a bit off-topic but one thing I'm curious is how the author manages to interview with these companies while holding down a full-time job and keep up-to-date with the latest developments. Interview for a data science position is typically a drawn-out process, with multiple rounds of interviews and possibly take-home projects. I found them to be very time and energy consuming.

To share my story, I also had a difficult time transitioning into a data scientist role after leaving academia (pure mathematics), and I always thought the root cause was my lack of experience and competency. So instead of keeping on applying, I spent over a year just to sharpen up my skills. It paid off in the end.

How can one develop his/her skills and cultivate expertise if one is job-shopping all the time (possibly aimlessly)?

I think being a good DS requires focused learning. For example, when I started out, I didn't know much about neural networks and their different architectures. I tend to find time outside work to go over papers, thesis, or watch lectures on youtube to keep myself up to date so that now I can describe cnn and rnns to tech and non-tech people.

A better but more difficult approach is to distinguish oneself and have the companies go after you.

At least you have been told the reasons, even if they are true or not. I recently had many tech/behavioral interview with team and passed, but after a review with VP of X, saying we decided not to move forward is way worse than this. I still keeep wondering 'What is wrong with me?' even after years of interviews. I have some suspicions, but never a promising answer. If I oneday found a company, first company policy would be to tell the candidates to tell why they were not hired in a polite way.

How about talking to a therapist / counsellor / coach that specialises in professionals?

Even one or two sessions can help you a lot - they are trained in pinpointing "soft" issues.

My analytical thought process on DS interviews:

- Signal is still quite low among noise, even with long multiple interviews, take-home homework, coding challenges, etc. Most relevant data is still hidden and takes months-years to come out.

- Companies seek to minimize false-positives much more than minimizing true-negatives.

- It's a numbers game from both ends because the probabilities are low, due to above 2 points.

Companies often use interviews as a time to figure out what they're really looking for.

For startups, this transcends data science. It might be the one time that week they focus on that need.

Networking is still king.

Exactly and this also argues against wanting to get hired to work remotely.

Is there a typical path for getting into Data Science? Posts like this make it seem like it's the wild west

The real reason you were rejected, or anyone applying in a highly competitive field: an oversupply of qualified candidates.

In this particular case (data science) it is more an oversupply of candidates (qualified and not), plus difficulties defining and measuring "qualified", plus buzz.

It's a difficult enough field to hire in when you understand what it is (and isn't) - and lots of companies are trying to do it with far more vague goals.

I think this is closer to reality. Tim is not competing with many folks like him -- he's a knowledgable, experienced, and capable data scientist with significant infrastructure experience, and a net positive to any team he joins. Some problems possibly originating from the company perspective

(1) They are inundated with applications folks of all sorts of backgrounds: engineering, finance, academia, marketing, BI/analytics, etc.

(2) They still haven't figured out hiring. To be fair, no one really has figured it out. Jeff Kolesky recently covered this as part of an excellent blog post. [0]

(3) In addition to the typical variance in engineering interview processes, we now introduce variance in the definition of data science across companies, which just complicates things further.

(4) Basically everything else Tim mentioned in his post: role or goals aren't clearly defined, remote data science is an unknown, etc.

[0] http://kolesky.com/datums/job-search/

Why do you think candidates for data science jobs aren't qualified relative to other positions?

reply


I've probably interviewed about 70-100 such people in the past year and a half. Exactly 1 such person was qualified (I hired him). The issue in my view is the following: people who know both statistics and computer science are extremely rare.

People who actually understand statistics are rare. I can probably weed out 1/3 to 1/2 of candidates simply by asking what a p-value is, or what precision/recall are (this includes people who said they worked in search).

Of the ones who know basic stats, most are neither good at nor interested in programming. They just want to use existing libraries to crunch numbers in a Jupyter notebook, then hand that off to the developers.

Finding a person who can come up with a predictive model, understand what they did, optimize it without breaking it's statistical validity and deploy it to production is very hard.

(If you can do this, I'm hiring in Pune and Delhi. Email in my profile.)

Ignoring what a p-value is does not mean that you don't know statistics. p-tests are not some inherent statistical property, they're just a useful model for significance. People coming from a CS background most likely didn't have to deal with p-values, but they can still be good at linear algebra or bayesian statistics.

(not sure I can defend somebody that does not know what precision/recall are)

Regarding precision/recall, I've a background in financial econometrics and this is the first time I encounter the concept.


I'm surprised people bother spending so much energy looking for someone who is both a statistician and a computer scientist knowing they are so rare. There are so many more statisticians who can at least communicate and work effectively with developers and vice versa. Why not just compose a team? I feel like just like other professionals have assistants, statisticians should have them too, and they'd be focused on the computer science and deployment of the applied statistics.

This is a classic problem that shows up equally with lots of related areas: numerical work, statistics, ML, signal processing etc.

"just compose a team" sounds easy, doesn't it? Unfortunately there are lots of failure modes involving different parts of the team not really understanding what each other are trying to do, let alone what they are doing, and subtle errors getting by people who don't know what to look for. So, you can find such teams and some of them work well but a lot of them don't.

So an alternate is to try and find or create domain experts who mix all the appropriate skills, but this is hard and in the extreme case involves chasing down unicorns.

Companies and industries flop back and forth between preferring different approaches - right now a lot of people are talking about "data scientists" as one of the latter, but it will likely change over time as it always does.

It's a hard problem, and it shows.

I have similar experiences. To add some color: I find that for data science tasks, someone who knows statistics & can program is much, much more productive than someone who only knows one. Part of that is because data science has to do their own product management–the question you ask next changes quite rapidly depending on the results of a single query.

That said, most companies should probably be hiring data engineers rather than data scientists–for most "data science" jobs I've seen, almost no statistics is actually necessary/useful.

reply


This is also very true! Usually we hire AI/stats folks and do a heck of a lot of training to get them up to speed on the development side of things. You can do it the other way around, but math is a lot harder to pick up outside of formal education than computer stuff.

Two main factors make data science stick out a little for me right now, although it isn't unique.

One is that there is buzz & excitement around "data science" right now. Nothing specific to this area, but in my experiences this creates a large number of under- or un-qualified applicants. It also creates an environment for companies to desire to hire a role they are not well qualified to hire for. It is really difficult to hire well for roles you don't understand well.

The second thing is that extremely few people are actually ready for this sort of job straight out of an academic program. A related Ph.D. or post doc plus a few years solid training in industry can make you a great candidate, but the academic work alone usually isn't even remotely close. There is confusion about this among both candidates (don't know what they don't know) and hiring managers (don't know what they are actually looking for).

Add to that an oversupply of academic credentials relative to academic jobs and you have a problem. If you are a large company with a well defined data science program and a defined "entry level" data science role, if you take skill development and training seriously and have the senior staff for it, well then you are fine taking strong academic candidates and turning them into talented data scientists. If you are a less experienced company looking for scientists to solve a problem you don't fully understand, you may be in for a pretty rough ride.

Not OP, but I think many companies aren't qualified to judge who is and who isn't a qualified candidate, at least for the first hires. This turns the whole thing into a "market for lemons".

I've helped a few organisations solve this bootstrap problem by helping out with candidate selection and interviews, but many other just don't ask for help.

But, wait! I thought there was a "shortage of tech workers."

reply


Which must necessarily be urgently addressed by open immigration for anyone who can write code! The future of silicon valley demands it!

"Networking is still king"

My takeaway from it.

I think that just "networking" would be the wrong takeaway, though. The author[0] is pretty active on Twitter / the blogosphere and, even though I don't follow him personally, his writing has popped up on my feed a number of times. He's likely meeting people and having interesting conversations ("networking") because he has interesting things to say.

[0] https://twitter.com/tdhopper

Not just blogging, but Tim is also active in terms of attending / speaking at various industry conferences and what-not. Whether he is doing it simply to "build his brand" or because he genuinely enjoys it, or both, is something I can't speak to. But I think he's definitely "meeting people and having interesting conversations".

Huh...he went to my school...

