Hacker News new | past | comments | ask | show | jobs | submit login
I wasn’t getting hired as a data scientist, so I sought data on who is (towardsdatascience.com)
403 points by jonbaer on Aug 15, 2019 | hide | past | favorite | 101 comments

This is an awesome analysis of the situation. Some things I have noticed as a data scientist of 4 years so far: - Increasingly, data scientist is just being used in place of senior analyst because it attracts more applications. - At the firms I've worked that are software tech companies, there was an outsized interest in mid-level software engineers wanting to be data scientists, mostly because the career development prospects at that stage are grim and data science usually means a pay bump. This demand has had the opposite effect - software shops are leery of promoting engineers to data scientists for fear of inciting contention among the ranks. - Building on the data scientist usually means senior analyst, it has also come to mean analyst that can build their sql query into a scheduled ETL or daily process of some sort. You work in pandas instead of excel sorta thing. - I have personally gotten all my data science jobs from talking about the business side of things. I think engineers approaching the field from a hard-skills perspective is totally wrong. My last technical take home was in a language I had never used before and likewise my execution was shitty, but I was able to well explain the problem, how the data could be used to predict the variation, and how the data science product fit into the business. I got an offer before I left the building.

The rise of Medium thought pieces/MOOCs has created the conception that data science jobs are a 40-hours-a-week Kaggle competition, whereas the reality is much, much different/less exciting to write thought pieces about. (I wrote a blog post last year about that phenomena: https://minimaxir.com/2018/10/data-science-protips/ )

That is not limited to data science. I remember the horror of an intern when he realised that real work wasn’t at all like solving neat little puzzles on HackerRank. He discovered that he didn’t really want to be a programmer after all...

It would be a true horror if real work is like solving hackerrank puzzles --- everything is predefined and you're given a set task for a set amount of time, surely a robot could do the same?

I mean, hackerrank puzzles IS how most companies interview so it isn't unreasonable to assume that that's what the job is like. Presumably orgs would ask questions related to the problems they are solving during an interview

To be fair that is the same with many jobs. It always sounds like you will be building the next great thing from scratch, but end up spending far more time fixing other people's crappy code.

Ask any Postdoc or PhD candidate :-). Even they aren't spared from the "discover the next great thing" phenomenon. Perhaps the best thing for students/new employees to realize is that while they may be on the path to building the next great thing, that path is full of potholes and grunt work. Will save a lot of disappointment when they hit the road.

True, although data science seems to have an elevated discrepancy of expectations vs. reality.

>as an outsized interest in mid-level software engineers wanting to be data scientists, mostly because the career development prospects at that stage are grim

What are you basing this on? Senior software engineer jobs are a lot easier to come by than data scientist jobs and from what I've seen, pay better than the average data scientist job as well.

I agree with you, but I think data science has the perception of more money and prestige. While senior software engineers often get I paid more, I do believe that data scientists have more opportunity to be involved in strategy and business decisions, which can help one get more exposure to high ranking employees. But this is not always the case.

Data scientists do have opportunities to work with management, but the realities of working with MBAs can be surprising. I found myself reading corporate finance textbooks to be able to fully participate when I started in data science.

This is truer outside of the Bay area- non-tech focused companies that see IT as a cost center, but have somehow decided they need to do 'Big Data Analytics'.

It’s the “it” thing right now. Everyone wants to “leverage” their data and projects in that space get readily funded. It’s also seen slightly better than the typical IT-is-cost-center mentality because of the immediate potential benefit or the risk of losing to a competitor.

I went into Data Science after grad school 5 years ago.

If it had been as difficult then as it is now I would probably have chosen software engineering.

Honestly, I feel software engineering might be better anyway as it's much easier to demonstrate value building features and shipping products rather than endless analysis and questionable models.

I am just going on my experience, but according to glassdoor data scientists of the same years of experience earn about 20k more than senior software engineers here in Boston.

Glassdoor is not accurate


Could be really depending on location.

Current Software Engineer, for full disclosure, pondering how to move away from the prevalent web dev job market where I currently reside. Data science seems to pay on par.

Depends on the company. I fought the senior engineering leadership and HR to get data scientists on the same scale as engineers (they had previously been lower--much lower).

Even if that's true, the OP was talking about mid level software engineers, so they would be leaving a field where they had 5 or so years of experience and moving into one where they had 0.

OP had I right in the first sentence. Senior Analysts not Senior Software Engineers.

I used to work as a data scientist. This title is not what it should be, I witnessed first hand how business thought data science is the magic that they would be able to rely on to consistently deliver impossible work. There are so much junk that gets shuffled our way and have the expectation that gold be coming out of them. It feels like the financial bubble of 2008, except it's with the inflation of the position and the 'clout' of data science.

There's nothing wrong in data science itself, just like there's nothing wrong with mortgage. But the current trend of software engineer/ non-software engineer moving into data-science is not sustainable. Things will break before it's fixed again, I've always considered myself a software engineer first and foremost, just with some extra machine learning/stats knowledge, and I'm glad to be out of that position now as it looks like we're in for a reckoning soon.

> data science usually means a pay bump

That’s interesting. I’m at one of F/G and a lot of the data scientists want to go the other direction to software engineering because we receive about 60% of the RSUs that they do. A few people on my team actually did switch; they said they found the data science work more interesting but an additional $40-100k per year can make a really big difference over the long run.

I think this is dependent on what someone means when they say "data scientist" (mentioned as Type A vs Type B in the article).

Facebook and Google "data scientists" (meaning those who hold the title) are really more like analysts -- they analyze data to inform decisions and use a lot of SQL. They make prototype models (usually based on less cutting-edge techniques) that get passed to engineering teams if they become worthwhile to scale/formalize. These folks get paid less than SDEs usually.

The other type of "data scientist" is basically an SDE (maybe SDE-lite) with research-level ML skills. These get paid similarly (or higher in some cases) than SDEs. I believe Facebook and Google call these SDEs. Sometimes the term "applied scientist" is used to describe these at other companies as well.


At my company, Type A are called "Data Analysts", but at Google Facebook they're called "Data Scientists". Type B are "Data Scientists" at my company, but "Machine Learning Engineers" (or SWE-ML or some other combination) at Google and Facebook.

As a Type B, at my company, I'm on the same pay scale as the SWEs. The Type As are not.

I have a research background and do pretty heavy duty machine learning. It’s not analyst work (the internal job family is “applied scientist” while the external facing title is “data scientist”). It’s still not SWE compensation. As far as I’m aware, there is only one team of data scientists that makes the same as SWEs at my company.

Does whatever you build go into production or do you need someone else / team “take care of that part”. That is where the “data science is just Statistics” people’s wheels come off when they realise production ML needs senior software engineering background.

Some of what I've built is currently in prod at a very large scale (which honestly is a bit freaky). Depends on the particular project though. Our team very rarely hands stuff off to SWEs (although they frequently code review); for the most part we implement everything ourselves.

At one of the national labs whose jobs listings I’ve looked at, they have people in ML/data science and ML/data engineering. The first is in the research department and the second is in IT.

I really am surprised to hear this. I'm about 250 all in and I haven't heard of many software devs pulling over 200 all-in. This is for Boston and I'm 6 years out of my undergrad with no graduate degree (although I didn't go to college until my late twenties and I have noticed my maturity helps a little). Maybe on average it's the same but data scientist at the right company has a higher upper limit?

200k+ is common at L5+ at my company and it’s not Google or Facebook fwiw.

FAANG skews the graph. levels.fyi

That's interesting that there are data scientists going the other way - you don't really hear about that on the outside.

What sorts of SDE positions do these data scientists go into? Are there any additional skills they pick up as part of the transition, or are strong Python/SQL skills enough?

"analyst that can build their sql query into a scheduled ETL or daily process of some sort. You work in pandas instead of excel sorta thing"

In my experience from my old employer...clients like Google get billed $120/hr for SQL analysts' services; ten years ago, staff earned $20/hr or a little more, and today, they've replaced almost everyone with offshore employees making $3-4/hr.

Where is the place that mid level software engineers think there are not good career prospects and they'd get paid more? I can only guess it's a place where there aren't many dev jobs. My experience is in the Seattle area and we are begging for people to even apply for jobs. There are 10,000 jobs easily in Seattle. My company would love to grow its dev force 50% and we can only get people by hiring them away from another company (perhaps an obvious comment :-)), and by hiring new college grads.

If your job is working you too hard, with not enough pay, then people here get another job. It seems harder to high people with some experience at my company anyway. New college grads make 120k+ at top companies (we are a startup but not a unicorn, we pay a little more than that).

Pardon the question: What skills are you looking for that define "mid level software engineer"? I'm a long-time engineer at a single company. I've been more and more tempted to strike out elsewhere as I feel like there is nowhere else for me to move into position/pay wise where I am. I need help to determine how to frame the skill level that I have and/or where to focus on so that I can claim/apply with a certain level of skill.

Title placement is often just a rough estimate based on years of experience. If you've been at a single company for a while, the usual advice is to break your time up into roles/projects on your resume.

Thank you

“Finding 1: most data scientists have postgraduate degrees”

Data Scientists primarily function to tell a story (based upon data) that technicals and non-technicals alike will use in business decisions. It’s critical that a Data Scientist be perceived as trustworthy, since the decision-makers are unlikely to reproduce or even understand the Data Scientist’s full argument.

What signals trustworthiness? A graduate degree from a Harvard Yale Princeton Stanford (HYPS) or similar university definitely speaks well for a candidate. Online degree programs like Coursera / Udacity / etc won’t carry nearly the same weight until their alumnae network grows, and that will require growing into non-technical fields.

What signals untrustworthiness ? Sadly, the “hacker” skills that are so very key in DS (e.g. for data cleaning) are completely at odds with traditional (and especially non-technical) assessments of trustworthiness. Many companies in the Bay Area will look past this issue, but it’s arguably a competitive advantage to simply be able to assess “hacker” skills effectively. That also entails making space for “hackers” at your company. Can’t take hackers? You probably will never hire a good Data Scientist.

I never tried to get a job as a data scientist, and as a matter of fact, I never took a course in statistics as part of my undergraduate CS degree (only discrete probability), but I think a good thing to remember is that there are always interesting jobs you aren't aware of, so don't focus too much on what you think the ideal title would be.

I was called a few years ago by someone looking to hire a support person for scientists using supercomputers. I just couldn't believe my experience (SQL, ETL, data munging, etc) was applicable, and I talked him out of it. But some people might have taken the approach of get the job and then learn what it's about.

That, or, candidates without formal education lack the basic statistical knowledge and/or communication skills required.

PhDs can't code, hackers don't know stats. A good team has both. If you're a one-man show, you need to do both yourself.

Hedge funds often have data engineers who focus on building the infrastructure and cleaning the data.

They also have quant guys/data scientists who use the data to help drive investment decisions.

"PhDs can't code" what a load of crap.

They might trust the person. But Ivy League gives plausible deniability.

Exactly. It's the "no-one ever got fired for buying IBM" of the recruiting world.

Leo Breiman [0] (inventor of bagging and random forests) wrote a paper called "Statistical Modeling: The Two Cultures" [1], and since I read it, I see it everywhere. The basic idea is that Statisticians place(d) too high an emphasis on model interpretability ("data modeling" in the paper), and as a result, missed out on the revolution of machine learning ("algorithmic modeling" in the paper). In the author's words (parenthetical added by me), "[T]he focus in the statistical community on data models (simple, interpretable models) has [l]ed to irrelevant theory and questionable scientific conclusions."

In this TDS post, the author says "Statisticians and Actuaries are at the bottom of the heap as a prior role for existing data scientists." Maybe this isn't a coincidence? Plenty of companies had statisticians on staff, but the explosion of data science happened anyway. Why? Because data scientists do the same types of tasks as statisticians, but while statisticians are of the data modeling culture, data scientists are expected to be of the algorithmic modeling culture. It seems that the market is saying that the algorithmic modeling culture is getting results.

The author references "Type A vs Type B Data scientists" [2], which seems to be getting at the same thing: "The Type A Data Scientist is very similar to a statistician... Type B Data Scientists share some statistical background with Type A, but they are also very strong coders and may be trained software engineers. The Type B Data Scientist is mainly interested in using data "in production." They build models which interact with users, often serving recommendations (products, people you may know, ads, movies, search results)." For whatever reason, there is a correlation between Algorithmic modeling / Type B and "getting things done".

[0] https://en.wikipedia.org/wiki/Leo_Breiman

[1] https://projecteuclid.org/download/pdf_1/euclid.ss/100921372...

[2] https://www.quora.com/What-is-data-science/answer/Michael-Ho...

With the failure modes we’ve seen from deep learning and related AI, maybe the statisticians are on to something?

Hello guys. It’s me, Hanif. Just a response to some of the comments here:

1. I think it’s absolutely fair to criticize this aspect of the analysis: the relative frequencies of the backgrounds of data scientists have been presented as suggesting the success rate from each field. Many of the comments in the post itself made a similar critique. As I’ve acknowledged in my responses to these comments, what we need are the relative frequencies of applicants from the different backgrounds, not just hires. However, one can justify the inference about the success rate of, say, Statisticians and Actuaries if one has the prior belief that the relative frequency of statistician applicants to DS positions should be higher than the observed relative frequency of statistician hires (<1%!) to DS positions. I don’t think this is unreasonable. 2. I make a similar argument with regards to MOOCs/bootcamps: my prior belief is that the relative frequency of bootcamp-only applicants should be higher than the observed relative frequency of bootcamp-only hires. Hence my statement about necessity vs. sufficiency. 3. It’s somewhat more complicated for applicants with both degrees and MOOCs/bootcamps. I haven’t done this, but what I can do is to look at the education distribution for hires with and without MOOCs. If the education distributions were similar, it would suggest that MOOCs have negligible impact. If, however, there is a higher relative frequency of say Bachelor’s degrees in the MOOC category, that would suggest that MOOCs/bootcamps have some value-added impact. 4. An ideal prospective study for the above would be to extract a sample of individuals from a precursor role, say, data analysts (hence naturally controlling for education). Note which of them have MOOCs or bootcamps, then follow them up in time to see how many end up as data scientists in each category. 5. I might actually change that profile picture. It’s 3 years old, in more innocent times. 6. As it happens I have landed a data scientist position in Singapore and will be starting in September.

Nice that you showed up to respond to this stuff. Instead of a blob of text, I suggest an edit to your comment to put each numbered point in its own paragraph. It will make it more readable. Lots of folks here sort of mix skimming and focused reading with the opening sentences helping them determine what they should invest time in.

Thanks nick. I’ve been trying to but the updates are never reflected :(

The oldest rules...


...mentioned 2 hours as the limit on an edit. I don't know what it currently is. You're past 2 hour mark, though. Might explain it.

It requires double instead of single line breaks

> For myself, it was worth noticing that Statisticians and Actuaries are at the bottom of the heap as a prior role for existing data scientists.

This has a lot more to do with the relatively small number of statisticians and actuaries out there than it does the odds of people from various backgrounds transitioning into data science roles.

Exactly. The data is about the backgrounds of data scientists, but is incorrectly interpreted as the probability of becoming a data scientist given a certain background. Obviously the two are related (Bayes' theorem), but to draw any conclusion one would need to know the number of PhDs, Masters, etc. that are applying to become data scientists. For example, the fact that a small fraction of data scientists has a MOOC degree does not imply that the probability of becoming a data scientist if having "only" a MOOC degree is low. For all that we know the few people in the market having this kind of non-traditional preparation could have 100% success rate in getting those jobs.

(Which seems like the kind of thing you would hope someone applying to Data Science positions would have considered.)

Actuaries are trying hard to market themselves against “data scientists”. Adding basic intro to statistical learning type material and r programming to their exam syllabus.

One can get an idea of the relevant frequencies from table 13 of the doctoral completion survey here [1].


> ...I had conflated the practice of data science with the strategy to become part of it.

What an excellent analysis that applies far beyond Data Science.

Perhaps describing themself as "Statistician, Data Scientist, Software Developer" might have a better hit rate against the skimmers who pre-screen the resumes. An honest-to-ghu statistician who became a programmer is much more exciting than someone who looks like a programmer attempting to leverage themselves into a new hot sector.

Like any other job hunt, it all comes down to networking with people you hope to work with. Blindly learning techniques won't get you anywhere if you don't have anyone who will vouch for you/introduce you to other people

It's too bad I didn't figure out until after college that college was about meeting people and networking and not about coasting through the easy classes.

the idea is to drink a bunch of beer while coasting through the easy classes and then the meeting people and networking just happens

Not if you're a Loner™.

Doesn't that defeat the premise of the tech industry as a meritocracy?

I'm not sure there's any reason to believe the tech industry is any more a meritocracy than most other industries. Look at sales just about anyplace. Your success is very tightly linked to how much stuff you sell.

In any case, how much merit someone has often isn't obvious. Pretty much anyone involved in hiring will tell you that internal referrals are one of the best ways (if not the best way) to at least figure out who to put into hiring pipeline.

> Doesn't that defeat the premise of the tech industry as a meritocracy?

If you don't do any marketing of yourself, nobody is going to know about your merit. "Build it and they will come" is a Hollywood fantasy, not reality.

I think this is unfair. There are lots of jobs out there, and lots of employers looking for people with specific skills. The reason networking works is because discoverability is a problem. If I apply to every job that listed my skills as required I'd waste 90% of my time because my area of expertise is so specialist that only 10% of the people who have all the skills listed are experienced in the right ways.

Whereas if I network with people the jobs they recommend are far more likely to be a fit because I've networked with them - they know my skills.

> I think this is unfair.

Fair or not, it's how most everything in life works. For good things to happen you, you have to put yourself in a position where good things can find you. That means marketing yourself. It applies to getting your dream job just as much as it applies to getting your dream partner.

Even if you get your dream job, talent and hard work simply isn't good enough. You'll need to be able to sell your ideas to others in the company.

Like it or not, it's how a lot of job hunting works. I haven't gotten a job in over 30 years that didn't come about primarily through one or more personal connections.

I realized that what's inside a person doesn't count because no one can see it.

I didn't realize you were such a philosopher.

That's my point!


Upvote for Classic Dilbert!

Only if you think the merit of being able to form social connections and work with people is orthogonal to the skills being hired for. I'm not sure it should count for as much as it sometimes does, but how we measure "merit" and how it translates into outputs is a complicated process and I'm fairly sure it shouldn't count for nothing.

Although the networking stuff is arguably the most circular of all of the criteria for hiring someone then.

"Why should we let you join our group?"

"Because I'm part of your group."


Also, frequently these issues become very Rashomon-like. Why did Bob not get along with anyone at site X? Is it because of Bob, or X, or because they were a poor fit for each other?

What's infuriating (or depressing) about this stuff from my perspective is that there's an implicit assumption always that the person complaining about not building networks has not built the networks because of poor social skills, rather than problems with the networks themselves.

I'm not naive about social connections, but in my experience the social skills stuff is vastly overrated. Serious problems get ignored when it's a friend, and molehills are made into mountains when it's not. It tends to devolve into petty gossip and junior high infighting.

There are 10's of thousands of CS majors every year, and probably hundreds of thousands of people who think being a data scientist would be "fun". No company is going to weed through that pile to find the literal "best" candidate, because acquiring that information is expensive.

> Theoretically, this is what testing is for. How many companies do that?

definitely not. soft skills are as important as hard skills are in the workplace. demonstrating effective networking skills shows that you are personable and could be tolerable in a work environment. its just as important for your co-workers to be able to get along with you as it is for you to get your work done correctly and on time.

It starts to resemble a meritocracy once your cv is in the “for interview” pile.

The expected shredinger-distance of your cv is given by G * N^2, G being cv weight and N your networking coefficient.

The tech industry is only a meritocracy at the edge. IT's large enough now that schmoozing is enough to get a job.

It’s a faulty premise. Or, at least we should include ability to network and/or make friends in high places in the definition of “merit”. Does anyone seriously believe that Facebook achieved its early success because Mark Zuckerberg was that excellent of a developer? The very name of it and the apparent utility and appeal, came from Harvard having a print Facebook. Not to mention an elite, influential userbase. Zuckerberg was obviously a talented and inventive developer, but at a large public university, he might have instead created the next Hot or Not competitor.

Not necessarily. I'm not going to vouch for someone who is terrible. That reflects poorly on me.

I will vouch for someone who I know and is not terrible. I'm not going to vouch for someone who I don't know. So there is an element of having connections. There is also an element of social skills as in any endeavor that involves more than one person.

It's most definitely not a meritocracy.

No it just reveals the weights of the economic indexes for new hires

I am looking for people that find and solve problems on their own, especially for data science roles. The Linkedin profile (hanifsamad) is all about how he is good at solving problems given TO him. Indicators:

- "I am for the problems worth solving (..)"

- Use the default header on Linkedin

- No sign of roles or additional engagement that indicate 'self driven problem solver'

I love the article, but I need actionable insights.

I sure as heck put a lot more weight on his clever approach and execution in gathering data on a problem he discovered _on his own_ than I do on what header (??) he used on LinkedIn.

It’s a good sign that you question what is given to you

> - Use the default header on Linkedin

Lol are you serious?

Is the profile header on LinkedIn a robust hiring signal?

Did he ask you for a job?

I'm not exactly sure what you are responding to, or complaining about for that matter.

I'm in the US, but I have worked with a couple of folks from Singapore, so I'm at least going to attempt to make what may be a culturally sensitive suggestion: I would consider a somewhat less 'free, young, and happy' LinkedIn picture. Collared shirt, short sleeve, empathetic smile. Your pic makes you look like you're 16. I suspect you want to look like you're 25-30. Your prospective employer wants to imagine you listening to them explaining their problem, prepared to work with them on the data collection and software development. Look like that.

Data science is so obscure the author is right to point out how there's a real difference between what a data scientist actually does and what he needs to do to actually get a job in data science. If you can write queries and understand product metrics, that's about half of all of the data science jobs and interviews.

But IMO the field is getting to a point where the engineers are going into machine learning because it's a pay bump and the data analysts are realizing they can start calling themselves data scientists with some more experience under their belt.

Ultimately the field is saturating to where people are now going into this field as an easy way to hit six figures. You see that with the rise of the data science bootcamps but I digress as I wrote more about it here: https://www.interviewquery.com/blog/the-saturation-of-data-s...

Is this a situation of garbage in, garbage out?

That is, there might be more interesting distinguish factors, but he was limited to education, position, and years of experience?

I started out my career as a 19 year old in business intelligence and old school data warehousing, and only in the last three years been able to properly apply my skills in the world of big data as a data engineer, and I've found that regardless of the fancy titles the kind of stuff I do is exactly the same. Perhaps the most surprising thing is that because "data engineering" is disassociated from the notion of traditional warehousing, you get lots of "experts" who have never heard of an ETL and think about software instead of data pipelines.

And with data scientists, I've found that it's a mix of a) people who did mathematics or physics degrees suddenly getting into computer science b) senior analysts learning how to power up their analysis and c) computer science graduates who went on to do phd in data science

Working with them my humble opinion is that a) you can't ignore the software aspect of your job, meaning that you need to understand basic database principles, parallel computing, SQL, etc. b) you also need to understand that it's not about how fancy your algorithm is, but also how it can be quantified, how you can manage the life cycle, how you maintain it, etc.

> you can't ignore the software aspect of your job, meaning that you need to understand basic database principles, parallel computing, SQL, etc.

I still find it amazing that so many data scientists I know do not understand basic data software principles. Stuff like distributed vs parallel, database types (NoSQL vs RDBMS), immutability etc.


“Finally, I would note that while the data is silent on the necessity of skills acquired from non-traditional certifications such as MOOCs and bootcamps, it does suggest something about their sufficiency: they clearly aren’t. A postgraduate degree is a far better indicator of your prospects as a data science hire. ”

Superb. This analysis shows that you are not only able to process data, but that you will creatively find opportunities to make data useful, in ways which other folks in whichever company you end up in may not even think to ask for. If people with data science openings are not attempting to hire you from this thread, they're crazy.

This is such a mind booglingly obvious way to find the skills needed to get a position that I am amazed I have never heard of it before (I wish I had thought of it!)

It also shows what Inthink is the most vital skill of a Data Scientist - to go and find the data to support the question

nice one

Analysts have been doing data science long before Data Science was a thing, unless i am missing something, it's a rebranding of tasks that have always been performed.

My grandfather was an “analyst” for GE, which involved a lot of physical calculation, but also FORTRAN and COBOL. It used to be before programming became its own profession that managers, accountants, secretaries, and other disparate jobs were just expected to program off-handedly. It’s interesting that we’re somehow in the same place again, but in reverse, having to make new names for jobs that involve programming but not “software engineering”

Sidenote for anyone who encounters the same: Medium is giving me a paywall, so I opened a private window and it works. Does anyone know how Medium paywall works? Is it the author that says "premium" content, or is it medium based on traffic?

The author determines if it is premium or not. Medium visitors get an limited allocation of premium articles they are allowed to view for free each month.

I can never read medium.com so I use outline.com.

anecdotal evidence ahead ...

I wanted to be a data scientist at a top tech company, so I did as Hanif did, and went to the data on LinkedIn. my search was more specific - only data scientists at top tech firms, and is also a very tiny sample.

But first, my situation at the time: Masters in Computer Science at the University of Pennsylvania. strong database, AWS, spark, and python skills. worked in a social media research lab that looked social media impacts on health, mostly did NLP involving twitter and health outcome data. Coauthored a paper that ended up in JAMA (journal of american medical association). Eventually I got what I wanted, but it wasn't easy.


- message recruiters directly

- find a way of showing them you are a good candidate beyond your resume - I found kaggle was really helpful, I recommend it.

- be careful of getting pigeonholed out of DS positions by recruiters. your LinkedIn should speak 'I am a data scientist at heart'

- be prepared to fail interviews and learn from mistakes

- study stats (youtube,books), coding (leet code), and SQL (leet code?)

Findings: Degree - Need masters or PHD. Major - Statistics or some version of it was most common and MBA's at top MBA programs 2nd, Computer Science very rare. why MBAs? probably because those programs had wonderful stats programs. School - Top schools are very important. previous job - Intern at a top tech company. Intern as a data scientist hugely beneficial. Next most important feature was whether the previous positions was data scientist. Not data backed, but I would argue that becoming a data engineer to get adjacent skills is a bad strategy. DE are highly needed, a recruiter will put you on DE loops, not DS ones. I feel like data analysts also struggle to become DS.

I thought my situation was at least somewhat ideal, but I was not getting interviews. 0. Its hard emotionally to not be able to get to where you want without been giving a chance, just got to keep trying. Findings helped me realize (previous job) that I was going to need to go about getting interviews in a more efficient manner.

I needed a way of getting attention. the reply rate of websites seems to be 1/50, which is problematic if you want to work for a specific set of companies. I think the best thing to do is to go on LinkedIn, search <company name> + recruiter. Message the recruiters directly, they have all the power in setting up phone screens, and they send batches of candidates to open positions hoping some of them will get a role. Now you got their attention, so you also need a way of getting into those batches.

An important metric for success was already being what I wanted, so I had to find a way of saying 'I can do this'. I starting spending most of my free time on kaggle, the zillow home price prediction. I finished top 100, which I STRONGLY feel this helped me get interviews. I recommend it. Its a free, zero risk way to get experience and display your passion/skills.

Next, I got some phone screens but failed a few and failed an on site. Technical phone screens are either stats, coding, or SQL - never been asked ML questions. Sometimes I failed coding questions, sometimes I failed stat questions. I addressed this by studying lots of stats (YouTube was very helpful) and coding (leet code). I already had years of SQL experience, so those questions were always easy for me but be prepared to answer the histogram question. I had some recruiters tell me in initial phone screen 'even though you applied to a data science position, we think you would be a better match for this data engineering position instead'. ouch, I realized my LinkedIn and resume looked very DE like because of my years as a database administrator and I added lots of spark/HIVE to my resume because I saw that on most DS postings. Its important, but don't over highlight the wrong things. I politely declined and kept trying.

Eventually I got exactly what I wanted, and I am very happy for it. It took me 2 years after graduation to get there, and I had failures at all parts of the process. I know it sounds cliche, but keep trying is my best advice.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact