Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What's the state of the job market in data science and machine learning?
204 points by TXV on Dec 21, 2016 | hide | past | favorite | 137 comments
If one were to use Hacker News as their only source of information, it would seem that machine learning is a very overrated topic. There is something related to it on HN's front page almost every day. This proliferation of courses, resources, books and startups would hint that machine learning is becoming more and more accessible to the average programmer and that the market is on track to getting saturated quickly. Is this the current trend? If yes, is it limited to the US? What about the machine learning scene in Europe? Maybe someone here could provide some perspective.

Speaking for NYC, but I imagine silicon valley is similar.

The supply-demand dynamics have changed a lot in the last couple years. I'd roughly break it out into two groups: people with work experience + strong software development skills, and those without. The first group is in higher demand than ever, and tend to add a lot of value to companies that really need it.

The second group has gotten extremely crowded, especially from STEM graduates - usually with a masters or phd - who have completed MOOCs or bootcamps. Supply keeps growing while demand is flat or shrinking (especially as executives get burned by "data scientists" who don't know how to help them build things of value). There's a huge crunch here; a lot of people I know in this group have been searching for jobs for months, eventually settling for a low quality job or giving up entirely :(

I've only been hiring DS folks since 2012, but my experience matches what you've said exactly. The biggest differentiator I've seen is to be able to participate in actually building production quality systems vs being proficient enough in R or python to hack together a prototype on a very small dataset.

The former kind of data scientists were very successful at our company, the latter, not so much. Both categories I described usually had a STEM type PhD.

That sounds weird to me. Does American PhD don't have to work for a few years at real companies as part of the PhD curriculum?

I think the scenario they are describing is for say math/physics PhDs who are transitioning to data science. During their degrees, they concentrate on research, so they don't (on average) have real software engineering experience.

no. it's even a stereotype that they've never worked at a company. where is it required that PhDs work at companies to receive a degree?

France. There are various internships dues during the PhD and the final paper is done while working 1-3 years at one place.

Basically, everyone has experience when they graduate from a master or a PhD there.

Not all experiences are born equal. e.g. Some work is at top tech companies, some work is at universities or low quality research centers with little work & expectations.

I am used to the PhD being an academic, not practice (trade) degree. This is true even in fields where the PhD is pretty much required for work in industry (medicinal chemistry, biology etc). And even in computer science where a PhD can even be a hindrance to getting an industry job, the PhD is pretty much university only.

I am curious what countries it works the other way around.

The problem is DS is really 2-3 different disciplines under one nebulous title. What you're describing is folks who are prototyping and productionizing models. That's definitely in short supply, but random STEM PhDs are in no way competitive for those roles unless they're coming from CS programs + have work experience in production engineering.

But that's by no means all of the DS field. There are lots of DS jobs where you're collecting and interpreting and communicating about complex data sets. An engineering mindset is occasionally helpful, but a bias towards building versus towards analyzing and writing can just as often be counter-productive. Not all problems are solved by systems; lots of problems are solved by better understanding the problem and then letting other specialists build the right solution.

The bootcamps have contributed to the problem by focusing so much on building things. The idea that you can go from an econ undergrad to being a self-sufficient member of a production ML team in 6-12 weeks is nuts. What's less nuts (and what I wish programs like Insight focused on) is taking people from having data skills in one domain and with one set of tools (e.g. logitudinal medical record data, stored in CSVs and handled in Stata) to another set of tools (billions of rows of event-based product data stored in a data warehouse, processed in R or Python). But instead the bootcamps behave like the missing skillset is the ability to make a predictive random forest model on some arbitrary data set and build an AWS web app around it. THAT job market definitely doesn't exist and is completely over-saturated.

But people who are smart communicators about data, can manipulate and make sense of massive data sets, can ask incisive questions about their data, and can use data to convince people of a complex argument are always going to have job opportunities, even if they're not production grade engineers. If that sounds like you, I'm hiring - hit me up on Twitter: @drewwww.

"Not all problems are solved by systems; lots of problems are solved by better understanding the problem and then letting other specialists build the right solution."

Agreed wholeheartedly. Reminds me of another quote from http://www.john-foreman.com/blog/surviving-data-science-at-t... :

""" You know what can keep up with a rapidly changing business?

Solid summary analysis of data. Especially when conducted by an analyst who's paying attention, can identify what's happening in the business, and can communicate their analysis in that chaotic context.

Boring, I know. But if you're a nomad living out of a yurt, you dig a hole, not a sewer system. """

I was part of the Data Science team in my previous company. We mainly build models for production, but we also were responsible for generating both daily and ad hoc reports. We tried to hire someone to take over the reporting part, but we found out even that requires engineering skills. This role ended up to even more difficult to hire for because it's hard to find someone who has the engineering skills but wants to work only on reporting. Maybe if we had a dedicated data engineering team the story would be different.

That is me. A decent developer who gets data and enjoys reporting, particularly the variety of hard problems that crop up.

I am happily serving a niche market with my own company and I suspect part of the difficulty in finding the skillset is that we can just start our own thing when we find a domain that we like.

Yeah, in my org we have a dedicated data engineering team that owns the pipeline and production data systems plus an analytics team that owns reporting and making data available to non-technical data consumers. That leaves complex research work for data scientists who are (in theory, anyway) building on top of stable data infrastructure and good data sources and free from ad hoc reporting work.

In France, and other Francophone countries, Statistics and Information Analysis is an engineering discipline.

I'm coming from a PhD in STEM where I did a lot of application of basic ML to my (neuroscience) research, and it took me a good while to get a data science position. But once I got the job, I have been inundated with interview requests, both from recruiters and from specific companies, as in multiple a week (granted most are blast-em-all style recruiters, that I'm sure anybody with any tech skills get). Maybe it's because I'm on the East Coast? I do feel like everybody in neuro is jumping on the bandwagon, and that generates a bit of "you don't belong here" feeling from the CS- or math- educated crowd. But over time I think this will all level-out.

I'm not coming from a CS background and don't purport to know absolutely all of the details of all the mathematics and theory behind many of the machine learning algorithms that I use. I try my best daily to expand my knowledge, understand the algorithms, and apply them appropriately. I would hope that any company who is looking for someone who has a PhD-in-CS-or-ML could weed someone with lesser knowledge out during the interview process.

With that being said, a couple lines of code using SciKit Learn and all the default parameters is enough to impress many non-tech companies that are looking for a way to use 'predictive' in their marketing materials. And they pay very well for it. I get the feeling that provokes the ire of people who think those types of basic implementations belong to the traditional label of 'data analyst'.

For what it's worth, I work with data sets that aren't quite large enough to justify anything more than Python, Pandas, SKLearn, Luigi pipeline, and PySpark. The vast majority of my time is spent cleaning the data and generating features, must less on the hyperparameter tuning, model training side itself.

Anyways, I think I'm rambling a bit.

I just want to say that I LOVE this job, whatever the label is, or whatever the hype surrounding the label is, and I hope it's around for a while before it's automated...

> a couple lines of code using SciKit Learn and all the default parameters is enough

Except for when it's not - those are the cases you're hired for.

There are also no shortage of "Data Science" projects that are basically just showing off (e.g. somebody in a large org trying to get recognition for being advanced) or using an algorithm to justify a decision that somebody had already decided to make. In this case, it's okay to solve the problems with a few lines of code and a nice presentation.

But if you have an important data driven decision that is going to cost you a lot of resources (money, people, time, etc), you want the analyst to be able to speak to what the data and/or model is telling you and why. When you have this type of problem and it's "solved" by a few lines of SciKit Learn, you should be prepared to have a really bad day at some point in the future. This is the type of Data Scientist that draws the ire of people who do analysis with more depth.

Let's be blunt: firms needing real data science know where to go and find real data scientists. Anything less real, on both sides, is matter of opinion and here it comes free market.

Even outside Silicon Valley this is very similar. When I post a programming job I get more applicants from data scientists than from actual programmers. When I post a marketing job I get more applicants from data scientists than from actual marketers. It is clear to me that supply has far outpaced demand.

I'm getting burned because I have a STEM PhD and am a competent programmer (have been programming on the side for decades) and am looking for a not-necessarly-data-science job, but it seems like all the places I'm applying to want to bin me in the "data-science" category.

Do you have advice for someone in the second camp? I have a master's in econ and I'm comfortable in R but I don't have good development skills. I'm wondering if I need to give up on data science jobs for a bit and try and find an entry level software development job

Target research-intensive jobs. Job descriptions are clear about what sort of machine learning professionals the company is looking for. In most cases, they want data engineers and developers who can make big, scalable production. Not researchers.

Just find who's looking for economists, specifically. Consulting firms and government agencies hire economists, but they don't call them data scientists. Investment funds may be interested, though hedge funds prefer math and physics graduates because they are easier to train.

Take whatever job you can... as far as it's related to the kind of things you want to do.

Forget about the big companies that everyone knows, forget about the high pay that appear in the newspapers. It's not real, it's not for you.

Get an entry or junior level position and get experience. There are many unknown places that will take people only because they're cheap. That's where everyone started.

Thanks for your comment. It has eased some of the anxiety caused by reading through this thread. My current job fits your description. I currently have an analyst role and I spend maybe 25% of my time building predictive models in R. I just haven't had any success finding a role that ups that percentage (and the pay). I know I need to be more patient, but it is difficult.

I'm looking to go through a DS boot camp this summer, have a master's in aerospace engineering. I'd be very content with mid-five figures (40% lower than what I made as an engineer) at a no-name company if it's in the city that I want (not SF). Are such lower-end data analyst positions in decent supply?

I think what you say is easily applicable to software engineering in general. Data science maybe is a field that is even more negatively impacted by bad hires because the threshold after which you start adding value to the company is higher.

This is correct. I've worked in industry for about 3 years, much in a top industry lab that built a product you probably use. I've got a couple papers, including a really good one. I've worked as a software engineer. I hopped jobs twice in a relatively short frame, with a 20% and 40% raise each time. The market is on fire if you have software engineering competence and real industrial experience.

Isn't it 2 profiles of engineers? Those who make production code and those who work in prototypes? One thing is to understand why your model isn't converging, another is how to scale it up...

A machine learning model is core to my company, and core to that is pulling in, cleaning, and working with large dataset from a variety of platforms. This means:

- excellence in being able to clean data at the column level for millions of data points - knowing how to work with such large scale data in a time efficient manner. One of our newer hires worked at the Postgres/SSD level to optimize and got it to where he could produce a full set within 5 minutes. Before that it was once every few months.

Being able to do these things is a prerequisite to building even a prototype of a model, and it requires substantial real-world programming experience to deal with those complexities.

2 profiles:

- Research departments that make prototypes that are consistently abandoned and never get any real world usage.

- Actual companies that make prototypes for real business cases that are then shipped to production, maintained and improved as they bring in $$$.

The first experience is of limited value to the world of real business.

the data science people I've worked with in multiple VC backed companies were not engineers at all. They were either "data scientists" or had engineer tacked on to the name.

I'm continually exposed to new kinds of software engineering roles I never heard of at tech companies. (fwiw software engineer is sometimes still considered an inflated title.)

How do you test for programming ability in an interview? Especially for people without a lot of work experience (e.g. graduating from college)

One data engineering role asked me to implement k-means from scratch and one data science role asked me to do some algorithms whiteboarding. But beyond this, people just asked SQL. From the rest of this thread it doesn't look like people would consider that to be 'SWE ability'.

Set someone the interviewee a programming task. This is pretty standard for developer jobs, and is absolutely fantastic for people without work experience (as it gives them another way to probe their skill).

I mean, I did a bunch of take-home tests. But those were focused on doing an analysis on some dataset, focused on answering a question.

Should data scientists be expected to do more than that? Or are you expecting a certain level of code quality from them? Within a 48 hour time window?

There's always room for smart, hard working and creative thinkers in any field. Except maybe marine biology. But as in any lucrative field, e.g. law, you have a wide variety of capabilities and work quality. After 10 years in quantititative work (I got an MS in Applied Math before they were calling Statistics "Data Science") I can say that it's just a tough field to work in. Experience goes further than education, but education (yes, a real accredited graduate level education) is necessary. The most challenging aspect of DS isn't the technical aspects, it's being able to have a thick enough skin to not let the skeptical and reluctant engineering types upon who you depend to implement your brilliant models get under your skin and enough patience to explain and convince the skeptical and not too savvy business types who cut your checks, that their intuition is wrong and your math is right. And then of course, there's the inevitable boredom that comes with solving yet another mundane business problem with the simplest and least sexy tools. I'm not complaining, just saying that almost all STEM graduates turned DS I've ever worked with have a small hollow spot in their soul that burns with passion for the astrophysics or theoretical math problems they traded in for the perk filled corporate life.

3 easy steps to get a job in DS if you want them though: Grad Degree in Math/Stats/CompSci; work on a bunch of hard to predict problems and then publish and present them to your local meetup community to gain experience; learn engineering tools and devops and be about 90% as good a software engineer as your team's actual engineers (git, hg, IDEs, java, pig)... your brilliant models are way less important than being able to help the already overwhelmed engineering team make them work.

Frankly, MOOC-type machine learning- being able to do plain vanilla logistic regression or black-box deep learning techniques is not enough to get a job. This knowledge and experience has to be paired with one or more strengths: excellence in programming, grad-level math/stat/numerical skills/theoretical machine learning, domain specific expertise or experience (e.g. vision, audio, natural language, networks, geophysics, biomed, finance, etc.), proven ability to learn and adapt very fast.

> plain vanilla logistic regression

Eh? It's not as easy as it sounds. Search for "mlelr" for a rather detailed illustration of how to code LR in C with only standard libraries. Now that was some fun to put together!

But that does mean that you have pretty good C programming skills! I think your comment reinforces my claim about pairing MOOC-depth ML with other skills.

(Edit: I'm not saying MOOC-depth derisively. Serious learners and autodidacts can go pretty deep with just MOOCs for guidance. I just mean prima facie, based on content of some of the lighter and more popular MOOCs like Andrew Ng's. Abu-Mostafa's on EdX is meatier.)

I definitely agree: the overuse of techniques like logistic regression in introductory courses without sufficient mathematical background and context can lead to a "plug and chug" culture where the technique is applied without proper understanding of its assumptions and limitations.

Anyone can be quickly taught how to run a logistic regression by calling one line of code from a high level library. But I'd argue it takes years of study to really know what you're doing.

> vision

Could you elaborate on which elements of vision science are relevant? Do you have any examples of what these people end up working on?

I loosely used the term to refer to computer vision, which overlaps a bit with vision science (especially color, perception, etc.). But yeah, computer vision.

So... for instance an MD with MOOC-type machine learning could make it in health related enterprises? Is that what you mean by domain specific expertise?

That's stretching it. I think I should have qualified domain specific expertise as quantitative-domain specific expertise. Afaik, MD is more about human biology and meant to help you become a practicing doctor. PhD-biomedicine, MPH/MS-bioinformatics, or MD-PhD programs tend to have more substantial quantitative components than just an MD. Caveat: these are gross generalizations. I know some MDs who are competent enough in quantitative analysis to give any engineer a run for their money!

Would an English MA or PhD, for example couple well with an ML-MOOC for natural language processing jobs? Possibly, but would involve a lot of effort to build the bridge. However, if someone has a mechanical engineering degree or a PhD in something similar, a MOOC-ML can work well to position them for a bunch of jobs in large mechanical engineering companies looking to jump on the ML bandwagon or ones traditionally strong in ML-type mechanical engineering like UTC.

A few months ago Mikio Braun wrote a great post on this topic (the need for data scientists with strong software development skills) https://www.oreilly.com/ideas/what-is-hardcore-data-science-...

I am currently studying Bachelor CS an am interested in ML/DS. How can i make sure i fall in the first group? I try to do some side-projects to expand my skill set and practice software dev. skills, but this whole thread seems pretty discouraging.

If your perspective of ML was based solely on the comments in this thread you'd think there's 0 opportunity in the field -- obviously not true. The people who comment on these kinds of posts tend to always be incredibly cynical, and I am not sure why. They offer a limited perspective on the industry and you should not let it discourage you from your desire to learn and become an ML engineer. If I were an undergraduate CS student interested in ML/DS I would start by taking an intro course for ML and creating a side project you're proud of.

Given you're strong in CS, study as much statistical theory as possible. That'd be my advice. You need to be a great programmer, but statistics is what's really in short supply.

Depending on where you are studying you can take ML courses. However the easiest way is to get a Master's degree in CS from Stanford, GA Tech, Cornell, CMU, etc. Few Universities offer a Data Science Masters degree e.g. I think NYU, Columbia & Harvard. You can also try joining a comapny and then starting the degree program a year later.

As far as this thread, don't worry too much about it.

I hire machine learning engineers and data scientists. In my opinion there is a great shortage of truly qualified machine learning engineers. A lot of people are entering the market with a general knowledge of machine learning tools. These people should be considered analysts or product data scientists. When it comes to people that can build machine learning systems that work at scale, they are very rarely available for hire and often are the subject of bidding wars by multiple companies. The key difference is whether the candidate truly understands the mathematical and statistical basis of machine learning, has the programming skills to execute their ideas, and is able to write code that can be used in large scale production systems and can be leveraged by others.

And that's why I think finance is smarter.

They've long understand that there are the finance analysts on the one hand and the software dev on the other. They get both and make them work together.

Looking for 5 rare skills in a single person is bound to disappointment: maths, statistics, programming, large scale systems, production.

Companies that have a research division and a separate engineering team to implement the research ideas are rarely successful. Anyone engineer good enough to do the implementation will figure out they can build something better without the input of the researchers. Microsoft and Yahoo both had great research teams whose ideas rarely saw the light of day.

I didn't say to put them in two separate divisions. Put both guys in the same place, working closely with each other.

Any engineer will quickly figure out that he's out of his depth in the maths & statistics. Any mathematician will quickly figure out that he's out of his depth in the system building.

And this is why a lot of trading desks would couple a trader with a quant with a technologist.

> The key difference is whether the candidate truly understands the mathematical and statistical basis of machine learning

Can you elaborate on this, and at what level? Are you talking about a PhD level of understanding of cutting edge mathematics, or do you mean understand the basics, or somewhere in between?

I think the following interview questions can help screen candidates (not an exhaustive list):

1. Asking a candidate to cast a non-standard problem of classification or estimation into a tractable optimization problem. (this is a very valuable skill that someone who has done good studies in numerical linear algebra/machine learning/stat/information theory/control systems/signal processing/math/etc. should be able to do)

2. Asking them to take an algorithm they have used and explain every step in deriving the algorithm. (It will help interviewer calibrate the level of learning in the interviewee. Also helps screen for indisciplined black-box users.)

3. Presenting challenging machine learning scenarios: using customized ensemble learning approaches, imbalanced data sets, noisy labels, multiple instances, different error metrics, etc. and seeing how interviewee approaches the problem from first principles (real world problems almost always involve some of these issues)

4. Testing their intuitions in "feature-engineering" for different types of data. (with the partial exception of cases where rigorous research/successful products show the utility of deep learning, one has to almost necessarily do a fair bit of feature-engineering)

very good response

I usually ask about what the candidate has worked on. Or maybe the topic of their PhD thesis. I prefer that people explain to me what they know instead of having a list of questions about what I know.

Somewhere in between. A simple question I use is "how do I decide whether to add a feature to a classification model?" Most candidate are fine until I bring up the topic of correlated features.

Are you hiring for finance? In Europe it still depends very much on the industry: some mean coding, some maths, some infrastructure, all want data preparation and in-depth stats. edit: other than understanding the business fundamentals, of course.

Do you mean like non-orthogonal dimensions?

That's terrible.

My colleague focused on machine learning for his phd and this is a very accurate description. Talking the talk can come with experience, being able to understand the math and execute a new system at scale is where the separation in experience will start to surface.

Someone who can build models and someone who can scale it are 2 different people and professions. Data scientist vs data engineer.

Whats the time from hire to a built live production system you would expect a truly qualified engineer to be able to achieve?

I've been working in DS role for a few years now in NYC - and I definately feel the role is more valued on the east coast over SV. SV has a focus on consumer facing applications that are in many ways fancy CRUD. DS roles have thier place but aren't the core of the business. East coast has a b2b / infobroker focus where DS is the product. Media (especially adtech), finance, government consulting are over on this coast.

I think you also need to not confuse the growing ease of machine learning tools with the role becoming more accessible. There is a wide gap between tooling and knowledge to use those tools appropriately and creatively.

And may I never write another HN comment on my cell phone again.

Stack overflow salary calculator shows a significant 50% premium over Developer salaries, all other things remaining the same. [1] Even though in my opinion the tool is flawed and actually significantly underestimates (stackoverflow underpays) salaries in SV/NYC. It is still a good indicator.

The major issue is that Data Scientist is a very fuzzy term with it being applied to everyone from undergraduates with Stats degree and to those with PhDs and papers at KDD/ICML/NIPS/CVPR.

However rather than doing a Frontend or Mobile developer coding bootcamp, a data science bootcamp is likely to lead to more transferable skills in case you wish to get an MBA etc.

[1] http://stackoverflow.com/company/salary/calculator?p=7&e=1&s...

From my experience this year recruiting coming out of undergrad, for the top kids who do DS vs the top kids who do CS, the median comp is higher for DS but the highest comp packages come for CS. I wouldn't be surprised if this holds for more experienced people too.

Currently, Gartner analysts place ML at the "peak" of its Hype Cycle for Emerging Tech [0] with a runway of 2-5 years for mainstream adoption.

[0] http://www.gartner.com/newsroom/id/3412017

It was past the peak last year, i.e. machine learning somehow went backwards up thr hype cycle over the past year according to Gartner. Maybe due to deep learning news stories over the past year+.


that must be qualified because there are a lot of ML applications that have basically been adopted in the mainstream, like voice recognition (which is pretty good, probably in the 80-90% accuracy range and better for specialized contextual tasks). I remember driving for lyft and being able to input destination addresses by voice (2 years ago; had a Moto X where this was cutting edge) was a godsend and improved my customer service ratings.

80% accuracy for voice is basically unusable. Low 90s will get someone to persist enough to optimise the voice model for them.

For continuous use, you need 95% at least. See http://anewdomain.net/2014/01/28/lamont-wood-windows-speech-...

> voice recognition ... probably in the 80-90% accuracy range

State of the art systems are far better than this. Microsoft recently published a paper with a 5.9% word error rate for conversational speech. Speech directed at computers/assistants is already in the high 90s, though I don't have a figure off the top of my head.

I gauged it off of my personal gut feeling, which includes a coefficient for "well I don't have a strong enough internet connection so I get google's spinner instead - and then it fails."

Occasionally, android gets the words right (as demonstrated by the onscreen text) and then flubs passing the correct intent because of "loss of connection", which is just about the most frustrating ML fail.

No doubt android's voice recognition is spectacular. I can prompt it in three different relatively orthogonal-sounding languages (English, French, Japanese), and it can figure out which language I'm using and usually get the transcription correct. Notably, I can't activate the italian/japanese pair and get useful results - which makes sense if you know both languages.

Google voice is horrible, however, at transcribing voicemail.

The startup I work at really favors their data scientists, though I am not one of them (I'm a frontend guy). The CEO and CTO pretty much keeps a personal eye on those guys' work.

Right now however the theme I've heard from the higher ups has been profitability, and this applies to all tech companies in general. Easy capital is gone and now companies are in the spotlight for not making profits.

So at least from my company's perspective, it's not that data science is saturated, it's that we're trying to not break the bank and hire too much.

I have only anecdotal experience (I live in Warsaw, but do contracts mostly for Poland, UK and US).

General data science is in need. I can get contracts easily, I know that people looking for competent people need to wait; especially as it is a skill much harder to pick than, say, front-end web dev (unless someone starts from a highly quantitive background like physics, modelling in biology, etc). My general impression are:

- ML (especially practical one, like logistic regression and random forest) is often integral parts of many data analyses (or at least a plus),

- there are not as many jobs solely focused on ML; and if so, often they require some specialistic expertise,

- and even less only for deep learning (also, for DL there is relatively high threshold for having skills at "hireable" level).

Some of my tips on how to learn data science: http://p.migdal.pl/2016/03/15/data-science-intro-for-math-ph... (on purpose I put the emphasis on general data exploration/analysis before machine learning).

How do you find contracts? I'm interested in doing contract data science work, but I don't know how to start finding interested potential clients.

In the last ~1.5 years it is solely people contacting me. (But I give a lot of talks, workshops, and the chain of recommendations is going.)

One day I want to write how I get started, but I am not sure which steps were essential, which - irrelevant. And many things are not ones one can replicate.

Ah, fair enough. I'd be interested in reading it, but I think that (unfortunately) you're correct that it's mostly non-replicable, at least based on my experience.

Like about everything on HN... You're either in the Silicon Valley or it doesn't apply to you.

In my opinion, you could start by defining what is a data science, a quant, or a machine learning job. Because that's not clearly defined. It means different jobs to a lot of people, jobs that are all hard to learn and absolutely NOT interchangeable.

We hire applied ML/AI specialists. For me it's not just an understanding of mathematical concepts, but also being able to apply new ideas to new problems.

This depends quite a bit on critical thinking, a good fundamental ability to analyze a problem and understand its parameters, then manage the logical operations required to deliver the feature and solve the problem.

As for why I think it's on HN every day: I also like to think of an innovation pipeline happening something like this:

  ,->developers -> engineers/scientists -> data scientists->--,
We're now in some sort of refinement cycle of innovation, where the current medium has been saturated on some level and there is a lot of push to mine value from the discoveries.

It seems that you made it reverse of what you wanted. As far as I understand, data scientists should start the exploration and developers should finish exploitation.

Interesting observation! Perhaps its a bidirectional graph.

I was thinking in terms of the development like this:

1. Observation made 2. Idea created 3. Software/Hardware made 4. Revenue achieved 5. Business parameters tuned using insights 6. Maximum profit achieved

I'm an undergrad at a big university known for CS in Canada. The CS program here has several possible 'focuses'; 4 of 9 are related to ML/AI directly (computer vision, NLP, AI, scientific computing). 2 others require AI/ML/NN courses.

The bias might stem from the fact that we have some huge names in AI doing research here, but the data points seem clear (we say undergraduate education is slow to catch on, right?): the topic as a whole isn't overrated.

However, there seems to be a lack of understanding by people working in tech of the differences (in uses, theory, implementation) between ML, AI, NN, DL, etc. This might stem from a lack of understanding of the foundations of these topics (ex: statistics, vector calculus) or simply because we can abstract a lot of this away (ex: TensorFlow).

>or simply because we can abstract a lot of this away (ex: TensorFlow).

That would work up to the point a better abstraction tool/framework comes along. I'd never try to build a career on a single framework, because frameworks come and go.

Building a career around a framework is never a good idea. If you know your shit, it shouldn't matter what framework you're using.

Theano and TF, for example, both make similar abstractions: graphs and numerical functions on top of the same matrix library, even. I would suspect someone could move between the two fairly easily. The problem is that a programmer can use TF/Theano/etc.'s built-in gradient descent functions pulled from a tutorial with their data subbed in _instead_ of learning the details of backpropogation, end up with decent results, and claim to have a basic understanding of ML - when really, they've managed to avoid it almost completely.

If you know your shit, it shouldn't matter what framework you're using.

And yet, what is the incessant drumbeat of most job ads, these days - even in data science?

That's right: "N years in framework X"

I don't follow job postings for ML, but I will say this - there is inherent value in being familiar with certain frameworks when hiring for non-entry-level positions. This is particularly true for ML, where it can be very useful to know the optimizations that the libraries do/don't make behind the scenes, etc.

Should most ML job postings require n years of experience in some framework? Maybe not - but I can see why a company might see value in it.

> That's right: "N years in framework X"

I don't know where this meme comes from. Six out of the seven jobs/internships I've had (in SF, Seattle, and Toronto) didn't care about any specific framework or language whatsoever and still don't.

Build your career on whatever tool(s) companies want.

There's only in web development where the hype change every year. (And even there there are plenty of companies that are lagging enough behind to still have opportunities in the old thing).

Is there some reason you'd rather write

> I'm an undergrad at a big university known for CS in Canada

than the actual name of the university?

> I'm an undergrad at ${name of university}

It conveys more relevant information than just the name would, in this case.

In my limited experience, there's a difference between a data scientist who can process data given data and a set of questions about it, and a data scientist who can figure out what data you need, and the questions that need to be answered.

I think making the transition from the first role to the second role comes with experience, both with the toolsets, and thinking about the problem as a whole.

> data scientist who can figure out what data you need, and the questions that need to be answered.

Isn't that describing a statistician?

Isn't that how the joke goes? A data scientist is a statistician in California...

I am the Chief Data Scientist of Dice.com. If you are interested in working as a junior Data Scientist, and are smart and hard working, please apply here: http://careeropportunities.dhigroupinc.com/. The position is a telecommute role. We will absolutely consider people with no data science experience, so long as they demonstrate an aptitude for data science \ machine learning and can code.

I would recommend to check the adp application page. It does not allow the application to go beyond the "Personal Information" tab because it keeps saying "Select a Disability status" even though I have selected it. Tried all options to check uncheck the boxes. Nothing works. I have latest Chrome on windows 8.1 :)

Is this only for people living in the US and have the relevant work permit?

I understand that it's a remote job, but I'm not sure if you'd be considering people from Europe as well.

ps: the link works fine now

For those with difficulties accessing the link, I apologize. I have notified HR, who maintain that aspect of the site. In the meantime, please email me your resumes to simon.hughes@dice.com


1 point by simonhughes22 0 minutes ago | edit | delete [-]

This links directly to the dice.com version of the posting (same position). http://www.dice.com/jobs/detail/-/Diceinc/790523?rno=7814317.... The application process via that link is smoother, or so I am told. Please note that while this position is remote, we are only looking for US citizens or Visa holders located within the US right now.


It's a bug on chrome, works fine on Internet Explorer. The check box and the radio choices on the "Disability" page don't play well together on Chrome.

Just tried applying, but the page for the data scientist remote position doesn't load correctly.

Please send me your resume directly to simon.hughes@dice.com. Apologies for the issues with the link. I've notified our HR department, hopefully they will address it.

This links directly to the dice.com version of the posting (same position).


The application process via that link is smoother, or so I am told. Please note that while this position is remote, we are only looking for US citizens or Visa holders located within the US right now. Dice.com link:

I considered a graduate program in data science, but compared to average programmer salaries, it doesn't seem like data science pays all that much (excluding data science jobs for PHD's in silicon valley). It's more interesting that programming, but seems like a much tighter market with no discernible demand driving salaries up.

Interesting. I have a statistics/data science background, but personally I find programming much more satisfying.

Programming is a tool to create and synthesize. It leads to new products, companies, and solutions. Data science is analysis, not synthesis. You collect data, you interpret it, you move on to other data. Nothing gets created, which for me, is a deal breaker for job satisfaction.

The premise is often that, as a programmer, you are a part of tightly controlled agile/SCRUM team, while DS get much more independence in their jobs.

> as a programmer, you are a part of tightly controlled agile/SCRUM team

I see. I'm currently in a research lab, and programming is my main day-to-day activity. I have a high degree of autonomy, but have recently considered moving to a product division so I can work on something that actually gets shipped. But it doesn't sound so great, as you describe it. Maybe it's a case of the grass always looking greener on the other side.

Millions of programmers work on stuff that gets shipped and I'd say their job satisfaction isn't very high on average. Shipping is overrated; the culture of shipping as a cool thing is cultivated by company owners who want people to think that they enjoy realising company goals.

I'm a designer but work for a data science company (LMI specifically). All of our data work is done in D, which I never even knew existed until I started working here.

I can't speak to anything regarding ML, but for whatever it's worth in our segment of the market we have seen a lot of competition emerge in a big way the last few years. Former academic-type firms who specialized in bespoke economy analysis reports are starting to build software around all of the data that is out there since it's never been easier to collect and normalize it. I think it's a stretch to say the market is approaching saturation for us, though.

As many have said here, and as a working and apparently in-demand data scientist, I agree that the tricky part about data science is that being effective isn't a matter of just any one thing. You have to be a unicorn of sorts who is, above all things, capable of solving any problem which comes your way. You have to be exceptionally flexible and very scrappy.

There are a lot of people who know more about modeling, software engineering, statistics, machine learning, analytics, and so on than I do. But I excel at bringing everything together and solving difficult business problems. It's really difficult to train someone to be this way. It takes a lot of time, experience, skills, and a unique disposition to be an effective data scientist. At least to be the kind of data scientist I am. And I'm still early in my career.

Just my two cents. I suspect there will continue to be a glut of people who, on paper, have the data science skills, but lack all the intangibles. Who knows, maybe the various programs and boot camps will start doing business scenario learning: here's a tough real world problem where we don't tell you how to solve it, but we desperately need you to figure it out. Go!

Background: I currently lead a Data Science team at a big non-tech company. Previous to this I worked at a software company that had a Data Science team in their customer facing consulting group.

I'm going to speak primarily about applied data science. This means a data scientist who is solving a business need by doing ad-hoc analysis or building a reusable solution (e.g a R+Shiny dashboard) to a business problems.

Jobs: There are plenty of jobs out there, but you have to be careful. Many "Data Science" jobs are really BI, Business Analyst, or Sales Engineer types of jobs where some VP got it in their head that they need a Data Scientist. These jobs are great for people who are okay with Technology and Data Science being 10% of their job - and many people are like that. They don't care about engineering, coding, or tech and statistics beyond the minimum to do their jobs. But if you really want a job that involves solid tech and stats/ML skills you will be unsatisfied at these types of jobs.

Right now there are plenty of hard business problems that people want to turn into Data Science problems because they think it'll give them a competitive edge or something to market and show off. This results in more data science job openings. However, they are not really data science problems. As somebody else said, people will eventually realize they are not getting the value they need with data scientists doing these types of jobs. Then they'll replace that person with an MBA with some DS coursework (e.g. MBA who can use KNIME or SAS Enterprise Miner) or eliminate the position.

People: I interview people and I know people at other organizations who interview candidates for Data Science roles. MOOCs and many degree programs (including 2 year MS degrees) are pushing out people who have a very superficial overview of data science. Basically they teach them about every ML algorithm in the known universe and the functions to call them them in R/Python/SAS. The end result is a mediocre coder or non-coder who boils everything down to a confusion matrix or root mean squared error. But they cannot actually think through a business problem or see why a low error doesn't equal a good model (see http://www.tylervigen.com/spurious-correlations)

Finding good people is hard and you have to be flexible to realize great people can come from different backgrounds.

I can speak for Spain, although I sometimes get calls from other European countries. Relative to the pathetic Spanish work market data science/machine learning is doing great. I think right now there is too much hype, which is going to stay for a few years. After that I suppose it won't be a hot thing but I don't think it's going to disappear. I hope I'm mistaken and we are really seeing some AI revolution, but after all my job is putting the trust on the data, and past data says fads come and go. If that happens I will keep with me the math, the statistics, any development skills I can learn meanwhile and of course the challenge of someday achieving true AI.

Worth a serious effort if you are going to use it originally in your own niche / industry, otherwise statistics will still help you more in any given market. So just learn statistics very very well and then ask again.

I am currently an MIS graduate student with 3 years of SAP functional experience. After this boring stint and hearing the hype around Data Science, I decided to give it a try (Decent statistics and engineering skills but no coding expertise. I also finished MOOCs and am currently working on some small projects during the holidays). Considering average pay as a prominent factor, what is a better option - Learning extra SAP skills (HANA etc) and try for a job in SAP or diving into Data Science completely and try to start as an entry level Data Analyst.

Is there a market for competent developers without professional/academic experience in data science or machine learning? Perhaps just a MOOC or some Kaggle projects?

This is what I would like to know as well. I'm a profession dev competent with handling large scale systems. I try to learn ML on my own time but that's not quite as thorough as getting a dedicated degree. I can catch up with the grad students if I put in more time but will an employer see it? Will they take a risk even if I haven't had enough projects in the belt. etc etc

If you're seeking work: If you want to be in demand, be the machine learning person for __________ , electric energy revenue protection, or healthcare payer fraud detection, investing, or supply chain. Pick a specialty.

If you're hiring: Get the above out of your pathetic small minds and start hiring the smartest people you can find. Look for successes in any industry. Your business isn't that unique. The best people can learn it much faster than you did.

I'm in Australia.

I'm hiring 6 people in a range of roles between "pure" data scientists to more data engineer/SWE roles. The exact mix depends on who we can get.

The ability to find good people is the biggest constraint on the work we do.

Our current team ranges from applied mathematicians (as in they are Math professors) to people with traditional SWE backgrounds. Basically we are a long long way from saturated.

What is your company? I am Australian, finishing a PhD in mathematics and looking for a job.

Email me. Contact details in my profile.

People with acquired skills are plenty and not really up to scratch most of the time. So people who have these "acquired" skills have a high likely hood of being scrapped at the CV stage.

If you're serious about machine learning - build a blog or online repository of quality work and use that to get a job instead

From someone who is looking to transition to data science, the field is terrible if you haven't had an industry job before. I am ranked in the top 150 on Kaggle and can't even get phone interviews without someone in my network recommending me for a position.

Post a link to your resume. There may be some obvious problems we can diagnose.

There is certainly a big market for both data analysts and system builders here in Moscow. Most want a person with credentials, e.g. Yandex school of data analisys. Field is rather on fire with big companies investing a lot of money in it

Off Topic: What will you advice to someone who writes code in Python(scraping, mining) and have a done of ML by wathing Udacity courses, how can I polish myself to get into position for a job?

This thread is really depressing as someone currently going through a bootcamp. (dataquest) Is it more realistic to aspire for data engineering/analyst roles?

I am curious about this as well. I think the difference between machine learning and software engineering is that companies may only need a few dozen machine learning engineers. They may need thousands of software engineers. There may be increasing demand, but the demand will never reach the demand of software engineering. Except at the premium ultra competitive level, where a data scientist who is globally known can have a massive impact on the companies bottom line. But we arent talking about those types of jobs.

I also believe that most traditional companies do have data scientists, but they havent really start incorporating machine learning into their products, they are analyzing information about their customers, but their products are not reliant on using data. Once that becomes more common, things will pick up.

It seems that this is true for now (for 'traditional companies').

Soon, however, one could argue that 'traditional companies' will no longer be the norm - data science, ML, etc. will play such a crucial role in the majority of tech firms that the number of companies using it will rise. That's when I expect we'll see a huge portion of software engineers knowing ML concepts. Alternatively, I wonder if we might see the rise of smaller companies contracting out all of their ML to larger ones.

I would also be curious to know if ML background helps one to get a job at a place like Amazon/Google for even 'traditional' positions right now. The amount of data they have now must drive demand for engineers who can write software that takes advantage of it, regardless of position. Of course, like you said, they'll always require engineers to fill more traditional roles with no data interaction.

You need a friend at GooMAzonSoft to refer you. It's always been and will always be the easiest way in. Then traditional uninteresting phone call and uninspired 6 hours on site with people who probably didn't read your resume.

Maybe if you have a good profile and you get lucky, you'll go interview straight for one group who's interested in you, but I wouldn't bet on that.

Referrals from your friends do little to nothing at those companies. At Microsoft, for instance, all it does is assign a "handler" to the applications you put in. They just let you know if anyone is looking at them. Which, as you can imagine, is useless.

Almost no one, except for maybe very high level hires, get a pass for the initial weed-out interviews.

Not true. I'm a run of the mill developer. I had another offer, I called google, told them I had another offer. They skipped all the rounds but the final.

They skipped screening and whiteboarding because you simply said you had another offer?

You can't get into the interview pipeline without a referral.

I've gotten into the interview pipeline without a referral in other places. My friend got into the interview pipeline in Microsoft without a referral.

Did you know that your use of that term uniquely fingerprints you? (Google it)

An interesting remark indeed.

Assuming the other people who used that word are me.

Machine learning PhDs >>> everything else.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact