
How to Get a Job in Deep Learning - stephensonsco
http://blog.deepgram.com/how-to-get-a-job-in-deep-learning/
======
csantini
TL;DR: Deep Learning will become a commodity. Software will eat Deep Learning
too.

I'd like to clean up a bit the air from the hype fog:

DL is giving amazing results only when you have big sets of labelled data.
Hence it will be much cheaper for companies to buy Google/Microsoft
Vision/Audio REST APIs rather than paying the costs of: cloud + find data +
deep learning experts. So, I don't think we will see a massive growth of DL
gigs.

e.g. Google Vision API:
[https://cloud.google.com/vision/](https://cloud.google.com/vision/)

Except those areas where your own CNN implementation is needed (automotive,
industrial automation), Deep Learning will be another "library" in the ever
increasing Software Engineering mess of gluing many open source libraries and
REST apis to get something useful done. You need 1 guy training a Neural
Network for every 100 software monkeys maintaining the infrastructure
complexity. There are now many Software Engineering jobs because it's hard to
glue and maintain publicly-available code to solve some specific business
problem.

I think the the same applies for many Data Scientist jobs, which are these
days more about fetching/cleaning/visualizing data than making machine
learning on it.

~~~
agibsonccc
Just to give a different perspective. We are an on prem shop that sells to
banks, telcos etc who can't use the cloud for compliance reasons. We make it
work by doing other things people need in those environments well. One example
is my willingness to do an install via dvds rather than "the cloud".

We mainly do fraud detection and security related work. We have also seen
operations workloads (forecasting when machines are going to break or
preventative maintenance) Most of our core business isn't even CNNs.

One thing that's missing from the narrative is that researchers who vision and
speech because it's "pretty" and you can demonstrate results on a large
feature vector. It's also very relatable for normal people.

Most of corporate america (not silicon valley) has more traditional things
like time series data. Not images.

I would say here that deep learning use cases aren't explored by most people
and that there are other areas besides what the marketing with self driving
cars is perpetuating.

The other thing I would add here is when companies get bigger they typically
need to take core competencies like vision and speech in house. It can be hard
to justify outsourcing your core business to google as you grow.

This trend however might change over time. I would love to hear contradictory
statements here.

Disclosure: I am a deep learning founder in the same batch as the authors of
this blog post.

~~~
zzleeper
If you were to use deep learning in-house for smallish problems, would you
still use tensorflow plus a batch of GPUs, or something different? I can't
take stuff to the cloud for regulatory reasons, and in order to contract w/a
startup like yours, I would first need to make a case for it (with useful
examples that just need to be improved)

~~~
agibsonccc
I am the wrong person to ask. We compete with tensorflow going after the
larger scale stuff where hadoop is a requirement and the cloud is a swear
word.

I work with .net and java shops where running on windows is a requirement and
hippaa is considered "lite" and they dont know what a gpu is.

Despite the marketing noise that is still most of the world.

Look no further than another comment that showed java job postings vs machine
learning.

My biased comment out of the way: if you can convince ops on why you need this
deep learning thing then be my guest. Their first question to you is probably
going to be: Does dell/hp sell this as a reference box?

It mostly comes down to roi and what your existing stack is.

My job is hard enough as it is. If you can convince IT to allocate an r&d
budget be my guest. That is more or less our specialty. Email is in profile if
you want specifics.

------
tom_b
I am curious about demand for this skill in the market.

But I just don't see it - machine/statistical/deep learning gigs just seem
really rare.

I know this isn't a great metric, but searches on Indeed.com:

    
    
      "deep learning" - 873
      "machine learning" - 9,762
      "statistical learning" - 65
      java - 72,802
      javascript - 43,785
    

Same searches on LinkedIn:

    
    
      "deep learning" - 646
      "machine learning" - 6,952
      "statistical learning" - 34
      java - 43,845
      javascript - 30,818
    

Even the "machine learning" search on Indeed, with 9K+ results has 1300+ from
Amazon, followed by a much smaller number (in low hundreds each) from
Microsoft, Google, others (including some that look like staffing companies).

Even on HN's who's hiring Sept 2016 thread phrase counts: 14 "deep learning"
79 "machine learning"

I completely agree with the idea that being able to use some
deep/machine/statistical learning is going to be a toolset that data hackers
need to have. I even think that there is a bit of the "build it and they will
come" magic waiting out there.

But I think the best way forward is to be working in data and figure out how
to generate value with deep learning - this will be _much_ more productive
than trying to seek out a deep learning gig in terms of promoting deep
learning in the workplace. Heck, that's a suggestion I would be wise to take
myself . . .

~~~
arimorcos
You can't only look at demand. You also have to look at supply. Most machine
learning engineers have a masters at least (many have a PhD), and almost all
'scientist' positions require a PhD. Of course there are fewer deep learning
jobs available than front-end developers. That doesn't mean they're not highly
coveted.

As an example, one of Fei-Fei Lee's recent grad students got multiple job
offers upon graduating, one for more than a million dollars a year:
[http://www.nytimes.com/2016/03/26/technology/the-race-is-
on-...](http://www.nytimes.com/2016/03/26/technology/the-race-is-on-to-
control-artificial-intelligence-and-techs-future.html?_r=0)

~~~
jamesblonde
That $1m offer refers to Andrej Karpathy. He's the Jeff Dean of Machine
Learning these days.

~~~
emcq
They are certainly great within their fields, but one thing Kaparthy is
particularly good at is social media. While in grad school he was quite active
both on HN, Reddit, and Twitter. He had fantastic blog posts that made
difficult subjects seem comprehensible. Perhaps it's just part of being part
of a younger generation, but Jeff Dean doesn't share that feature.

------
orthoganol
My question is, it feels like machine learning is reaching its "Rails" stage.
You can implement the latest Bi-directional NN or LSTM-RNN using a high level
API that already sits on top of another high level framework. Even beyond the
core setup it will do the peripherals - smart initializations, anti-
overfitting, split up your data, etc.

Do people who implement (albeit real, useful) deep learning systems, but who
have no formal machine learning background, who don't really know much or care
about implementing derivatives or softmax functions because the frameworks
abstract all that away - are these people getting offered jobs?

~~~
nharada
Am I the only one who feels like all that math is required in order to
properly implement an ML system? Bugs in ML systems are insidious and tricky,
often difficult to find and manifesting in unexpected ways based on subtle
issues in the underlying data or mathematics. Having a ground up understanding
of why things work the way they do is a requirement to reasoning through all
the layers of the system to find and fix issues.

Maybe someday all of that will be abstracted enough that you don't need to
know math to do ML (like how now you generally don't need to know machine
architecture to program), but how soon will that be?

~~~
tjl
It might be some time before it's abstracted enough. For example, there's
software packages for Finite Element Analysis, but if you don't know the
underlying math, you can't do anything more complicated than the basics. Plus,
if there's a problem without that knowledge you can't really debug it. I'm
guessing ML will be like that for a while. If you want to do something simple
using a package is fine, but as soon as something goes wrong you'll need that
knowledge to figure out what's happening.

For example, for a while there's been work on doing sentiment analysis using
machine learning and they typically train them on a data set of movie reviews.
It turns out that as soon as you apply that trained system to anything other
than movie reviews the actual results are quite poor, but you might not catch
it.

------
protomikron
My advice:

Do not label yourself as a data scientist or machine learning expert. Go for
the domain, i.e. become comfortable with the actual data and the methods used
there:

\- predict land use in aerial imagery - become comfortable with
photogrammetry, geography, etc.

\- predict biological tissue(s) - become comfortable with specific branches of
biology or medicine

\- predict $something_relevant

I actually stole this advice from the epilogue of some text about programming,
and it really stuck with me. Otherwise your expertise is just too generic and
you compete with a big pool of people who call themselves machine learning
experts, because they can write a for loop in Bash.

------
xor1
>Speaking of math, you should have some familiarity with calculus, probability
and linear algebra

Curious to know if anyone has had success learning/re-learning these as a
mid-20s or older adult who works fulltime, and if you could potentially
provide a list of books/courses to go through. I personally never learned
anything past geometry (in high school). The most advanced math class I took
in college was College Algebra. That means I never learned trig or anything
past it (so no calc, linear algebra, or probability), and I'm sure most people
on HN surpassed me math-wise sometime in high school :)

I've been able to skate by with my embarrassing lack of math knowledge/skills
as a developer, but I feel like it's only a matter of time until the
mathematical steamroller becomes a serious threat career-wise and I get
crushed.

~~~
conjectures
It's totally possible. Which isn't to say that it's easy. If you go down that
road, be prepared to sacrifice your social life for several years.

The most frustrating thing will be going through basics that don't seem
connected to your eventual goal of ML; this is potentially a long phase.

I'd recommend Khan Academy for the math basics. Schaum's Outlines series for
lots of worked problems. I'd recommend Strang's MIT OCW videos and books for
Linear Algebra. I first learned calculus from an economics book, probably best
not repeated ;). Bishop's PRML book is a good source on probability, Bayesian
stats and machine learning.

I did a part time MSc in applied stats while working full time. Even in the
MOOC era, there's nothing like real exams to focus your efforts.

~~~
RandomInteger4
Khan Academy is a great place to start. I second that recommendation.

Although I was pretty confident in my maths up through basic calculus, I began
fresh earlier this year starting from pre-algebra on their videos and problem
sets.

I took the approach of looking at the mind as a muscle and if I had taken a
long break from lifting, I wouldn't jump right back in lifting the same
weights. The analogy isn't perfect, but I feel like it helped to reinforce the
old basic neural pathways in order to prime my mind for more difficult topics.

EDIT: Also the achievement points aspect helped as a learning tool as well.

------
partycoder
"A job in deep learning".

It is highly unlikely that you will get a job in which you exclusively use
deep learning alone, and not any other ML/AI technique.

Once you learn DL, then, "congratulations... here are 100 other topics you
might need to know about before getting a job". [http://scikit-
learn.org/stable/tutorial/machine_learning_map...](http://scikit-
learn.org/stable/tutorial/machine_learning_map/)

~~~
tostitos1979
I disagree. I think we might soon be at a point where someone might be able to
get a job just knowing how to use CNNs well. Why do I think this? Well .. CNNs
have basically licked the problem of image classification. They require a lot
of trial and error. So .. I can totally see people packaging this up (e.g.
NVidia's DIGITS or TStreamer), and CNN skills become sort of like Word/MS
Office for some industrial applications. These people won't get paid 500K ..
more like what web developers make. Just my personal opinion.

~~~
rawnlq
I think you're right that the barrier is a lot lower now that there is a
somewhat successful blackbox abstraction.

In the past you need to know that to recognize lines: hough transform,
recognize polygons: line simplification, recognize face: cascades, etc etc.

Now? You can almost just feed it arbitrary labeled training data and do well
without any sort of feature engineering. Just another api to glue.

------
thefastlane
i just want to be a software engineer without having to continually burn away
evenings and weekends studying the latest shiny, continually for the next two
decades, just to keep my career afloat. is that even an option anymore?

~~~
pmyjavec
Stick to understand fundamentals and the cruft on the top layer is easy to
grok. Getting good jobs is about who you know most of the time, so just keep a
network happening.

Don't read too much hacker news, it kind of becomes stressful and I would try
enjoy your weekends, don't worry too much about the market, a lot of good
people just burn out and have breakdowns by trying to understand all that's
going on and become useless anyway. Just know the basics well and learn what
you need to _in work hours_ , make time.

What I'm finding is that in the end, most of the good / important stuff ends
up condensed into a nice O'Reilly (or similar) volume that you can read at you
leisure' later on when the hype has evaporated. If you invest yourself too
much in the latest tech constantly, you run the risk of it being redundant /
replaced anyway.

~~~
Chronic9q
> What I'm finding is that in the end, most of the good / important stuff ends
> up condensed into a nice O'Reilly (or similar) volume that you can read at
> you leisure' later on when the hype has evaporated.

If you are okay with median compensation and median project importance
(internally and externally), then sure, wait a couple years when the interest
has died down.

------
imron
> I built a twitter analysis DNN from scratch using Theano and can predict the
> number of retweets a tweet will get with good accuracy

I imagine a product like this could actually charge a fair bit of money
helping companies and people improve the 'virality' of their tweets.

------
cbgb
This is just a nit, but Andrej Karpathy was never a professor at Stanford; he
received his PhD from Stanford and now works at OpenAI.

~~~
stephensonsco
good catch. fixed it

------
FT_intern
This should be titled "How to Learn Deep Learning".

"How to get a job in deep learning" would include:

\- What specific topics will be asked during interviews

\- What the interview question format is like

\- How to prepare for the interviews

\- How to get interviews without a PhD. What do you need to show competence in
your self learned skills?

~~~
stephensonsco
These are definitely great points. Most companies looking for DL/ML talent
aren't interested in setting up HR hoops for the applicant to jump through.

They want to see if you did cool stuff before you applied for the job. If you
didn't then you won't get an interview, but if you did then you have a chance
no matter what your background is. Of course, the question of "what is cool
stuff?" comes up. If it is building small projects with a a little bit of
success, that probably won't do it (it might work for larger companies, or
companies that need light ML/DL performed). But if it is "built twitter
analysis DNN from scratch using Theano and that can predict the number of
retweets a tweet will get: here's te accuracy, here's a link to a my write up
on it and here's a link to github for the code.".

Edit: added words similar to this at the end of the blog post.

~~~
throw_away_777
> So even if you're a beginner with deep learning, you're welcome to apply for
> one of our open positions

Statements like this contradict what you are saying here - to really build a
model that predicts number of retweets based on the content of the message
(not something like the average number of retweets this user has) is very non-
trivial. If your threshold of a side project is publishable [1], it is an
unrealistic expectation.

[1]
[http://homepages.inf.ed.ac.uk/miles/papers/icwsm11.pdf](http://homepages.inf.ed.ac.uk/miles/papers/icwsm11.pdf)

~~~
namank
Speaking for the other side, the point is not that you achieved the accuracy
but it is understand how you thought about the problem. The ability to decide
on a useful hypothesis, formulate the problem around it, and have some way to
measure your progress is very very valuable from the employer's perspective.

~~~
throw_away_777
Sure, my point was more that this can be demonstrated by a much simpler
project. How many candidates have you seen who haven't had a deep learning job
before complete such a complicated project?

~~~
stephensonsco
If you:

\- have good problem solving and coding skills \- spend a month or two
learning how to build networks using good libraries

Then you will be able to get a good result with the Twitter task I pointed
out. It takes being able to work input data correctly, think about what
matters and what doesn't, then synthesize using readymade DL tools, usually in
python. None of that is complicated esoteric neural net stuff, it is just
motivated problem solving.

I'll add that as a big point — you should be able to code, problem solve, and
be motivated.

~~~
throw_away_777
The important part to mention is that the expected completion time of your
project is 1-2 months. I have extensive ML experience (no deep learning
though), and I think this project would take me at least 2 months to do well.
This is a long time, especially for a beginner side-project. It is good to
tell people to do side-projects, but suggesting projects like this is de-
motivating for beginners. If someone on your team (who is presumably an expert
already) does this side-project I'd be very interested in the results, and on
how long it took them - it would make a good blog post. Your proposed beginner
side-project should be less complicated than a published paper with 200
citations (the paper didn't use a nn).

~~~
Chronic9q
Everyone thinks they can download tensorflow, hook it up to some stock market
or Twitter api, hack a system in a week, and become a ML engineer. It is
simply not this simple. It takes many months or years to achieve this
experience.

Sometimes software engineers need to accept they aren't the smartest ones
anymore. This is why they can't get the "smart" and "cool cutting edge" ML/DL
jobs.

------
bbctol
You may not need a PhD or tons of experience to _learn_ Deep Learning, but
what about the gap between that and getting a job?

~~~
stephensonsco
If you are a great software engineer that can solve hard problems yourself,
and then you add to that real experience solving your own nontrivial problems
with DL, then people will hire with no questions asked. The reason is that not
many people have experience in DL. So companies need to get good engineers
that have the problem solving mindset and motivation+creativity to learn new
frameworks and lick the problems they face in new awesome ways. Industry is
dying for people like that.

If you are that type of person then you can kill it with just a few weeks of
hard study on your own.

~~~
Chronic9q
> If you are a great software engineer that can solve hard problems yourself
> [...] then people will hire with no questions asked.

False. Being a great software engineer is not enough to get a job in deep
learning. For the same price, or a little more, you can hire an "expert"
scientist or engineer with a PhD in CS/stats/ML.

~~~
stephensonsco
Good point. Should probably say "you'll get an interview". I think you snipped
out something pretty important though. The fact that you have done something
nontrivial with DL is actually a big tell for the employer. If you just 'want
to learn more about DL because it seems interesting' then employers aren't
interested. But if you say "I am a good engineer and I have already done
something nontrivial" then you have a much better chance of getting an
interview and then the job.

------
Xcelerate
This is applied deep learning. There's a ton of jobs available for taking
someone's library on GitHub and applying it to a bunch of data. But other than
DeepMind, FAIR, Google Brain, Open AI, Vicarious, and Microsoft Research, who
is hiring for theoretical machine learning? That's what I'm interested in —
developing better algorithms that eventually approach AGI.

~~~
emcq
Allen Institute for AI and many other smaller companies depending on how left
field you want to get.

------
max_
Why is the Machine Learning subreddit so toxic?

~~~
zump
There's so much hype. It's attracting masters of the universe type people that
would otherwise be you'know, at Goldman Sachs or something

------
stephensonsco
just wrote a blog post that I think a lot of folks will like if they are
looking for a job in ML/DL.

Would love to hear if I missed something!

~~~
throw_away_777
One minor point is that you don't mention some fundamentals of the machine
learning process, like making sure to evaluate your models on a different data
set than you used to train your models.

Another point is that this article is really about how to learn deep learning,
not how to get a job. I would really like to see some evidence that: "The good
news is that basically everyone is hiring people that understand deep
learning." Most data scientist jobs I have seen don't require or use deep
learning.

~~~
stephensonsco
It's true I didn't mention this but I was hoping the coverage in the related
links would be sufficient. I would loved to have had a section that points out
some basic tenets though.

------
jamisteven
1st sentence should be: 1\. Be a fuckin math whiz.

Had it been, i would have clicked the back button.

~~~
nkozyra
Take a look at the primary algorithms used commonly in AI today - nothing
exceeds high school level math.

I posted about this earlier today but ML really should be demystified. You can
write a lot of commonly used algorithms in 100 lines or fewer. The math is not
complex. If you can get past the notation and buzzwords like "deep learning"
(it's an artificial neural network, itself a grandiose term) you'll see it's
not as daunting as most think.

The reality is _most_ "data scientists" will be working on implementation
rather than creation. They'll be working on data sets and error analysis, not
creating the next buzzword-laden algorithm.

~~~
cbHXBY1D
Most ML algorithms can be treated as an optimization problem.

Convex optimization is not high school level math.

