
Tensorflow and Deep Learning, Without a PhD, Martin Gorner, Google [video] - ayanray
https://www.youtube.com/watch?v=sEciSlAClL8
======
nl
Or you could just implement this (A submission from Google Brain to ICLR
2017):

 _In this paper, we use a recurrent network to generate the model descriptions
of neural networks and train this RNN with reinforcement learning to maximize
the expected accuracy of the generated architectures on a validation set. On
the CIFAR-10 dataset, our method, starting from scratch, can design a novel
network architecture that rivals the best human-invented architecture in terms
of test set accuracy. Our CIFAR-10 model achieves a test error rate of 3.84,
which is only 0.1 percent worse and 1.2x faster than the current state-of-the-
art model. On the Penn Treebank dataset, our model can compose a novel
recurrent cell that outperforms the widely-used LSTM cell, and other state-of-
the-art baselines. Our cell achieves a test set perplexity of 62.4 on the Penn
Treebank, which is 3.6 perplexity better than the previous state-of-the-art._
[1]

To translate that, they built and train a RNN to design neural networks. These
machine designed networks are almost equal to the best human designed network
on an image-recognition benchmark, and outperform the best human-designed
systems on a text understanding benchmark.

[1]
[http://openreview.net/forum?id=r1Ue8Hcxg](http://openreview.net/forum?id=r1Ue8Hcxg)

------
minimaxir
Although TensorFlow/Keras allows anyone to implement Deep Learning easily,
that doesn't mean that they will get _hired_ for a relevant job position
without a PhD. Most Deep Learning jobs, and even relatively mundane Data
Scientist jobs nowadays, want a PhD from my experience. There is a surplus of
Statistics/CS PhDs, why would a company hire someone without one if they do
not have to?

Without a relevant job position, knowing how to implement Deep Learning is a
buzzword trick for Medium thought pieces or getting $$ in funding from venture
capitalists for a generic "AI" startup that no one actually understands how it
works.

~~~
agibsonccc
I have seen this from multiple angles.

I used to teach at a data science bootcamp where many of the students got
hired by big companies.

I've also been running a deep learning startup for the last few years and have
hired quite a few people.

Many of our team don't have phds but can still write backprop code for even
complex modules like inception among other things. A lot of my students didn't
have phds either.

A few of us (me included) are self taught. I've also coauthored the largest
oreilly book on deep learning:
[http://shop.oreilly.com/product/0636920035343.do](http://shop.oreilly.com/product/0636920035343.do)

1 piece of advice I would offer is building something that differentiates you
from the rest. Many of these "medium thought pieces" you're talking about are
actually very cool applications of deep learning. If you want to get hired for
these kinds of roles, I would demonstrate you understand how to build things
with deep learning. The litmus test I would also look for is "I trained a net
from scratch and innovated in x way". Honestly, there's a rare amount of
talent out there that can do well at software engineering as well as deep
learning. I'm not convinced a phd is a hard requirement.

I get that recruiters at these larger companies definitely tend to look for
the buzz words and often can't tell the difference so it's definitely harder
going the traditional route.

Tech hiring also tends to be a networking thing as much as it is buzz word
bingo no matter what field you're in. If you can network a bit and build
something cool that demonstrates an understanding of deep learning I don't see
the problem.

~~~
imakecomments
Regarding your book, have you expanded on the math section? I saw somewhere a
draft of the material and the math review seemed to be broken up into short
paragraphs. These short paragraphs lacked examples and appeared to assume
previous background knowledge in the subject, which seems contradictory to the
book's title and aim. For example.. I believe you mentioned somewhere "The
Jacobian is a m x n matrix containing the 1st order partial derivatives of
vectors with respect to vectors." \-- Since I have a math background I can
understand what you write. But for someone with little to no math background
(e.g. a software practitioner) this may throw them off.

I am hesitant to recommend your book to a true practitioner due to the assumed
knowledge presented within the math section. I think a better treatment of
mathematics would assume the reader has little to no background but is
intelligent enough to learn ground up the specific use cases of the
mathematics for the deep learning techniques presented in the book. See:
[http://www.deeplearningbook.org/](http://www.deeplearningbook.org/) for
better treatment of the math review. It seems more thorough and makes less
assumptions about the math background of the reader.

I would love to recommend your book to a practitioner but I'm afraid the math
section (the version I reviewed) would scare them off/they would get little
out of it.

~~~
Jugurtha
> _I believe you mentioned somewhere "The Jacobian is a m x n matrix
> containing the 1st order partial derivatives of vectors with respect to
> vectors." \-- Since I have a math background I can understand what you
> write. But for someone with little to no math background (e.g. a software
> practitioner) this may throw them off._

This makes sense. However, there will _always_ be requirements to understand
any given topic. It is recursive and dangerous to assume otherwise because
knowledge builds on previous knowledge. Knowledge gaps for requirements should
be an exception handled by the reader, not by the author because it penalizes
everyone who doesn't have that gap.

I understand the effort of authors wanting their books to be self contained
and _inclusive_ , _bringing everyone up to speed_ , but this brings up awful
college memories and students having to wait for the one person who doesn't
know matrix multiplication asking a question in a class that is _not_ about
linear algebra. This person was the exception and instead of learning it on
his own time, he was willing to penalize everyone.

Similarly, in the context of books, this is the reason 600 pages is the norm
with the same first 400 pages "bringing everyone up to speed" (100 pages for a
Python introduction, 70 pages for elementary linear algebra, etc).

The overlap is just _staggering_ and it is safe to assume that a 600 pages
book does _not_ cost the same as a 200 pages book. In other words, everyone is
paying the price for the one guy who wants to do the sexy Machine
Learning/Deep Learning/Pattern Recognition, but doesn't want to bother looking
up the _Jacobian_ on his own. We're paying for the 400 pages we'll never read.

A large percentage of books caters to the beginner/neophyte knowing that being
a beginner is a relatively short step for someone who has a long road ahead.
There's an assumption of non-evolution/improvement, an everlasting tutorial 0.
Imagine how frustrating it would be to have every item in the world being
designed for crawling babies and disregarding the facts that they're on their
way to be adults.

~~~
imakecomments
Chapter 1 is completely trivial to anyone with mathematics training. It makes
no difference to me if the author expands upon the section or not. It doesn't
hurt my demographics of readership because we wouldn't read that section
anyways. You know who would read it? The 'practitioner'. Someone that hasn't
seen a matrix since high school or freshman year of college.

The interviews the authors give paint the picture this book is for the
'practitioner'. If Chapter 1 is meant for a brief review then don't advertise
the book for a complete practitioner/beginner. Either make the book for the
practitioner or not. If you do, then don't pretend to serve introductory math
in it that the unfamiliarized reader will read and understand. They fail at
their purpose there. So either make that chapter useful for the practitioner
or leave it out and assume the mathematicians already know it. Maybe put it in
an appendix and let us get to the meat quicker. It honestly does not take much
time to define what a matrix is, give an example, define matrix
multiplication, give examples etc. Same applies with basic definition and
examples of derivatives. These are mindless mechanical procedures anyone can
learn. It wouldn't take too much extra space to include some thoughtful
examples. Maybe I should write an 'introductory group theory' textbook and
start discussing geometric group theory 2 pages in if we want to get into not
serving an intended audience's purpose.

I like what the author's are doing. I'm on their side, but I'm making
suggestions that could serve a wider audience.

------
brilee
This video has nothing to do with the title "without a PhD". It's a
walkthrough of basic techniques for training neural networks on the MNIST
handwriting dataset.

------
paulsutter
This is a great tutorial for beginners. His sample code[1] is much better than
the MNIST tutorials on the Tensorflow page, especially those visualizations.

[1] [https://github.com/martin-gorner/tensorflow-mnist-
tutorial/](https://github.com/martin-gorner/tensorflow-mnist-tutorial/)

------
dorianm
Here is the full code from his slides:
[https://gist.github.com/c4127d2d899386179dbd2e6cd013a87e](https://gist.github.com/c4127d2d899386179dbd2e6cd013a87e)

I just added a few comments and constant names.

------
annnnd
I am struggling to understand how TensorFlow should be used, but the syntax is
very alien to me. I found Therano documentation much easier to follow. Is it
just me?

------
zeratul
99.3% accuracy is not that great. Currently the best performer on this data
set has 99.77% accuracy:

[http://yann.lecun.com/exdb/mnist/](http://yann.lecun.com/exdb/mnist/)

~~~
danieltillett
What is the human accuracy on this data set?

~~~
mamon
more interesting question: if some of those "digits" are so hard to recognize
even by humans then how can we ever label them with the one true "correct"
answer?

Do we know a person who draw those digits and ask "what artist had in mind
when making this masterpiece" ? And even then someone might have been trying
to draw the "2" but end effect looks more like "3".

I think that some of the test cases simply don't have definitive answer and
trying to reach 100% accuracy is just misguided effort.

~~~
danieltillett
Another interesting question is which approach most closely matches the errors
made by humans.

