
Dive into Deep Learning - soohyung
https://d2l.ai/
======
fareesh
As an engineer I find myself in this type of situation quite often - if anyone
can point me to some good resources or has any advice, I'd be quite grateful:

\- Some non-technical stakeholder comes to me and says "can we solve this
problem with Machine Learning?" usually it's something like "there need to be
two supervisors on the factory floor at all times, and I want an email alert
everytime there are less than 2 supervisors for more than 20 minutes"

\- I ask for some sample footage to build a prototype and get a few very poor
quality videos, at a very different standard from what I see in most of these
tutorials.

\- I find some pre-trained model that is able to do people detection or face
detection and return bounding rectangles and download it in whatever form

\- After about 30 minutes of fiddling and googling errors, I run it against
the sample footage

\- I get about 60% accuracy - this is no good. Where do I go from here? Keep
trying different models? There are all sorts of models like YOLO and SSD and
RetinaNet and YOLO2 and YOLO3.

\- At some point I try a bunch of models and all of them are at best 75% good.
At this point I figure I should train it with my own dataset, and so I guess I
need to arrange to have this stuff labelled. In my experience stakeholders are
usually willing to appoint someone to do it but they want to know how much
footage they need to label and whether their team will need special training
to do the labelling and after it's all done is this even going to work?

What are some effective / opinionated workflows for this part of the overall
process that have worked well for you? What's a labelling tool that non-
technical users can use intuitively? How good are tools/services like
Mechanical Turk and Ground Truth?

This part of the process costs time and money - stakeholders, particularly
managers who are non-technical tend to want an answer beforehand - "If we
spend all this time and money labelling footage, how well is this going to
work? How much footage do we need to label?". How do you handle these kinds of
conversations?

I find this space fairly well-populated with ML tutorials and resources but
haven't been able to find content that is focused on this part of the process.

~~~
newfeatureok
I'm somewhat surprised at the responses for this.

I believe your issue can be easily solved - have supervisors wear a
distinctive color from a non-supervisor. For example let's say it's yellow.

OK so now you have yellow wearing supervisors and everyone else. To resolve
the issue you have described acquire a month or so of footage, with labels per
minute describing how many yellow wearing supervisors and how many people (in
total) there are.

So the data you have is:

1\. Yellow wearing supervisors

2\. Total amount of workers on the floor

Then with this data you can train a network to do what you're describing
pretty easily. Assuming there are a lot of workers on the floor, trying to do
person detection or face detection would require too much data. Just have a
uniform enforced and train on the colors/presence.

~~~
mrspeaker
"Easily solved - just have them wear special clothes." Everything is easy if
you can arbitrarily change the requirements!

~~~
1MoreThing
This is good problem-solving. Why spend tens (if not hundreds) of thousands of
dollars building technology to do a complicated task if you can cut that
effort in half or more by having somebody where a funny vest?

Remember, the problem is "I need to know when I don't have two managers on the
floor," not "how do I use machine learning to know when I don't have two
managers on the floor."

~~~
mrspeaker
This particular problem is "I need to know when I don't have two managers on
the floor, and they aren't always wearing funny vests just because the
computer guys are bad at deep learning".

If we can make up arbitrary rules and assumptions then just have them jot down
on a piece of paper when they come and go, and if they are the last to leave
then they have to send an email.

~~~
nolite
Honestly, despite your facetiousness, this is the best starting point. And
then from here work up to more complex solutions if there are reasons why rhis
simple one isn’t suitable

------
dragandj
I'll chip in with my book, which is written with programmers in mind,
implements everything from scratch, works on CPU and GPU, at great speed.
Directly links theory to implementation, and you can use it along with
Goodfellow's Deep Learning book. Also, discusses all steps, and does not skip
gradients by using autograd.

Deep Learning for Programmers: An Interactive Tutorial with CUDA, OpenCL,
DNNL, Java, and Clojure.

[https://aiprobook.com/deep-learning-for-
programmers/](https://aiprobook.com/deep-learning-for-programmers/)

~~~
vga805
And, what makes me want to dive into this the most, there's some Clojure! Will
definitely have to take a look a this one. Thanks.

~~~
dragandj
There's lots of Clojure! (in relative terms. In absolute terms, there's not
much of it because Clojure is so concise and powerful that everything is
implemented with very little code :)

------
sanxiyn
See also Dive into Deep Learning Compiler from the same team:
[http://tvm.d2l.ai/](http://tvm.d2l.ai/)

------
whoisnnamdi
Great guide - though unless I missed it I think this is missing the latest
advancements around Transformers, BERT, ELMo, etc.

This stuff is pretty fresh, so it's understandable, but the NLP chapter would
be greatly enhanced by covering these newer topics

~~~
enitihas
Is there any book which has more than a passing mention of BERT?

------
dang
Discussed a year ago:
[https://news.ycombinator.com/item?id=18838808](https://news.ycombinator.com/item?id=18838808)

------
whoevercares
Does MxNet as a DL framework still have a place given Pytorch/tensorflow
pretty much dominated all use cases?Amazon/AWS still “officially” supported it
but given its product driven culture it could replace it with whatever
framework that move faster and is more demanded by customers. Vendor Lock-in
in this case probably won’t work as well since Amazon is not quite a leader in
this case

~~~
samcodes
MXNet existed before AWS picked it, and it has a lot of strengths. I’d use it
(especially with Gluon) over TF any day. But that said, PyTorch is usually
easy to use on AWS... the preference for MXNet seems weak

~~~
thatsenough
It existed at CMU, but it seems like even CMU has moved over to PyTorch. I
think Amazon just doesn't want to seem like an "also ran" by conceding to one
of its competitor's frameworks.

------
throwlaplace
this looks pretty good. certainly much better than goodfellow's deep learning
book. definitely much the diagrams and code are much appreciated but i'm
curious why mxnet over pytorch?

~~~
sanjose321
I find this comment amusing, have you read the goodfellow's book? That book is
amazing.

~~~
hnarayanan
I suppose you and I have very different notions of the word 'amazing'.

------
bor100003
Has anyone read this book ? It look very attractive but I want to hear some
feedback before bookmarking another ML book.

~~~
lindbergh
Kinda did, but mostly the first chapters, actually up to CNN chapter (where
real modern DL start). But so far, I really liked what I read. It has a very
good blend of code and theory, with hands on applications throughout the whole
book. Most importantly, all those applications could perfectly be copy pasted
into your own environment. So it actually reminded me of a very thorough
tutorial on a framework, more say than a regular textbook, although the
authors don't compromise on mathematical arguments (but don't get lost in it
either, they skimmed pretty fast on regularization theory imho). If you've had
previous exposure to classical ML, I think it's a fantastic introduction to
DL, enough to get started.

------
kolleykibber
RFID at the doors?

