
Machine Learning Crash Course: Part 2 - llazzaro
https://ml.berkeley.edu/blog/2016/12/24/tutorial-2/
======
wiradikusuma
Now that there's a bunch of AI/ML-related links in the front page, probably
now is the best time to ask:

As I learn deep learning, from the practical point-of-view, I found that the
idea is simply to feed some "black box" with labeled data so next time it can
give you correct label given unlabeled data. In essence, it's pattern
recognition. What do you think?

And then, as I try to find use cases for ML (you know, finding problem for the
solution), I found that actually, many problems that can be solved with ML can
actually be solved with rules. For example, detecting transaction fraud. You
just need to find the right rules/formula. Forget ML, if you can't hardcode
if-else, just use rules engine. What do you think?

So, I'm starting to think that ML is good for solving problems where (1) we're
too lazy to formulate the rules, or (2) the data is too complex/big to analyze
by rules (as in, understanding image or voice). What do you think?

~~~
micro_cam
ML is about creating a process that produces models from data in a way that is
likely to generalize to new data.

For example if you just start trying lots of rules by hand on your fraud data
set there is a good chance you'll come up with a rule that looks good on your
data but doesn't generalize to new data.

The number of models (or rules or formulas) you try by hand increases this
chance (this related to multiple testing) and worse the process that generates
them isn't repeatable since one you know a feature worked on the data set
you're biased towards finding it again.

So in ML you try to come up with a model generating process that is entirely
automated and repeatable. This means you can do it repeatedly in cross
validation, over bootstraps, out of time etc on the same data set and be
fairly sure that it will generalize.

The goal can still be a simple rule or formula but you achieve this simplicity
for simplicity by penalizing complexity (as in lasso) or doing explicit
simplification (as in pruning).

The reason that complex "black box" models are so popular is that they often
have really nice statistical properties in terms of generalization. It's
fairly intuitive that averaging over a bunch of slightelly perterbed simple
models will give you a nice combined model as in a random forest, gbm or other
ensemble.

Deep neural networks are less intuitive but it's been hypothesized that the
depth makes them less prone to overfitting than a shallow network.

A shallow network will have one global optimal set of weights that's simple
and really good but it also has lots of locally optimal states that aren't as
good...and the chances are high your training procedure (or manual search for
a simple model that works) will get caught in one of these.

It's been shown that the for deeper networks these local optima tend to be
much closer in performance to the global optima so in effect making the model
more complex makes it less likely you'll end up with a bad model.

------
michaelsbradley
Here's _Part 1_ :
[https://ml.berkeley.edu/blog/2016/11/06/tutorial-1/](https://ml.berkeley.edu/blog/2016/11/06/tutorial-1/)

------
BrandoElFollito
A comment I made within this post: Fitting a formula to data implies that
there are reasons for the formula to be this and not something else. For
example one can prove, within the current model, that the speed of a falling
object is v(t) = 1/2 g t^2 and it is reasonable to fit experimental data to
this formula to determine g. It may be a false assumption (it is, actually)
and this is the whole idea of modelling and understanding limitations of a
model.

In contrast, the article assumes that a linear expression will be fitted to
any kind of data as if they behaved, by miracle, in a linear fashion. Any kind
of deduction from this will be false, except if, again by miracle, the data
actually behaves linearly.

I am a big fan of clusterisation and data behaviour discovery - the process
which highlights relationship between data we do not know anything about. I
believe this is a huge win in ML. Fitting something (1D, 2D, ...) to data
without a model and drawing conclusions is at least perilous.

------
supremus_58
does anyone have recommendation on a text that is mathematically heavy but
also looks at modern approaches/appleications?

~~~
davmre
You can order the standard machine learning texts from most to least math-y,
and least to most modern:

\- Pattern Recognition and Machine Learning (Bishop 2007)

\- Machine Learning: A Probabilistic Perspective (Murphy 2012)

\- Deep Learning (Goodfellow, Bengio, Courville 2016)

If you want cutting-edge material, read the Deep Learning book (which is still
quite technical, though some of its content may be outdated in a few years).
If you want timeless mathematical foundations very clearly presented, read
Bishop. Murphy is a good middle ground.

If you're self-teaching and have trouble focusing on a textbook for long
periods, the Stanford CS229 lectures combined with Andrew Ng's course notes
and assignments are probably the best resource. They are still quite rigorous
and working through them will give you a solid foundation, after which you'll
be more prepared to understand the deeper content in any of the texts above.

~~~
wjn0
Where would you place "Elements of Statistical Learning" in relation to these,
if you know?

~~~
platz
You want ISLr ([http://www-bcf.usc.edu/~gareth/ISL/](http://www-
bcf.usc.edu/~gareth/ISL/)), not ESL. ESL is the prototype for the former.

~~~
wjn0
Interesting, thanks - I wasn't aware of this text.

But according to your link, ESL isn't a prototype of ISL, but a more 'advanced
treatment'. The R applications in ISL seem like they might be very useful,
though.

~~~
argonaut
ISLR was written after ESL to be an easier to read version. I guarantee you
will have trouble if you try to teach yourself ML with ESL.

------
Keyframe
I also have a question regarding ML. Are there resources where I can see how
could I treat video sequences (series of images / spatial/temporal continuity)
as inputs? Trying to find a starting point for learning and use case I have in
mind.

~~~
lgessler
There's a really straightforward approach to turning video into feature
vectors which you can readily plug into any old ML algorithm. You can turn
every decoded frame (which is essentially a PNG, right?) into a
(Width*Height)x1 vector, where each cell is an RGB pixel. You can then compose
these vectors into a matrix, or perform further operations on them, e.g. SVD.
Whether or not this is a good approach, though, will depend of course on your
application. For face detection, for example, this is not adequate, and you'll
need more sophisticated algorithms. For a good overview of that, see Jason
Lawrence's slides[1] and also take a look at some of the other stuff he
covered in his Computer Vision course[2].

Sorry if that's too basic for what you were asking. If you want to see some
messy code that does this using OpenCV, here's some I wrote a while back with
a friend, starting on line 127:
[https://github.com/sprestwood/CompVisionS2015/blob/master/te...](https://github.com/sprestwood/CompVisionS2015/blob/master/test_scripts/people_detector.py#L127)

[1]
[http://www.cs.virginia.edu/~gfx/Courses/2015/TopicsVision/le...](http://www.cs.virginia.edu/~gfx/Courses/2015/TopicsVision/lectures/lecture07_recognition.pdf)

[2]
[http://www.cs.virginia.edu/~gfx/Courses/2015/TopicsVision/sc...](http://www.cs.virginia.edu/~gfx/Courses/2015/TopicsVision/schedule.html)

~~~
Keyframe
Thanks for pointers! I'm basically below/at 101 with ML, but have background
in computer graphics.

I was wondering if one could utilise ML for either or both of two things:
object outline spatial/temporal (feathering can solve for motion blur) and
better Chroma key.

~~~
lgessler
For outlining, my initial reaction is to reach for two things:

1\. Kalman filters to track an object in motion within a frame [1]

2\. Edge detection on the subframe you got from (1) [2]

Both appear to be available out of the box in OpenCV[3][4], though you'll have
to fiddle with parameters I'm sure.

[1] Example:
[https://www.youtube.com/watch?v=K14SK4v3-IY](https://www.youtube.com/watch?v=K14SK4v3-IY)

[2]
[https://en.wikipedia.org/wiki/Canny_edge_detector](https://en.wikipedia.org/wiki/Canny_edge_detector)

[3]
[http://docs.opencv.org/trunk/dd/d6a/classcv_1_1KalmanFilter....](http://docs.opencv.org/trunk/dd/d6a/classcv_1_1KalmanFilter.html)

[4]
[http://docs.opencv.org/2.4/doc/tutorials/imgproc/imgtrans/ca...](http://docs.opencv.org/2.4/doc/tutorials/imgproc/imgtrans/canny_detector/canny_detector.html)

