Machine Learning Crash Course: Part 2

wiradikusuma · on Dec 29, 2016

Now that there's a bunch of AI/ML-related links in the front page, probably now is the best time to ask:

As I learn deep learning, from the practical point-of-view, I found that the idea is simply to feed some "black box" with labeled data so next time it can give you correct label given unlabeled data. In essence, it's pattern recognition. What do you think?

And then, as I try to find use cases for ML (you know, finding problem for the solution), I found that actually, many problems that can be solved with ML can actually be solved with rules. For example, detecting transaction fraud. You just need to find the right rules/formula. Forget ML, if you can't hardcode if-else, just use rules engine. What do you think?

So, I'm starting to think that ML is good for solving problems where (1) we're too lazy to formulate the rules, or (2) the data is too complex/big to analyze by rules (as in, understanding image or voice). What do you think?

micro_cam · on Dec 29, 2016

ML is about creating a process that produces models from data in a way that is likely to generalize to new data.

For example if you just start trying lots of rules by hand on your fraud data set there is a good chance you'll come up with a rule that looks good on your data but doesn't generalize to new data.

The number of models (or rules or formulas) you try by hand increases this chance (this related to multiple testing) and worse the process that generates them isn't repeatable since one you know a feature worked on the data set you're biased towards finding it again.

So in ML you try to come up with a model generating process that is entirely automated and repeatable. This means you can do it repeatedly in cross validation, over bootstraps, out of time etc on the same data set and be fairly sure that it will generalize.

The goal can still be a simple rule or formula but you achieve this simplicity for simplicity by penalizing complexity (as in lasso) or doing explicit simplification (as in pruning).

The reason that complex "black box" models are so popular is that they often have really nice statistical properties in terms of generalization. It's fairly intuitive that averaging over a bunch of slightelly perterbed simple models will give you a nice combined model as in a random forest, gbm or other ensemble.

Deep neural networks are less intuitive but it's been hypothesized that the depth makes them less prone to overfitting than a shallow network.

A shallow network will have one global optimal set of weights that's simple and really good but it also has lots of locally optimal states that aren't as good...and the chances are high your training procedure (or manual search for a simple model that works) will get caught in one of these.

It's been shown that the for deeper networks these local optima tend to be much closer in performance to the global optima so in effect making the model more complex makes it less likely you'll end up with a bad model.

nrao123 · on Dec 29, 2016

I would recommend reading Paul Graham's Paul Graham's Plan for Spam- one of the early extremely useful ML approaches to fighting spam. http://www.paulgraham.com/spam.html

In the article he outlines the challenges of taking a rules based approach:

The statistical approach is not usually the first one people try when they write spam filters. Most hackers' first instinct is to try to write software that recognizes individual properties of spam. You look at spams and you think, the gall of these guys to try sending me mail that begins "Dear Friend" or has a subject line that's all uppercase and ends in eight exclamation points. I can filter out that stuff with about one line of code.

And so you do, and in the beginning it works. A few simple rules will take a big bite out of your incoming spam. Merely looking for the word "click" will catch 79.7% of the emails in my spam corpus, with only 1.2% false positives.

I spent about six months writing software that looked for individual spam features before I tried the statistical approach. What I found was that recognizing that last few percent of spams got very hard, and that as I made the filters stricter I got more false positives.

False positives are innocent emails that get mistakenly identified as spams. For most users, missing legitimate email is an order of magnitude worse than receiving spam, so a filter that yields false positives is like an acne cure that carries a risk of death to the patient.

The more spam a user gets, the less likely he'll be to notice one innocent mail sitting in his spam folder. And strangely enough, the better your spam filters get, the more dangerous false positives become, because when the filters are really good, users will be more likely to ignore everything they catch.

mmanfrin · on Dec 29, 2016

As I understand it, ML is about the finding of those rules. The big example I can think of is stocks -- what signals are important in determining the fiscal health of a company or stock? Every trader or investor has their own ruleset they believe in, ML (and specifically Quant) is meant to dig through as much data as possible to find those signals that affect the as-best-understood quality of a pick.

For example, a human can think 'oh I will look at numbers like P/E and growth and...' whereas with ML you can feed it all those, and things like number of times the CEO has tweeted in the past week or the number of PR articles or even the language in PRs released to see if there is some strong correlative signal in the bunch, or if it is all just noise.

cttet · on Dec 29, 2016

You are absolute right about most field of applications of ML.

But in some fields like computer vision, we humans fail(at least till now) to make rules better than black box neural networks.

Also, there are another group of ML people deal with rules+data, I quite agree with this article about the two types of ML: http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726...

akg_67 · on Dec 29, 2016

What you are describing is the basis for decision tree and random forest.

signa11 · on Dec 29, 2016

> (1) we're too lazy to formulate the rules, or (2) the data is too complex/big to analyze by rules

in most of the cases we don't even know _how_ to formulate the rules. for example, if someone asks you, what makes an 'a' a 'a' ? what would be your response ?

tnecniv · on Dec 29, 2016

Figuring out the rules is the hardest part of programming (after choosing good names, haha).

visarga · on Dec 29, 2016

> Forget ML, if you can't hardcode if-else, just use rules engine. What do you think?

Load data in X, model.fit(X, y), predictions = model.predict(new_X). It's not more complicated to apply ML.

pouta · on Dec 29, 2016

I can't wait to read the answer to this!

michaelsbradley · on Dec 29, 2016

Here's Part 1: https://ml.berkeley.edu/blog/2016/11/06/tutorial-1/

BrandoElFollito · on Jan 4, 2017

A comment I made within this post: Fitting a formula to data implies that there are reasons for the formula to be this and not something else. For example one can prove, within the current model, that the speed of a falling object is v(t) = 1/2 g t^2 and it is reasonable to fit experimental data to this formula to determine g. It may be a false assumption (it is, actually) and this is the whole idea of modelling and understanding limitations of a model.

In contrast, the article assumes that a linear expression will be fitted to any kind of data as if they behaved, by miracle, in a linear fashion. Any kind of deduction from this will be false, except if, again by miracle, the data actually behaves linearly.

I am a big fan of clusterisation and data behaviour discovery - the process which highlights relationship between data we do not know anything about. I believe this is a huge win in ML. Fitting something (1D, 2D, ...) to data without a model and drawing conclusions is at least perilous.

supremus_58 · on Dec 29, 2016

does anyone have recommendation on a text that is mathematically heavy but also looks at modern approaches/appleications?

davmre · on Dec 29, 2016

You can order the standard machine learning texts from most to least math-y, and least to most modern:

- Pattern Recognition and Machine Learning (Bishop 2007)

- Machine Learning: A Probabilistic Perspective (Murphy 2012)

- Deep Learning (Goodfellow, Bengio, Courville 2016)

If you want cutting-edge material, read the Deep Learning book (which is still quite technical, though some of its content may be outdated in a few years). If you want timeless mathematical foundations very clearly presented, read Bishop. Murphy is a good middle ground.

If you're self-teaching and have trouble focusing on a textbook for long periods, the Stanford CS229 lectures combined with Andrew Ng's course notes and assignments are probably the best resource. They are still quite rigorous and working through them will give you a solid foundation, after which you'll be more prepared to understand the deeper content in any of the texts above.

gtani · on Dec 29, 2016

Besides the DL book, there's other excellent texts that are freely available/open content on the web:

- Elements statistical Learning, Hastie et al

- Shalev-Shwartz and Ben-David: http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning...

- the late David MacKay's Info Theory

- Bayesian Reasoning in ML, Barber

- Hopcroft/Kannan (this is an older version, you can google latest: http://www.cs.cornell.edu/jeh/book112013.pdf

nilknarf · on Dec 29, 2016

I didn't expect to see Hopcroft/Kannan in that list. I used it in a course taught by Kannan and back then it was called "Computer Science Theory for the Information Age". Apparently the book is now called "Foundations of Data Science". I always thought the old naming was terrible but compared to the new title it actually describes the content a lot better since the book is mainly about CS theory and mathematical foundations for them. It was one of my favorite courses but I would not classify it as ML book.

[1]https://www.cs.cmu.edu/~venkatg/teaching/CStheory-infoage/

[2]https://www.cs.cornell.edu/jeh/book2016June9.pdf

argonaut · on Dec 29, 2016

I hear people recommend textbooks a lot, and I honestly don't know why. Very few researchers I know learned machine learning through reading a textbook on their own. Furthermore, the first two textbooks are closer to reference books rather than actual pedagogical tutorials (I haven't read any single chapter in its entirety). The Deep Learning book assumes machine learning knowledge.

Wazzymandias · on Dec 29, 2016

What would you recommend instead for a beginner trying to get into ML?

cr0sh · on Dec 29, 2016

I'll tell you how I started my journey:

I took the Stanford ML Class in 2011 taught by Andrew Ng; ultimately, Coursera was born from it, and you can still find that class in their offerings:

https://www.coursera.org/learn/machine-learning

On a similar note, Udacity sprung up from the AI Class that ran at the same time (taught by Peter Norvig and Sebastian Thrun); Udacity has since added the class to their lineup (though at the time, they had trouble doing this - and so spawned the CS373 course):

https://www.udacity.com/course/intro-to-artificial-intellige...

https://www.udacity.com/course/artificial-intelligence-for-r...

I took the CS373 course later in 2012 (I had started the AI Class, but had to drop out due to personal issues at the time).

Today I am currently taking Udacity's "Self-Driving Car Engineer" nanodegree program.

But it all started with the ML Class. Prior to that, I had played around with things on my own, but nothing really made a whole lot of sense for me, because I lacked some of the basic insights, which the ML Class course gave to me.

Primarily - and these are key (and if you don't have an idea about them, then you should study them first):

1. Machine learning uses a lot of tools based on and around probabilities and statistics.

2. Machine learning uses a good amount of linear algebra

3. Neural networks use a lot of matrix math (which is why they can be fast and scale - especially with GPUs and other multi-core systems)

4. If you want to go beyond the "black box" aspect of machine learning - brush up on your calculus (mainly derivatives).

That last one is what I am currently struggling with and working through; while the course I am taking currently isn't stressing this part, I want to know more about what is going on "under the hood" so to speak. Right now, we are neck deep into learning TensorFlow (with Python); TensorFlow actually makes things pretty simple to create neural networks, but having the understanding of how forward and back-prop works (because in the ML Class we had to implement this using Octave - we didn't use a library) has been extremely helpful.

Did I find the ML Class difficult? Yeah - I did. I hadn't touched linear algebra in 20+ years when I took the course, and I certainly hadn't any skills in probabilities (so, Kahn Academy and the like to the rescue). Even now, while things are a bit easier, I am still finding certain tasks and such challenging in this nanodegree course. But then, if you aren't challenged, you aren't learning.

Wazzymandias · on Dec 29, 2016

Thank you for the comprehensive response! I have a lot to learn but it's exciting.

wxnx · on Dec 29, 2016

Where would you place "Elements of Statistical Learning" in relation to these, if you know?

davmre · on Dec 29, 2016

I haven't read it in detail, but my impression is that it is mathy, like Bishop, but focuses more on 'classical' frequentist analysis, whereas Bishop takes a more open-ended Bayesian perspective and covers important machinery like graphical models and inference algorithms that I don't think are in ESL.

As a researcher I tend to prefer the Bayesian perspective in Bishop, because it gives you a unifying framework for thinking about building your own models and learning algorithms. But lots of people seem to respect ESL and speak very highly about it. It's probably valuable if you are implementing one of the methods it covers and want to understand that specific method in great depth.

wxnx · on Dec 29, 2016

I appreciate the comparison. My field is computational biology, and the latest edition of ESL has some specific, relevant examples, which is what drew me to it in the first place. I'm currently working on getting my statistics background up to par, and then I'll be choosing an ML textbook. However, your note about ESL focusing on frequentist models gives me pause about my original choice, as it would seem to me biological applications naturally lend themselves to Bayesian methods. I think in the end I'll have to crack open several books to decide. Thanks for the input.

davmre · on Dec 29, 2016

You're welcome! FWIW I mostly agree with argonaut's point elsewhere in this thread - very few people successfully self-teach ML from a textbook alone. So whichever book(s) you choose, it might also be worth working through some course materials. I've already suggested Stanford's CS229 for solid foundations, but depending on your interests in bioinformatics, Daphne Koller's Coursera course on probabilistic graphical models (https://www.coursera.org/learn/probabilistic-graphical-model...) might be especially relevant. Koller literally wrote the book on PGMs, has done a lot of work in comp bio, and her MOOC is apparently the real deal: very intense but well-reviewed by the people that make it through.

platz · on Dec 29, 2016

You want ISLr (http://www-bcf.usc.edu/~gareth/ISL/), not ESL. ESL is the prototype for the former.

argonaut · on Dec 29, 2016

Not sure why this is downvoted. ISLR is widely considered the "easier to read" / more pedagogical version of ESL. It's still mathematical but is closer to what the poster wants - a textbook that teaches them.

wxnx · on Dec 29, 2016

Interesting, thanks - I wasn't aware of this text.

But according to your link, ESL isn't a prototype of ISL, but a more 'advanced treatment'. The R applications in ISL seem like they might be very useful, though.

argonaut · on Dec 29, 2016

ISLR was written after ESL to be an easier to read version. I guarantee you will have trouble if you try to teach yourself ML with ESL.

lindbergh · on Dec 29, 2016

Also consider Foundations of Machine Learning (2012) by Meryar Mohri. It is a very good book with a statistical learning approach to ML. If you're interested by statistical guarantees to ML algorithms, eg. why the SVM algorithm works and its sample complexity (the expected out of sample error in function of the sample size), it's the book you need.

argonaut · on Dec 29, 2016

The main reason we see all these relatively information-less blog posts / tutorials is because most people on HN are afraid of the math. If you're not afraid of the math (or have a relatively strong math background), then honestly you can jump into the CS 229 Stanford lecture notes and homework assignments. If that's a little too heavy for you, you can also start with the MOOC version of that course (which is a very watered down version of CS 229).

Keyframe · on Dec 29, 2016

I also have a question regarding ML. Are there resources where I can see how could I treat video sequences (series of images / spatial/temporal continuity) as inputs? Trying to find a starting point for learning and use case I have in mind.

lgessler · on Dec 29, 2016

There's a really straightforward approach to turning video into feature vectors which you can readily plug into any old ML algorithm. You can turn every decoded frame (which is essentially a PNG, right?) into a (Width*Height)x1 vector, where each cell is an RGB pixel. You can then compose these vectors into a matrix, or perform further operations on them, e.g. SVD. Whether or not this is a good approach, though, will depend of course on your application. For face detection, for example, this is not adequate, and you'll need more sophisticated algorithms. For a good overview of that, see Jason Lawrence's slides[1] and also take a look at some of the other stuff he covered in his Computer Vision course[2].

Sorry if that's too basic for what you were asking. If you want to see some messy code that does this using OpenCV, here's some I wrote a while back with a friend, starting on line 127: https://github.com/sprestwood/CompVisionS2015/blob/master/te...

[1] http://www.cs.virginia.edu/~gfx/Courses/2015/TopicsVision/le...

[2] http://www.cs.virginia.edu/~gfx/Courses/2015/TopicsVision/sc...

Keyframe · on Dec 29, 2016

Thanks for pointers! I'm basically below/at 101 with ML, but have background in computer graphics.

I was wondering if one could utilise ML for either or both of two things: object outline spatial/temporal (feathering can solve for motion blur) and better Chroma key.

lgessler · on Dec 29, 2016

For outlining, my initial reaction is to reach for two things:

1. Kalman filters to track an object in motion within a frame [1]

2. Edge detection on the subframe you got from (1) [2]

Both appear to be available out of the box in OpenCV[3][4], though you'll have to fiddle with parameters I'm sure.

[1] Example: https://www.youtube.com/watch?v=K14SK4v3-IY

[2] https://en.wikipedia.org/wiki/Canny_edge_detector

[3] http://docs.opencv.org/trunk/dd/d6a/classcv_1_1KalmanFilter....

[4] http://docs.opencv.org/2.4/doc/tutorials/imgproc/imgtrans/ca...

windexh8er · on Dec 29, 2016

I may be off base here but wouldn't CV be part of what you're interested in? If so I've found Adrian Rosebrock has some great paid for and free options:

https://www.pyimagesearch.com (free tutorials in the blog) https://www.pyimagesearch.com/pyimagesearch-gurus/ (paid for "guru" course)

Keyframe · on Dec 29, 2016

You're right, computer vision is/would be definitely a part of it. Thanks for the link. Guru seems a bit pricey for a hobby, but maybe I'll go with it.