As I learn deep learning, from the practical point-of-view, I found that the idea is simply to feed some "black box" with labeled data so next time it can give you correct label given unlabeled data. In essence, it's pattern recognition. What do you think?
And then, as I try to find use cases for ML (you know, finding problem for the solution), I found that actually, many problems that can be solved with ML can actually be solved with rules. For example, detecting transaction fraud. You just need to find the right rules/formula. Forget ML, if you can't hardcode if-else, just use rules engine. What do you think?
So, I'm starting to think that ML is good for solving problems where (1) we're too lazy to formulate the rules, or (2) the data is too complex/big to analyze by rules (as in, understanding image or voice). What do you think?
For example if you just start trying lots of rules by hand on your fraud data set there is a good chance you'll come up with a rule that looks good on your data but doesn't generalize to new data.
The number of models (or rules or formulas) you try by hand increases this chance (this related to multiple testing) and worse the process that generates them isn't repeatable since one you know a feature worked on the data set you're biased towards finding it again.
So in ML you try to come up with a model generating process that is entirely automated and repeatable. This means you can do it repeatedly in cross validation, over bootstraps, out of time etc on the same data set and be fairly sure that it will generalize.
The goal can still be a simple rule or formula but you achieve this simplicity for simplicity by penalizing complexity (as in lasso) or doing explicit simplification (as in pruning).
The reason that complex "black box" models are so popular is that they often have really nice statistical properties in terms of generalization. It's fairly intuitive that averaging over a bunch of slightelly perterbed simple models will give you a nice combined model as in a random forest, gbm or other ensemble.
Deep neural networks are less intuitive but it's been hypothesized that the depth makes them less prone to overfitting than a shallow network.
A shallow network will have one global optimal set of weights that's simple and really good but it also has lots of locally optimal states that aren't as good...and the chances are high your training procedure (or manual search for a simple model that works) will get caught in one of these.
It's been shown that the for deeper networks these local optima tend to be much closer in performance to the global optima so in effect making the model more complex makes it less likely you'll end up with a bad model.
In the article he outlines the challenges of taking a rules based approach:
The statistical approach is not usually the first one people try when they write spam filters. Most hackers' first instinct is to try to write software that recognizes individual properties of spam. You look at spams and you think, the gall of these guys to try sending me mail that begins "Dear Friend" or has a subject line that's all uppercase and ends in eight exclamation points. I can filter out that stuff with about one line of code.
And so you do, and in the beginning it works. A few simple rules will take a big bite out of your incoming spam. Merely looking for the word "click" will catch 79.7% of the emails in my spam corpus, with only 1.2% false positives.
I spent about six months writing software that looked for individual spam features before I tried the statistical approach. What I found was that recognizing that last few percent of spams got very hard, and that as I made the filters stricter I got more false positives.
False positives are innocent emails that get mistakenly identified as spams. For most users, missing legitimate email is an order of magnitude worse than receiving spam, so a filter that yields false positives is like an acne cure that carries a risk of death to the patient.
The more spam a user gets, the less likely he'll be to notice one innocent mail sitting in his spam folder. And strangely enough, the better your spam filters get, the more dangerous false positives become, because when the filters are really good, users will be more likely to ignore everything they catch.
For example, a human can think 'oh I will look at numbers like P/E and growth and...' whereas with ML you can feed it all those, and things like number of times the CEO has tweeted in the past week or the number of PR articles or even the language in PRs released to see if there is some strong correlative signal in the bunch, or if it is all just noise.
But in some fields like computer vision, we humans fail(at least till now) to make rules better than black box neural networks.
Also, there are another group of ML people deal with rules+data, I quite agree with this article about the two types of ML: http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726...
in most of the cases we don't even know _how_ to formulate the rules. for example, if someone asks you, what makes an 'a' a 'a' ? what would be your response ?
Load data in X, model.fit(X, y), predictions = model.predict(new_X). It's not more complicated to apply ML.
In contrast, the article assumes that a linear expression will be fitted to any kind of data as if they behaved, by miracle, in a linear fashion. Any kind of deduction from this will be false, except if, again by miracle, the data actually behaves linearly.
I am a big fan of clusterisation and data behaviour discovery - the process which highlights relationship between data we do not know anything about. I believe this is a huge win in ML.
Fitting something (1D, 2D, ...) to data without a model and drawing conclusions is at least perilous.
- Pattern Recognition and Machine Learning (Bishop 2007)
- Machine Learning: A Probabilistic Perspective (Murphy 2012)
- Deep Learning (Goodfellow, Bengio, Courville 2016)
If you want cutting-edge material, read the Deep Learning book (which is still quite technical, though some of its content may be outdated in a few years). If you want timeless mathematical foundations very clearly presented, read Bishop. Murphy is a good middle ground.
If you're self-teaching and have trouble focusing on a textbook for long periods, the Stanford CS229 lectures combined with Andrew Ng's course notes and assignments are probably the best resource. They are still quite rigorous and working through them will give you a solid foundation, after which you'll be more prepared to understand the deeper content in any of the texts above.
- Elements statistical Learning, Hastie et al
- Shalev-Shwartz and Ben-David: http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning...
- the late David MacKay's Info Theory
- Bayesian Reasoning in ML, Barber
- Hopcroft/Kannan (this is an older version, you can google latest: http://www.cs.cornell.edu/jeh/book112013.pdf
I took the Stanford ML Class in 2011 taught by Andrew Ng; ultimately, Coursera was born from it, and you can still find that class in their offerings:
On a similar note, Udacity sprung up from the AI Class that ran at the same time (taught by Peter Norvig and Sebastian Thrun); Udacity has since added the class to their lineup (though at the time, they had trouble doing this - and so spawned the CS373 course):
I took the CS373 course later in 2012 (I had started the AI Class, but had to drop out due to personal issues at the time).
Today I am currently taking Udacity's "Self-Driving Car Engineer" nanodegree program.
But it all started with the ML Class. Prior to that, I had played around with things on my own, but nothing really made a whole lot of sense for me, because I lacked some of the basic insights, which the ML Class course gave to me.
Primarily - and these are key (and if you don't have an idea about them, then you should study them first):
1. Machine learning uses a lot of tools based on and around probabilities and statistics.
2. Machine learning uses a good amount of linear algebra
3. Neural networks use a lot of matrix math (which is why they can be fast and scale - especially with GPUs and other multi-core systems)
4. If you want to go beyond the "black box" aspect of machine learning - brush up on your calculus (mainly derivatives).
That last one is what I am currently struggling with and working through; while the course I am taking currently isn't stressing this part, I want to know more about what is going on "under the hood" so to speak. Right now, we are neck deep into learning TensorFlow (with Python); TensorFlow actually makes things pretty simple to create neural networks, but having the understanding of how forward and back-prop works (because in the ML Class we had to implement this using Octave - we didn't use a library) has been extremely helpful.
Did I find the ML Class difficult? Yeah - I did. I hadn't touched linear algebra in 20+ years when I took the course, and I certainly hadn't any skills in probabilities (so, Kahn Academy and the like to the rescue). Even now, while things are a bit easier, I am still finding certain tasks and such challenging in this nanodegree course. But then, if you aren't challenged, you aren't learning.
As a researcher I tend to prefer the Bayesian perspective in Bishop, because it gives you a unifying framework for thinking about building your own models and learning algorithms. But lots of people seem to respect ESL and speak very highly about it. It's probably valuable if you are implementing one of the methods it covers and want to understand that specific method in great depth.
But according to your link, ESL isn't a prototype of ISL, but a more 'advanced treatment'. The R applications in ISL seem like they might be very useful, though.
Sorry if that's too basic for what you were asking. If you want to see some messy code that does this using OpenCV, here's some I wrote a while back with a friend, starting on line 127: https://github.com/sprestwood/CompVisionS2015/blob/master/te...
I was wondering if one could utilise ML for either or both of two things: object outline spatial/temporal (feathering can solve for motion blur) and better Chroma key.
1. Kalman filters to track an object in motion within a frame 
2. Edge detection on the subframe you got from (1) 
Both appear to be available out of the box in OpenCV, though you'll have to fiddle with parameters I'm sure.
 Example: https://www.youtube.com/watch?v=K14SK4v3-IY
https://www.pyimagesearch.com (free tutorials in the blog)
https://www.pyimagesearch.com/pyimagesearch-gurus/ (paid for "guru" course)