
Basics of Machine Learning - luu
http://homepages.inf.ed.ac.uk/vlavrenk/iaml.html
======
3pt14159
If you want to get into machine learning, it is actually pretty easy, provided
you studied Math, Engineering, or Science in University. The papers are hard
at first, but they don't hide like other papers do. Everything is laid out
there for you to code and run with your own data. In a field like, say,
Structural Engineering, the paper writers can make claims about the structural
resilience of an Ultra High Performance Concrete that they tested, without you
_ever_ being able to hope to repeat the exact experiment. You may not even be
able to get your hands on the proprietary compound they used.

In ML, you might not be able to use the same corpus / training set, but you're
usually able to recode the actual algorithm and you're usually able to find a
compatible type of data set to work off of.

Also, most ML people are lazy, so if they don't work for Google or Facebook
they're usually using open data datasets anyway, which are trivial to drop in
and verify.

~~~
visarga
> most ML people are lazy

That's a great virtue for programmers in general.

"Laziness: The quality that makes you go to great effort to reduce overall
energy expenditure. It makes you write labor-saving programs that other people
will find useful, and document what you wrote so you don't have to answer so
many questions about it. Hence, the first great virtue of a programmer. Also
hence, this book. See also impatience and hubris. (LarryWall, ProgrammingPerl,
1st edition)"

> they're usually using open data datasets anyway

They do that in order to be able to judge the performance of various
algorithms - they need to have a common standard to test against.

~~~
3pt14159
I use lazy in the positive sense, I'd actually forgotten that it had negative
connotations when I wrote the above.

------
petulla
I'm sure these lectures are great but my computer was none too happy about 100
embedded videos in a single Web page.

~~~
nhebb
Yeah, my machine learned not to visit that page again.

------
imurray
The main course website is here:
[http://www.inf.ed.ac.uk/teaching/courses/iaml/](http://www.inf.ed.ac.uk/teaching/courses/iaml/)

There are more notes and so on there. It isn't an online course though, so
don't expect too much.

~~~
edran
We also have two more in-depth (and harder) courses on ML with publicly
available slides:

* Machine Learning and Pattern Recognition: [http://www.inf.ed.ac.uk/teaching/courses/mlpr/](http://www.inf.ed.ac.uk/teaching/courses/mlpr/)

* Probabilistic Modelling and Reasoning: [http://www.inf.ed.ac.uk/teaching/courses/pmr/](http://www.inf.ed.ac.uk/teaching/courses/pmr/)

Unfortunately video lectures are often only available internally.

~~~
catkin
I didn't take it myself, but my (now former) classmates that took PMR told me
it was one of the hardest courses they ever did.

------
jds375
I also highly recommend these lectures from Cornell. The lecturer is well-
known for his free SVM-light implementation. [http://machine-learning-
course.joachims.org](http://machine-learning-course.joachims.org)

------
graycat
Best I can tell, it comes in a Champagne bottle with a cork held in by wire
and a nice, new label, but, when open the bottle and pour out the contents,
what get is 99 44/100% pure, old cookbook-style, applied statistics, some
curve fitting, some hypothesis testing, some statistical estimation with
definitions, theorems, and proofs mostly filtered out. Also not much in
experimental design and only little in 'resampling' techniques.

Right: Since we need a computer to do the data manipulations, the computer
science people want to conclude that the statistics is also part of 'computer
science'? Now bookkeeping, accounting, numerical solutions of differential
equations, etc. are also part of 'computer science'? Or, what the heck ever
happened to the field of statistics, biostatistics, quality control, etc.?

~~~
Houshalter
What's your complaint exactly? There is a lot of overlap with statistics,
sure. Especially in the simpler stuff. Machine learning is literally using
machines to automatically create models. What did you expect?

~~~
graycat
> Machine learning is literally using machines to automatically create models.

Good we got that clear. In statistics, the 'models' have hypotheses that
should be satisfied. Checking the hypotheses and, also, the model that gets
built, is tricky work not much mentioned in the book and, really, super tough
to program in general.

> What's your complaint exactly?

Old wine, filtered, in new bottles.

I believe that students would be better advised to go for the original stuff
in courses/books on statistics, the field itself. Yes, then a computer can do
the data manipulations.

There has long been a threat of 'cook-book' statistics where get some software
and a 'tool', apply it where some crucial hypothesis does not hold, and then
get misleading results. Do this in medicine and can kill people; indeed,
instead, actually, biostatistics is a relatively serious subject.

Well, it seems fairly clear that with 'machine learning' the threat of 'cook-
book' statistics is much greater.

Or, there long was threat enough with SPSS, SAS, R, Matlab, etc., but now the
threat is greater.

> What did you expect?

Something new and good, not 'statistics done badly'.

~~~
Houshalter
I don't know enough about the field of statistics to argue with you. But if
what you say is true, why aren't statisticians using traditional methods
winning competitions and benchmarks, and taking over the field?

The "cook-book" problem is overfitting and it's pretty well known and
avoidable.

~~~
graycat
> taking over the field

Sorry, but "field"? What "field"? 'Machine learning' in 'computer science'?
That's not a 'field'; it's a really bad joke.

> The "cook-book" problem is overfitting and it's pretty well known and
> avoidable.

Gee, people have been criticizing 'cook-book' statistics for decades, and the
only problem was "over fitting"? Amazing. Gee, we can clean out nearly all of
the definitions, theorems, and proofs of statistics and just put in a simple
solution for the "avoidable" problem of over fitting! Astounding.

Did I mention a bad joke?

~~~
Houshalter
>Sorry, but "field"? What "field"? 'Machine learning' in 'computer science'?
That's not a 'field'; it's a really bad joke.

Ok then. I guess biology isn't a field either because I don't like their
methods. They are too messy and impure, and really it's just applied physics.
(Mildly relevant: [http://xkcd.com/435/](http://xkcd.com/435/))

And sorry I meant overfitting is the equivalent of the "cook-book" problem in
machine learning. Otherwise it's fairly easy to try a model against test data
and see how well it performs. No need to guess whether the underlying
assumptions of the model are true, just see if it works.

~~~
graycat
> No need to guess whether the underlying assumptions of the model are true,
> just see if it works.

No, there are assumptions also for this approach, and these assumptions also
need to be checked.

Then in applying the model that has been checked, there are still more
assumptions, that need to be checked.

And checking models is often essentially hypothesis testing where need more
assumptions and, commonly, some experience to know what hypothesis tests to
use. E.g., tests in regression can be F ratios and/or t-tests, and need to
check some assumptions or at least robustness here.

E.g., in power spectral estimation, see, Blackman and Tukey, 'The Measurement
of Power Spectra', there is a severe trade off between resolution and
stability, another case of a trade off between bias and variance. Tough to
'automate' that consideration.

So, take a box of data. Divide it in half. Use the first half for the
'fitting' or 'training' data. Test on the second half. Not deep; I advised
some people in finance to do that in about 1982; not nearly new; and I very
much doubt I was the first; gotta believe that J. Tukey, L. Breiman, and
others thought of this long before I did.

And if the test fails, then try to fit again. Run all night and in the
morning, presto, have a model that does well on the test data. Now what have
we? Not a trivial question to answer.

Why least squares instead of something else? A good answer is not easy.

I'm not criticizing biology, and it's done some excellent science, e.g., see
the E. Lander lectures in his MIT course in biology.

Statistics is a good field with a lot of good work by a lot of bright people
with excellent backgrounds in pure and applied math.

The first time I heard about 'machine learning' I guessed they meant filling
in the optimal value function for stochastic dynamic programming. Okay, that
is a kind of 'learning'. But, that's not what they had in mind.

Next an example was the children's interactive computer game Animals where a
child thinks of an animal, answers some questions about the animal to chase
down a binary tree, stored in the program, to a leaf, and have the computer
guess the animal. If the computer is wrong it asks the child "What is true for
your animal and false for my guess.", adds to the binary tree, and thus,
'learns'. Call it a parlor trick.

A little like 'self driving cars': Really 'self-driving'? Nope, not close.
Only on streets where everything has been mapped down to 1 cm or better
including all the painted lines, all the curbs, all the traffic signs, and all
the traffic lights, etc. In no way does the 'self driving car' actually look
at a new street scene, make sense out of it, and use that 'learning' to drive.
Instead, so far a self driving car is about as amazing as a train on a track.

We're talking hype, old wine, filtered, in new bottles, and not much that is
new and good. Or, the old academic joke goes, "the new is not good and the
good, not new.". The last time I looked at a leading 'machine learning' prof,
I had conclude that he needed to return to ugrad school, be a math major, and
learn how to read/write math.

E.g., why maximum likelihood estimation? There are reasons, but I saw no hint
of them in the Ng lectures at Stanford.

We've known a lot about pure and applied statistics and how to use them for a
very long time. The good work in statistics is high quality pure/applied math,
e.g., Billingsly, 'Convergence of Probability Measures', e.g. with Ulam's
result Le Cam called 'tightness', Brillinger on time series, Serfling on limit
theorems, Rao on linear methods, Breiman on CART, and much more. I'm not
seeing comparable quality in 'machine learning'; indeed, from all I've seen
only a tiny fraction of the computer science machine learning people have the
math prerequisites for good research in statistics.

Yes, I've published peer-reviewed original research in statistics, indeed,
applied to a problem in computer science.

~~~
Houshalter
For tradeoffs you design your error metric to be whatever best represents your
problem.

>And if the test fails, then try to fit again. Run all night and in the
morning, presto, have a model that does well on the test data. Now what have
we? Not a trivial question to answer.

Minor quibble, but the proper way is to use validation data to find the best
model and training parameters, the test data is only used once to test how
well it actually works.

>A little like 'self driving cars': Really 'self-driving'? Nope, not close.
Only on streets where everything has been mapped down to 1 cm or better
including all the painted lines, all the curbs, all the traffic signs, and all
the traffic lights, etc. In no way does the 'self driving car' actually look
at a new street scene, make sense out of it, and use that 'learning' to drive.
Instead, so far a self driving car is about as amazing as a train on a track.

That's not accurate. Self driving cars do take input and can't rely on a
static map since the world changes and it has to detect obstacles and other
cars. I'm not entirely certain but I don't believe they are using very much
machine learning in current generation self-driving cars. However there is a
startup working to make one with pure machine vision, and it was done once in
the '90s with neural nets (ALVINN.)

~~~
graycat
> Minor quibble, but the proper way is to use validation data to find the best
> model and training parameters, the test data is only used once to test how
> well it actually works.

No: I was assuming that at your step, the text with the 'validation data'
fails. So, we have to return and use the validation data on each of the
fitting efforts. Then to my question: What do we have? Not so easy to answer.
Net the two step approach of 'training data' and 'validation data' is not so
good. Also we need some assumptions about the two boxes of data and, then, any
data we plug into the resulting model.

For the self driving cars, once they see something not on their static map of
the street, apparently they have to stop and wait, maybe drive around. The
point is, those cars can only drive on streets very carefully mapped out, down
to 1 cm.

Apparently the basic problem is, driving a car takes some basic intelligence,
that is, available essentially only from humans. For anything much like real
human intelligence, we don't know how to program it. In particular, machine
learning doesn't know how to program it.

------
pistle
In case you want a better user experience...

Playlists on youtube vs. the mass of embedded videos on one page.

Naive Bayes
[https://www.youtube.com/watch?v=xYqiIjaqydU&list=SPBv09BD7ez...](https://www.youtube.com/watch?v=xYqiIjaqydU&list=SPBv09BD7ez_7-4V3IJIzCHWQj9nd4rVWB&feature=share)

Decision Tree
[https://www.youtube.com/watch?v=eKD5gxPPeY0&list=PLBv09BD7ez...](https://www.youtube.com/watch?v=eKD5gxPPeY0&list=PLBv09BD7ez_4temBw7vLA19p3tdQH6FYO&feature=share)

Generalization and Overfitting [https://www.youtube.com/watch?v=j9_yzC-x-
js&list=PLBv09BD7ez...](https://www.youtube.com/watch?v=j9_yzC-x-
js&list=PLBv09BD7ez_6FzIS3smoa8QYj3ykdPkhT&feature=share)

Nearest Neighbor Methods
[https://www.youtube.com/watch?v=k_7gMp5wh5A&list=PLBv09BD7ez...](https://www.youtube.com/watch?v=k_7gMp5wh5A&list=PLBv09BD7ez_68OwSB97WXyIOvvI5nqi-3&feature=share)

K-means Clustering
[https://www.youtube.com/watch?v=mHl5P-qlnCQ&list=PLBv09BD7ez...](https://www.youtube.com/watch?v=mHl5P-qlnCQ&list=PLBv09BD7ez_6cgkSUAqBXENXEhCkb_2wl&feature=share)

Mixture Models and the E-M Algorithm
[https://www.youtube.com/watch?v=REypj2sy_5U&list=PLBv09BD7ez...](https://www.youtube.com/watch?v=REypj2sy_5U&list=PLBv09BD7ez_4e9LtmK626Evn1ion6ynrt&feature=share)

Principle Component Analysis
[https://www.youtube.com/watch?v=IbE0tbjy6JQ&list=PLBv09BD7ez...](https://www.youtube.com/watch?v=IbE0tbjy6JQ&list=PLBv09BD7ez_5_yapAg86Od6JeeypkS4YM&feature=share)

Hierarchical Agglomerative Clustering
[http://www.youtube.com/watch?v=GVz6Y8r5AkY&list=PLBv09BD7ez_...](http://www.youtube.com/watch?v=GVz6Y8r5AkY&list=PLBv09BD7ez_7qIbBhyQDr-
LAKWUeycZtx&feature=share)

------
Legoben
Any chance we can get the lectures before #5?

~~~
mrknmc
Here's a link to the course website:
[http://www.inf.ed.ac.uk/teaching/courses/iaml/](http://www.inf.ed.ac.uk/teaching/courses/iaml/)
I took this course and Professor Lavrenko is one of the best lecturers at the
uni.

~~~
arethuza
Interesting to see that Edinburgh is using US style position naming in
addition to more traditional UK positions (i.e. "Lecturer" == "Assistant
Professor", "Reader" == "Associate Professor" etc.)

~~~
ssabev
It's probably because a few of our lecturers are actually from/did their PhDs
in the US :)

------
frozenport
I have 32GB of ram and this nearly caused my computer to lock up.

