
Online graduate-level machine learning course from CMU's Tom Mitchell - ilcavero
http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml
======
monk_the_dog
I'm enrolled in the online Applied ML class from Stanford, and I've also been
watching this course from CMU (I'm up to the Graphical Model 4 lecture -
almost the midterm). If you've taken at least one stats class you'll get much
more out of CMU's class.

BTW, here are some good online resources for machine learning:

* The Elements of Statistical Learning (free pdf book): <http://www-stat.stanford.edu/~tibs/ElemStatLearn/>

* Information Theory, Inference, and Learning Algorithms (free pdf book): <http://www.inference.phy.cam.ac.uk/mackay/itila/>

* Videos from Autumn School 2006: Machine Learning over Text and Images: <http://videolectures.net/mlas06_pittsburgh/>

* Bonus link. An Empirical Comparison of Supervised Learning Algorithms (pdf paper): [http://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icm...](http://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml06.pdf) (Note the top 3 are tree ensembles, then SVM, ANN, KNN. Yes, I know there is no 'best' classifier.)

~~~
zeratul
About the _bonus link_ :

It does not make sense to compare _ensamble_ methods (bagging & boosting) with
_single instance_ classifiers. In practice, you try all classifiers and then
you use best to create an ensamble. The paper leaves me unsatisfied, thinking
that probably bagging or boosting SVM would give the best results.

~~~
monk_the_dog
You make a good point. Ensemble methods seem to outperform single classifiers.
There's no reason you can't have an ensemble of SVMs. The paper should have
included something other than an ensemble of trees.

I tried to find a paper comparing an ensemble of SVM to an ensemble of trees
and I came up empty (after a quick search). I did find papers showing
ensembles of SVMs outperforming a single SVM. I also found a comment on a
paper claiming an ensemble of trees out outperformed a "Parallel Mixture of
SVM" (see here:
[http://www.mitpressjournals.org/doi/abs/10.1162/089976604323...](http://www.mitpressjournals.org/doi/abs/10.1162/089976604323057416?journalCode=neco)).
Of course, that's not a great source.

I absolutely agree they should have included ensembles other than trees. I
don't necessarily agree an ensemble of SVM would have beat an ensemble of
trees. It would have been interesting to see.

~~~
zeratul
There was a suicide note emotion classification challenge:

<http://computationalmedicine.org/home-0>

Very noisy and sparse data. 25 teams. 22 system description papers. The winner
used SVM ensamble.

~~~
monk_the_dog
Zeratul, you're obviously into ML. Would you mind if I asked what your
application is? I'm just curious.

I work in computer vision. When I do a machine learning problem, I spend most
of my time brainstorming and implementing good features. I'm getting deeper
into ML (and loving it). I'm always curious what other people are doing with
ML.

~~~
zeratul
Medical language processing, information extraction from patient data, text
classification, and clustering.

Yes, it would be great to get a list of hackers that do ML and the domain that
they are working with.

~~~
aperrien
I'm working using ML in the casino industry. I use multiple forms of
classification and forecasting.

~~~
monk_the_dog
Once upon a time I thought about using ml/vision in slot machines. I would try
to read the gamblers emotions/age/sex and the slot machine would change
stimulation (music/lights etc; not mess with the odds) to try to keep them at
the machine longer.

I thought it was a good idea until I actually visited a casino. People sit at
the slots in what looks like a hypnotic state. The emotions don't change much.
I don't think I could have made a measurable difference.

I'm not surprised the gambling industry is using ml, but cool to hear about
it. Thanks.

------
drats
Silverlight? Are these people serious? Whether you are an educational
institution or a for-profit media company, you are trying to get to the
largest number of people and cause them the fewest problems. Silverlight fails
spectacularly at both those objectives.

Edit: I know there seems to be a flash player component as well, but it's
failing for me and can't get to the .mp4. Which doesn't speak well of the
joker who cobbled the site together either.

~~~
SkyMarshal
Especially when the target audience for such a class is probably likely to
have an outsized portion of *nix users.

------
amirmc
_"To view a video you will have to login with your CMU Andrew username and
password, ..._ "

Also, requires Silverlight (which I don't fancy installing)

Edit: This is the Tom Mitchell that Andrew Ng refers to early on in the
Stanford ML lectures (when defining Machine Learning)

~~~
ya3r
You don't have to login to watch videos.

He is the author of one of the must used texts on machine learning: "Machine
Learning, Tom Mitchell, McGraw Hill, 1997."

------
Maven911
I hope this question doesnt come off as too new naive but due to the amount of
links on the front page about ML - what is so fascinating about ML?? Why is
there not the same level of interest/links on topics such as cryptology,
graphics, circuits, comp architecture ?

~~~
law
There's this enormous focus on 'web scale' technologies. This focus
necessarily invokes visualizing and making sense of terabytes and eventually
even petabytes of data; conventional approaches would take thousands or
millions of man hours to accomplish the same level of analysis that computers
can perform in hours or days.

Tom Mitchell's definition of machine learning algorithms as those that
_improve_ their performance at some _task_ with _experience_ is precisely the
way in which humans go about learning what's necessary to perform the same
tasks that formerly took thousands or millions of hours.

For highly dimensional problems, such as text classification (i.e., spam
detection) or image classification (i.e., facial detection), it's almost
impossible to hard code an algorithm to accomplish its goal without using
machine learning. It's much easier to use a binary spam/not spam or face/not
face labeling system that, given the attributes of the example, can learn
which attributes beget that specific label. In other words, it's much easier
for a learning system to determine what variables are important in the
ultimate classification than trying to model the "true" function that gives
rise to the labeling.

~~~
tapertaper
Great comment.

Probably also worth speculating on why this is happening NOW. Why is this
breaking out of CS departments in 2011 and not 2002?

The datasets are new.

Bandwidth? Storage capacity? Computing power? All of the above?

~~~
law
Actually, this has been actively researched since ICs started gaining
widespread usage in the 1970s! Even before that there were plenty of journal
papers produced that deal with the basics of ML and AI.

It wasn't until the 1990s that computers started becoming reasonably priced
and more accessible to researchers and hobbyists that we began seeing an
_exponential_ growth in the amount of research output. In many way, one could
argue that the proliferation and development of AI has very much followed
Moore's law, since these are extremely complex and costly calculations.

Bandwidth increases have certainly increased the availability of data sets
(Google has its entire ngrams data set fully available, and it's multiple
terabytes in size), but storage capacity (hard disk, RAM, and CPU cache) and
computing power have really formed the bottle neck. It's not just storage
capacity, either: I/O read/write times are also immensely important. It's all
just a huge balancing act right now.

------
kky
I love that open source mentality (sharing and collaborating for the love of
the work, community, and result) is reaching higher ed. I can't wait for it to
reach lower ed! If kids start seeing this model at a young age...

------
ya3r
As Tom Mitchell says on the first video, this course is recommended for Phd
students.

------
igrekel
Cool. I'm disappointed that there isn't a video for hidden markov models and
other models for time series tough, just slides. The schedule says that
session is in march, maybe by then there will be a video online.

------
zeratul
Three most important issues in ML are missing for this course:

* Feature selection, Overfitting, Bias-Variance tradeoff

Maybe one of the prof Mitchell's students can make the missing slides
available online?

~~~
law
If I'm not mistaken, that was just a recitation that replaced the regular
Thursday class. It was one of the TAs covering that stuff briefly. All three
topics were covered by Tom Mitchell in previous classes.

