They haven't been very satisfied with it. It's co-taught by two professors, one who teaches like it's an introduction for people who have never heard of Bayes' theorem and one who teaches like it's a graduate seminar for people who've seen it all before.
See Prof's Yaser's 1 min overview: http://www.youtube.com/watch?v=KlP0DpiM7Lw
The "Learning from Data Book" videos are online for free, and the book is on Amazon...
The course is also availble on EdX: https://www.edx.org/course/caltechx/cs1156x/learning-data/11...
Honestly, this is not a course that I would recommend. The most problematic part of this course is its lack of clear outline. It jumps between different fields of machine learning, which could have fundamentally different focuses and motivations, without illustrating the connections to the students. It talks about Watson-Nadaraya classiﬁer in the second class, then we have two lectures to explain most basic naive bayes algorithm. I just don't get it.
Though it gets me confusing a lot of times, the course is useful in a way that gives me a lot of keywords to search for and read article about.Also the homeworks might be challenging some time, working through them did improve my understanding of something I might think trival before, like the linear regression stuff.
And if you are really interested, I would recommend a book, which covered most of the materials of the course while being much more organized:
A refresher in linear algebra will also help~
: Wiki with code, exercises and explanation
: Video lecture one with a recap on backprop
: Video lecture two on Sparse Auto Encoders
I've been following the self paced AI class in Udacity https://www.udacity.com/course/cs271
Statistical Learning Theory and Applications
I think it is interesting how different ML courses can have such different emphasis in content. The MIT course is all about regularization.
It would have been a piece of cake in college, but I have not used that math in a long time :-/
Warning: History shows that the
US economy looked at the material
in operations research, statistics,
and the mathematical sciences and
rolled their eyes, did a big upchuck,
laughed, turned, and walked away.
One might look for alarms from their
hype and fad detectors.
Homework is much better in the Caltech course, too. In the Coursera course, they give you programs and environments in Octave that are all prewritten for you, and you just need to plug in a few key lines (often there's essentially one way to do it due to dimensionality). You feel like you understand what's going on, but the understanding is not really grounded. The Caltech course has multiple choice questions, but they look like this: "implement this algorithm, run it through a data set chosen randomly with such and such parameters, calculate learning error, do all this 1000 times and average. What value out of these 5 is your learning error closest to?". You choose the language, you implement the algorithm from scratch, you debug the hell out of it, you visualize your data to understand what's wrong... then the knowledge and the understanding stay with you.
I'm currently doing both the Coursera course and the Caltech course concurrently. I really like the level and delivery style of the Caltech course. It covers a lot of material, with good depth and rigour where needed and with a lot of colour. Makes you want to jump and try the techniques out.
In contrast the Coursera course seems a bit easy and dry. I also dislike the dependency on Octave.
That said, the course covered a lot of ground and touched on a number of different interesting/important topics. Some of the lecture material was a bit disorganized/had errors and didn't flow all that well from one topic to another but there was a lot of good material there, especially if you had enough background to appreciate it. I was comfortable enough but it was obvious that the expectations set by the prereqs were off.
Hopefully the course will run again with most of the kinks worked out and, perhaps, a better level-setting of what's needed to get the most out of the course.
Because the learner is quite a good fit for the task, it performs better in terms of speed/accuracy trade-off than many other algorithms, such as CRF.
A follow up post for statistical dependency parsing should be finished in about a month (it's down my queue...)
Source: Am a CMU 2012 grad.
1971–75: DARPA's frustration with the Speech Understanding Research program at Carnegie Mellon University
In the case of Harvard's CS50x, it was essentially the exact same course. (Plus it helped that David Malan is an outstanding teacher.)