Hacker News new | past | comments | ask | show | jobs | submit login
A Course in Machine Learning (ciml.info)
229 points by sebg on Sept 15, 2013 | hide | past | favorite | 39 comments

For those who might not know, Hal Daumé III is a highly respected researcher in machine learning, spending a lot of time working in Natural Language Processing (NLP). He also contributes his knowledge to many interesting open source projects. After creating an algorithm called SEARN for solving hard structured prediction problems, Daumé et al. went and created a practical and open implementation of the algorithm in Vowpal Wabbit, an online machine learning library built for speed and used at Yahoo and Microsoft Research amongst other places. He's the sort of academic who contributes not just to theory, but also to practice. Browsing through his publications gives an idea of his contributions[1].

Given all that, I'm looking forward to his take on machine learning and for another valuable resource to be available to everyone.

[1]: http://www.umiacs.umd.edu/~hal/publications.html

Nice. He was a fellow PhD student at USC/ISI, I remember him being the dude who always walked around barefoot.

Seems like he's done really well since, I will definitely take a look at this.

It looks very well organized, and that he's given thought to "How to teach" as well as "What to teach." Makes me wish I had 3 free hours a night over the next month. :-)

OP - Thanks for sharing!

Hal uses this as the textbook for his undergrad machine learning course. Having recently taken that course (in the spring 2013 semester), I feel justified saying that this book needs a lot of work before it's usable as a textbook or even a learning tool. Most of the time this text served primarily as an initial "dereference" of ideas yielding not concrete information, but a series of other pointers that I'd need to chase on Google before getting anything usable.

He clearly has high hopes for CIML, based on all the infoboxes and chapters that are still incomplete; it's a shame he'll likely need to get tenure before finishing it.

(This should in no way be taken as a slight on Hal's teaching; his class was the best sort of challenging in that it required a large amount of work, and yielded a correspondingly large amount of insight. He's also a fantastic lecturer and a fair grader.)

> "... it's a shame he'll likely need to get tenure before finishing it."

Judging by his homepage [1] he is at least on the tenure track. Unless the notion of an Assistant Professor is different over in the states. I agree about the sentiment about the book, at least the chapter on Neural Networks needs some serious work (I believe there are errors in there, maybe I should mail Hal). As a person and researcher I only have good things to say about him, do check out it his blog if you are into ML and NLP [2] and let's hope he will find the time to continue polishing on this freely available book (I do like his writing).

[1]: http://www.umiacs.umd.edu/~hal/

[2]: http://nlpers.blogspot.com/

I wish more people followed what Hal writes in here :

> A second goal of this book is to provide a view of machine learning that focuses on ideas and models, not on math. It is not possible (or even advisable) to avoid math. But math should be there to aid understanding, not hinder it.

No book (yes not even PRML - Bishop), follows this diligently.

This is a harder problem than it sounds, and something I've given a lot of thought to. I think the underlying issue is that all machine learning was discovered through a combination of applied math and intuitive ideas/models. Without the intuitive model no one would have thought to discover the method, and without the math the intuitive idea would be a pipe dream. Both are fundamentally linked, and it's a bad idea to separate them.

For example, one of the most basic and oldest statistical methods is linear regression. The first thing anyone will tell you when they are teaching it is the basic "idea and model" - finding a line that fits a scatter plot (and then extending that idea). But this doesn't give you a real understanding of linear regression: where does that line come from, and why does a clean, algorithmic solution exist? Why does the standard solution often lead to numerical errors, and why is regularization a valid solution?

These questions require an increasing amount of math, but they are essential to really understanding linear regression. I agree that some textbooks just throw you a wall of math, but many actually do a solid job of explaining what is going on as they do so.

If you don't want to know the details of machine learning methods, which are inherently mathematical, you might as well as just remember the names of libraries that implement the solutions for you.

Completely agree. However one thing that I think is missing in most books with a lot of math is the opportunity to use programming to help teach and communicate the math in question.

A great example of a book that communicates abstract mathematical concepts via algorithms is the Little Schemer. Ironically, it doesn't even set out to communicate the math, but actually just uses the math to communicate other programming ideas like recursion. That, however, doesn't diminish the fact that it demonstrates ways to teach math through programming and algorithms.

Another book which also does a good job at using programming to demonstrate more concrete math is Allen Downey's "Think Stats" book. All through the book, you learn the mathematical concepts of statistics through hands on programming activities.

There definitely is a chicken and egg problem in areas like machine learning because unlike the above resources which have only one layer of abstraction to cross, machine learning presents two layers of programming<->math abstractions to cross for most people who decide to learn it. To really understand and apply machine learning you need to understand the math and models behind it. However the math and models are presented in pure form that makes it difficult to grok unless you arrived at the resource with a classically-trained mathematical background. I would hope that given that the target market for such learning resources it not mathematicians but programmers, that such learning resources would present content to help you arrive at the math from a programmer's point of view.

You are right, it is indeed difficult and I never meant to say it is easy. My tripe was with the people who intentionally make it difficult. The thing is that you can learn ML with minimal (not minimum) math and that is what should be tried to achieved. I would be curious to know the textbooks you talk about - I have read quite a lot of those and struggled with them.

For example - Linear regression is just brilliantly tackled by Andrew Ng in his ML course (CS229 at Stanford Lecture Notes - not Coursera).

One trouble is that the "idea" you've just communicated isn't actually the idea behind linear regression. The idea behind regression is this.

You're trying to make a prediction for some number you care about - let's say the value of a given stock price. You've come up with a set of hypotheses about which characteristics might help you make that prediction. Moreover, you have a set of examples that you've witnessed in the past, and you want to learn from that experience.

Using linear regression, you can test those hypotheses. You turn those characteristic features into a quantifiable number themselves. Linear regression is simply the name we give to the process of testing whether there's any validity to your hypothesis. If that hypothesis is true and you've discovered what makes the stock go up, then the corresponding feature will be given a high absolute coefficient. You'll also know whether it's an indicator of the stock going up or down, based on the sign of that coefficient. There's no math involved - you're testing your own intuition about how to make predictions.

The idea behind linear regression isn't "finding a line that fits a scatter plot." That's still math and it's still unhelpful to many people with serious, real-world applications. It's just an abstraction of the math that happens to leave out the formal representation.

To really communicate ideas in application, you need to move past the math entirely, and get to how it ties into people's judgments about data that they know well and have experience with, and show them that the intuition they can bring to the table is valuable (for feature determination). Otherwise, even scatter plots will often shut people out, because they "aren't good at math."

Hey there, I'd like to follow up on this point (and some other comments you've posted on HN). Do you mind if I email you?

Personally, I find Bishop's book to hit on a good balance between developing intuition and presenting the math to give precision to understanding (the opposite of this would be Murphy's book--the math seems too esoteric and disjointed to be useful). I have a hard time feeling confident I thoroughly understand a concept without having precision in the presentation of the math behind the concept.

There is certainly something to be said for courses and books that can present a complex idea without requiring a graduate-level degree of math literacy. But at the end of the day, ML is a subfield of mathematics so not having a thorough grasp of the math underlying it will definitely hinder your understanding.

Yeah, of what use can mathematics be, when you can just study "ideas and models"...

Edx is also offering the caltech ML course in mooc format: https://www.edx.org/course/caltechx/cs1156x/learning-data/11...

Note that this is not a watered-down course! It's the same course as the students at Caltech are taking, and in fact they will have the option not to attend the lectures and watch them online instead. The homework assignments will be the same ones as well (AFAIK).

I've taken the previous (non-edX) version and this is by far one of the best and clearest MOOC (and even offline course) I've taken.

If you are interested in this you might want to also look at Andrew Ng's (Stanford) Machine Learning course that is starting soon on Coursera.


Here's a comment which I had written earlier on another article. The context was about learning ML with Python - of course the objective of Hal is more generic but some parts of it apply here too.


The book details building ML systems with Python and does not necessarily teach ML per se. It is a good time to write a ML book in Python particularly keeping in mind efforts to make Python scale to Big Data [0].

What material you want to refer to is entirely dependent on What you want to do?. Here are some of my recommendations-

Q : Do you want to have an "Introduction to ML", some applications with Octave/Matlab as your toolbox?

A :Take up Andrew Ng's course on ML in Coursera [1].

Q : Do you want to have a complete understanding of ML with the mathematics, proofs and build your own algorithms in Octave/Matlab?

A : Take up Andrew Ng's course on ML as taught in Stanford; video lectures are available for free download [2]. Note - This is NOT the same as the Coursera course. For textbook lovers, I have found the handouts distributed in this course far better than textbooks with obscure and esoteric terms. It is entirely self contained. If you want an alternate opinion, try out Yaser Abu-Mostafa's ML course at Caltech [3].

Q : Do you want to apply ML along with NLP using Python ?

A : Try out Natural Language Tool Kit [4]. The HTML version of the NLTK book is freely available (Jump to Chapter 6 for the ML part) [5]. There is an NLTK cookbook available as well which has simple code examples to get you started [6].

Q: Do you want to apply standard ML algorithms using Python?

A : Try out scikit-learn [7]. The OP's book also seems to be a good fit in this category (Disclaimer - I haven't read the OP's book and this is not an endorsement).

[0] http://www.drdobbs.com/tools/us-defense-agency-feeds-python/....

[1] https://www.coursera.org/course/ml

[2] http://academicearth.org/courses/machine-learning/

[3] http://work.caltech.edu/telecourse.html

[4] http://nltk.org

[5] http://nltk.org/book/

[6] http://www.amazon.com/Python-Text-Processing-NLTK-Cookbook/d....

[7] http://scikit-learn.org

Is this worth going through over picking up a textbook or two? I've found that Coursera courses are actually quite bloated. Lots and lots of empty talking, and very little substance.

I've taken the Coursera ML class. It's very easy if you have the adequate math background. And it's not very comprehensive, there are lots of machine learning methods that are not covered. So it's more like an introductory course to machine learning.

But it's absolutely commendable how Andrew Ng takes the topic to such an understandable level that a clever high schooler who knows a little about programming could take the course. There's even extra videos serving as a crash-course to linear algebra and octave programming in the first week.

So he really manages to make the topic accessible to a very large audience.

>adequate math background

Do you know what kind of math is needed other than linear algebra.

Basic vector and matrix operations. The first half of a typical freshman linear algebra course is more than enough. But like I said, there is a matrix review in the beginning, so if you're willing to study those extra lectures, then almost no prior knowledge is needed.

Also being able to take derivatives helps in a couple of places, but is not necessary.

Thats linear algebra.

Some calculus and linear algebra. The majority is linear algebra.

Haven't done the course, but from the preview of the videos, they cover the math you need and it's just basic, high-school level knowledge of linear algebra.

For a beginner to machine learning I'd recommend Andrew Ng's course notes and lectures over any textbook I've seen. But I prefer his Stanford CS 229 notes to Coursera for exactly the reasons you state: they are watered down. After you really can understand Andrew Ng's course notes I'd recommend a textbook because they go in more detail and cover more topics. My two favorites for general statistical machine learning are:

* Pattern Recognition and Machine Learning by Christopher M. Bishop

* The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman

Both are very intensive, perhaps to a fault. But they are good references and are good to at least skim through after you have baseline machine learning knowledge. At this stage you should be able to read almost any machine learning paper and actually understand it.

Isn't Murphy's book more up to date and comprehensive as a reference?

Edit: Andrew Ng's Coursera course is CS229A (http://cs229a.stanford.edu/), not really watered down.

I'm a big fan of Murphy but its comprehensiveness means you lose some detailed explanations. Bishop really gets at those details (so does EoSL).

Depends on the course, but other a than a select few, I tend to agree. Not the case with edX courses, though, I think they are up to a much higher standard. (At least the programming/math ones).

Just wanted to add another textbook option that's of comparable breadth and quality, but easier to read than bishop. There are pre-print pdfs probably available online with a little searching. The exercises and accompanying data are really fun imo. http://www.cs.ubc.ca/~murphyk/MLbook/index.html

It's an excellent book, good balance of rigor and code/graphs/visuals to develop intuition, but pls see errata before buying


(and the freely available content books by Barber, McKay and Hastie/Tibshirani/Friedman (ESL) are all worth close reads

I think he needs to re-run latex a couple times before releasing that pdf. Fair few ?? in there...

The lines are not properly justified, so it appears that the output is not generated using TeX. I fail to see why would anyone choose left flushed alignment (with hyphenation!) when using TeX?

The book seams accessible.

This subject is difficult and requires a lot of time to understand and put something useful into practice. AI books appear frequently these days. I feel that most of books and tutorials fail to deliver good practical examples and meta-code, focusing more on mathematical proofs.

I am struggling to understand Peter Abeel's aprenticeship learning: http://citeseerx.ist.psu.edu/viewdoc/download?doi=

He can resume the thing in a couple of slides. It seams short and simple (!?) but I think I am still away from truly understand it. Maybe next year.

The pages are watermarked, "Draft. Do not distribute". I wouldn't rely so much on these lectures unless they stabilize.

Doesn't make them any less worth reading.

If the content is wrong or suboptimal it makes them less worth reading. If the guy hasn't spent time thinking about the content of his slides it might be a confused and confusing jumble of nonsense: sometimes the best researchers are the worst teachers.

please change the logo! I'm not trying to be jerky, it's always remarkable that someone wants to share his knowledge and I know probably it's not your priority, but the book and website will be better off without it.

also why do you have the topics "introduction to machine learning" and "decision trees" in the same chapter?

this looks like a pleasant introduction to machine learning

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact