As someone who works in machine learning, I have mixed feelings about this article. While encouraging people to start learning about ML by demystifying it is a great thing, this article comes off as slightly cocky and dangerous. Programmers who believe they understand ML while only having a simplistic view of it risk not only to create less-than-optimal algorithms, and might instead create downright dangerous models:
http://static.squarespace.com/static/5150aec6e4b0e340ec52710...
In the context of fraud detection (one of the main areas I work in these days), a model that is right for the wrong reasons might lead to catastrophic losses when the underlying assumption that made the results valid suddenly ceases to be true.
Aside from the fact the techniques he mentioned are some of the simplest in machine learning (and are hardly those that would immediately come to mind when I think "machine learning"), the top comment on the article is spot on:
> "The academic papers are introducing new algorithms and proving properties about them, you’re applying the result. You’re standing on giants’ shoulders and thinking it’s easy to see as far as they do."
While understanding how the algorithm works is of course important (and I do agree that they are often more readable when translated to code), understanding why (and when) they work is equally important. Does each K-Means iteration always reach a stable configuration? When can you expect it to converge fast? How do you choose the number of clusters, and how does this affect convergence speed? Does the way you initialize your centroids have a significant effect on the outcome? If yes, which initializations tend to work better in which situations?
These are all questions I might ask in an interview, but more importantly, being able to answer these is often the difference between blindly applying a technique and applying it intelligently. Even for "simple" algorithms such as K-Means, implementing them is often only the tip of the iceberg.
> Aside from the fact the techniques he mentioned are some of the simplest in machine learning (and are hardly those that would immediately come to mind when I think "machine learning")
The primary point of this article was the contrast between something as simple as K-Means, and the literature that describes it. It wasn't meant as a full intro to ML, but rather something along the lines of "give it a try, you might be surprised by what you can achieve".
> Even for "simple" algorithms such as K-Means, implementing them is often only the tip of the iceberg.
Yup. But getting more people to explore the tip of the iceberg is, in my opinion, a good thing. We don't discourage people from programming because they don't instantly understand the runtime complexity of hash tables and binary trees. We encourage them to use what's already built knowing that smart people will eventually explore the rest of the iceberg.
Thanks for responding. I fully agree with your comment -- as I said, I too think many people are sometimes put off by the apparent complexity of machine learning, and demystifying how it works is a great thing.
Unfortunately there's always a risk that a "hey, maybe this isn't so hard after all" might turns into a "wow, that was easy". While I think the former is great, the latter is dangerous because machine learning is often used to make decisions (sometimes crucial, for example when dealing with financial transactions), so I would argue more care should be taken than if we were talking about general purpose programming: if you trust an algorithm with making important business decisions, then you better have an intimate knowledge of how it works.
While I again agree with the underlying sentiment, I was just a bit disappointed that it seems to invite the reader to be satisfied of himself rather than motivate him to dig deeper. Nothing a future blog post can't solve though!
In the context of fraud detection (one of the main areas I work in these days), a model that is right for the wrong reasons might lead to catastrophic losses when the underlying assumption that made the results valid suddenly ceases to be true.
Aside from the fact the techniques he mentioned are some of the simplest in machine learning (and are hardly those that would immediately come to mind when I think "machine learning"), the top comment on the article is spot on:
> "The academic papers are introducing new algorithms and proving properties about them, you’re applying the result. You’re standing on giants’ shoulders and thinking it’s easy to see as far as they do."
While understanding how the algorithm works is of course important (and I do agree that they are often more readable when translated to code), understanding why (and when) they work is equally important. Does each K-Means iteration always reach a stable configuration? When can you expect it to converge fast? How do you choose the number of clusters, and how does this affect convergence speed? Does the way you initialize your centroids have a significant effect on the outcome? If yes, which initializations tend to work better in which situations?
These are all questions I might ask in an interview, but more importantly, being able to answer these is often the difference between blindly applying a technique and applying it intelligently. Even for "simple" algorithms such as K-Means, implementing them is often only the tip of the iceberg.