Do competitions on kaggle (or find them on other sites, but kaggle is definitely the best place to start). Once you get past the point where you are finishing in the middle of the pack (multiple top 10% or 25% finishes and maybe a prize win) then you are an expert. That is proof that you are separate from than hackers who just throw scikit-learn algorithms at a matrix. The people in the master tier use clever feature engineering and/or code up custom learning algorithms to get themselves above the masses. Looking at a problem and figuring out the correct modeling approach is what the experts do. They don't just create a data frame and run down the list of classification algorithms that they have access to. Read the "No Free Hunch" reports on how the winners did it and you'll quickly see the difference between yourself and the experts.
I opened an issue ... https://github.com/hangtwenty/dive-into-machine-learning/iss...
I'll wait a bit in case you want to add a note on this in your own words (via PR). Otherwise, tonight or tomorrow I'll paraphrase you or something. Whether you or I make the change I want it in a branch, and then I'll try to get a bit of review for that branch ...
THANKS AGAIN, the guide really needs some insight like this.
The guide's primary recommended course is Andrew Ng's Machine Learning course. Current session started November 2nd, you must enroll by the 7th. Another session is starting November 30th.
What I had in mind is that some people get a lot from the the community features on Coursera, more active while a class is in session. So that's all I meant.
Like suggested in the other comment, the best place to start is probably by working on projects with open data sets. Try experimenting with different algorithms, feature engineering techniques. This is especially important because there are plenty of algorithms and identifying which algorithm works for which kind of data set is useful.
It's loaded with useful R snippets and practical examples.
But most importantly - it's not a dump of all possible links, making a daunting list "I will never go through".
Source: I run workshops introducing to ML and Big Data (http://workshops.deepsense.io/, next one in London) and I made a lot of choices converging with this one (Python + scikit-learn, everything in Jupyter Notebook, etc). Also, a lot of links there is already in my delicious list of things I am sending to friends wanting to jump into data science (and many of them were already on the HN main page).
BTW: See also discussion on the same post on DataTau: http://www.datatau.com/item?id=10093
No one gets fired for using Scikit.
Experts use all sorts of things: MATLAB, R, Python (with scikit-learn), etc.
tdaltonc said "No one gets fired for using Scikit." Maybe I read too much into this comment, but it seemed to have a negative tone. So I got the impression that tdaltonc might have more to say about it. Maybe not though!
This is a really good list of resources on Machine learning and has a section dedicated to NLP/Text mining.
I would love to learn ML concepts, but I really don't have the cognitive bandwidth to learn a new language, which I most likely will never use in my day job (Python, ruby,java).
When I last looked, most of the top quality courses use some variant of proprietary tools or MATLAB, but production code is in python or java (with R sometimes).
I have been having a bad day on HN, so before I get misconstrued - there is nothing wrong with matlab. I was just hoping a go-to-production language like Java or Python for learning ML.
That being said, you can do ML completely in Python.
yes - python/pandas/scikit is pretty popular for writing production ML code. The question really is - any good courses ? Most of the top courses I see are using some variant of Matlab to teach.