I'll add my thoughts on the topic of "how do you know when you're out of the danger zone and can start marketing yourself as a machine learning expert?"
Do competitions on kaggle (or find them on other sites, but kaggle is definitely the best place to start). Once you get past the point where you are finishing in the middle of the pack (multiple top 10% or 25% finishes and maybe a prize win) then you are an expert. That is proof that you are separate from than hackers who just throw scikit-learn algorithms at a matrix. The people in the master tier use clever feature engineering and/or code up custom learning algorithms to get themselves above the masses. Looking at a problem and figuring out the correct modeling approach is what the experts do. They don't just create a data frame and run down the list of classification algorithms that they have access to. Read the "No Free Hunch" reports on how the winners did it and you'll quickly see the difference between yourself and the experts.
I'll wait a bit in case you want to add a note on this in your own words (via PR). Otherwise, tonight or tomorrow I'll paraphrase you or something. Whether you or I make the change I want it in a branch, and then I'll try to get a bit of review for that branch ...
THANKS AGAIN, the guide really needs some insight like this.
The guide's primary recommended course is Andrew Ng's Machine Learning course. Current session started November 2nd, you must enroll by the 7th. Another session is starting November 30th.
It's not time-sensitive at all. After the course has ended, you can still enter the course and do everything; you just won't get the certificate (which doesn't matter).
What I had in mind is that some people get a lot from the the community features on Coursera, more active while a class is in session. So that's all I meant.
If the author is here, thank you very much for providing this. I wanted to to look it into Jupyter and machine learning and this is probably the right way to start. I tried the course one Udacity for machine learning (Python, Scikit-Learn) but it not my way of learning things,since I like to fiddle around instead of going the straight way . If anyone is interested in an alternative check out the Udacity coure https://www.udacity.com/course/machine-learning-supervised-l... .
Like any topic/skill, it can be learnt, but only if you spend significant time and effort by doing projects, exercises, asking questions (stackexchange, etc.). It's very important to pay attention to fundamentals and thinking from scratch rather than mastering a laundry list of tips/tricks, because fundamental ideas can be composed in different ways and adapted to a new situation. The fundamentals here would be probability, statistics, linear algebra, optimisation.
I started my career in machine learning with absolutely no knowledge in it. It is definitely some thing that you can learn on the job. You do need a background in linear algebra/ statistics to understand the theory behind different algorithms that will help you decide what algorithm to choose (SVM vs Random Forest for ex.).
Like suggested in the other comment, the best place to start is probably by working on projects with open data sets. Try experimenting with different algorithms, feature engineering techniques. This is especially important because there are plenty of algorithms and identifying which algorithm works for which kind of data set is useful.
It's a nice list of resources for starting. General tools he mentions are both easy to start and are used in practice; also, I like the overview part.
But most importantly - it's not a dump of all possible links, making a daunting list "I will never go through".
Source: I run workshops introducing to ML and Big Data (http://workshops.deepsense.io/, next one in London) and I made a lot of choices converging with this one (Python + scikit-learn, everything in Jupyter Notebook, etc). Also, a lot of links there is already in my delicious list of things I am sending to friends wanting to jump into data science (and many of them were already on the HN main page).
I'm curious if you can speak more to this, or share any resources about it. It seems clear that scikit-learn is a good fit for this kind of hacking-learning. If there's a way I can throw in a sentence (with link to more detail), giving context about where it sits in the eyes of experts ... Would be nice.
What is there to be worried about? scikit-learn is a solid, tested implementation of most machine learning algorithms. If you're doing work in Python and want to run your data through a standard ML algorithm, and the algo is implemented by scikit-learn, then just use scikit-learn. If it isn't implemented by scikit-learn, you find some other implementation or implement it yourself.
Experts use all sorts of things: MATLAB, R, Python (with scikit-learn), etc.
What you're saying -- actually every sentence of your comment -- was my existing impression.
tdaltonc said "No one gets fired for using Scikit." Maybe I read too much into this comment, but it seemed to have a negative tone. So I got the impression that tdaltonc might have more to say about it. Maybe not though!
I'd love one such list about AI in general and other sub-fields like NLP/Computational Linguistics as well. I've recently started the Berkeley AI course on EdX along with Russell & Norvig's standard textbook. :)
How does an academic introduction and study in Machine Learning compare to a self taught one? I know it's a shallow question but there has to be some sort of line where the difference opens and closes opportunities.
Are there any quality ML courses (of norvig or ng quality) that uses python or java.
I would love to learn ML concepts, but I really don't have the cognitive bandwidth to learn a new language, which I most likely will never use in my day job (Python, ruby,java).
When I last looked, most of the top quality courses use some variant of proprietary tools or MATLAB, but production code is in python or java (with R sometimes).
I agree - the problem is that there are some problems at my work that I can probably solve by applying some concepts of ML. But I dont think I can do that through matlab.
I have been having a bad day on HN, so before I get misconstrued - there is nothing wrong with matlab. I was just hoping a go-to-production language like Java or Python for learning ML.
1. I dont have matlab, and I dont want to buy it.
2. when you go into production (say.. predicting top customers for an ecommerce site), you are not going to run matlab on the server.
yes - python/pandas/scikit is pretty popular for writing production ML code. The question really is - any good courses ? Most of the top courses I see are using some variant of Matlab to teach.
Do competitions on kaggle (or find them on other sites, but kaggle is definitely the best place to start). Once you get past the point where you are finishing in the middle of the pack (multiple top 10% or 25% finishes and maybe a prize win) then you are an expert. That is proof that you are separate from than hackers who just throw scikit-learn algorithms at a matrix. The people in the master tier use clever feature engineering and/or code up custom learning algorithms to get themselves above the masses. Looking at a problem and figuring out the correct modeling approach is what the experts do. They don't just create a data frame and run down the list of classification algorithms that they have access to. Read the "No Free Hunch" reports on how the winners did it and you'll quickly see the difference between yourself and the experts.