
Ask HN: What is the best way to learn Machine Learning in Python? - karan_dev
I am comfortable in coding, also familiar with python programming.
What approach should i follow to learn machine learning with in SHORT TIME.<p>Should i start with a book(if yes which one), or with a machine learning library or with a project or with complete machine learning algorithm implementation in python.<p>Please provide me step by step guide which i should follow (with the source links and references[if possible]) to learn machine learning in one or two months.
======
drallison
Machine Learning is a sub-field of computer science and an area of intense
current research. It has nothing to do with Python (a programming language)
except that some machine learning algorithms might be implemented in Python.

You might find Andrew Ng's Stanford Coursera course a good place to start.
[https://www.coursera.org/learn/machine-
learning/home/info](https://www.coursera.org/learn/machine-
learning/home/info).

~~~
craigching
"Introdocution to Statistical Learning" by Trevor Hastie et al. [1] They have
a free online class through Stanford [2] Sign in to their system and you can
take the archived version for free.

ISL is an excellent, free book, _introducing_ you to ML, you can go deeper,
but, to me this is where I wish I'd started. I am taking the Data Science
track at Coursera (on Practical Machine Learning now) and I am kicking myself
that I didn't start with ISL instead.

Now, I know you specifically asked about Python, but the concepts are bigger
than the implementation. All of these techniques are available in Python's ML
stack, scikit-learn, NumPy, pandas, etc. I don't know of the equivalent of ISL
for Python, but if you learn the concepts and you're a programmer of any
worth, you will be able to move from R to Python. Maybe take/read ISL, but do
the labs in Python, that might be a fun way to go.

Lastly, to go along with ISL, "Elements of Statistical Learning" also by
Hastie et al is available for free to dive deeper [3]

[1] -- [http://www-bcf.usc.edu/~gareth/ISL/](http://www-
bcf.usc.edu/~gareth/ISL/)

[2] --
[https://lagunita.stanford.edu/courses/HumanitiesandScience/S...](https://lagunita.stanford.edu/courses/HumanitiesandScience/StatLearning/Winter2015/about)

[3] --
[http://statweb.stanford.edu/~tibs/ElemStatLearn/](http://statweb.stanford.edu/~tibs/ElemStatLearn/)

~~~
cschmidt
I also think this is one of the best entry level books, and the Stanford
course looks good. This is what I recommend to people. In some ways, R is a
very good match for this material, and you could move to python later.

------
88e282102ae2e5b
I don't mean to be blunt, but I don't think you're going to get what you want
out of machine learning if you still need people to give you step-by-step
instructions.

~~~
bra-ket
that's how education works

~~~
88e282102ae2e5b
OP clearly has not even googled this. "python machine learning" pulls up many
easily-accessible articles meant for beginners with no background in machine
learning. The scikit-learn website is chock full of tutorials meant for
beginners, with code examples!

How is someone with this little motivation going to learn something so
complex? I want to allocate my time helping people who at least try first.

~~~
icpmacdo
I dont think hacker news is a bad place to ask a question like this. The most
helpful of the answers on a popular post are going to eliminate a lot of low
quality content that your going to come across with Google.

------
jawns
Step-by-step guide, with source links and references?

I don't think that's necessarily something one can (or ought to) expect to
order up on Hacker News.

~~~
theseatoms
This. "Please provide ..." is pretty demanding. At least be courteous when
asking for free labor.

That said, I'm also interested in the topic. As others are acknowledging, it's
a broad field and one really needs to focus on well-defined projects in order
to learn anything tangible.

~~~
jmount
I've seen the phrase "please provide" a lot. And I agree with you, I've never
seen it actually used politely- it is always comes off like a strong demand.
Maybe part of the problem is it is a stock phrase on exams.

~~~
chc
I suspect that a lot of them might not be native English speakers (e.g. OP's
username suggests that he is from India, where a lot of people speak English,
but it is relatively few people's native tongue).

~~~
karan_dev
Yes, I am from India, And it was my first question on Hacker News. Apologies
if i offended you with my language, I should not have used those combination
of words.

~~~
theseatoms
Hey, sorry about calling you out. My bad.

------
scuba_man_spiff
I enjoyed this book you may want to check out.

[http://www.amazon.com/Machine-Learning-Python-Techniques-
Pre...](http://www.amazon.com/Machine-Learning-Python-Techniques-
Predictive/dp/1118961749)

The main thing to understand though is that machine learning is a big topic,
and you aren't going to be able to become an expert in two months.

Narrow down to a specific area, or type of problem, and focus on learning
techniques and tools for that.

My guess is that there's something your working on or want to work on which is
why want to learn. If that's the case, I'd recommend that read up a bit to
give yourself a good understanding of the different kinds of problems out
there (classification, prediction, anomaly detection, etc...), and different
classes of tools available, and then pick a simple real world problem to try
to tackle that is similar.

The best way to really learn is going to be getting hands on with a project
and suffering through after you've read up a bit to understand the basics.
Then when you hit something can't wrap your head around, search and read
articles (or talk to someone with experience and expertise) until it clicks
and you can proceed on working through.

By the end you'll have a good grasp of at least one technique, and be in a
great place to keep learning more.

------
chipmonkey75
My personal preference is to learn by doing, and the best place I've found for
this particular task is Kaggle
([http://www.kaggle.com](http://www.kaggle.com)). They have a variety of
datasets and scored data mining tasks, great forums for every level, code
examples, and even a set of tutorials specifically for learning scikit (one of
Python's machine learning libraries): [http://blog.kaggle.com/2015/04/08/new-
video-series-introduct...](http://blog.kaggle.com/2015/04/08/new-video-series-
introduction-to-machine-learning-with-scikit-learn/)

I don't have any relationship with Kaggle other than being a semi-active user,
but I really dig what they've got going. For a step-by-step approach, start
with their blog posts and work on their "Getting Started" competitions.
Everything you need is there.

~~~
apeeyush
I created a github repo ([https://github.com/apeeyush/machine-
learning](https://github.com/apeeyush/machine-learning)) to store and organize
the codes I used in Kaggle contests (mainly knowledge contests). Recently, I
have participated in some vision and CTR prediction contests as well but could
not update them here since the code is still very hacky. Will really
appreciate any contribution from the community.

~~~
craigching
That's a nice repository, thanks for sharing! I'll be combing through that as
I make the transition from R to Python ;)

------
loumf
I liked "Programming Collective Intelligence"
[http://www.amazon.com/Programming-Collective-Intelligence-
Bu...](http://www.amazon.com/Programming-Collective-Intelligence-Building-
Applications/dp/0596529325), but it might be a little dated (in not using the
latest libraries). It's a good way to learn some simple algorithms
(optimization, clustering).

Also, rather than learning ML in 2 months (which is a very unfocussed and
unattainable goal) -- try to narrow it down to some problem domain. You'd get
better recommendations if you are more specific.

------
rapid_snail
Machine learning is a very large field - you shouldn't expect to learn it in
one or two months. Maybe you will be able to scratch the surface and learn to
implement a few learning algorithms.

I would recommend starting with scikit-learn.

------
sozerberk
[1] -- First, you need to learn machine learning(ML) basics. Andrew Ng's
course on Coursera is a good start: [https://www.coursera.org/learn/machine-
learning/home/info](https://www.coursera.org/learn/machine-learning/home/info)

It doesn't teach you ML with Python but it is extremely important to learn the
ML concept without any programming language in mind. In addition to that
course, any Google search will help you a lot. There are a lot of good
explanations of ML concepts on various websites. If you don't understand how
algorithms work, you will end up with copying and pasting example codes
without knowing what you're doing. You need to imagine what you want to do in
your head before you type any letter.

[2] -- Once you have the initial introduction, you can use Python to implement
ML concepts. Fortunately, Python has a very easy to learn ML package: Scikit-
learn ([http://scikit-learn.org](http://scikit-learn.org)). It's free and is
used by various companies such as Spotify and Evernote. Scikit-learn has a
great documentation and many examples that will make the whole learning
process exciting.

[3] -- After you feel comfortable with ML in Python, if you don't have
datasets of your own, you can find a lot of datasets on UC Irvine's machine
learning repository:
[http://archive.ics.uci.edu/ml/](http://archive.ics.uci.edu/ml/)

The more you practice, the more comfortable you feel with playing with data.
To cover a ML technique very well, play with every single parameter of the
scikit-learn functions of that technique by using the same dataset. Also,
always try to include visualization of the data (scikit-learn has examples
with matplotlib to learn from how to do it) so you can actually see the
changes of the implementation when parameters of the function change. This
will make everything a lot easier.

Good luck!

------
century19
The is an edX course going that covers Machine Learning with Python, though it
does require "...familiarity with basic machine learning concepts".

"All exercises will use PySpark, but previous experience with Spark or
distributed computing is NOT required. "

[https://www.edx.org/course/scalable-machine-learning-uc-
berk...](https://www.edx.org/course/scalable-machine-learning-uc-berkeleyx-
cs190-1x)

------
bit2mask
Coursera Machine Learning is one of the best places to start, but there are
countless resources. There's a huge, wonderful open list of links at
Github[1]. I definitely recommend you take a look at it

There are also another great resources online, like those I list below:

1.) In-depth introduction to machine learning in 15 hours of expert videos[2]

2.) Deep Learning Tutorial (@ ufldl.Stanford.edu/tutorial/, can't post the
link because I'm out of mana, I mean, not enough reputation yet)

[1]: [https://github.com/josephmisiti/awesome-machine-
learning](https://github.com/josephmisiti/awesome-machine-learning)

[2]: [https://www.dataschool.io/15-hours-of-expert-machine-
learnin...](https://www.dataschool.io/15-hours-of-expert-machine-learning-
videos/)

------
burningion
I just started going down this path. I began with using audio analysis to do
some machine learning. (Detecting a specific audio pattern very easily
recognizable to humans). Can't get too specific about it, as it's under NDA.
But I had a little under two weeks to get a prototype built that either proved
or disproved it would be possible.

The very first thing I did was take a step back and understand the domain of
the data I was working with, and what the best way to present it for machine
learning would be. In my case, I had to understand what the best format for
presenting my audio would be (slightly modified MFCCs), and what the best
library would be to get my data in that format.

Next, I needed to build a data set of proper training data. This mean I had to
manually build a (largish) data set that matched exactly what I was looking
for. So I went and downloaded a bunch of example audio, and then manually went
through it, tagging it into the two bins I was looking to differentiate
against.

Once I had this, (which actually took much more time than the learning
itself), I was ready to do the actual machine learning itself. I used Theano,
and figuring out how to translate my dataset into a format digestible by
Theano took another chunk of time. Once I had my data in the proper format for
Theano, it came down to basically playing with how I presented my initial data
to Theano, and then tweaking my gradient.

Finally, I was able to train and get a net that was about 80% right with my
hypothesis. There were a few edge cases I hadn't anticipated that wouldn't
necessarily work well, but it gave us enough confidence to go through with
more machine learning for our project.

So, takeaway suggestions: find a real project, something you want to learn,
and then just do it. Gather knowledge of your data, build a dataset, and test
a hypothesis. Most of this isn't machine learning, it's mostly just moving and
shaping data, and knowing what in your data is significant. The machine
learning algorithms are really just a tiny piece of the whole picture. Good
luck.

------
pjungwir
Machine learning is a pretty big field. The Coursera course is very good. It
uses Octave not Python, but what you learn will be easy to transfer. It is
mostly focused on neural networks. If you don't already know linear algebra
you should probably learn that first.

These are three very good O'Reilly books that all use Python:

\- Programming Collective Intelligence: A broad and shallow survey of
automated machine learning techniques.

\- Data Analysis with Open Source Tools: Also a survey. More focused on manual
data exploration.

\- Python for Data Analysis: A pandas tutorial (and more). Very helpful to
learn the ML tools in the python ecosystem.

Fitting all that into two months sounds challenging.

------
gavinh
Shameless plug: [http://www.amazon.com/Mastering-Machine-Learning-With-
scikit...](http://www.amazon.com/Mastering-Machine-Learning-With-scikit-
learn/dp/1783988363)

------
jmount
Which do you want to know? (as it affects the answer greatly). And how short
is "SHORT TIME"? For certain small values of "SHORT TIME" the answer is come
back when you have more time.

How to apply machine learning using Python? (then scikit learn related
materials).

How to tinker with machine learning implementations? (then which one are you
trying to tinker with and what problem that isn't solved in the standard
libraries is your concern?)

The theory of machine learning? (then "The Elements of Statistical Learning"
and "An Introduction to Statistical Learning", but that is in R not in Python)

------
selleck
Peter Norvig's Artificial Intelligence:

[http://www.amazon.com/Artificial-Intelligence-Modern-
Approac...](http://www.amazon.com/Artificial-Intelligence-Modern-
Approach-3rd/dp/0136042597/ref=sr_1_1?ie=UTF8&qid=1437143592&sr=8-1&keywords=norvig+ai)

Has plenty of examples in Python. You can also look at different Udacity
courses. They have a couple dealing with ML with Python.

~~~
craigching
I have Norvig's book, I'm not sure I'd recommend that as an _introduction_ ;)
Awesome book though!

~~~
plinkplonk
AIMA is about as introductory as these texts get (and still be valuable). It
in an undergrad textbook after all.

~~~
craigching
I guess my point is that it's such a broad overview of all topics that fall
under artificial intelligence that you don't get much of a good introduction
to applying machine learning. But point taken, you're right, it is an
introductory text.

------
armabiz
From my favorites list, very easy introduction tutorial about ML, good as
starting point:

[http://radimrehurek.com/data_science_python/](http://radimrehurek.com/data_science_python/)
\- Practical Data Science with spam detection example (Machine Learning, NLP,
sklearn, Python).

------
hootguy
O'Reilly's publishing Introduction to Machine Learning with Python by Sarah
Guido and Andreas Mueller in January 2016.

[http://shop.oreilly.com/product/0636920030515.do](http://shop.oreilly.com/product/0636920030515.do)

------
Isamu
I've been looking at this deep learning tutorial in Python:

[http://deeplearning.net/tutorial/contents.html](http://deeplearning.net/tutorial/contents.html)

Has the advantage of a Python framework (Theano) specifically for deep
learning.

------
ldom22
use lolpython
[https://en.wikipedia.org/wiki/LOLCODE](https://en.wikipedia.org/wiki/LOLCODE)

------
andreasvc
Check out scikit-learn and its excellent documentation.

~~~
MrLeap
Its excellent documentation has been down for 24 hours due to a failure at
sourceforge. :(

[https://twitter.com/sfnet_ops?original_referer=http%3A%2F%2F...](https://twitter.com/sfnet_ops?original_referer=http%3A%2F%2Fscikit-
learn.org%2F&profile_id=22783784&tw_i=621859945487581184&tw_p=embeddedtimeline&tw_w=347335110670049280)

~~~
andreasvc
alternative: apt-get install python-sklearn-doc

