

Machine Learning with scikit-learn - derpapst
http://amueller.github.io/sklearn_tutorial/#/

======
ColinWright
I cannot tell you how much I hate these drip-feed presentations. There isn't
even an indication of how long it goes for. The early stuff is obvious (for
me) - how many times do I have to click to get to the interesting bits?

There might be some great stuff here, but many of your potential audience will
never find out, because they'll give up.

~~~
jkldotio
Most reveal.js based presentations you can press "esc" and get a slide
overview, although that's of limited use. Slides in general are simply not as
useful as a reasonable web page if they don't have the presentation along with
them in audio or video. It's a peculiar part of HN culture to link to just
slides, I don't know of any other group that does it.

A far better introduction to sci-kit learn is the project's examples page,
[http://scikit-learn.org/stable/auto_examples/index.html](http://scikit-
learn.org/stable/auto_examples/index.html), which you can use to get example
code and data to generate each type of graph. The documentation on the rest of
the site is also of very high quality.

------
blauwbilgorgel
Andreas Mueller is one of the core devs of scikit learn.

He is active on Kaggle.com too.

For more practical ML projects see:
[https://github.com/amueller](https://github.com/amueller)

~~~
toisanji
He has only done one competition as noted in his slides.

------
sjtgraham
I know what I'm doing tonight. Great idea including sample data to play with
in the library! Is that the MNIST data set?

~~~
dhammack
Here's how to get MNIST, from the sklearn docs:

>>> from sklearn.datasets import fetch_mldata

>>> mnist = fetch_mldata('MNIST original', data_home=custom_data_home)

I think the handwritten digits dataset used in the presentation is just a
subset of MNIST; MNIST is 28x28 and the handwritten digits are 8x8.

------
tucson
I'd like to know more about slide 6:

[http://amueller.github.io/sklearn_tutorial/#/6](http://amueller.github.io/sklearn_tutorial/#/6)

Why the [Classification][100K sample?] checkpoint?

And more info in general about this whole cheat-sheet.

~~~
IanCal
SGD is _fast_ but not necessarily more accurate. If you've got lots and lots
of data, then a simple yet fast approach is likely to be a good choice to
start with.

~~~
tucson
Thanks.

I also found a blog post that figured the diagram, with some background infos:

[http://peekaboo-vision.blogspot.de/2013/01/machine-
learning-...](http://peekaboo-vision.blogspot.de/2013/01/machine-learning-
cheat-sheet-for-scikit.html)

~~~
alok-g
And a prior HN discussion:
[https://news.ycombinator.com/item?id=5831512](https://news.ycombinator.com/item?id=5831512)

There are many others too, but with few points and comments

