Hacker News new | comments | show | ask | jobs | submit login
The Best Machine Learning, NLP, and Python Tutorials I’ve Found (unsupervisedmethods.com)
523 points by RobbieStats on June 26, 2017 | hide | past | web | favorite | 24 comments

This is a fantastic list.

I've been trying to teach myself ML and AI for a while now, and though only tangentally relevant to the article perhaps others can take a few tips from my experience. First, I didn't really have a breakthrough until I ditched the 'recommended' textbooks and video courses and just started picking out a topic and learning by doing. Anything I don't understand or know when trying to implement it I just google/youtube/wikipedia it and keep messing with it until I know how it works at a conceptual level. Thats where resources like this article really come in handy.

For the heavier math parts, I just write a short summary of the formula and what its various parameters do and try to make a mental note of it. I certainly do not try to use the formulas to solve complex mathematical problems or write an implementation in python. I chalk this task up to `someday when I have the time`.

Finally, I'll get a dataset and try to solve various problems using the new skill I just learned using R/python & co.

Thats it.

This method has ridiculously accelerated the speed at which I've been able to acquire ML/AI skills that I also know how to apply in the real world. Before I felt like I was moving at a snails pace.

This method might not work well for everyone but its at least an interesting alternative to most of the recommendations of doing A,B,C online courses and reading X,Y,Z books.

Your approach resonates with me. I will recommend Practical Deep Learning For Coders[1]. This course is taught by Jeremy Howard who won kaggle competition for 2 consecutive years. His motto is to "Make Deep Learning Uncool again"

I personally found it very hands on, it jumps into practical application right of the bat which helps keeping the motivation steady. Having said that, its not easy or dumbed down in any sense.

[1] http://course.fast.ai/

Edit: Grammar

I will try your approach. I am currently frustrated not by the actual material by the lack of application in my studies.

I know NLTK isn't used for learning by itself, but it's such an essential player in the realm of NLP+Python that I think it should definitely be included (and that the list would be incomplete without it). The best link is probably http://www.nltk.org/book/ which is an excellent place to get started.

You should probably also include the gensim package https://radimrehurek.com/gensim/, since it has the most popular python word2vec implementation. One of the links I clicked on mentions it and walks you through using it, but I think it would make sense to point people to it directly in case they don't have time for a tutorial.

I work in NLP and I would really disagree that NLTK is still an essential player. Some academic courses still teach in it, but I think it's around five years since it has been the best option. I would definitely recommend gensim, but also SpaCy, which is very fast and has a better documented and more extensive API than NLTK. I have nothing to do with SpaCy personally (except being an enthusiastic user) but I recommend this argument: https://explosion.ai/blog/dead-code-should-be-buried

There is also a new higher-level library built on Spacy which looks good: https://textacy.readthedocs.io/en/latest/

Thanks, I tried NLTK many times and ended up dropping it because it became too frustrating to deal with various Parser with no explanation of why there are so many and how to pick between them.

I skimmed though Spacy briefly and it looks great.

How far can you go in spacy with the core 50MB english model ? the other parts are GB large.

The v2 models are much smaller (15mb), because neural networks. The parsing, NER and tagging are mostly okay with the 50mb model. There are only word vectors for the top 5k words though, which can be a problem.

The v2 English models are more accurate, and can assign vectors to any word, including unknown words using the context and the word shape. Overall it's much better -- but it's still in alpha. The docs are already better, though.

A hundred and fifty tutorials is useless, TMI. What are the five best? ;-)

(This is an awesome resource, thank you for compiling it.)

https://github.com/spro/practical-pytorch is also worth checking out

I can't help but agree (disclaimer here).

These were written to demonstrate modern techniques with readable code, after seeing way too many indecipherable tangles of models. PyTorch plays a big part in that readability.

I'm actually recently started ML using a book, jupyter and Python. But i have a hard term translating the formula's to code and graphs. I know/understand what the formulas do, but the code to write seems hard. Perhaps because i don't know Python. Any advice on this?

Ps. Only 1 day since i installed Jupyter to recreate graphs and implement formulas

Ps2. Thought about implementing c# as a kernel in Jupter, but i think it's better to continue with Python.

These kind of repositories are always helpful for bookmarking + coming back to in the future. Thanks for taking the time and effort to compile.

Nice collection.. What I am looking for in addition to these are an explanation of different deep network topologis and how to construct new solutions utilizing ever more complex structures.

Along these same lines, every time I see a new topology I wonder how much of it was inspiration and how much was trial-by-error.

On twitter [1] I saw mention of GDGS: Gradient Descent by Grad Student :)

[1]: https://twitter.com/hardmaru/status/876303574900264960

There was a good link in the feed, but there seem to be fairly little theory for a general approach to solving these more advanced architectures:


What are the most valuable, unsolved problems in the field?

I'm working on using Tensorflow to improve Chinese OCR. So far I collected 1500 fonts. Now I'm exporting all the glyphs, and I ran out of disk space yesterday. Then I'll upload the data set of all 75,000 characters here on HN where people can play with it.

Before you say that OCR is a solved problem because of Tesseract, please read this:


Sentience. ;)

Neither machine learning nor NLP nor Python will give you sentience. However...

Machine learning - no one will tell you the answer b/c they're working on it (whatever "it" is) right now. But here's a good tip from Mr. Andrew Ng:

"...almost anything a typical human can do with less than one second of mental thought, we can probably [do] now or in the near future automate using AI... Take a security guard looking at a video feed and saying, “Are there people in this? Are they doing something suspicious?” That task is actually a lot of one-second judgment thoughts strung together, so I think a lot of it can be automated."



So that's the level of current ML.

NLP - most valuable application is natural language.

I'm starting a Machine Learning channel, will publish weekly https://www.youtube.com/watch?v=5IPuNDVRhkk

Good find!

I personally think Google > this kind of laundry list.

You certainly can, but you'll spend hours like I did sifting through the good and the bad. Mine is a curated list that can hopefully save others time.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact