Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Python Machine Learning – A Crash Course (github.com)
350 points by irsina 39 days ago | hide | past | web | favorite | 20 comments

The tutorials introduce how the different algorithms work, but the code just uses the libraries rather than implementing what's in the library from primitive operations.

For me, this type of tutorial doesn't stick. I found the explanations in Joel Grus' book, which were accompanied by succinct, idiomatic Python implementations of the algorithms, much easier to understand.

Really nice series of tutorials that are code-focused! I think one of the things that surprised me most about ML is how simple a lot of the code is.

The code itself is indeed simple, thanks to the combined efforts of very smart and capable researchers and developers across the world. But the time taken to write the actual code to perform ML is negligible compared to:

- choosing the right algorithm(s) for the specific task and data at hand

- tuning hyperparameters

- interpreting preliminary results and/or the output of statistical tests

- chasing down and cleaning data(!!)

- deploying the resulting ML pipeline into something more maintainable than a GitHub repo of inconsistently named Jupyter Notebooks strewn with hardcoded paths to CSV files and dangerously obsolete documentation written by employees who have since left your company.

Do you have any references that could help with 'choosing the right algorithm'? That seems to be something that comes with experience, or knowing how a particular algorithm works, or more importantly, does not work.

One possible strategy is:

1. Find out what kind of information that is wanted from the model and what kind of data you have available that could help with this. 1b) And note down other requirements for the solution.

2. Find out how to formulated the information needs in terms of a well-understood and well-researched problem.

3. Find out what the current best performant and well-understood solutions are for this problem. Usually through a literature search.

4. Choose a few candidates, and rank them wrt your particular requirements.

5. Test out 1-3 of them

This is a business-needs-first, top-down type method. Prior knowledge of the details of a lot of algorithms not needed. Ability to understand terminology and quickly identify relevant material/papers critical. A good overview of common problem formulations and methods will build over time, and speeds things up immensely.

This one has been recommended to me a lot: https://scikit-learn.org/stable/tutorial/machine_learning_ma...

But I agree I wish there was more resources on that, it seems to be just a trial and error process.

For my final project in control engineering I was tasked with writing a couple of ml algorithms and rank them to suggest the best one. The code I was working on was not good at all and all the data was being processed in Matlab, so I spent all the project refactoring and proposing a viable solution to using python as a math backend for the java application.

Anyways I had to research a little into the subject and what I found is that there isn't a straightforward approach to choosing algorithms. I could be mistaken, but I believe the best approach for you would be to get intimate knowledge from every ml algorithm and maybe use a cheatsheet to guide you, but ultimately only knowing well your data set (distribution of classes, occurrences, which traits are better for which categorization you want to do etc..) will bring you farther than looking for a recipe for choosing an algorithm.

I wonder if there is some course that would cover a bit more advanced topics in a comprehensible manner. There are hundreds of courses/books/tutorials that cover pretty much the same stuff again and again.

General ML: supervised vs unsupervised, K-means clustering, linear regression, logistic regression, maybe several enseble learning methods based on trees.

NNs: backpropagation, gradient descent, tensorflow, a bit about meta-param selection, CNNs (basically, just ImageNet), sometimes RNNs are mentioned.

This is all pretty entry-level and covered many times over, but, surprisingly, that's pretty much it. Discussion of models pretty much stops at ImageNet. I rarely see RBM or autoencoder, and pretty much nothing about how real problems are encoded into inputs and outputs.

I am ashamed to admit, but I still don't really understand how AlphaZero, AlphaStar or various language models (GPT, BERT) really work. Is there something good on that, maybe?

Speaking of Machine learning, I love the docker images of tensorflow. Got Tensorflow running with an IPython UI in less than 2 min with only one command just 5 minutes ago:

>> docker run -it --rm -v $(realpath ~/notebooks):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:latest-py3-jupyter

You can actually add nightly-gpu-py3-jupyter if you'd like GPU level tensorflow as well :D

Yup but it also needs the nvidia-docker image for the driver, but got it working in less than 10 minutes too !

Really nice! One suggestion would be to have a list of resources for slightly more theoretical materials, so curious students can be exposed to deeper parts of ML (of course, such materials need links to applied tutorials like this). Perhaps can be done through the pull requests of the community.

Nice tutorials and reference resources.

One note, which may be confusing for beginning learners: why to choose one fitting method (e.g. least squares) over another (e.g. gradient descent)? Especially so, as libraries often pack a lot of alternatives.

I swear an AI crash course gets posted to HN every other day..

I have to agree actually. If one is actually serious in wanting to learn this stuff he really should just do a Google search, there doesnt really need to be anymore of these

It certainly does seem that way, especially with python. I wouldn’t be opposed to a fun tutorial in a different language. Also not trying to be pedantic (is anyway) but ... should always have three .’s. (I learned that recently)

Not trying to be pedantic, but the "Also" in your third sentence should be followed by a comma.

Wait, that was annoying and didn't contribute to the discussion. Huh, really makes you think.

Hey, well, at least my comment also added input to the conversation.

You could have let it be, but instead you derailed things further. All good, I learned something new about ellipsis + periods from someone else’s response.

Also not trying to be pedantic, but… it should be just one character (ellipsis).

Actually, if it's at the end of a sentence, it should have four '.'s. Three for the ellipsis, and one for the end of the sentence....

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact