Hacker News new | comments | show | ask | jobs | submit login
End to End Machine Learning Pipeline Tutorial (spandan-madan.github.io)
377 points by tancik 99 days ago | hide | past | web | 24 comments | favorite

Reading articles like this written by people who want to share their fabulous domain knowledge for free of charge really is the reason why I read Hacker News. Thank you, i hope i will have the time to read through it all with thought and later hopefully utilize it with my own projects.

And people like you who take time out to read and learn is exactly the reason why people like me write such articles! Absolutely thrilled that people liked it and that I will be contributing in people learning this beautiful field of science I do research in :)

Thanks for putting so much time and effort into this. This is definitely not "Yet-another-intro-to-ML".

I'm gonna go through it as well, that was a significant amount of work you put in.


Hi! This is really great page, I love reading it. Just a few tips:

The for loops in your code can be made more conscise: instead of

  for i in range(len(movies_with_overviews)):
you can write

  for movie in movies_with_overviews:
Also, at around In[82], you don't declare Y, but still reference it at the train-test split. Another way to do the train-test split is by using the train-test split in scikit-learn: http://scikit-learn.org/stable/modules/generated/sklearn.mod...

These are the tutorials that depict the reality of a machine learning career. Everyone broadly understands that data preparation is the key, but few realize what that involves. Half of this tutorial is just about getting and prepping data for training. Kudos!

To quote one of the greatest professor in ML Pedro Domingos - "First-timers are often surprised by how little time in a machine learning project is spent actually doing machine learning. But it makes sense if you consider how time-consuming it is to gather data, integrate it, clean it and pre-process it, and how much trial and error can go into feature design.....Learning is often the quickest part of this, but that’s because we’ve already mastered it pretty well! Feature engineering is more difficult because it’s domain-specific, while learners can be largely general-purpose."

This so so helpful. It would take me months to gather resources to learn this stuff and I wouldn't even know what I would be looking for. To the author: please share more content if your valuable time permits

Working on them already! Next one is going to be on Word Embeddings for Natural Language Processing. Basically, how do we convert words and sentences to numbers so that a computer can work with them. Applications like Text classification, sentiment analysis all of them depend on this one single fundamental backbone!

That sounds great! Hope to catch it on here when you post it, thanks again for this tutorial - it's a fantastic resource.

Great write-up. Especially the fact that half of it was about finding cleaning and structuring data! You can tell someone isn't applying ML if they aren't spending most of their time getting their data organized. It's the "sharpening the axe" part of the hour Lincoln describes.

For example, they never introduce you to how you can run the same algorithm on your own dataset

I actually think the tensorflow tutorial on CNNs actually runs through training and classification on your own set with inception pretty well.

You mention you're a CV student. Any particular area of focus?

Sure, would love to get in touch about my work over mail! What's your email ID Andrew?


While much of this goes over my head, detailed write-ups like this by people who have no direct way of gaining a financial outcome from all their hard work is the cornerstone of why the internet is fantastic.

Amazing work!

I have been searching for exactly this type of tutorial for months. Your explanation of the state of online "10 minute introductions" for machine learning is spot on. I understand the concepts, and have a thorough background in programming, yet there always was a gap in my knowledge base. Thank you for sharing this!

This is wonderful. I just became interested in this subject but had difficult finding resources that weren't simple copy/paste examples, as you mentioned, or semester-long courses. Thank you!

Are there more great resources like this to learn finding, cleaning and structuring data? Would greatly appreciate it if someone could point me in a direction.


I couldn't find much, that's why I stressed on it in the tutorial. Scraping is a fun hobby but it's extremely useful. I strongly suggest spending time using python's selenium and beautiful soup libraries. The former is good to automate pages with javascript elements, and the latter to parse HTML!

I saw this tutorial by you somewhere Spandan, and found it here on HN. I am yet to explore it but I have marked your GIT repo already. Thanks for the hard work.

Spandan, this is fantastic and detailed write up. Kudos! And thanks for investing your time to do this!

This looks amazing, thankyou for sharing! :)

Very nice work. Thanks for sharing.

Amazing piece of work

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact