
End to End Machine Learning Pipeline Tutorial - tancik
https://spandan-madan.github.io/DeepLearningProject/
======
tekkk
Reading articles like this written by people who want to share their fabulous
domain knowledge for free of charge really is the reason why I read Hacker
News. Thank you, i hope i will have the time to read through it all with
thought and later hopefully utilize it with my own projects.

~~~
spandan-madan
And people like you who take time out to read and learn is exactly the reason
why people like me write such articles! Absolutely thrilled that people liked
it and that I will be contributing in people learning this beautiful field of
science I do research in :)

~~~
anantzoid
Thanks for putting so much time and effort into this. This is definitely not
"Yet-another-intro-to-ML".

------
fabatka
Hi! This is really great page, I love reading it. Just a few tips:

The for loops in your code can be made more conscise: instead of

    
    
      for i in range(len(movies_with_overviews)):
          movie=movies_with_overviews[i]
    

you can write

    
    
      for movie in movies_with_overviews:
    

Also, at around In[82], you don't declare Y, but still reference it at the
train-test split. Another way to do the train-test split is by using the
train-test split in scikit-learn: [http://scikit-
learn.org/stable/modules/generated/sklearn.mod...](http://scikit-
learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)

------
deepGem
These are the tutorials that depict the reality of a machine learning career.
Everyone broadly understands that data preparation is the key, but few realize
what that involves. Half of this tutorial is just about getting and prepping
data for training. Kudos!

~~~
spandan-madan
To quote one of the greatest professor in ML Pedro Domingos - "First-timers
are often surprised by how little time in a machine learning project is spent
actually doing machine learning. But it makes sense if you consider how time-
consuming it is to gather data, integrate it, clean it and pre-process it, and
how much trial and error can go into feature design.....Learning is often the
quickest part of this, but that’s because we’ve already mastered it pretty
well! Feature engineering is more difficult because it’s domain-specific,
while learners can be largely general-purpose."

------
Omnipresent
This so so helpful. It would take me months to gather resources to learn this
stuff and I wouldn't even know what I would be looking for. To the author:
please share more content if your valuable time permits

~~~
spandan-madan
Working on them already! Next one is going to be on Word Embeddings for
Natural Language Processing. Basically, how do we convert words and sentences
to numbers so that a computer can work with them. Applications like Text
classification, sentiment analysis all of them depend on this one single
fundamental backbone!

~~~
companycalls
That sounds great! Hope to catch it on here when you post it, thanks again for
this tutorial - it's a fantastic resource.

------
AndrewKemendo
Great write-up. Especially the fact that half of it was about finding cleaning
and structuring data! You can tell someone isn't applying ML if they aren't
spending most of their time getting their data organized. It's the "sharpening
the axe" part of the hour Lincoln describes.

 _For example, they never introduce you to how you can run the same algorithm
on your own dataset_

I actually think the tensorflow tutorial on CNNs actually runs through
training and classification on your own set with inception pretty well.

You mention you're a CV student. Any particular area of focus?

~~~
spandan-madan
Sure, would love to get in touch about my work over mail! What's your email ID
Andrew?

~~~
AndrewKemendo
andrew@pair3d.com

------
sekasi
While much of this goes over my head, detailed write-ups like this by people
who have no direct way of gaining a financial outcome from all their hard work
is the cornerstone of why the internet is fantastic.

Amazing work!

------
stevew20
I have been searching for exactly this type of tutorial for months. Your
explanation of the state of online "10 minute introductions" for machine
learning is spot on. I understand the concepts, and have a thorough background
in programming, yet there always was a gap in my knowledge base. Thank you for
sharing this!

------
jonheller
This is wonderful. I just became interested in this subject but had difficult
finding resources that weren't simple copy/paste examples, as you mentioned,
or semester-long courses. Thank you!

------
ireadfaces
I saw this tutorial by you somewhere Spandan, and found it here on HN. I am
yet to explore it but I have marked your GIT repo already. Thanks for the hard
work.

------
praveer13
Are there more great resources like this to learn finding, cleaning and
structuring data? Would greatly appreciate it if someone could point me in a
direction.

~~~
spandan-madan
Hi!

I couldn't find much, that's why I stressed on it in the tutorial. Scraping is
a fun hobby but it's extremely useful. I strongly suggest spending time using
python's selenium and beautiful soup libraries. The former is good to automate
pages with javascript elements, and the latter to parse HTML!

------
allpratik
Spandan, this is fantastic and detailed write up. Kudos! And thanks for
investing your time to do this!

------
mcintyre1994
This looks amazing, thankyou for sharing! :)

------
code4tee
Very nice work. Thanks for sharing.

------
craptocurrency
Amazing piece of work

