

Ask HN: Simple projects to implement machine learning - fjellfras

I have been working through Tom Mitchell's book on machine learning and I want to supplement the theory with a practical project.<p>Does anyone have experience with small projects which helped them in a similar situation? The most I have come up with is building a news classifier which will use the feedzilla and NY Times api but if anyone has other good ideas please let me know.<p>I have reasonable programming experience with python and have access to a linode so I will be using these to implement the project.<p>Thank you
======
asit
I used lua to create a simple AI. This AI can learn words and store them in a
dictionary. It can link one word with another, having some resemblance or
relationship. Also I implemented a "emotional link" between them; one word can
be either in a GOOD or BAD relationship with another. Like the word "rat" will
have a BAD relationship with "cat". It is based on meaning of words, rather
than acting like a mere processor of vocabulary crawler! This AI can
communicate with me.. asks me questions that it generates from meaningful
concepts. Ofcourse, it is very basic right now and makes errors. But it will
grow. The bottomline is that you must implement "from the roots". Perception
and recognition are built over these roots. Hope you got on idea what I am
attempting to build. One thing most required right now is adding a solid
grammer engine to it. If I were not a lazy programmer, I would have done it
already. :D

------
jfaucett
I built an image interpretation app once that was really interesting from a
machine learning perspective. Using image magick I extracted shapes and based
on the layout I was able to feed that into a "composition" algorithm that
judged whether the image layout was good or not. I also implemented a "plant
detection" feature to find out whether a plant was in the image or not. I
think images is an area that could still benefit a lot from machine learning,
i.e. facial recognition, image search (extracting meaning from images). thats
just some of my ideas. machine learning is awesome I wish you all the best!!

~~~
fjellfras
Thank you, that sounds very interesting. I assume you would use known images
with plants in them as the training set.

------
anujkk
How about creating a recommendation engine for HN articles? It should
recommend articles upvoted by users that have similar likes/dislikes as me.

I would also like to browse HN articles/comments on basis of topics. For
example, right now if I want to find HN posts related to "machine learning" I
use HN Search. Can you make it better by using machine learning techniques so
that I can get all relevant HN articles on a given topic sortable on basis of
upvotes and time?

~~~
fjellfras
Thank you, that sounds like a good idea. I may start looking into the HN api.
I have been working to build a search engine for reddit using their api for a
week now (not very mature, I have only started to build a database of tags for
the posts) but I believe this idea will work here as well.

~~~
anujkk
May I know which books/courses/blogs you used to learn machine learning? I'm
learning it myself and I have just finished reading "Programming Collective
Intelligence".

~~~
fjellfras
Sure, I started off with Andrew Ng's course on coursera. Then I started with
the book called Machine Learning by Tom Mitchell. I also have the PCI book to
supplement Mitchell's book with code examples. I got Bishop's book too but to
be honest I'm finding it a little harder to follow than the others.

~~~
hoodwink
I'm almost through Andrew Ng's course. Did you do all the programming
activities to reinforce the lectures? I've been keeping pace with the
lectures, but hadn't done the homework/programming. Now I'm going back and
completing them one-by-one.

I was thinking about watching Tom Mitchell's CMU course online next. Have you
checked that out?

PS. Is it just me or is Andrew Ng incredible? I thought I had good professors
in college, but he is on another level.

~~~
fjellfras
No I am exactly in the same place. I missed the programming activities (I was
going to use numpy instead of octave and it turned out to be too much of an
effort). I have re-enrolled in the course going on now so I am also going to
do the assignments now.

Also thanks for mentioning Tom Mitchell's course, I didn't know about it. I
will be sure to check it out.

And yes I agree Andrew is a great teacher. It became even more obvious when I
tried to read up on areas he had not covered in the course.

------
tzm
How about analyzing a database of approx 400k car parts requests? Analyze key
words, year-make-model, geographic location, etc?

I'm currently doing this to generate a chloropleth map for PartsLine.com.
[https://skitch.com/tzmartin/eydmm/partsline.com-rfq-to-
fips-...](https://skitch.com/tzmartin/eydmm/partsline.com-rfq-to-fips-map)

------
ig1
Why not sign-up to kaggle and do some of their challenges, that way you can
also benchmark yourself against others.

~~~
fjellfras
Thanks for the suggestion, I actually did sign up to kaggle a couple of days
ago and am working on some of their problems. I was looking for something more
internet related as getting in data from the internet and grouping or ranking
it feels more immediately fun to me.

