
Ask HN: What's a good machine learning independent study project? - shakeel_mohamed
My school doesn&#x27;t offer very many CS electives, and the ones they do offer aren&#x27;t interesting to me.<p>So, I m going to propose an independent study around machine learning - which I know nothing about right now. The highest level math classes I&#x27;ve taken are multivariable calculus and linear algebra. Other CS courses I&#x27;ve taken are data structures, OO design, OS &amp; networks, web design, database fundamentals. Coming up I&#x27;m taking: languages &amp; computation, and algorithms analysis.<p>I was thinking about making an AI that learns to play chess over time, but I don&#x27;t know if that&#x27;s too much work for an 11 week quarter.<p>What other projects would fit within the scope of 11 weeks?
======
patio11
YMMV on this, but I studied CS with an informal concentration on AI/natural
languages. Here's some take-them-or-leave-them suggestions.

If you want to maximize the return on your time for this class, do a project
which:

1) Uses one or many data sources which are publicly available but which,
ideally, are not quite as simple to access as straight downloading a CSV file.
A bit of practical experience with scraping, API use, or data processing
doesn't hurt. Bonus points if you get a taste for working with large data
sets.

2) You will not make an AI which learns to play chess in 11 weeks, or in 11
years. Just to set expectations. A more reasonable task for the same timeframe
given your current skillset is e.g. "Given a large corpus of documents and a
small number of them are hand-tagged, explore a few different approaches for
classifying the remainder of the documents." A motivated undergrad can succeed
at implementing a Bayesian classifier, but you will not advance the state of
the art on chess.

3) A lot of academic projects focus on toy problems, like e.g. chess or a
contrived simplification of a real system. There is no reason that you have to
adopt this academic convention: consider picking a real system with
consequences. There exist many websites which have information on them that
actually impact decisions which people care about -- wouldn't you rather learn
to do analysis on that rather than pulling out arbitrary trivia out of e.g.
the British national corpus (which, I rush to mention, is an excellent tool).

4) Think about the presentation layer for findings in more detail that the
typical academic paper, which spits out a sentence or two of summary stats and
maybe graphs them. This might be an opportunity to have a bit of fun doing,
e.g., a website which lets you search through your (voluminous) findings.

Putting it all together, you could imagine something like "I have developed a
website and/or Chrome plugin which, when pointed at an Etsy item, predicts the
likelihood that it will sell. Or it predicts the likelihood that a KickStarter
campaign will succeed. Or it predicts the final sale value of an eBay auction
-- better in some categories than others, see page 6. Or it successfully
paints a red/blue map of the United States using no prior knowledge other than
a geolocation database and the Twitter stream. Or it asks you ten questions
about seemingly irrelevant trivia and then makes a surprisingly accurate
prediction on how long it has been since you ate sushi."

~~~
shakeel_mohamed
This is some solid advice, thanks!

------
angersock
Simple idea:

Given a post text or image, give the three boards it was most likely posted to
on 4chan.

Data is easily available on the 4chan API, and you can do things from very
simple (matching word frequencies) to complex (NLP and image recognition).

EDIT:

Bonus round--train it to generate posts for a given board.

------
rfergie
I'm doing some work for a small UK based charity.

I have several clustering/prediction problems in my pipeline at the moment.

Drop me a line (email in profile) if you are interested in having a crack at
one of them. Should give you insight into all sorts of stuff apart from big
data

------
Irishsteve
Students in my place usually end up going through all the content in
[http://www.cs.waikato.ac.nz/ml/weka/book.html](http://www.cs.waikato.ac.nz/ml/weka/book.html)

In terms fo projects etc. there are about 4 or 5 assignments that range from
spam detection, to parameter setting optimisation.

------
sharemywin
Check out Restricted Boltzmann Machines and Deep learning.

