

Ask HN: Learning, understanding and applying Data mining - madmaze

Hello fellow HNers,<p>I am a soon to graduate Computer Science undergrad(Aug 2011). I have always been fascinated by data mining and scientific computing.<p>During one of my internships I did some scientific computing consisting of processing, indexing and searching terabyte sized sets of time series on GPUs and other massively parallel systems. This and other personal projects ignited my interest and passion in extracting knowledge out of large datasets.<p>I have been sucking up any material I can find and have looked around through google, youtube and through text books but have found to little to satisfy me.<p>Sadly my college does not offer any data mining courses, so I turn to you, the HN community, to point me to further resources.<p>Are there any websites, lectures or books you can recommend?
Any projects I could play with to further my knowledge and understanding?<p>Thank You,<p>Maze
======
helwr
check out some of these self-study guides:

What are some good "toy problems" in data science?
[http://www.quora.com/Programming-Challenges-1/What-are-
some-...](http://www.quora.com/Programming-Challenges-1/What-are-some-good-
toy-problems-in-data-science)

What are some good resources for learning about machine learning?
[http://www.quora.com/Machine-Learning/What-are-some-good-
res...](http://www.quora.com/Machine-Learning/What-are-some-good-resources-
for-learning-about-machine-learning)

How do I become a data scientist? [http://www.quora.com/Educational-
Resources/How-do-I-become-a...](http://www.quora.com/Educational-
Resources/How-do-I-become-a-data-scientist)

What are some introductory resources for learning about large scale machine
learning? [http://www.quora.com/Machine-Learning/What-are-some-
introduc...](http://www.quora.com/Machine-Learning/What-are-some-introductory-
resources-for-learning-about-large-scale-machine-learning)

What are some good learning projects to teach oneself about machine learning?
[http://www.quora.com/Machine-Learning/What-are-some-good-
lea...](http://www.quora.com/Machine-Learning/What-are-some-good-learning-
projects-to-teach-oneself-about-machine-learning)

What are some good class projects for machine learning using MapReduce?
[http://www.quora.com/Machine-Learning/What-are-some-good-
cla...](http://www.quora.com/Machine-Learning/What-are-some-good-class-
projects-for-machine-learning-using-MapReduce)

for a list of available courses see:
<http://news.ycombinator.com/item?id=2656156>

~~~
madmaze
Thanks thats exactly what i have been looking for.

------
iworkforthem
You can consider going through the Getting Started guides for Apache Hadoop,
Cassandra, Lucene & Solr. These software are used quite widely in the
specified areas. Also contributing to the code base will help too.

~~~
madmaze
I have done some Hadoop in the past on Amazons EC2.. definitely useful data
mining tools, but Im looking more for concepts and approaches

------
dakotasmith
Stanford. Mining of massive data sets.

<http://infolab.stanford.edu/~ullman/mmds.html>

Complete book is available.

~~~
peterpeters
here are also some lecture notes:
<http://infolab.stanford.edu/~ullman/mining/mining.html>

