

Ask HN: How to learn Data Analysis? - hhimanshu

I completed the ml-class offered by Prof Andrew Ng last fall<p>I started trying one of the problems in Kaggle<p>When I fit a logistic Regression, the algorithm gave me poor results.<p>I saw that people do lot of data analysis before applying any machine learning algorithm<p>I completed a book in Statistics and want to learn about Data Analysis<p>Please help me identifying resources/courses/tools that I can take to learn Data Analysis<p>Thank you
======
johnhess
You're fluent with regressions and stats. Sounds like you're competent with
the mechanics. Where you might have gaps, there are some great tools that can
do your heavy lifting. But, that's only half of the battle.

When you're trying to do meaningful data analysis, you really have to
understand your dataset. Fancy math can't substitute for domain expertise.
Think long and hard about what's in the set, what the causal connections might
be, and how an "expert" in the field might approach the problem.

The guys over at OKCupid are awesome at this. Check out this post to see what
I'm talking about.

<http://blog.okcupid.com/index.php/dont-be-ugly-by-accident/>

Their advice on taking a good picture is just about exactly what my
professional photographer mother recommends. Good stuff. But, the way they
analyzed and presented the data shows (a) exactly how powerful putting numbers
on something subjective can be and (b) that they know their domain.

If you read through the other posts (do that), you'll see that they have a
solid understanding of their dataset. They know what to look for, namely photo
attractiveness. They know how to get good data on that (the dependent
variable) and they know which independent variables probably matter the most.

Throwing math at a complex dataset can be useful (e.g. bayes spam
classifiers), but if you really want to do something that will work well or
"speak" to a client, invest a bit of time in understanding the field.

~~~
hhimanshu
thank you for your valuable suggestion, I will definitely spend time in
understanding the data first!

~~~
johnhess
[http://blog.stephenwolfram.com/2012/03/the-personal-
analytic...](http://blog.stephenwolfram.com/2012/03/the-personal-analytics-of-
my-life/)

This is on the front page now, but holy crap is it beautiful. This is
everything data analysis should be.

------
skadamat
[http://www.amazon.com/Data-Analysis-Open-Source-
Tools/dp/059...](http://www.amazon.com/Data-Analysis-Open-Source-
Tools/dp/0596802358/ref=sr_1_sc_3?ie=UTF8&qid=1330285039&sr=8-3-spell)

~~~
mattgratt
+1 to this. This book rocks.

There's also a UC Berkeley data science class - <http://datascienc.es/> \-
that's helpful.

~~~
hhimanshu
This class looks good, also Jeff is a well known person in this field, I am
going to attend it offline for sure.

Thank you !

