

Ask HN: How do I get into data mining/analysis? - parkrrr

Some background: Bachelors in comp sci, currently full time internal developer.  Unfortunately, I only took one stats course in college, about 3 years ago, and my data mining course was worthless.<p>I'd really like to get into data analysis and data mining.  I do a fair amount of basic reporting right now, but it's nothing outside of basic select-and-display database reports.  For example, right now we have a report that shows client satisfaction reports.  I'd like to add some added functionality to it so that we can see some trends such as movers/shakers, if some customers are consistently returning negative feedback, or if the amount of notes left in a ticket has a bearing on the survey feedback.<p>I can't afford to go back to school at the moment, so I'm looking for ways I can bootstrap some basics into my work.  Can someone recommend some resources for this?  I have the book "Think Stats" and "Mining of Massive Datasets" by Rajaraman and Ullman (although this one is quite a bit over my head), but some more basic resources would be nice.  Thank you in advance!
======
sfrechtling
I come from a text analysis background - there are a lot of nice packages in
python which make it easy to do data analysis of any kind. Have a look at
Numpy + Scipy, nltk. You can easily find insights without a strong statistics
background, but that would certainly help. Even just the basic median, mode,
mean gives you a level of insight that you can extrapolate and give extra
meaning to your data.

------
NnamdiJr
I would say Coursera's "Computing for Data Analysis" and "Data Analysis"
courses, taught by JHU's Robert Peng and Jeff Leek respectively, were great
introductions to the field using R. Both courses are over now, but you might
be able to find archived content on the Coursera sites or some of the vids on
YouTube. The courses also pointed you to many additional resources that should
do a lot to supplement your learning.

After that, you should have a good foundation to self-direct your learning by
studying relevant texts (like the two you already picked up) and finding data
sets you can play with to just see what you can do and push your skills
further.

Good luck to you.

------
mwetzler
There is a serious shortage of analytics skills in the market right now. I
recommend you set up an ifttt that emails you anytime a job posting is made to
craigslist with the terms "analytics", "big data", or "data mining". Update
your resume to express an interest in this field and describe the work you've
done that's related to it. Send your resume to all the postings and talk to
the companies about the type of work they have available. Most companies don't
need a data genius; they need a smart person who is passionate about
analytics. There is a ton of opportunity in this field; you'll be welcomed
with open arms!

~~~
parkrrr
Is there a particular area I should look at? Not a lot of activity in
Indianapolis :\

~~~
mwetzler
SF, Seattle, NYC, Boulder are tech hubs

------
warrenmar
I would also recommend the Coursera course on Machine Learning by Andrew Ng
and Probabilistic Graphical Models by Daphne Koller. I would also go over some
basics probability and statistics review. Maybe some linear algebra too.
Python is a great language to do data analysis in. I recommend the scikit-
learn and pandas packages and using ipython notebooks. Another book is the
Elements of Statistical Learning (<http://www-
stat.stanford.edu/~tibs/ElemStatLearn/>). There are also Kaggle contests for
testing your chops.

