

For Today’s Graduate, Just One Word: Statistics - tokenadult
http://www.nytimes.com/2009/08/06/technology/06stats.html

======
defen
This is a question I've been meaning to ask on here, but didn't think it
warranted its own "Ask HN": What's the best way for someone with an otherwise
decent math background, but no statistics, to get started on the topic? Any
particular books, websites, etc that people recommend?

~~~
physcab
The key is to come up with a well-formulated question, such as "I wonder if I
can predict stock trends." Then in doing so, you'll search the web and come
across predictive modeling, which will lead to machine learning techniques,
which will lead to good resources. Within those machine learning sources,
search for chapters on prediction and classification, and you'll come across
regression techniques, support vector machines, relevance vector machines,
etc. Then you'll wonder, "ok, how do I actually solve this problem" so you may
search for "SVM implementations" and find Steve Gunn's for Matlab for example.
Then after much codebanging, you'll realize that inorder to solve this
problem, you need a good dataset, so you go to Yahoo finance and see if you
can download some data for IBM.

This is usually the process one needs to follow, albeit with some intermediate
steps switched out for others here and there.

~~~
discojesus
it seems like if he did that, he'd be biting off WAY more than he can chew at
the present time.

If you want to learn statistics, it's probably better to start at the
beginning (_Cartoon Guide to Statistics_, O'Reilly's new _Head First
Statistics_, or Huff's _How to Lie With Statistics_) than to leap headlong
into a huge, mostly intractable problem and pagefault in knowledge at each
point you come across something you don't know how to do.

~~~
physcab
I agree, but at the same time, you only learn by doing. In my experience, even
when you dive in way over your head, you tend to pick up the information
rather quickly. I took a machine learning class with no statistics background
and after a few weeks of floundering and learning terminology I was fine. The
benefit of forming a problem first gives you an ultimate goal of which to work
towards.

Don't tell Google that it's slogan ("Organize the world's information") is
biting off more than it could chew.

~~~
discojesus
_I agree, but at the same time, you only learn by doing._

This is true, but I think in the original poster's case it could be much more
easily and reliably be accomplished by continually giving him problems that
are within (or more ideally, _just_ outside) his circle of competence, rather
than a problem like "predict stock prices" which isn't in _any_ human being's
circle of competence. Moreover, with the latter problem, he'll get virtually
no feedback as to whether his answer was correct, because a correct answer for
"use statistics to predict stock prices" doesn't exist. Odds are that if he
follows that path he'll quit before making any progress, or at the least he
won't be able to close the feedback loop that is so vital to gaining
expertise.

I think that if he's starting from ground zero and wanting to learn
statistics, he'd be much better served by sitting down with The Cartoon Guide
to Statistics and a deck of cards and set of dice first. He can work his way
up to conquering the stock market :)

 _Don't tell Google that it's slogan ("Organize the world's information") is
biting off more than it could chew._

I think that's apples and oranges - he's trying to learn statistics, not
trying to convince potential clients or investors that he's already an expert.
I don't think it would be harmful at all for him to have a lofty, far-out goal
like "predict stock prices" to aim toward, but I do think that if he starts
out trying to learn statistics by typing "statistical stock prediction
methods" into Google, he will burn out rather quickly. Pagefaulting in
knowledge when you need it is probably optimal for something where you just
want to make sure your knowledge is passable, but if he wants to truly know
his domain, he's gotta get out the marbles and urns. :)

------
kevinpet
I completely agree. Statistics is what I most wish I had studied more
intensely in school. And not just so I wouldn't have those poor marks on my
transcript.

~~~
fauigerzigerk
Well, school may be over, but life is not :-)

------
jamesk2
Stephen Baker's book, the Numerati, covers what data geeks are doing with
stats, data mining, semantic analysis, machine learning...

Here's a link to some of his talks on the subject:
<http://thenumerati.net/index.cfm?catID=4>

~~~
hooande
I have some background with machine learning and I'm reading this book now. It
doesn't seem like way a good way to learn much about applied statistics, but
it does a good job of illustrating how data mining impacts people's daily
lives.

There's an "aren't you shocked that they're gathering all of this data?!" tone
that gets old as the book goes on, but it's generally a good read. He even
describes some of the major algorithms (support vector machines and
clustering) in layman's terms.

~~~
dkersten
SVM's and clustering are AWESOME. My housemate uses both to analyse EEG data
and his classifiers are absolutely amazing. Looks like his work may end up
being used by the european space agency too.

------
bravura
Or, instead of statistics, you could study it's sister field: Machine
Learning.

Robert Tibsharani provides the following comparative glossary for machine
learning and statistics: [http://anyall.org/blog/2008/12/statistics-vs-
machine-learnin...](http://anyall.org/blog/2008/12/statistics-vs-machine-
learning-fight/)

    
    
                            Glossary
    
      Machine learning              Statistics
      network, graphs               model
      weights                       parameters
      learning                      fitting
      generalization                test set performance
      supervised learning           regression/classification
      unsupervised learning         density estimation, clustering
      large grant = $1,000,000      large grant= $50,000
      nice place to have a meeting: nice place to have a meeting:
      Snowbird, Utah, French Alps   Las Vegas in August

------
davidw
So, for those of us who have mostly forgotten math that doesn't get used
regularly, what's a good way to get an overview of at least what's _possible_
, and where it's applicable? Enough to get an idea of what to go study in
further detail in order to accomplish something, or at least ask for help/hire
someone.

~~~
tokenadult
<http://www.mrderksen.com/textbooks.htm>

These books have good indexes leading to statistics issues in particular
fields of research or applications.

------
xhuang
this article make statustics really cool, i want to learn it now, anyone know
how hard could it be?

------
krishna2
Just like: Plastics

Of course, should add the obligatory remark: Lies, Damn Lies and Statistics.

~~~
joshu
I was worried that nobody else here would get the reference. Whew!

------
travisjeffery
Anyone who wasn't registered keep getting pushed off the site? F-that.

~~~
dejv
Just google for name of the article go to the site from this point. Site never
blocks traffic from search engines :)

------
oz
Gotta love the T-shirt...

