
A Thousand Foot View of Machine Learning - fogus
http://awwthor.wordpress.com/2009/12/31/a-thousand-foot-view-of-machine-learning/
======
physcab
This is a good overview, although I have some critiques.

1) The author doesn't really emphasize how complicated these algorithms are.
Most people look at ML and think it can solve their problems (ie predict the
future). In practical applications though (other than "toy" perfect gaussian
data), you have to understand _what the hell the algorithm is actually doing
to your data_. A lot of the questions that I encounter when doing ML research
include: a) am I sure I implemented the algorithm correctly? b) is the
algorithm doing what its supposed to be doing? c) how can I cross check its
answers? d) if I use another algorithm, how do I gauge performance?

2) Like it or not, ML requires a very solid background in mathematics. You can
get away with using algorithms "out of the box", but then its just that...a
box. Knowing what goes on inside gives you the confidence that the solutions
are correct and can be used reliably. It's important to know that ML is not
like a car that you can just pick up and drive without knowing how an engine
works.

3) I think the author should have listed easier algorithms that are used more
often in industry. K-nearest neighbors is good, but I would also add k-means
clustering and perhaps Principal Component Analysis (PCA) over SVM or KSVM.
And if you're going to cover the more advanced stuff...you need to talk about
Bayesian learning.

~~~
adamsmith
Nearest neighbor and k-means will return noise unless your data only has two
or "at most" three dimensions, due to the curse of dimensionality.
(<http://en.wikipedia.org/wiki/Curse_of_dimensionality>) This is the #1 trap
that newbies fall into, since these algorithms seem deceptively intuitive but
there's this huge unintuitive pitfall.

I agree wrt points #1 and #2. Especially #1.

My favorite supervised learning algorithm is decision trees. They have built
in feature selection, and they are simple to understand and express in code.

Boosted decision stumps are my second favorite.

Both are "far better than" support vector machines for the real world problems
I've ran into. Except the algorithm names don't sound nearly as cool as SVM.

~~~
paraschopra
Any particular hypothesis why decision trees work better for you than SVMs?

~~~
adamsmith
Sorry for the late reply.

Here are the two I can think of off the top of my head: the first is that
decision trees (or even naive bayes) produce results that are intuitively
understandable and human readable. In contrast the result of training an SVM
is a huge vector of 100's of floats and there's no real way to explain to
someone (e.g. someone buying your tech, a manager, etc) what's actually going
on.

I didn't mention this before but SVM's do get you something that decision
trees, naive bayes, and others don't: they will look at all kinds of
combinations of attributes. This becomes critical for applications like
machine vision where e.g. looking at any one single pixel as an attribute
won't do anything for you and so you need something that will understand lots
of different possible combinations of attributes.

Which is also the second weakness of SVMs: most data sets that aren't images
don't express strong patterns across a wide variety of combinations of
features. To take naive bayes as an example, it is a hugely popular algorithm
and the 'naive' part (which tends to work well on my data sets) is the
assumption that each attribute is statistically independent!

------
msbmsb
The final two phrases "how to use the data to make it as useful as possible to
the algorithm, and how to fine-tune the parameters that each of these
algorithms take" are just as influential in the quality of any ML process as
the selection of training method. The data type and quantity, overfitting,
sparseness, dataset shift, etc. are all constant challenges that affect the
classification and require work to deal with.

When you see the effort involved at times in data wrangling, feature
selection, equation optimization or parameter experimentation to overcome
deficiencies in training sets, any notion of 'black magic' fades.

------
rwolf
Those are two great examples. I've heard of KNN, but I will certainly be
looking into KSVM!

My personal recommendation is the Naive Bayesian Classifier:

<http://en.wikipedia.org/wiki/Naive_Bayes_classifier>

<http://www.paulgraham.com/spam.html>

~~~
pgbovine
peter norvig has a great chapter in the O'Reilly book "Beautiful Data" about
using naive bayes + a ton of data mined from the web to do fairly good natural
language processing. one of his recent themes has been "naive algorithms +
lots of training data can often beat sophisticated algorithms + little data"

i dunno if his article is online somewhere, though.

~~~
rwolf
The O'Reilly book "Collective Intelligence" is another nice resource for Naive
Bayes. I picked it up because of a sidebar describing an improvement I didn't
find online (something about a Chi distribution...I'm in a different town than
my bookshelf today).

------
wheaties
From the website of Awwthor LLC: "In our model portfolio, our initial balance
of $5,000 grew within three months to $150,000." -- So let me get that
straight, a 3000% increase in only 3 months!?

~~~
zandorg
It says 'down' meaning it predicted going down. But every bet is down...? What
does that mean?

~~~
Imprecate
I would assume it means to short sell the security.

No clever modeling is going to generate 3000% quarterly returns in mostly
efficient markets without taking significant risk. I do this kind of stuff for
a living. Either the model is excessively curve-fit or they didn't account for
real-world concerns like slippage, latency, market impact, the bid-offer
spread, and transaction costs. A typical user of their signal services won't
be able to reproduce those results.

Statistical techniques have some interesting uses in quantitative finance, but
it's important to find an underlying economic reason to explain why your model
works. Stat arb
<[http://en.wikipedia.org/wiki/Statistical_arbitrage>](http://en.wikipedia.org/wiki/Statistical_arbitrage>);
is a common strategy, but there are sensible reasons why it (sometimes) works.
If two securities are in the same industry, they'll tend to move in tandem
since they're affected by similar factors. Thorp's articles (linked in the
Wikipedia page) are worth a read if it's something that interests you.

Also, the difficult thing about modeling financial markets vs. other phenomena
is that there's a huge incentive for market participants to avoid leaking
information. Most inefficiencies are arbitraged away quickly or players who
"show their hands" wise up; the market is a harsh mistress. Maybe years ago a
big execution in a particular stock was indicative of its direction, but buy-
side traders are smarter now and use algorithmic strategies that split orders
temporally and physically across exchanges or trade in dark pools to avoid
information leakage. There's not nearly as much incentive for you to hide your
Google searches or ad clicks, so there's more opportunity for useful (and
profitable) modeling using these techniques.

