

Research Directions for Machine Learning and Algorithms  - yarapavan
http://cacm.acm.org/blogs/blog-cacm/108385-research-directions-for-machine-learning-and-algorithms/fulltext

======
bravura
As always, John Langford has one foot firmly planted in practice, and knows
deeply what practitioners will care about in the next five years.
Incidentally, he is the author of Vowpal Wabbit, one of the fastest out-of-
core learning implementations. He also has a profound theoretical knowledge.

This article is a more open-ended, forward-looking version of his blog post
"The Ideal Large Scale Learning Class" (<http://hunch.net/?p=1729>). That blog
post is required reading for anyone who wants to do large-scale learning, and
understand the current state-of-the-art.

 _"Almost all the big impact algorithms operate in pseudo-linear or better
time."_

This is one of the reasons for the resurgence in neural networks. It's useful
to learn a non-linear model over 1 billion examples, which is something you
can't do with an SVM.

 _"How do we efficiently learn in settings where exploration is required?"_

I've been exploring this problem recently. For example, I am designing an
interface where a user can interact with search results to improve the
quality. Besides ML issues, this also touches on UX questions: Is it better to
force the user to give feedback on five crucial search results? Or should you
show 50 results and have the user cherry pick results that are particularly
good/bad?

 _"How can we learn to index efficiently?"_

Another important question. One particularly interesting approach is using a
dense hash code to do semantic search. For example, see the approach Semantic
Hashing, which I describe in this forward-looking O'Reilly Strata talk about
ML:
[http://strataconf.com/strata2011/public/schedule/detail/1693...](http://strataconf.com/strata2011/public/schedule/detail/16934)
(YouTube: <http://www.youtube.com/watch?v=fEUw8igr1IY>) That talk overlaps a
bit with Langford's post, except my Strata talk was about new developments for
people in industry, not upcoming topics of interest for academics.

~~~
iandanforth
"Is it better to force the user to give feedback on five crucial search
results? Or should you show 50 results and have the user cherry pick results
that are particularly good/bad?"

What metafore are you trying to support? Is the system one that learns (think
child) or one that adapts (adult).

If you present a system as one that needs to be taught you will invoke a whole
different set of expectations than one where it is assumed to know a great
deal but can adapt.

Personally I always recommend forcing a user to teach the system (see Netflix
new user flow) because it breaks the assumption that the system will 'just
know' which is often unreasonable.

------
hamner
The argument that important ML algorithms should be highly scalable ( O(logN),
O(N), O(NlogN) ) holds in fields that are rich in "big data," with millions to
trillions of data points.

However, there are also many fields where acquiring a large ( > 100s-1000s of
samples) is infeasible. This is especially relevant in medicine and biology.
Many applications are constrained by small sample sizes and may have a feature
count that is orders of magnitude larger than the sample count. Examples
include fMRI studies and gene expression studies. Don't discount research in
methodologies (such as SVMs and many graphical models) that have superlinear
performance as impractical for real-world applications, because these are used
heavily in certain fields.

~~~
Wuzzy
My impression was that the OP didn't say superlinear algorithms are somehow
useless; merely that there are reasons why the linear (or better) ones can be
used in much more general settings, which is what makes them "big impact".

------
alextp
This is a really valuable list, specially because by reading journals and
going to conferences one would not think these are the major problems (as most
of these are too hard to be tackled directly and yet need simple solutions
that would not translate to dozens of publications), but at the same time in
talking to non-researchers there is too much clouding this clear picture.

I'm trying to focus more on these problems in my research, but it is no
accident that they are still unsolved.

I think the main recent ideas to tackle these are: sampling instead of doing
things exactly, random projections to compress information when needed,
limiting the memory footpring of existing algorithms, and controlling shared
state to increase distributedness. The hardest of these problems, how to act
when what you do changes the world and the data you see, desperatly needs at
least some guiding principles.

