

Do We Need Hundreds of Classifiers to Solve Real World Classification Problems? [pdf] - alexcasalboni
http://jmlr.org/papers/volume15/delgado14a/delgado14a.pdf

======
benhamner
Random Forests are great on many tasks, but this analysis is incredibly
biased: it only includes the incredibly small and simple datasets in the UCI
repository. Many real world tasks are far more complex than that, especially
those involving text, speech, images, video, and large scale web data.

------
ely-s
This is interesting experimental evidence in spite of the NFL theorem, which
refutes the notion of a generally superior algorithm.

[https://en.wikipedia.org/wiki/No_free_lunch_theorem](https://en.wikipedia.org/wiki/No_free_lunch_theorem)

I would reconcile it by saying that the UCI contains a biased subset of all
theoretical classification tasks.

~~~
Houshalter
This is not surprising to anyone, because the no free lunch theorem assumes
that all datasets are randomly generated. In reality real world problems are
probably drawn from some distribution, and datasets are not totally random.

------
gwern
It makes its point well, but I'd like to see a followup paper addressing
neural networks: given the extreme complexity of successful deep neural
networks, which outperform anything he considers there on real world problems
he doesn't consider, what implications can we draw?

~~~
Houshalter
Neural networks tend to overfit very easily, which is the main reason other
methods usually outperform them. They are mainly successful where they can
exploit the structure of a problem, in ways other methods can't.

E.g. convolutional neural networks take advantage of local structure within
images and the fact nearby pixels are related to each other.

However I'd really like to see the reemergence of Bayesian neural networks,
which can solve the overfitting problem. Also methods like dropout are
relatively new, and alleviate overfitting a lot more than was possible in the
past.

------
minthd
So basically to get 93%(in average) of the value of machine learning, you can
use bigml's extremely easy interface[1], even without writing code ?

[1][http://blog.bigml.com/2013/07/01/you-dont-need-coursera-
to-g...](http://blog.bigml.com/2013/07/01/you-dont-need-coursera-to-get-
started-with-machine-learning/)

------
sjtrny
Do we need hundreds of reposts?

