
Why decision trees is the best data mining algorithm - ColinWright
http://zyxo.wordpress.com/2010/09/17/why-decision-trees-is-the-best-data-mining-algorithm/
======
hamner
There is no such thing as a "best" data mining algorithm. Almost all the
advantages you mentioned for decision trees, a form of recursive binary
partitioning, applies to a greater extent to Random Forests, which are
bootstrapped decision trees that only consider a subset of features at each
node.

Examples of domains where decision trees perform poorly include: -Low amount
of data -Domains where you have extra knowledge about the data (such as some
features coming from certain probability distributions) that you can
incorporate into classifiers.

Decision trees work well in a variety of applications, but that does not make
them the "best" algorithm, and it is rare that a classical decision tree
provides state of the art performance on any given data set.
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122...](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.5901&rep=rep1&type=pdf)

------
bravura
Decision trees are useful for the points enumerated in the blog article.

One disadvantage of decision trees is that they can be slow on large data sets
(> 1M examples). They are a batch algorithm, which means that you have look at
all examples to build a tree, although they can be trained in an mini-online
setting (only look at 10K examples per tree) which is faster.

More importantly, decision tree induction involves combinatorial optimization
over splitting features. This is much slower than a continuous optimization
over non-linearities. So decision tree induction is slower than, say,
stochastic gradient descent over a neural network.

(As hamner points out, there is no "best" data mining algorithm, just like
there is no "best" programming language.)

~~~
bane
They're also easy to overfit, similarly, finding representative training data
is often non-trivial.

------
StavrosK
This article is weird. It makes an odd point, it includes the words "decision
trees" _seventeen times_ , and doesn't have much content. It feels like SEO
bait.

~~~
Estragon
Plus some flame bait, to attract links like this thread.

------
iskander
Decision trees suffer from high variance. A slightly different sample might
give you entirely different splits; using decision trees for data
interpretation or feature selection is an art at best (and for some data sets
uselessly unreliable).

>Decision trees are weak learners.

This is untrue. They're only used as weak learners in boosting because the
tree depth is limited to some small constant.

>Decision trees run fast even with lots of observations and variables

I don't know all the decision tree learning algorithms, but at least some of
the common ones run in O(features * samples * splits). That's not terrible,
but you can handle much larger data sets optimizing w/ stochastic gradient
descent or coordinate descent.

>Decision trees can easily handle unbalanced datasets.

This links to a post about bagging, which is not really specific to decision
trees (but can be done with any learning algorithm)

------
snippyhollow
No it is not. It depends on the structural form of what you are learning.
Check <http://www.pnas.org/content/105/31/10687.short> for some other forms.

------
shaggyfrog
Decision trees are a learning system, and like all other tools, it has pros
and cons. They are designed for state spaces where data can be easily divided
up at each branching point, thus, they do not handle stochastic domains very
well compared to something like a Bayesian network.

There is no silver bullet in data mining/machine learning.

------
tgrisfal
They're nice when they work and when they're appropriate.

Spoilers: that's not always.

------
xedarius
In this article there's a small screen shot of an application that looks like
a decision tree designer ... does anyone know what this is? And is it public
domain software?

~~~
OliverM
Looks like some sort of Excel plugin. Googling 'excel decision tree' gives you
several options.

~~~
xedarius
Thanks.

