
An Introduction to Decision Trees with Julia - ptwobrussell
http://bensadeghi.com/decision-trees-julia/
======
tlarkworthy
"Also, keep in mind that even though the construction of decision trees may be
expensive, the classification of a new sample is just a walk down the tree,
and so, is very fast."

I would say decision tree training is very fast compared to other methods.
That's the main benefit of decision trees vs. neural nets (iterative back
propagation), MCMC (sampling), SVM (O(n^3)).

I can't think of any mainstream technique that's faster to train. (k-nearest
is fast in training but slow in prediction so that doesn't count).

~~~
ajtulloch
For what it's worth, I don't think it's fair to claim that SVM is O(n^3).

From [1], LibLinear trains linear SVMs in O(nd log (1/p)) time, where n is the
number of examples, d is the number of features, and p is the optimization
tolerance, and Pegasos/FOLOS and other subgradient methods can do online
kernel SVM training in O(s / λ p) iterations, where s is the number of non-
zero features in each example, p is as above, and λ is the regularization
parameter.

There are also lots of ways to speed up decision tree training from the naive
approach - I explain/implement some in [2] (and in the Go decision tree
library I worked on [3]), and there are many more (bucketizing, presorting,
etc). The bound I've seen is O(mn log m + mnT) with T rounds, m examples, and
n features.

[1]:
[http://cseweb.ucsd.edu/~akmenon/ResearchExam.pdf](http://cseweb.ucsd.edu/~akmenon/ResearchExam.pdf)

[2]: [http://tullo.ch/articles/speeding-up-decision-tree-
training/](http://tullo.ch/articles/speeding-up-decision-tree-training/)

[3]:
[https://github.com/ajtulloch/decisiontrees](https://github.com/ajtulloch/decisiontrees)

~~~
tlarkworthy
Oooh that's some cool stuff. Incremental Tree training is really nice!

I generally use random forests so I can control the subset sizes pretty easy.
Its generally not worth me tweaking stuff beyond that. I am generally applying
techniques rather than doing machine learning research so the gains are not
really worth it for me.

------
jamescryer
The expressiveness of Decision Trees lends itself to other applications. I'm
looking at how we might apply the ideas behind Decision Trees in UI/UX
modeling and UI testing. (Slightly off subject but)
[https://github.com/Huddle/PhantomFlow](https://github.com/Huddle/PhantomFlow)
attempts to do this. Breaking away from conventional testing approaches is
interesting enough, but having the tests produce mine-able, visualize-able
data is a great way to better understand system complexity.

