
A Practical Guide to Tree-Based Learning Algorithms - sadanand4singh
https://sadanand-singh.github.io/posts/treebasedmodels/#.WXT8Kli2pUw.hackernews
======
iamnafets
I've found Adele Cutler's presentation on random forests to be an outstanding
resource for getting intuition of tree-based algorithms.

[http://www.math.usu.edu/adele/RandomForests/UofU2013.pdf](http://www.math.usu.edu/adele/RandomForests/UofU2013.pdf)

Thinking about trees as a supervised recursive partitioning algorithm or a
clustering algorithm is useful for problems that may not appear to be simple
classification or regression problems.

~~~
lackadaisicall
I like this one better:

[https://web.csulb.edu/~tebert/teaching/lectures/551/random_f...](https://web.csulb.edu/~tebert/teaching/lectures/551/random_forest.pdf)

I made it.

------
thearn4
As interesting as I find the current state of deep learning to be, there is
something about random forests that I can't help but find much more cool.
Probably the amazing out-of-box performance.

~~~
platz
also the model is analyzable so as to determine the variables which are
contributing the most.

~~~
nerdponx
For some very nice Random Forest visualizations, check out the R package
"forestFloor" [0].

I also once started implementing a R package for "partial dependence plots"
[1][2], which are popularly associated with Random Forests but aren't specific
to them.

[0]:
[https://CRAN.R-project.org/package=forestFloor](https://CRAN.R-project.org/package=forestFloor)

[1]: [http://scikit-
learn.org/stable/auto_examples/ensemble/plot_p...](http://scikit-
learn.org/stable/auto_examples/ensemble/plot_partial_dependence.html)

[2]:
[https://github.com/gwerbin/statsplots/blob/master/R/partialp...](https://github.com/gwerbin/statsplots/blob/master/R/partialplot.R)

------
Aqwis
Does anyone know why most machine learning libraries (notably scikit-learn)
implement trees and ensembles of trees based on the CART algorithm? It seems
like using other types of trees (See5, MARS) particularly in ensembles could
possibly have advantages as these types of trees were specifically developed
as _improvements_ to CART/C4.5.

~~~
lackadaisicall
> Does anyone know why most machine learning libraries (notably scikit-learn)
> implement trees and ensembles of trees based on the CART algorithm?

This is just my theory.

Because it was the first tree based algorithm and Leo Brieman really did
market it out. He even trademark Random Forest.

Kinda like how XGboost is doing right now.

My professor is also trying to market his version out too. If I get around
finishing my thesis. His algorithm problem is that it isn't ported to any
language at all. It's written years ago in a C and he's not a programmer.

I'd imagine it is the same with the other algorithms. Leo on the other hand is
a CS major on top of a Stat major.

Also there are tons of regression algorithms out there that can be made into
trees (their fully nonparametric counter part).

But in the end linear regression is the most popular next to logistic iirc.
There's survival trees and BART bayesian trees which is in it's infancy.

~~~
joe636434
A professor who invents his own version of tree but can not program.
Seriously. Is this common in academic circles where a computer professor who
can not program ?

------
6502nerdface
Nice write-up, thanks for sharing. One possible typo I noticed:

> Maximum depth of tree (vertical depth) The maximum depth of trees. It is
> used to control over-fitting, higher values prevent a model from learning
> relations which might be highly specific to the particular sample.

Shouldn't it be _lower values_ , i.e., shallower trees, that control over-
fitting?

~~~
sadanand4singh
Thanks for pointing. Yes it should be lower value to prevent over-fitting.

~~~
Bishonen88
df_train_set.Income.value_counts() should be df_test_set probably in the part
where it's comparing both.

------
lugg
Is OP here?

Can you please remove the text justification? Makes it really hard to read on
mobile.

~~~
ErikBjare
Looks fine to me. I wouldn't want him to remove it.

~~~
zephyrppt
Looks fine in portrait mode, width-wise truncated in landscape mode.

