
How to visualize decision trees - parrt
http://explained.ai/decision-tree-viz/index.html
======
parrt
Decision trees are the fundamental building block of gradient boosting
machines and Random Forests™, probably the two most popular machine learning
models for structured data. Visualizing decision trees is a tremendous aid
when learning how these models work and when interpreting models.
Unfortunately, current visualization packages are rudimentary and not
immediately helpful to the novice. So, we've created a general package called
animl for scikit-learn decision tree visualization and model interpretation.

~~~
cschmidt
Wow, I wondered why you put a TM on Random Forests. I guess it is trademark of
Salford Systems, which is kind of weird. Maybe we can just call them random
forests and ignore that.

~~~
jph00
I'm often guilty of this too - but we really should put the (tm) there. It's
nice that they made code of the algorithm publicly available and all they ask
is that we respect their trademark in return. I think that's more than fair.
:)

(I discussed this a few years with the co-inventor of random forests, Adele
Cutler, and she confirmed that this is something that she wants to see
happen.)

~~~
marktangotango
Are algorithms patentable? Last I checked in US they were, copywritable?

~~~
kgwgk
Not the answer to your question, but in case it helps anyone: trademarks are
unrelated to patents. You can use a random forest but you can not call them
“random forest”. “Aleatory jungle” is fine, though.

~~~
sacado2
"stochastic treeset". Sounds way more scientific, which can be required to
convince a pointy-hair boss. "Random" forest sounds... well, I can flip a coin
too, how is that going to solve my problem?

For the same reason, "naive" bayes classifier are very hard to sell, to the
point I stopped naming them and now just tell "a very fast machine learning
algorithm", unless specifically asked.

------
oscilloscope
There's a nice interactive version of a decision tree diagram here, in the
section "Growing a Tree".

[http://www.r2d3.us/visual-intro-to-machine-learning-
part-1/](http://www.r2d3.us/visual-intro-to-machine-learning-part-1/)

~~~
comboy
This is the first time ever when the website is doing something when I scroll
and I'm not mad.

~~~
anonytrary
I was particularly impressed with the continuous scroll which lets you go
through the animations frame-by-frame, so to speak.

------
benmccann
I have a model trained with XGBoost Java. Can I take the model file and read
it with scikit-learn to visualize it with this library?

------
jdonaldson
Good to see others looking into tree model viz. I've done work with larger
scale tree visualizations and found you quickly run out of space. I wound up
using interactivity to reveal branch level info, dynamically pruned the tree
based on train support, and I used a more sophisticated layout technique to
pack more info in.
[https://www.google.com/amp/s/blog.bigml.com/2012/01/23/beaut...](https://www.google.com/amp/s/blog.bigml.com/2012/01/23/beautiful-
decisions-inside-bigmls-decision-trees/amp/)

FWIW visualizing trees like that helps spot problems really quickly.
Overfitting behavior typically involves overusing a certain field, or growing
long and relatively narrow branches.

~~~
parrt
Thanks for that link. Super useful. Looks like BigML uses same layout I did
for ANTLR parse trees. Really packs stuff in; e.g., [https://cdn-
images-1.medium.com/max/1760/1*k0mO4kJyQvPCyyev0...](https://cdn-
images-1.medium.com/max/1760/1*k0mO4kJyQvPCyyev0nn_EQ.png)

~~~
jdonaldson
Yeah the general algorithm is "Reingold-Tilford" with some tweaks from
Buchehim.

[https://en.m.wiktionary.org/wiki/Reingold-
Tilford_algorithm](https://en.m.wiktionary.org/wiki/Reingold-
Tilford_algorithm)

Really good algorithm to have in an arbitrary visualization toolkit.

------
parrt
Just a note that dtreeviz works cross-platform now! Mac, Windows, Linux. "pip
install -U dtreeviz" See more at
[https://github.com/parrt/dtreeviz](https://github.com/parrt/dtreeviz)

------
beamatronic
If you model the decision tree as a directed graph, there’s no reason you
couldn’t export it into a Doom/Quake level or even Minecraft or Roblox

~~~
parrt
Heh that’s a cool idea. Fly through the tree like a maze

------
nestorD
Not sure about the choice of pie chart as the default leaf format (humans are
bad at guessing proportions from pie charts) but otherwise it does look great
and convey the information efficiently.

~~~
parrt
howdy! We use a pie chart for classifier leaves, despite their bad reputation.
For the purpose of indicating purity, the viewer only needs an indication of
whether there is a single strong majority category. The viewer does not need
to see the exact relationship between elements of the pie chart, which is one
key area where pie charts fail.

------
swinghu
Good,very informtion distilled from the drawing

